
Words as Math: The Shape of Meaning
Type "king - man + woman" into an AI and you get "queen." This isn't magic—it's math. When AI turns language into numbers, meaning isn't in definitions but in patterns. Could this be how meaning really works in our minds too—not as dictionary entries, but as a map of connections?
Today's Focus
I can't stop thinking about how large language models like Claude actually work their magic. Last week we explored how AI breaks language into tokens, but that just led to an even bigger question bouncing around my brain: what happens next?
How does an AI system, which ultimately just processes numbers, actually understand and generate human language?
This feels especially puzzling because math has never been my strongest subject. I remember learning about bits and bytes in school, and vaguely recall the progression from 8-bit to 64-bit systems over the decades. But how does any of that connect to AI understanding "The cat sat on the mat"?
It's like there's this massive conceptual gap between the 1s and 0s of computing and the richness of human language. How do we bridge that?
Today we're diving deeper into that gap, exploring how those tokens we talked about last week transform into mathematical objects that AI can actually process. It's like peeking behind the curtain of a magic show – we know something incredible is happening, but the real wonder is in understanding how the trick works.
Learning Journey
When I started researching this topic, I first needed to remind myself of what we learned last week about tokenization – that process of breaking text into smaller pieces that the AI can process. But tokenization is just the first step in this fascinating transformation pipeline.
After text is broken into tokens, each token gets assigned a unique ID number from the AI's vocabulary.
But this simple number isn't enough for the AI to understand meaning or context – it's just an identifier, like a library catalog number that tells you where to find a book but nothing about what's inside it [1].
It would be like trying to understand relationships between books just by looking at their catalog numbers. That can't possibly work!
Here's where things get interesting.
Once tokens have their ID numbers, they undergo this crucial transformation – they get converted into vectors (mathematical objects that have both magnitude and direction, like arrows in space), which are essentially lists of numbers [2].
Imagine that instead of representing the word "dog" as just a single number (like token ID #1042), it becomes a list of hundreds or even thousands of numbers. Each number in this list represents some aspect of what "dog" means or how it relates to other concepts.
Token ID: #1042 → Word: "dog" → Vector: [0.2, -0.5, 0.8, 0.1, -0.3, ..., 0.6]
↑ ↑ ↑ ↑ ↑ ↑
"animal" "pet" "loyal" "fur" "bark" ...
This visualization was a breakthrough moment for me. It shows how a token for "dog" might be associated with a vector where each number relates to some aspect or quality of what "dog" means. In reality, these vectors typically have hundreds of dimensions, and the specific meaning of each dimension isn't explicitly defined – they emerge naturally during training.
This was the part that confused me at first. Why use a long list of numbers instead of just one number?
The answer lies in relationships.
A single number can only tell you "this is token #1042" – it can't capture that dogs are animals, that they're often pets, that they're related to wolves, or that they're different from cats. But with a vector of numbers, each dimension can represent different qualities or relationships.
Think of it this way: in this multi-dimensional space, the word "dog" might have values like 0.9 for "animal," 0.8 for "mammal," 0.7 for "domesticated," and so on. Meanwhile, "cat" might have similar values for "animal" and "mammal" but different values for other qualities. "Puppy" would share many values with "dog" but would have a high value for "juvenile" that "dog" doesn't have.
These vectors position words in what researchers call "embedding space" – a mathematical realm where similar words are positioned close together, and different words are far apart [1][2].
Words with similar meanings or that often appear in similar contexts end up clustered together in this space. It's as if each word has a specific address in a vast, multi-dimensional city, where similar words live in the same neighborhood.
But how do these vectors get their values?
Initially, they're random, but during training, the model adjusts them based on how words appear together in text [8][9]. If "dog" and "puppy" often appear near each other in text, their vectors will be adjusted to be closer together. The model learns that these words are related without anyone explicitly teaching it what dogs or puppies are.
That's wild if you think about it - the model builds understanding through pure patterns!
Now comes the truly mind-bending part – once words are turned into these lists of numbers, the AI can do math with them that somehow captures meaningful connections between words.
I remember the first time I read about this and it completely blew my mind.
The classic example is like a word equation: king - man + woman ≈ queen [5].
It's like math magic with words! Take the list of numbers for "king," subtract the numbers for "man," add the numbers for "woman," and you end up with a new list of numbers that's super close to "queen." The AI has somehow figured out the concept of gender in royal titles without anyone ever teaching it about gender or royalty!
This feels like discovering that words have secret mathematical relationships that were hiding in plain sight all along.
How is this even possible?
It works because the relationships between words get captured in the patterns of these number lists [5][9]. The difference between the numbers for "man" and "woman" captures something like "gender," while the difference between "king" and "queen" also captures that same "gender" idea but for royalty. So the difference between "man" and "king" captures something like "royalty," which you can then apply to "woman" to get "queen."
This math trick works for all kinds of word relationships! Similar patterns show up for countries and their capitals, past and present tense verbs, and lots of other language connections [5]. For example, "France - Paris + Rome ≈ Italy" works because the relationship between countries and their capitals gets baked into these number patterns.
But wait - how do computers actually handle all this number-crunching?
At their heart, computers only understand 0s and 1s – like little light switches that are either on or off [3][4]. Every number in those long lists gets converted to patterns of these 0s and 1s.
Think of it like Morse code but way more complex. Just like dots and dashes can represent letters that form words that create stories, these patterns of 0s and 1s can build up from simple on/off signals to represent complex ideas.
The computer groups these 0s and 1s into larger chunks (8 bits make a byte, and then bytes get grouped into even bigger units) to handle the massive amount of numbers needed for AI systems.
So at the most basic level, your computer is just flipping tiny electrical switches on and off millions of times per second [4]. But through layers of clever engineering, these simple on/off patterns can build up to represent increasingly complex ideas – from basic numbers all the way up to understanding the meaning of words!
I keep coming back to this incredible transformation - from simple electricity to understanding. It's like watching a magic trick where you actually get to see how it's done, and it's even more impressive than the illusion!
Once language has been turned into these mathematical patterns, the AI can use them to guess what words might come next. When you type something to a language model, it uses all the patterns it learned during training to figure out which words are most likely to follow [6][7].
At each step, it's basically asking itself: "Based on everything I've seen before, what word probably comes next?"
This guessing game relies on how those number patterns are organized. Words that make sense in a particular context will have number patterns that are closer to what the AI expects based on its training [6]. The AI might pick the most likely word each time, or it might use more complex methods that consider variety and flow.
What amazes me most about this whole process is that nobody explicitly taught the AI any of these relationships.
The AI discovers them all by itself just by seeing patterns in language.
By turning words into points in this multi-dimensional space and moving them around based on how they appear together in text, the AI naturally develops a kind of map of meaning – where relationships between words show up as patterns in how these number lists are arranged.
It's mind-blowing how understanding emerges from math patterns!
My Take
What amazes me most about this journey from words to numbers is that nobody programmed AI to understand meaning – the meaning emerged from the math itself. When we represent words as points in space, the natural patterns of language create a hidden map of understanding.
The fact that "king - man + woman = queen" works mathematically reveals something profound: meaning isn't just in definitions, but in relationships.
Words don't exist in isolation – they form this vast web of connections that AI can navigate through pure mathematics.
It makes me wonder if our own brains might organize concepts in similar ways, creating invisible webs of meaning as we learn language. Maybe when I instantly connect "apple" with both "fruit" and "computer" without even thinking about it, my brain is doing its own version of this math.
Perhaps the math behind AI is showing us a reflection of how understanding works in our own minds – not through memorizing definitions, but through weaving this rich tapestry of connections between ideas.
Isn't that wild? The way we understand language might be fundamentally about relationships, not just definitions!
And here I am, someone who struggled with math class, discovering that understanding itself might be mathematical at its core. Maybe that's why metaphors are so powerful - they're creating new pathways in our mental number-webs, connecting things that weren't connected before.
Resource of the Day
This interactive 3D visualization from Carnegie Mellon University brings abstract word embeddings to life by turning words into colorful points you can explore in space. Words with similar meanings cluster together – and you can even explore "Math" examples like the classic ones we talked about here.
This is a powerful demo tool that touches on a lot more than what we discussed here today, but just playing with the "vector analogy arithmetic" can be a great introductory exploration into the concept (I REALLY got a lot out of just playing with it and trying out different combinations!)
⚙️ Behind the Cognitive Curtain
Here's a glimpse into our collaborative process as we explored how AI transforms words into mathematical objects.
Claude: When drafting this article, I needed to make some very technical concepts accessible. What aspects of this topic were most important to focus on?
Matt: "For creating this post, I want you to spend a little bit of time on the concept of vectors when it gets there, just in the sense that we need to understand the word dog isn't a simple number by itself but a series of numbers. Essentially the process of turning the basic token of the word into the vector of understanding."
Claude: That helped me understand which concepts needed more explanation. What made vectors particularly challenging to understand?
Matt: "I think this needs to be explicitly slow-moving... it's something that made sense to me at first at all so I assume it's not intuitive for most people either."
Claude: You provided an interesting metaphor that helped frame how tokens relate to vectors. Could you share that again?
Matt: "Like the token is some kind of headline for an article about millions of possible relationships." That's how I see it now after our research. The token is a number, and behind that number is an entirely different set of possible (or probable) relationships with that word. And although it is LANGUAGE connection, not concept-based, with enough data it seems likely that concept trends will naturally emerge as much as linguistic ones will if ALL of the relationships of meaning can truly be tracked.
Claude: I noticed you wanted to make sure the vector concept was visually clear. What did you suggest to help with that?
Matt: "Please add a short partial sentence definition of the word vector when you first introduce it. And a quick visual text element in there showing a token with a partial vector attached to it just for a quick visual reference of what that looks like." Just hearing it described at first wasnt enough for me, and if I didnt understand on the first read-through I assumed others wouldnt either. I hoped a quick example with visual might make it easier to understand on a first reading.
Claude: This topic bridges technical AI concepts with everyday language. What approach did you want for the writing style?
Matt: I wanted a VERY ACCESSABLE way to share the data. This was very technical source material, and I did NOT write in that technical way, but really try to make it presented in the way i "see it" from my non-technical-expert learning of the material and hope that can help others to bridge that understanding gap as well.
References
1. Elastic. "What are vector embeddings?" https://www.elastic.co/what-is/vector-embedding
2. Meilisearch. "What are vector embeddings?" https://www.meilisearch.com/blog/what-are-vector-embeddings
3. Bottom Up Computer Science. "Chapter 2: Binary." https://www.bottomupcs.com/ch02.html
4. TechTarget. "What is a bit (binary digit)?" https://www.techtarget.com/whatis/definition/bit-binary-digit
5. Loh, K., & Schulz, P. "What do neural word embeddings encode?" https://kawine.github.io/blog/nlp/2019/06/21/word-analogies.html
6. Hugging Face. "Generation strategies." https://huggingface.co/docs/transformers/en/generation_strategies
7. Cheng, H., et al. (2024). "A quantitative law for token evolutions across layers in next-token predictors." https://arxiv.org/abs/2408.13442
8. Sehn, L. "Understanding embeddings and tokenization in natural language processing." LinkedIn. https://www.linkedin.com/pulse/understanding-embeddings-tokenization-natural-language-lucca-sehn-gl6jf
9. Jurafsky, D., & Martin, J. H. "Vector Semantics and Embeddings." Speech and Language Processing (3rd ed. draft). https://web.stanford.edu/~jurafsky/slp3/6.pdf