How It Works September 15, 2025 ⏱ 2 min read

Embeddings: Turning Words into Numbers

How computers understand the meaning of words by mapping them into multi-dimensional space.

embeddingsvectorsnlp

Embeddings: Turning Words into Numbers

If you want a computer to understand language, you can’t just give it strings of text. Computers do math. They need numbers.

Embeddings are the translation layer that converts words (or tokens) into lists of numbers (vectors) that capture their meaning.

The Core Concept: Words as Coordinates

Imagine a 2D graph.

X-axis: “Royalty”
Y-axis: “Masculinity”
King might be at [0.9, 0.9] (Very Royal, Very Masculine)
Queen might be at [0.9, 0.1] (Very Royal, Not Masculine)
Man might be at [0.1, 0.9]
Woman might be at [0.1, 0.1]

Because these are numbers, we can do math with them. The most famous example in NLP history is: $$ \text{King} - \text{Man} + \text{Woman} \approx \text{Queen} $$

If you take the vector for King, subtract the “Man” vector, and add the “Woman” vector, the resulting coordinates land closest to “Queen.”

High-Dimensional Space

Real AI models don’t just use 2 dimensions. They use thousands. OpenAI’s text-embedding-3-small uses 1,536 dimensions.

This means every chunk of text is represented by a list of 1,536 numbers. These numbers capture incredibly subtle nuances:

Syntax
Tone (happy vs sad)
Subject matter (biology vs coding)
Context

Vector Search (Semantic Search)

Embeddings powered the search revolution. Old search engines used Keyword Matching. If you searched “canine diet,” and a page said “dog food,” it might miss it because the words don’t match.

With Vector Embeddings:

Convert query “canine diet” -> Vector A.
Convert page “dog food” -> Vector B.
Calculate the distance (Cosine Similarity) between Vector A and Vector B.
The distance is very small because “canine” and “dog” are semantically close in the vector space.

The system understands that the concepts are related, even if the words are different.

Visualizing the Space

If you could visualize this 1,536-dimensional space (which humans can’t, but math can), you would see clusters:

All fruits (apple, pear, banana) are clustered together in one corner.
All coding terms (python, variable, function) are in another cloud.
Positive emotions are far away from negative emotions.

Summary

Embeddings are the bridge between human concepts and machine calculus. They turn “meaning” into “geometry,” allowing us to calculate the distance between ideas.