Rankly's reverse-engineering of ChatGPT Search's internal pipeline documented the exact moment your content lives or dies. Once ChatGPT decides a web search is needed, a specialised orchestrator model fetches candidate pages, splits them into 128-token chunks, and runs each chunk through a GPU-accelerated embedding model. Every chunk becomes a numerical vector. The query becomes a vector too. The system then scores every chunk against the query using cosine similarity, and completes that entire comparison, across every candidate chunk pulled from every fetched source, in roughly 100 to 200 milliseconds.
That is the actual decision point. Not your domain authority. Not your backlink count. A geometric comparison between two lists of numbers, computed at GPU speed, deciding whether your 128 tokens are close enough in meaning to the user's question to make the final cut. The RAG pipeline guide covered the four-stage retrieval architecture that surrounds this moment. This article opens up the specific mathematical operation at the centre of stage two, what an embedding actually is, how similarity is measured, and what that means for the words you choose to write.
What Is a Vector Embedding, in Plain Terms?
An embedding is a dense list of floating-point numbers, typically 768, 1024, or 1536 of them, that represents the meaning of a piece of text as a point in a high-dimensional mathematical space. The embedding model that produces this list has been trained so that texts with similar meaning end up as points that are geometrically close together, regardless of whether they share any actual words.
The canonical illustration: "How to fix a flat tire" and "Changing a punctured wheel" share zero keywords but describe the same task. A well-trained embedding model places these two sentences' vectors near each other in the embedding space, because it has learned the underlying conceptual relationship, not just the surface text. Meanwhile, "How to fix a flat organisational structure" shares three words with the first sentence but describes something entirely different, and a good embedding model places its vector far away from the tire-repair cluster despite the lexical overlap. This is the mechanism behind the semantic matching described in the RAG pipeline: retrieval by meaning, not by keyword.
Different engines use different embedding models with different dimensionalities. OpenAI's text-embedding-3-small and text-embedding-3-large power much of ChatGPT's retrieval infrastructure. Perplexity operates its own Embeddings API, explicitly documented as producing unnormalized vectors, a technical detail that determines exactly which similarity metric its systems must use, covered below. Google's infrastructure uses its own proprietary embedding models tied to Gemini. The specific model in use is invisible to a content creator, but the underlying mathematical behaviour is consistent enough across all of them that the same writing principles apply regardless of which engine is doing the scoring.
What Does Cosine Similarity Actually Measure?
Cosine similarity measures the angle between two vectors, ignoring their length. A cosine similarity of 1.0 means the two vectors point in exactly the same direction, as close to identical meaning as the model can represent. A score of 0.0 means the vectors are at a 90-degree angle, representing completely unrelated content. Negative values, which are possible but less common in practice, indicate vectors pointing in opposing directions, content that is semantically contrary rather than merely unrelated. In practice, across most production retrieval systems, useful, relevant results tend to fall in the 0.6 to 0.9 range.
The reason cosine similarity is the standard choice, rather than a simpler distance measure, comes down to a specific mathematical property that most embedding providers rely on. OpenAI states its embeddings are normalised to a length of 1, which means cosine similarity and Euclidean distance produce mathematically identical rankings for its vectors, most production systems still default to cosine, or an equivalent dot-product calculation, because it is computationally simpler and marginally faster at scale. Perplexity's embeddings, by contrast, are explicitly unnormalized, and Perplexity's own documentation is direct about the consequence: teams working with its base64_int8 embedding format must always compare using cosine similarity, never a raw dot product or Euclidean distance, or the resulting rankings will be wrong.
This technical distinction matters less for a content creator than the practical implication that follows from it: every major retrieval system, regardless of the specific engine underneath it, is fundamentally asking one question of your content, how close is the angle between what you wrote and what the user asked? Content that embeds into a vector pointing in a meaningfully different semantic direction from the query, even if it uses many of the same keywords, will score poorly. Content that embeds into a vector pointing in a closely aligned direction, even with different surface wording, will score well.
Why Does Approximate Search Matter More Than Exact Search at Scale?
The naive approach to vector search, compute the distance between your query vector and every single stored vector, then return the closest matches, is called exact nearest-neighbour search. It works cleanly for a database of ten thousand vectors. It becomes computationally impossible at the scale AI search operates at: at ten million vectors of 1,536 dimensions each, a single query would require roughly fifteen billion floating-point operations, far beyond what any production system can afford at the latency users expect.
Production retrieval systems instead use approximate nearest-neighbour search, a family of algorithms that trade a small amount of ranking accuracy for a dramatic improvement in speed. The practical consequence for content strategy is subtle but real: because the search is approximate, not exhaustive, a chunk that would technically be the single best match in a perfect exact search might occasionally be missed in favour of a chunk that scores marginally lower but is easier for the approximate algorithm to surface reliably. This is one of the underlying reasons that citation behaviour on the same query can vary slightly between repeated runs, the approximation introduces a small amount of legitimate variance that has nothing to do with your content changing at all.
How Should You Actually Write Content Given How Embeddings Work?
Four practical writing patterns follow directly from the mechanics described above, and each addresses a specific failure mode that shows up repeatedly when content that reads well to a human fails to embed well for a machine.
Write the concept, not just the keyword. Because embedding models cluster by meaning rather than exact wording, content that only uses one narrow phrasing of a concept embeds into a narrower, more fragile region of the vector space. Content that expresses the same idea through a few natural variations, the way a knowledgeable person would explain something multiple ways in conversation, embeds with a richer, more centrally-located vector that scores well against a wider range of query phrasings. This does not mean keyword stuffing with synonyms; it means genuinely explaining a concept from more than one angle within the same section.
Avoid vague, low-information-density language. ZipTie.dev's 2026 analysis of OpenAI's smaller embedding model found it could correctly identify that a passage was broadly about databases with high confidence, but showed far lower confidence pinpointing which specific database question the passage actually answered, a semantic relevance score around 48.6% against a retrieval accuracy of only 39.2%. Vague content produces vague embeddings: correctly clustered at a broad topic level, but positioned too centrally and non-specifically to win a precise cosine similarity contest against a competitor's more specific passage. Specificity is not a stylistic preference under this model, it is what pulls your vector away from the crowded topic centre and toward the precise coordinate the query vector is actually looking for.
Match the query's actual phrasing pattern for your engine. Rankly's analysis of ChatGPT Search found that when a user asks something like "best running shoes for 2026," the system's internal semantic query reformulation shifts the embedding weight disproportionately, in its worked example, roughly 80% of the vector's weight lands on quality-indicator language like "top," "best," and "awards," and only about 20% on the literal product noun. Content that anticipates this weighting, including genuine quality signals, comparative language, and named evidence, not just the product category term, aligns more closely with the actual vector the system is scoring against, not just the surface query text a user typed.
Keep each chunk self-contained at roughly the length the system actually uses. ChatGPT Search chunks at 128 tokens, a little under 100 words. A paragraph that spans a complete, specific claim within that window embeds as a coherent unit. A paragraph that requires the preceding or following paragraph for its meaning to make sense gets embedded as an incomplete fragment, and an incomplete fragment produces a noisier, less confidently-scored vector. The attention mechanism guide covers the related positional effect within a chunk; this is the upstream requirement that the chunk itself must first be a complete semantic unit before positional effects even become relevant.
How Does This Interact With Entity Naming and Disambiguation?
Embedding models encode entities as part of the overall semantic vector, and an ambiguous or inconsistent entity reference measurably degrades the precision of that vector. A chunk that says "it improves citation rates significantly" embeds with genuine uncertainty about what "it" refers to, spreading the resulting vector's confidence across multiple possible entity interpretations rather than pointing precisely at one. A chunk that says "FAQPage schema improves AI citation rates" embeds with a sharp, unambiguous entity anchor, producing a vector that a query about FAQPage schema specifically can match against with much higher cosine similarity.
This is the same underlying principle covered from a different angle in the entity disambiguation guide, consistent, explicit naming is not a stylistic nicety. At the embedding layer specifically, it is the difference between a vector that clusters tightly around one clear point in semantic space and a vector that is smeared across several ambiguous possibilities, diluting its similarity score against any single, precisely-worded query.
How NotionCue Helps You Confirm Your Content Is Actually Reaching the Embedding Layer
None of the writing guidance above matters if the content in question never reaches the embedding stage at all. A chunk with a perfectly engineered, specific, entity-anchored claim produces zero cosine similarity score if the retrieval system's crawler could not fetch the page in the first place, or if the schema and body text disagree so sharply that the extraction pipeline discards the chunk before it is ever embedded.
The NotionCue AI Crawler Audit confirms the prerequisite layer beneath everything covered in this article: whether GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, Claude-SearchBot, and Googlebot-Extended can actually reach your content in the server-rendered HTML response those embedding pipelines fetch from. A page that is technically unreachable never gets a cosine similarity score of any kind, positive or negative, it simply never enters the comparison. Once the audit confirms your content is reaching the crawlers reliably, the writing principles in this guide determine how well it performs once it does.
Start your free NotionCue trial and run the AI Crawler Audit on your highest-priority pages before investing further time in embedding-conscious rewriting, confirming reachability first avoids optimising prose that never gets the chance to be scored at all.
A quick, free test of your own content's embedding specificity: take a paragraph you believe should earn a citation, and ask ChatGPT or Claude to state, in one sentence, the single specific question that paragraph answers. If the model can only describe the paragraph's broad topic rather than the precise question it resolves, the paragraph is likely producing a vague, low-precision embedding. Rewrite it to lead with the specific claim, and run the same test again.
Frequently Asked Questions About Vector Embeddings and AEO
Can I check which embedding model a specific AI engine uses?
Not reliably for most engines. OpenAI and Perplexity publish some technical documentation about their embedding APIs, but the specific models and configurations used internally for live web-search retrieval are not fully disclosed and can change without notice. The practical approach is to write for the underlying mechanism, semantic specificity, entity clarity, self-contained chunks, rather than trying to reverse-engineer and optimise for one specific model's exact behaviour, since that behaviour is neither fully documented nor stable over time.
Does keyword density still matter at all if embeddings work on meaning?
Marginally, through a different mechanism than embeddings. Most production retrieval systems in 2026 use hybrid search, combining vector similarity with a sparse keyword-matching method like BM25, then merging the two result sets. This means exact-phrase matching on the specific terms a buyer actually types still carries some independent weight, alongside, not instead of, the semantic similarity score. The practical implication is that content should include the literal query phrasing at least once, in addition to the broader conceptual explanation that produces a strong embedding.
Why does the same query sometimes return different citations on repeated attempts?
Approximate nearest-neighbour search, described above, introduces a legitimate small amount of variance by design, trading perfect accuracy for the speed required at production scale. Beyond that, retrieval-based engines like Perplexity fetch live web content on every query, so a page that has been updated, or a competitor's page that has newly appeared, between two identical queries can genuinely change which content scores highest at that specific moment. Both effects mean single-query testing is inherently noisy, which is exactly why repeated, consistent tracking over multiple weeks, rather than a one-off manual check, is the reliable way to measure citation performance.