How LLMs decide what to cite. The 6 signals that drive AI recommendations

Why does ChatGPT cite some brands and not others for the same query? The mechanism is more knowable than "AI is a black box" suggests. This piece walks through the actual decision pipeline LLMs run when answering a real-world recommendation query, plus the six signals that determine which sources end up in the response.

The framework synthesizes the 2024 Princeton GEO paper (which empirically tested signal effects across 3 LLMs and 40K queries) with ~18 months of practitioner data from real client audits. It's not theory. It's the model that holds up when you reverse-engineer hundreds of cited responses across categories.

The 4-stage decision pipeline

Every LLM that produces a recommendation runs roughly the same pipeline, with implementation details varying per engine. Understanding the pipeline is the precondition for understanding the signals.

Stage 1: Query understanding

The LLM parses your input into intent + entities + constraints. It decides whether to run a real-time web search or rely on training data. Triggers for web search include:

Time-sensitive markers ("current", "latest", "2026", "today")
Recommendation queries ("best X for Y")
Comparison queries ("X vs Y")
Local or specific-region queries
Queries where the model has low confidence from training alone

If web search is triggered, the LLM moves to retrieval. If not. The response comes purely from training data, and your brand's visibility depends on what was in the model's training corpus.

Stage 2: Retrieval

The LLM runs one or more web searches (Bing for ChatGPT/Copilot, Google for Gemini, Perplexity's own index for Perplexity, etc.). Typically retrieves 10–30 results, optionally re-ranks them by semantic match to the original query.

Critical implication:to be available for citation, you must first be retrievable. That means good SEO underpins GEO. Pages that don't rank in the underlying search engine's top results almost never make it into the LLM's candidate set.

Stage 3: Source selection

From the retrieved candidates, the LLM picks 3–7 sources to ground its response. This is where the six signals (below) actually operate. Source selection optimizes for:

Coverage. Together, the picked sources should answer the whole query
Diversity. Three sources saying the same thing add nothing; one unique source gets cited specifically
Authority. Established domains weighted higher, but less heavily than in Google ranking
Freshness. For time-sensitive queries, recency matters a lot
Confidence. Sources the LLM "trusts" based on whatever proxies it has

Stage 4: Synthesis

The LLM generates the response, weaving information from the selected sources. Attribution is shown as inline citations, numbered footnotes, or a sources panel depending on engine. Brand mentions can be:

Bold inline. "I recommend **AcmeSaaS**". The strongest cited form
Listed. Appearing in an enumerated list of options
Referenced indirectly. "a popular Slovenian platform" without naming you (counts as a miss in your audit)
Linked but not named. Your URL in the citations panel but no mention in the response body (partial win)

Your GEO work targets Stage 3 (source selection). That's the addressable point in the pipeline.

The 6 signals that move citation

The Princeton paper tested ~20 signal candidates and found six that consistently moved citation rate by 30–40% per signal. Practitioner audits since have confirmed and refined these. In rough order of leverage:

1. Authoritative quoting (cite others to be cited)

Pages that themselves cite credible external sources get cited moreby LLMs. This is counterintuitive but empirically robust. LLMs use "does this source cite others" as a proxy for "is this source intellectually honest."

Tactical: every content page should include 2–5 external citations to authoritative sources (research, government data, established industry reports). It costs nothing and meaningfully lifts citation eligibility.

2. Statistics and data

Pages with original numbers (research, surveys, internal data) are cited 30–40% more than narrative pages on the same topic. LLMs strongly prefer sources where they can pull a number to anchor the response.

Tactical: every brand should publish at least one "X by the numbers" page per quarter. Doesn't have to be huge research. Even a small internal data point + clear visualization beats a fluffy blog post 10× for citation purposes.

3. Fluency and clarity

Well-written, scannable content beats walls of text. The LLM has to extract usable text. If your prose is dense, ambiguous, or full of marketing fluff, it's harder to extract. Pages that score well on Flesch readability metrics also score well on citation rate.

Tactical: short paragraphs (3–4 sentences), descriptive subheads, bullets over walls of prose, plain language over marketing copy.

4. Comprehensiveness

Pages that cover a topic fully are cited over pages that cover only part of it. LLMs prefer one comprehensive source they can extract from over multiple fragments they have to combine. This is why "definitive guide"-style content outperforms short blog posts in GEO measurement.

Tactical: pillar pages that answer the entire question. Not a teaser leading to gated content.

5. Freshness

Heavy weight for time-sensitive queries. A page updated 2 months ago beats a page updated 18 months ago on the same topic. For evergreen topics, freshness matters less but still measurably.

Tactical: review and update top-traffic pages quarterly. Update the <lastmod> in the sitemap. Add an explicit "updated MM/YYYY" line in the article.

6. Semantic uniqueness

If your content says the same thing as everyone else's content, you're fungible. The LLM picks the highest-authority source. If your content says something the others don't, that's where you get specifically cited.

Tactical: pick one strong opinion or framing per article that nobody else in your category has stated as clearly. That's the citation hook. Bland consensus content is invisible.

Stack effects

These signals stack multiplicatively, not additively. A page that has all six gets cited far more than 6× a page with one. From practitioner data:

Signals present	Relative citation rate
0–1	Baseline (1×)
2–3	~2× baseline
4–5	~5× baseline
All 6	~9× baseline

The implication: the marginal value of moving from "some signals" to "all signals" is enormous. Most pages that miss citation are missing 4+ signals, not just one. The fix is rarely a single tweak. It's a coordinated rebuild.

What about the off-site signals?

The six above are on-page signals. Off-site signals. Third-party citations, comparison articles, Reddit/Quora presence, G2/Capterra rows. Operate through a separate mechanism: they affect which pages and sources show up in the retrieval set in the first place.

That's why the 5 gap patterns include both kinds. Three of them (Missing stats page, Comparison gap, Unparseable site) are on-page. Two (Reddit vacuum, Empty third-party listings) are off-site. The audit catches both because both affect citation rate, just at different stages of the pipeline.

How to use this in practice

Pick any page on your site you'd like to be cited for a specific query. Score it 0–6 on the signals above. Be honest. Most pages score 1–2.

Adds citations to other sources? +1
Contains original or specific data? +1
Reads cleanly, scans well? +1
Comprehensively answers the topic? +1
Updated within last 6 months? +1
Says something other sources don't? +1

For each missing signal, you have a specific, addressable lever. That's the real product of understanding the mechanism. Clarity on what to actually change.

The black-box framing of AI is partly true (we don't know the exact internal weights) but mostly a marketing trope. The mechanism is observable, the signals are testable, and the work is real engineering. The agencies that win this category over the next 24 months will be the ones operating at this level of mechanism awareness. Not the ones selling "AI optimization" as a vague service.