← All research
Research · Citations

How AI engines choose what to cite

Every AI answer is built on a handful of sources. Across hundreds of thousands of citations, a clear and surprising pecking order emerges — one where community forums beat the world's best newsrooms, and where each engine reads from its own library.

11 min read7 chartsUpdated June 2026Mentionova Research
Share of all AI citations, by domain~150,000 citations across ChatGPT, Perplexity & Google AI
Source: Semrush / Visual Capitalist analysis of ~150,000 AI citations. Domains can co-occur in a single answer, so shares do not sum to 100%.

When an AI engine answers a question, it is not pulling from "the whole web" in any even way. It leans, heavily, on a short list of sources it has learned to trust — and that list looks almost nothing like the one a journalist or a search marketer would predict.

In an analysis of roughly 150,000 citations across the major engines, one domain towered over the rest: Reddit accounted for about 40% of all AI citations, ahead of Wikipedia (26%) and YouTube (24%). The implication is bracing — the models trust a forum of strangers more than almost any institution.

The newsroom blind spot

It gets sharper. A 5W Research study of U.S. ChatGPT answers found that Wikipedia and Reddit together drive more than 25% of citations — while the Wall Street Journal, the New York Times, and Bloomberg do not appear in the top 20 sources at all. The paywall that protects premium journalism also makes it invisible to the machines now summarizing the world.

0Top-tier newsrooms in ChatGPT's top 20. In the 5W study of U.S. ChatGPT citations, WSJ, NYT and Bloomberg were absent from the twenty most-cited sources — outranked by forums, wikis, and review sites.

The models don't cite the best-written source. They cite the most accessible, structured, consensus-shaped one.

Each engine reads a different library

"AI visibility" is not one score, because the engines don't share a reading list. ChatGPT skews toward Wikipedia-style reference material. Perplexity, built citation-first, leans hardest on Reddit and forums. Google's AI Overviews — unsurprisingly — pull from the same web Google already indexes, with Reddit prominent. Claude is the most conservative, citing less and demanding cleaner sourcing.

Where each engine looksrelative citation emphasis by source type
Darker = stronger reliance on that source type. Directional, synthesized from the platform citation studies below.

This is why a brand can be the default answer on Perplexity and completely absent on Gemini for the same question. There is no "rank" to check — there are six verdicts, and they routinely disagree.

Leading source by enginemost-cited single domain, % of citations
Perplexity and Google AI Overviews both lead with Reddit; ChatGPT leads with Wikipedia. In ChatGPT's U.S. answers, Wikipedia (13.2%) and Reddit (12.0%) alone make up roughly a quarter of all citations (5W Research).

Citations are volatile

The most important — and least appreciated — fact about AI citations is that they move. In September 2025, Reddit's share of ChatGPT citations collapsed from roughly 60% to about 10% in two weeks, almost certainly the result of a quiet change in how the model sourced answers. Nobody got a memo. Brands that had built their whole presence on Reddit threads simply… vanished from the answer.

Reddit's share of ChatGPT citations2025 — a two-week collapse
A single sourcing change erased most of one platform's reliance on the web's most-cited domain. This is why visibility has to be monitored, not assumed.

A citation you can't see change is a citation you can lose overnight.— why we track on a two-hour clock

What actually earns a citation

Across the studies, the patterns behind getting cited are consistent even as the specific domains churn:

  • Consensus & community. Forums and wikis win because they read as many-voices agreement, not one brand's claim.
  • Structure. Clean headings, lists, and tables are machine-readable; walls of prose are not.
  • Accessibility. If a crawler can't read it (paywall, JS, PDF maze), it can't be cited — full stop.
  • Specific, checkable facts. Numbers and named comparisons get lifted into answers far more than adjectives.

We turned those findings into a step-by-step method in the next report: the GEO playbook, built on the Princeton benchmark that lifted AI visibility by up to 40%. And the macro picture behind all of it lives in The State of AI Search, 2026.

Sources

  1. Semrush / Visual Capitalist — analysis of ~150,000 AI citations (Reddit 40.1%, Wikipedia 26.3%, YouTube 23.5%)
  2. 5W Research / PR Newswire — Wikipedia & Reddit drive 25%+ of U.S. ChatGPT citations
  3. Profound — AI platform citation patterns (ChatGPT, AI Overviews, Perplexity)
  4. ZipTie — Why Reddit dominates AI citations
  5. Search Engine Roundtable — ChatGPT sources Wikipedia, Google AIO sources Reddit
  6. Am I Cited — Reddit citation volatility, 2025
Free AI visibility report

Who does the AI cite for you?

See the exact sources the engines pull from in your category — and where your domain ranks among them.

https:// Get my report