How AI engines choose what to cite
Every AI answer is built on a handful of sources. Across hundreds of thousands of citations, a clear and surprising pecking order emerges — one where community forums beat the world's best newsrooms, and where each engine reads from its own library.
When an AI engine answers a question, it is not pulling from "the whole web" in any even way. It leans, heavily, on a short list of sources it has learned to trust — and that list looks almost nothing like the one a journalist or a search marketer would predict.
In an analysis of roughly 150,000 citations across the major engines, one domain towered over the rest: Reddit accounted for about 40% of all AI citations, ahead of Wikipedia (26%) and YouTube (24%). The implication is bracing — the models trust a forum of strangers more than almost any institution.
The newsroom blind spot
It gets sharper. A 5W Research study of U.S. ChatGPT answers found that Wikipedia and Reddit together drive more than 25% of citations — while the Wall Street Journal, the New York Times, and Bloomberg do not appear in the top 20 sources at all. The paywall that protects premium journalism also makes it invisible to the machines now summarizing the world.
The models don't cite the best-written source. They cite the most accessible, structured, consensus-shaped one.
Each engine reads a different library
"AI visibility" is not one score, because the engines don't share a reading list. ChatGPT skews toward Wikipedia-style reference material. Perplexity, built citation-first, leans hardest on Reddit and forums. Google's AI Overviews — unsurprisingly — pull from the same web Google already indexes, with Reddit prominent. Claude is the most conservative, citing less and demanding cleaner sourcing.
This is why a brand can be the default answer on Perplexity and completely absent on Gemini for the same question. There is no "rank" to check — there are six verdicts, and they routinely disagree.
Citations are volatile
The most important — and least appreciated — fact about AI citations is that they move. In September 2025, Reddit's share of ChatGPT citations collapsed from roughly 60% to about 10% in two weeks, almost certainly the result of a quiet change in how the model sourced answers. Nobody got a memo. Brands that had built their whole presence on Reddit threads simply… vanished from the answer.
A citation you can't see change is a citation you can lose overnight.— why we track on a two-hour clock
What actually earns a citation
Across the studies, the patterns behind getting cited are consistent even as the specific domains churn:
- Consensus & community. Forums and wikis win because they read as many-voices agreement, not one brand's claim.
- Structure. Clean headings, lists, and tables are machine-readable; walls of prose are not.
- Accessibility. If a crawler can't read it (paywall, JS, PDF maze), it can't be cited — full stop.
- Specific, checkable facts. Numbers and named comparisons get lifted into answers far more than adjectives.
We turned those findings into a step-by-step method in the next report: the GEO playbook, built on the Princeton benchmark that lifted AI visibility by up to 40%. And the macro picture behind all of it lives in The State of AI Search, 2026.
Sources
- Semrush / Visual Capitalist — analysis of ~150,000 AI citations (Reddit 40.1%, Wikipedia 26.3%, YouTube 23.5%)
- 5W Research / PR Newswire — Wikipedia & Reddit drive 25%+ of U.S. ChatGPT citations
- Profound — AI platform citation patterns (ChatGPT, AI Overviews, Perplexity)
- ZipTie — Why Reddit dominates AI citations
- Search Engine Roundtable — ChatGPT sources Wikipedia, Google AIO sources Reddit
- Am I Cited — Reddit citation volatility, 2025