Why Reddit runs the AI answer
A forum of strangers is the most-cited source in artificial intelligence — bigger than Wikipedia, bigger than YouTube, bigger than every newsroom combined. Here is why the models trust it, what kind of thread actually gets cited, and why that trust is more fragile than it looks.
Reddit is the single most-cited domain in artificial intelligence. In a Semrush analysis of roughly 150,000 citations across the major engines, Reddit accounted for about 40.1% of them — ahead of Wikipedia at 26.3% and YouTube at 23.5%. The models that now answer the world's questions trust a forum of anonymous strangers more than almost any institution on the internet.
This is not an accident, and it is not stable. Reddit's dominance is the product of three forces that reinforce one another — and it can be undone, almost overnight, by a single quiet change inside one model. This guide explains both halves: why Reddit runs the AI answer, and why no brand should treat that as a permanent fact.
Why do AI engines cite Reddit so much?
AI engines cite Reddit because three forces stack: licensing deals made its corpus machine-readable, its question-and-answer format maps onto how models answer, and its content carries first-hand experience the models have learned to trust. No other single source combines all three at Reddit's scale.
1. The models pay for the data
In February 2024, Reddit signed a content-licensing deal with Google worth about $60 million a year, giving Google direct, real-time access to Reddit's posts to train and ground its AI. Months later, OpenAI struck a similar agreement estimated near $70 million a year. Reddit disclosed roughly $203 million in total licensing revenue for 2024 — a corpus the largest model makers have explicitly paid to read.
2. The format fits how models answer
Most Reddit threads are a question followed by ranked human answers — the exact shape a model needs when a user asks one. Reference pages tell an engine what something is; a Reddit thread tells it so what — which option people actually picked, what broke, and what they would do differently. The structure is pre-chunked for retrieval, and the answer is already written.
3. It reads like real experience
Reddit answers tend to open with "I switched from X to Y and here's what happened" — the first E in Google's E-E-A-T framework, experience, at enormous scale. Models trained on that pattern learn to prize content that sounds like someone who actually did the thing. Polished brand copy rarely clears that bar; an honest comment thread does.
Wikipedia tells the model what something is. Reddit tells it what happened when a real person tried it. The answer engine wants both — and only one of them is for sale on your own website.
Does Reddit dominate every engine equally?
No. Every engine leans on Reddit, but by different amounts and in different ways, because each reads its own slice of the web. Reddit is the top single source on Perplexity, a leading source inside Google's AI Mode, and a major but more volatile one in ChatGPT. There is no single "Reddit number" that holds across all of them.
That split matters for any brand trying to measure its standing. A glowing Reddit thread can make you the default on Perplexity and do almost nothing on Gemini. "AI visibility" is never one score — it is six verdicts that disagree, which is exactly why we cover the full picture in how AI engines choose what to cite.
What kind of Reddit post actually gets cited?
The cited post is rarely the viral one. In Semrush's analysis of 248,000 cited Reddit URLs, about 80% had fewer than 20 upvotes, 70% had fewer than 20 comments, and the average cited post was roughly 900 days old. Engines reward durable, specific, on-topic answers — not the threads that won the day on the front page.
Format is the strongest predictor. Three thread types — direct question-and-answer, product comparisons, and discussion threads — together account for roughly three quarters of all cited Reddit content, with Q&A alone making up more than half. If a thread answers a specific question in plain language, it is in the running regardless of its score.
| Signal | What the data shows | Why it matters |
|---|---|---|
| Upvotes | 80% of cited posts under 20 | Score is not the gate — relevance is |
| Age | ~900 days on average | Engines favor settled, evergreen consensus |
| Length | ~80 words median | Tight, specific answers chunk cleanly |
| Format | Q&A, comparison, discussion (~75%) | The thread is already an answer |
Can you manufacture a Reddit citation?
You can try, and it usually backfires. Brands and agencies now seed subreddits with promotional posts engineered to be scraped — a practice moderators are catching, and one the models punish. Engines ingest Reddit's full edit and moderation history, so a comment flagged as spam or astroturfing becomes a lasting negative signal that ties your brand to manipulation.
The mechanics that make Reddit trustworthy are the same ones that make it hard to fake. You cannot retro-fit 900 days of consensus, and a removed post does not vanish from a model that already read it. Moderators of communities like r/biohackers have publicly exposed companies seeding sponsored content for AI tools to harvest — the kind of story that lives forever in the training data.
But does a Reddit citation win the recommendation?
Not on its own. Being cited is not the same as being recommended. Large aggregate studies draw on randomized keywords, so Reddit and YouTube pile up citations simply by covering everything. On high-intent, bottom-of-funnel buying questions, engines often cite Reddit for context while recommending specific category tools by name — a distinction explored in Search Engine Land's analysis of what actually drives AI recommendations.
A citation is a footnote. A recommendation is the sentence. The brands that win get named in the answer, not just linked beneath it.— cited ≠ recommended
The strategic read: a strong Reddit presence in the specific communities that shape your category is worth pursuing, but it is one input, not the whole game. Owned content that states plainly who you serve and where you win still does the heavy lifting — the durable framework lives in the GEO playbook and the broader answer engine optimization guide.
How volatile are Reddit citations?
Extremely. In September 2025, ChatGPT's Reddit citation share fell from roughly 60% of answers to about 10% in two weeks, after OpenAI moved to avoid over-citing a small set of sites. No announcement preceded it. Reddit's stock dropped about 14% in five days on the reporting, and brands built entirely on Reddit threads simply thinned out of the answer.
The lesson is not "ignore Reddit." It is that any one source — even the most-cited on the internet — is a position you can lose without warning. A citation you cannot see change is a citation you can lose overnight, which is the whole reason AI brand monitoring exists.
How do you earn a Reddit citation honestly?
You earn it the slow way: by being genuinely useful in the communities that shape your category, then measuring whether it moves the answer. The goal is not a viral post — it is a specific, durable, plain-language answer in a thread the models already read.
- Find the threads that decide your category. Identify the subreddits and Q&A threads engines already cite for your buying questions — those are the rooms that matter.
- Answer the actual question. Contribute specific, first-hand, comparison-style answers — the formats that make up ~75% of citations — not pitches.
- Be honest about who you are. Disclose affiliation; manipulation leaves a permanent trail and the downside dwarfs the upside.
- Let consensus build. Cited posts average ~900 days old. Helpfulness compounds; it does not spike.
- Track it across engines. A Reddit win can lift one model and not another, and can reverse in a week — so measure it where it actually appears.
Key takeaways
- Reddit is the most-cited domain in AI — roughly 40% of all citations, ahead of Wikipedia and YouTube.
- It wins on three reinforcing forces: paid data access, a Q&A format models can lift, and first-hand experience at scale.
- The cited post is quiet, not viral — 80% have under 20 upvotes; format beats popularity.
- Citations are volatile and being cited is not being recommended — so a Reddit strategy must be earned honestly and measured continuously.
Reddit runs the AI answer today because the models pay to read it, it is shaped like an answer, and it sounds like a real person. None of that guarantees it runs the answer next month — and none of it guarantees the brand it cites is yours. The only way to know where you stand is to watch the answer itself, across every engine, as it changes.
Sources
- Semrush — The Most-Cited Domains in AI: A 3-Month Study. Reddit 40.1%, Wikipedia 26.3%, YouTube 23.5%; the September 2025 ChatGPT collapse from ~60% to ~10%.
- Semrush — We Analyzed 248K Reddit Posts: What Drives Visibility in AI Search. 80% of cited posts under 20 upvotes; ~900-day average age; format mix.
- CBS News — Google strikes $60 million deal with Reddit to train AI on human posts.
- Columbia Journalism Review — Reddit Is Winning the AI Game (licensing revenue, OpenAI deal).
- 5W Research / PR Newswire — Wikipedia and Reddit Drive Over 25% of U.S. ChatGPT Citations.
- Search Engine Land — Stop chasing Reddit and Wikipedia: What actually drives AI recommendations (cited vs recommended).
- Mentionova Research — How AI Engines Choose What to Cite and AI Brand Monitoring.