What I Learned From Ahrefs' Study on Why ChatGPT Cites Some Pages

There is a lot of speculation about what makes a page appear as a citation in ChatGPT's responses. Most of the advice in circulation is based on intuition, analogy to traditional SEO, or small anecdotal observations. Ahrefs decided to approach it differently — by analyzing large volumes of actual ChatGPT responses and the pages cited in them, then looking for patterns.

The findings are worth working through carefully. Some of them confirm what many practitioners already suspected. Others challenge assumptions that have become conventional wisdom in the GEO space. All of them have practical implications for how brands should think about earning citations in AI-generated answers.

What Ahrefs Actually Studied

Ahrefs collected thousands of ChatGPT responses across a range of query types and identified which pages ChatGPT chose to cite as sources. They then ran those cited pages through their own data infrastructure, comparing them against non-cited pages on the same topics to identify what distinguished the pages that got cited from those that did not.

The analysis looked at domain-level authority signals, page-level content characteristics, backlink profiles, organic search rankings, and technical crawlability. This made it possible to separate correlation from noise and identify which signals were most consistently associated with citation across different topic areas and query types.

Domain Authority Is the Strongest Predictor

The single factor most correlated with ChatGPT citations was domain authority — specifically, the backlink strength of the referring domain as measured by Ahrefs' Domain Rating metric. Pages on high-DR domains were cited at significantly higher rates than pages with comparable content on lower-authority domains.

This finding has a clear implication: ChatGPT's citation behavior reflects, at least in part, what the model learned during training about which sources tend to be trustworthy. Domains that accumulated strong backlink profiles over time were treated as more authoritative sources, and that authority carries into AI-generated responses.

The uncomfortable conclusion here is that a brand with highly relevant, well-written content on a relatively new or low-authority domain may consistently lose citations to older, more established competitors whose content is less specific but whose domain carries more weight. Brand-building and authority development are not separate from AI visibility strategy — they are central to it.

Organic Search Rankings Correlate Strongly With Citations

Pages that ranked on the first page of Google for relevant queries were substantially more likely to be cited by ChatGPT than pages ranking on pages two and beyond. This correlation was consistent across topic areas and held even when controlling for domain authority.

This matters because it suggests ChatGPT's training data — which included large amounts of web content — was not uniformly sampled. Pages that had earned visibility through traditional organic search were more likely to have been encountered, indexed thoroughly, and weighted positively during training. The result is a meaningful overlap between what ranks in Google and what gets cited in ChatGPT.

This does not mean traditional SEO and GEO are the same discipline. They are not. But it does mean that ignoring organic search performance while pursuing AI visibility is a mistake. A page that cannot compete in organic search is unlikely to earn consistent AI citations either, at least in the current generation of models.

Page-Level Content Quality Has Independent Predictive Power

Beyond domain authority and search rankings, Ahrefs found that certain content characteristics were independently associated with higher citation rates. Pages that addressed a query comprehensively, included specific data points, and were organized with clear headings and structured sections were cited more often than pages with thin or generalized content on the same topic.

This makes intuitive sense: a language model generating a response that cites sources is more likely to cite a page that gives it something specific and citable — a stat, a definition, a clearly organized explanation — than one that covers the topic in broad strokes. Specificity is not just a quality indicator for human readers. It is a signal to AI systems that a page contains concrete, reliable information worth attributing.

The practical takeaway is that content depth matters more than content volume. A single well-researched, data-grounded page on a narrow topic is likely to earn more citations than five pages of general coverage on the same subject area.

Structured Data Had a Measurable but Secondary Effect

Pages with structured data markup — particularly FAQ schema, Article schema, and HowTo schema — showed higher citation rates than comparable pages without it, but the effect was smaller than domain authority or organic rankings. Structured data appears to function as a secondary signal that helps when other factors are roughly equal, rather than a primary driver of citation.

This is consistent with how structured data works in traditional search. It does not compensate for weak authority or poor content, but it does provide a clear, unambiguous signal to parsing systems about what a page contains. When ChatGPT or another AI system is deciding between two roughly comparable sources, the page with cleaner structure and explicit metadata has an advantage.

The implication is not that structured data is unimportant. It is that it works in conjunction with the factors that matter most, not as a substitute for them. Brands that focus exclusively on schema markup without building underlying authority will not see the results they are hoping for.

What This Tells Us About How ChatGPT Selects Sources

The pattern across all of Ahrefs' findings is that ChatGPT's citation behavior is not arbitrary. It reflects a consistent underlying logic: prefer established sources, prefer content that has already proven its value through organic discovery, and prefer pages where the relevant information is structured and specific enough to be clearly attributable.

This means that many of the fundamental questions about ChatGPT citations can be answered by asking what a well-structured web of credible, authoritative information would look like. ChatGPT is not operating with alien logic — it is amplifying signals that the broader web ecosystem has been refining for decades.

It also means that the brands most likely to succeed at earning AI citations are not those who optimize narrowly for AI systems, but those who build genuine authority, publish genuinely useful content, and ensure that content is technically accessible. The shortcut strategies do not hold up in this framework.

The Practical Implications for Your Strategy

If your organization is trying to earn more citations in ChatGPT responses, the Ahrefs findings point toward a clear set of priorities.

Domain authority development should be treated as a long-term infrastructure investment. Earning links from credible industry sources, publications, and associations builds the foundation on which everything else rests. This is slow work, but it compounds over time and cannot be bypassed.

Organic search performance remains relevant. Pages that rank on the first page for their target queries are the pages most likely to appear in AI-generated answers about those topics. Strategies that treat SEO and GEO as competing priorities are misallocating effort.

Content depth and specificity matter independently of volume. A focused content audit that identifies where your existing pages are thin or generic — and prioritizes bringing those pages up to a standard where they contain something specific and citable — is likely to produce better results than creating additional general-purpose content.

Structured data and technical quality remain worth investing in, particularly because they operate as differentiators when other factors are comparable. FAQ schema, Article schema, and clean heading hierarchies give AI systems the structural hooks they need to reference your content accurately.

The Ahrefs study is a useful reminder that AI visibility is not a separate game with different rules. It is the same game of building credible, well-organized, useful information — played on a field where the stakes are higher because the audience increasingly never clicks through to find out what your page actually says. Getting cited is now the point, which means everything that makes a page worth citing is worth investing in seriously.

Source: Why ChatGPT Cites One Page Over Another — Ahrefs Blog

Want to improve your AI visibility?

Start with an AI Visibility Assessment. Receive a prioritized findings report and a consultation to review your roadmap.

Request an Assessment More Insights

What I Learned From Ahrefs' Study on Why ChatGPT Cites Some Pages

What Ahrefs Actually Studied

Domain Authority Is the Strongest Predictor

Organic Search Rankings Correlate Strongly With Citations

Page-Level Content Quality Has Independent Predictive Power

Structured Data Had a Measurable but Secondary Effect

What This Tells Us About How ChatGPT Selects Sources

The Practical Implications for Your Strategy

Want to improve your AI visibility?

Why Structured Data Is the Foundation of AI Visibility

What Is Generative Engine Optimization and Why Does It Matter?

How to Audit Your Website for AI Readability

What I Learned From Ahrefs' Study on Why ChatGPT Cites Some Pages

What Ahrefs Actually Studied

Domain Authority Is the Strongest Predictor

Organic Search Rankings Correlate Strongly With Citations

Page-Level Content Quality Has Independent Predictive Power

Structured Data Had a Measurable but Secondary Effect

What This Tells Us About How ChatGPT Selects Sources

The Practical Implications for Your Strategy

Want to improve your AI visibility?

Continue Reading

Why Structured Data Is the Foundation of AI Visibility

What Is Generative Engine Optimization and Why Does It Matter?

How to Audit Your Website for AI Readability