Schema for AI Search: Markup That Gets You Cited
Does schema markup get you cited by AI? The 2026 controlled studies say no — but schema still matters indirectly. Here's what to implement and why.
Eighty-one percent of pages cited by AI search engines carry schema markup — yet a landmark Ahrefs controlled study of 1,885 pages found that adding JSON-LD caused no statistically significant increase in AI Overview citations (Search Engine Roundtable, May 2026). That gap between correlation and causation is the central tension in structured data strategy for 2026, and understanding it is the difference between wasting implementation budget and building genuine AI visibility.
Quick answer:
Schema markup does not directly cause AI systems to cite you more — most LLMs treat JSON-LD as raw text and several fail to parse it at all. But schema still matters as entity infrastructure: it feeds Google's Knowledge Graph, enables rich results, and signals content quality. Implement it correctly for indirect, compounding benefits — not as a citation shortcut.
The Core Debate: Does Schema Actually Work for AI?
Two data sets dominate the conversation, and they point in opposite directions.
The correlation case is strong on its surface. AccuraCast analyzed 9,000 AI-cited sources across 2,000+ prompts in 2025 and found that 81% of cited pages had schema (enhancely.ai). WPRiders research puts pages with schema at 36% more likely to appear in AI citations. Relixir tracked 50 sites and found FAQPage schema delivered a 41% citation rate versus 15% without it. OtterlyAI's own GEO experiment (December 2025–March 2026) reported +611% more Google AI Overview citations after implementing schema on their SaaS site (otterly.ai).
The controlled experiment tells a different story. Ahrefs ran the most rigorous test to date in May 2026: 1,885 pages that added JSON-LD versus 4,000 matched control pages, tracked across Google AI Overviews, AI Mode, and ChatGPT for 30 days before and after schema addition. Results:
- AI Overviews: -4.6% (statistically significant decline, 1-in-2,500 odds of chance)
- AI Mode: +2.4% (indistinguishable from zero)
- ChatGPT: +2.2% (indistinguishable from zero)
OtterlyAI's own controlled test found that 6 out of 7 AI platforms failed to retrieve or correctly interpret schema markup when directly asked. Only Gemini succeeded (otterly.ai).
Why Both Data Sets Are Correct
The reconciliation is mechanistic. LLMs tokenize the contents of <script type="application/ld+json"> blocks as raw text, not as semantically validated JSON-LD. Many HTML-to-Markdown extraction pipelines used in AI search strip <script> tags entirely before the model ever sees the page. Mark Williams-Cook's February 2026 experiment confirmed this: he created a fake company with an address embedded exclusively inside invalid, made-up JSON-LD — never in visible HTML. Both ChatGPT and Perplexity successfully extracted the address, proving they treat schema blocks as plain text, not structured data (Search Engine Roundtable).
"Schema was invented for algorithms that couldn't handle unstructured text," wrote OtterlyAI's Rick Tousseyn. "AI search is built to do exactly that."
The correlation exists because pages with schema tend to be better-maintained, more technically sophisticated sites — the same sites with stronger content, more backlinks, and higher authority. Schema is "riding the wave of every other signal," as Ahrefs' analysis noted. As Gianluca Fiorelli put it in May 2026: "The Ahrefs study is right. And it's testing the wrong thing." Schema operates at the entity resolution layer, not the citation frequency layer (iloveseo.net).
What AI Platforms Actually Do with Structured Data
Understanding each platform's pipeline changes your implementation priorities.
Google AI Overviews and AI Mode
Google AI Overviews now reaches 2.5 billion monthly active users (Google I/O, May 19, 2026) and appears on approximately 48% of all tracked queries (BrightEdge, via averi.ai). Google is the one platform where schema has the most legitimate indirect impact. Schema feeds the Knowledge Graph — 500+ billion facts about 5+ billion entities — which in turn grounds AI Overview content. Google's own Search Central Blog (May 2025) stated that "Structured Data can improve targeting" and "plays a vital role in AI experiences." John Mueller confirmed in April 2025: "Structured Data can improve targeting for AI" — while also reiterating it is not a direct ranking factor.
The practical path: Schema → Rich Results → Higher CTR → Better Engagement Signals → Higher Rankings → More AI Visibility. It is indirect but real.
Citation overlap between AI Overviews and top-10 organic results has collapsed from 76% (July 2025) to 38% (March 2026), meaning Google AI increasingly pulls from pages not in the top 10 (Ahrefs, 863K keywords). E-E-A-T now functions as a near-binary gate: 96% of AI Overview citations clear the threshold (Wellows).
ChatGPT
ChatGPT processes over 2 billion queries daily and holds 81% of the AI chatbot market. It relies on Microsoft Bing's index for real-time search, meaning Bing's crawl and entity understanding are what matter. SE Ranking found that 71% of pages cited by ChatGPT include structured data — but as noted above, the mechanism is Bing's index enrichment, not real-time JSON-LD parsing. Over 56% of ChatGPT crawling happens to answer live user queries (ziptie.dev). Pages crawled by ChatGPT's bot get 3.2x more human traffic (astiva.ai).
Fresh content matters enormously: 60.5% of ChatGPT's most-cited pages were published within the last two years (Ahrefs). Pages updated in the past three months average 6 citations versus 3.6 for outdated content (DiscoveredLabs, discoveredlabs.com).
Perplexity
Perplexity uses real-time Retrieval-Augmented Generation (RAG), processing roughly 780 million monthly queries. Its referral traffic converts at 14.2% versus Google's 2.8% — a 5x quality multiplier (LLMRefs, via ziptie.dev). Reddit accounts for 46.7% of Perplexity's top citations (Wellows). Person schema with author credentials correlates with 2.3x higher citation rates on Perplexity (Onely). Notably, 92.78% of Perplexity's cited pages have fewer than 10 referring domains (FelloAI) — meaning authority as traditionally measured matters less than content clarity and format.
Gemini and Copilot
Gemini is the only platform that successfully retrieved schema markup when directly requested in OtterlyAI's tests. It has native Knowledge Graph integration and gives weight to VideoObject schema for how-to queries. Microsoft Copilot emphasizes Organization and LocalBusiness schema. Fabrice Canel, Microsoft's Principal Product Manager, stated at SMX Munich (March 2025): "Schema markup helps Microsoft's LLMs understand content."
Which Schema Types Drive AI Visibility
Not all schema types deliver equal value. The research is clearest on five.
| Schema Type | Primary Benefit | Key Stat |
|---|---|---|
| FAQPage | Highest AI citation probability | 41% citation rate vs. 15% without (Relixir, 2025) |
| Organization + sameAs | Entity disambiguation, brand identity | 67% of ChatGPT citations come from sites with this (AISEO.com.mx) |
| Article/BlogPosting | Foundational for content extraction | dateModified drives 3.2x more citations when recent (Digital Bloom) |
| Person (author) | Most under-implemented relative to impact | 2.1x citation rate increase in Claude (Astiva AI, 10,000+ responses) |
| BreadcrumbList | Signals topical hierarchy for AI chunking | Pages with it are consistently selected in AIO tests (Astiva AI) |
FAQPage schema deserves special attention. Google deprecated it for rich results in January 2026 (removing SERP display for most sites), but it is still processed and used by AI systems. Pages with FAQPage schema achieved 73% higher AIO selection rates (Wellows) and 3.2x more AI responses overall (Frase.io). Keep it where genuine Q&A content exists on the page — do not deploy decorative FAQ schema on template footers.
sameAs is the connective tissue of entity SEO. Linking your Organization schema to LinkedIn, Crunchbase, and Wikidata builds the entity graph that Knowledge Graph-driven platforms use to understand your brand. Sites with comprehensive Organization schema are 3.7x more likely to earn Knowledge Panels. Brands with three well-chosen sameAs entries materially outperform brands with none on entity-disambiguation queries (grupainsight.com).
Person schema is consistently underused. Pages without author credentials receive 60% fewer AI citations (Semrush, 2025). Pages with author bylines and credentials receive 2.5x more AI citations than anonymous content. This effect is particularly pronounced for Claude, which uses author entity data via sameAs links on YMYL topics.
Schema Types to Deprioritize
Google deprecated HowTo schema for desktop rich results in February 2026, and removed seven additional types in June 2025 (CourseInfo, ClaimReview, EstimatedSalary, LearningVideo, SpecialAnnouncement, VehicleListing, Book Actions). Deprecation affects SERP display only — the underlying schema is still processed by AI systems — but it should shift your implementation priority toward types that still earn rich results.
Critically: attribute-rich, complete schema earns a 61.7% citation rate while thin, minimally populated schema underperforms having no schema at all (41.6% vs. 59.8%) according to Growth Marshal's analysis of 730 citations. Google's March 2026 core update explicitly targeted schema abuse patterns. Bad schema is worse than no schema.
What Actually Drives AI Citations (Beyond Schema)
The research is converging on a set of content signals that matter more than schema for most platforms.
Content format. Evertune's May 2026 study of 400 million citations across ChatGPT, Copilot, Gemini, Google AI Mode, and Perplexity found that 63% of all LLM citations point to listicle pages, and 71–86% of those are ranked (numbered Top-N) lists. For informational queries, articles take 45.5% of citations versus 21.7% for listicles. Front-load your content: 44.2% of extracted passages come from the first 30% of a document.
The "Island Test." Princeton GEO research found that each paragraph should be semantically complete in isolation — AI systems chunk content into 200–300 word units, and self-contained chunks produce clearer embeddings. Avoid starting paragraphs with pronouns. Include context in each section. Semantic completeness (r=0.89 correlation with selection) means content scoring above 8.5/10 on self-contained answer ability is 4.2x more likely to appear (Wellows).
Freshness. AI Overviews prefer content 25.7% fresher than traditional organic results (Ahrefs). Artificially refreshing publication dates can improve AI ranking positions by up to 95 places. Update dateModified in your Article schema every time you substantively revise content.
Semantic HTML. Pages with clear H2/H3 hierarchy earn 2.8x more AI citations. Content with consistent heading levels is 40% more likely to be cited (Averi.ai). Pages with 120–180 words between headings receive 70% more citations (SE Ranking). Use HTML5 semantic elements (<article>, <section>, <header>) — they provide structural signals for AI chunking algorithms that complement schema (digitalapplied.com).
Third-party validation. Brand mentions correlate with AI visibility more strongly than backlinks (r = 0.664, Wellows). 82.9% of B2B AI citations come from third-party sources (Profound). Perplexity's top citation sources are Reddit (46.7%), Wikipedia (19.8%), and YouTube (13.4%). Being cited by authoritative third parties often matters more than schema on your own site.
For more on entity-based visibility signals, see Entities & Semantic SEO and AI Search & AEO.
What About llms.txt?
Introduced by Jeremy Howard of Answer.AI in September 2024, llms.txt is a proposed standard for giving AI crawlers structured access to site content. As of August 2025, analysis of 1,000 domains showed zero visits from major LLM crawlers to llms.txt files, and only about 951 domains had published one. Google's official May 2026 generative AI optimization guide explicitly stated that "llms.txt has no special treatment." Prioritize schema and content quality first; revisit llms.txt when adoption by crawlers materializes.
Practical Implementation: 30-Day Plan
Week 1 — Foundation
Audit current schema with Google's Rich Results Test and Search Console Enhancements reports. Implement Organization schema on your homepage with complete sameAs properties pointing to LinkedIn, Crunchbase, and Wikidata. Fix validation errors first — invalid schema can trigger penalties since Google's March 2026 core update.
Week 2 — Content schema
Add Article/BlogPosting schema to every content page. Populate author, datePublished, and dateModified — the last field is the one most commonly missing. Add Person schema for content authors with credentials and sameAs links to author profiles. Deploy BreadcrumbList on all pages two or more levels deep.
Week 3 — Answer surfaces Add FAQPage only where genuine Q&A content exists on the page. Structure questions as direct user queries; keep answers 40–60 words. Add Speakable markup to 2–3 key passages per long-form article to flag content suitable for voice and AI direct-quote extraction.
Week 4 — Entity graph and validation
Connect all schema into a single @graph on each page. Use consistent @id references across pages — entity fragmentation is a common mistake. Run validation on all template types. Establish a baseline measurement of AI citations across platforms using manual sampling or OtterlyAI.
Implementation notes:
- Use JSON-LD, not Microdata — 89.4% market share, Google's recommended format
- Place JSON-LD in the
<head>, not injected via JavaScript or Google Tag Manager (LLM crawlers parse raw HTML; GTM injection may not be read) - One FAQPage object per page — combine all questions into a single object
- Answers must be complete — no teasers requiring click-through
- Rich snippets take 1–4 weeks; AI citation improvements take 4–8 weeks; authority-building effects appear in 3–6 months
Frequently Asked Questions
Does schema markup directly cause AI systems to cite my content more?
No — not directly. Ahrefs' controlled study of 1,885 pages (May 2026) found no statistically significant citation increase from adding JSON-LD. Most LLMs treat schema blocks as raw text, not validated structured data. Schema works indirectly: it enriches Google's Knowledge Graph, improves entity disambiguation, and unlocks rich results that drive higher CTR and engagement signals, which in turn improve rankings and AI visibility (otterly.ai).
Which schema types matter most for AI Overviews and ChatGPT?
For Google AI Overviews: FAQPage (where genuine Q&A exists), Article/BlogPosting with complete author and date fields, Organization with sameAs, and BreadcrumbList. For ChatGPT, which relies on Bing's index: Organization, Article, and Person schema contribute to Bing's entity understanding. In both cases, thin or incomplete schema performs worse than no schema at all — completeness and accuracy matter more than type coverage (digitalapplied.com).
Is FAQPage schema still worth using after Google deprecated it?
Yes, with caveats. Google deprecated FAQ rich results for most sites in January 2026, meaning you will not see the dropdown FAQ display in SERPs. But FAQPage schema is still processed by Google and used by AI systems — OtterlyAI found FAQ content with FAQPage schema earned 350% more AI citations (2,379 vs. 529). Only use it where genuine Q&A content is visible on the page; decorative or templated FAQ schema invites penalties from Google's March 2026 schema abuse crackdown (peppereffect.com).
Does llms.txt help with AI search visibility?
Not yet. As of August 2025, analysis of 1,000 domains showed zero visits from major LLM crawlers to llms.txt files, and Google's May 2026 generative AI optimization guide stated it receives no special treatment. It is worth monitoring but should not displace schema implementation in your current priorities.
How do I measure whether schema is actually improving my AI visibility?
Manual sampling is the baseline: query AI platforms monthly with your target questions in clean/incognito sessions and document whether your content is cited. Tools like OtterlyAI ($29–$989/month depending on query volume) automate this across ChatGPT, Perplexity, Google AIO, AI Mode, Gemini, and Copilot. Google Search Console added AI Overview data under the "Web" search type in June 2025. Track brand mentions in AI responses alongside traditional Search Console impressions — the two metrics are increasingly diverging.
Originally published in the EcomExperts SEO library.