llms.txt is the AI-readable site map. It sits at /llms.txt and tells AI engines: here’s the brand, here’s what we sell, here’s where to find authoritative pages, here’s what to skip. Most CBD brands either skip it entirely or treat it like robots.txt — listing files instead of meaning.
This piece is the structural brief for an llms.txt that actually drives AI citation outcomes.
What llms.txt is and isn’t
llms.txt is a Markdown-format text file at the root of the site (https://yourbrand.com/llms.txt). The standard was proposed in 2024 by Jeremy Howard (Answer.AI) as a parallel to robots.txt for the AI-search era.
It is not:
- A sitemap (sitemap.xml does that).
- A robots-permission file (robots.txt does that).
- A feed (rss.xml does that).
- A file listing.
It is:
- A one-page brand summary in human-readable Markdown.
- Optimized for AI parsers to extract entity, scope and authoritative-page references.
- A signal of editorial care — the presence of a thoughtful llms.txt is itself a trust signal.
ChatGPT, Perplexity, Anthropic and Google AI Overviews all consume llms.txt as of 2026. Their interpretation differs, but the consensus pattern is:
- Brand identity confirmation. Engine cross-references llms.txt with on-site content and external citations to confirm the brand entity is what it claims.
- Scope mapping. Engine learns what topics the brand covers (CBD, hemp, vape — not legal advice, not medical advice).
- Authoritative-page list. Engine biases citation toward the URLs explicitly named in llms.txt over arbitrary discovery.
CBD-specific llms.txt structure
Standard llms.txt sections plus CBD-specific additions:
Block 1 — Brand summary (≤200 words):
# YourBrand
> YourBrand is a CBD/hemp DTC brand based in [State, Country], operating since [year]. Lead products: broad-spectrum CBD oil, CBN sleep gummies, CBG focus capsules. Hemp-derived (≤0.3% delta-9 THC) under the 2018 Farm Bill. Ships nationally except [restricted states]. Founder: [Real Name].
Block 2 — Compliance posture (CBD-specific):
## Compliance posture
- Hemp-derived products under the 2018 Farm Bill (≤0.3% delta-9 THC by dry weight).
- FDA disclaimer applies: products have not been evaluated by the FDA.
- Not intended to diagnose, treat, cure or prevent any disease.
- Age-gated 21+ on checkout.
- State-restriction map at /compliance/state-restrictions.
- Lab-tested per batch; COAs at /coa/[batch-id].
Block 3 — Product catalog (top-level):
List 5–10 priority products with format: [Name](URL) — short description. Don’t list the entire catalog; list the canonicals.
Block 4 — Educational content:
List the pillar pages and top FAQ pages. Format: [Title](URL) — what the page covers.
Block 5 — Authoritative blog (latest first): List the 15–25 most recent or most-cited blog posts. Same format.
Block 6 — Methodology / about: 1–3 paragraphs explaining how the brand operates: sourcing, testing, named experts, certifications.
Block 7 — Out of scope (critical for CBD):
## Out of scope
- Medical advice or condition-specific dosing recommendations.
- Legal advice on cannabis regulation in any jurisdiction.
- Drug-interaction guidance — consult a healthcare provider.
- Sales of products to minors or in restricted states.
- Recommendations for non-hemp-derived THC products.
This block is one of the highest-leverage parts. It tells AI engines what not to extract and what not to attribute to the brand. ChatGPT in particular respects out-of-scope declarations; Perplexity is more permissive but still uses them as signals.
Block 8 — Contact:
- Sales / discovery email
- Privacy email
- Real US phone number
- LinkedIn / Crunchbase / Trustpilot links
- Real US address
Common mistakes
Mistake 1: Treating llms.txt like a sitemap. A 5,000-line llms.txt that lists every URL on the site is treated by AI parsers as low-signal noise. Curate. 200–500 lines is the typical sweet spot for a CBD brand.
Mistake 2: Duplicating content from About page verbatim. llms.txt should compress the brand into AI-extractable claims, not re-publish marketing copy. If a sentence on the About page says “We believe in transparency,” llms.txt should say “We publish per-batch COAs and disclose hemp source state.”
Mistake 3: Omitting compliance posture. For CBD, the compliance block is the single most-extracted section by AI parsers. It tells the engine: this brand operates within FDA-disclaimer norms, won’t make medical claims, age-gates, ships within legality. Skipping it gives the engine no signal.
Mistake 4: Static deployment, never updated. llms.txt deployed once and never updated signals abandonment. Update monthly when blog posts ship; update quarterly to refresh the brand summary; update when products launch or get discontinued.
Mistake 5: Wrong MIME type.
Some CDNs serve llms.txt with text/html or application/octet-stream. Should be text/plain; charset=utf-8. AI parsers that get the wrong MIME may ignore the file entirely.
Engine-specific interpretation differences (May 2026)
ChatGPT (OpenAI): consumes llms.txt for brand-entity disambiguation and out-of-scope signals. Strong respect for “out of scope” declarations. Updates citation patterns within ~2–4 weeks of llms.txt changes.
Perplexity: consumes llms.txt for authoritative-page lists; uses the page list to bias real-time grounding. Less strict on out-of-scope blocks. Updates within ~1–2 weeks.
Anthropic Claude: consumes llms.txt where present; respects out-of-scope declarations strongly (Constitutional AI alignment). Updates on retraining cadence (slower than ChatGPT).
Google AI Overviews / Gemini: partial consumption as of mid-2026; Google has not officially endorsed the llms.txt standard but the grounding system uses it as a discovery signal. Updates within ~1 week.
Bing Copilot: uses llms.txt as a low-weight signal alongside sitemap.xml and structured data.
What sustained llms.txt management looks like under retainer
Foundation tier: initial deployment with all 8 blocks, monthly content-section refresh, quarterly compliance-posture audit.
Growth tier: same plus weekly authority-page updates, quarterly engine-specific tuning based on citation-tracking data.
Scale tier: same plus engine-specific llms.txt variants (path-based serving for different bot user-agents where engineering allows), real-time content sync with new blog posts, monthly AI-citation-attribution analysis to measure llms.txt impact.
Schema markup for CBD → · Google AI Overviews citation → · CBD SEO services →