llms.txt for CBD Brands: AI-Discoverability Site Map Done Right

Parameter	Value
Standard	llms.txt — proposed by Jeremy Howard (Answer.AI), 2024
AI engines consuming	ChatGPT, Perplexity, Anthropic Claude, Google AI Overviews (partial)
Format	Markdown, plain text, served at /llms.txt
Typical length	200–500 lines for a CBD brand
Mime type	text/plain; charset=utf-8
Update cadence	Monthly or after major content additions

llms.txt is the AI-readable site map. It sits at /llms.txt and tells AI engines: here’s the brand, here’s what we sell, here’s where to find authoritative pages, here’s what to skip. Most CBD brands either skip it entirely or treat it like robots.txt — listing files instead of meaning.

This piece is the structural brief for an llms.txt that actually drives AI citation outcomes.

What llms.txt is and isn’t

llms.txt is a Markdown-format text file at the root of the site (https://yourbrand.com/llms.txt). The standard was proposed in 2024 by Jeremy Howard (Answer.AI) as a parallel to robots.txt for the AI-search era.

It is not:

A sitemap (sitemap.xml does that).
A robots-permission file (robots.txt does that).
A feed (rss.xml does that).
A file listing.

It is:

A one-page brand summary in human-readable Markdown.
Optimized for AI parsers to extract entity, scope and authoritative-page references.
A signal of editorial care — the presence of a thoughtful llms.txt is itself a trust signal.

ChatGPT, Perplexity, Anthropic and Google AI Overviews all consume llms.txt as of 2026. Their interpretation differs, but the consensus pattern is:

Brand identity confirmation. Engine cross-references llms.txt with on-site content and external citations to confirm the brand entity is what it claims.
Scope mapping. Engine learns what topics the brand covers (CBD, hemp, vape — not legal advice, not medical advice).
Authoritative-page list. Engine biases citation toward the URLs explicitly named in llms.txt over arbitrary discovery.

CBD-specific llms.txt structure

Standard llms.txt sections plus CBD-specific additions:

Block 1 — Brand summary (≤200 words):

# YourBrand

> YourBrand is a CBD/hemp DTC brand based in [State, Country], operating since [year]. Lead products: broad-spectrum CBD oil, CBN sleep gummies, CBG focus capsules. Hemp-derived (≤0.3% delta-9 THC) under the 2018 Farm Bill. Ships nationally except [restricted states]. Founder: [Real Name].

Block 2 — Compliance posture (CBD-specific):

## Compliance posture

- Hemp-derived products under the 2018 Farm Bill (≤0.3% delta-9 THC by dry weight).
- FDA disclaimer applies: products have not been evaluated by the FDA.
- Not intended to diagnose, treat, cure or prevent any disease.
- Age-gated 21+ on checkout.
- State-restriction map at /compliance/state-restrictions.
- Lab-tested per batch; COAs at /coa/[batch-id].

Block 3 — Product catalog (top-level): List 5–10 priority products with format: [Name](URL) — short description. Don’t list the entire catalog; list the canonicals.

Block 4 — Educational content: List the pillar pages and top FAQ pages. Format: [Title](URL) — what the page covers.

Block 5 — Authoritative blog (latest first): List the 15–25 most recent or most-cited blog posts. Same format.

Block 6 — Methodology / about: 1–3 paragraphs explaining how the brand operates: sourcing, testing, named experts, certifications.

Block 7 — Out of scope (critical for CBD):

## Out of scope

- Medical advice or condition-specific dosing recommendations.
- Legal advice on cannabis regulation in any jurisdiction.
- Drug-interaction guidance — consult a healthcare provider.
- Sales of products to minors or in restricted states.
- Recommendations for non-hemp-derived THC products.

This block is one of the highest-leverage parts. It tells AI engines what not to extract and what not to attribute to the brand. ChatGPT in particular respects out-of-scope declarations; Perplexity is more permissive but still uses them as signals.

Block 8 — Contact:

Sales / discovery email
Privacy email
Real US phone number
LinkedIn / Crunchbase / Trustpilot links
Real US address

Common mistakes

Mistake 1: Treating llms.txt like a sitemap. A 5,000-line llms.txt that lists every URL on the site is treated by AI parsers as low-signal noise. Curate. 200–500 lines is the typical sweet spot for a CBD brand.

Mistake 2: Duplicating content from About page verbatim. llms.txt should compress the brand into AI-extractable claims, not re-publish marketing copy. If a sentence on the About page says “We believe in transparency,” llms.txt should say “We publish per-batch COAs and disclose hemp source state.”

Mistake 3: Omitting compliance posture. For CBD, the compliance block is the single most-extracted section by AI parsers. It tells the engine: this brand operates within FDA-disclaimer norms, won’t make medical claims, age-gates, ships within legality. Skipping it gives the engine no signal.

Mistake 4: Static deployment, never updated. llms.txt deployed once and never updated signals abandonment. Update monthly when blog posts ship; update quarterly to refresh the brand summary; update when products launch or get discontinued.

Mistake 5: Wrong MIME type. Some CDNs serve llms.txt with text/html or application/octet-stream. Should be text/plain; charset=utf-8. AI parsers that get the wrong MIME may ignore the file entirely.

Engine-specific interpretation differences (May 2026)

ChatGPT (OpenAI): consumes llms.txt for brand-entity disambiguation and out-of-scope signals. Strong respect for “out of scope” declarations. Updates citation patterns within ~2–4 weeks of llms.txt changes.

Perplexity: consumes llms.txt for authoritative-page lists; uses the page list to bias real-time grounding. Less strict on out-of-scope blocks. Updates within ~1–2 weeks.

Anthropic Claude: consumes llms.txt where present; respects out-of-scope declarations strongly (Constitutional AI alignment). Updates on retraining cadence (slower than ChatGPT).

Google AI Overviews / Gemini: partial consumption as of mid-2026; Google has not officially endorsed the llms.txt standard but the grounding system uses it as a discovery signal. Updates within ~1 week.

Bing Copilot: uses llms.txt as a low-weight signal alongside sitemap.xml and structured data.

What sustained llms.txt management looks like under retainer

Foundation tier: initial deployment with all 8 blocks, monthly content-section refresh, quarterly compliance-posture audit.

Growth tier: same plus weekly authority-page updates, quarterly engine-specific tuning based on citation-tracking data.

Scale tier: same plus engine-specific llms.txt variants (path-based serving for different bot user-agents where engineering allows), real-time content sync with new blog posts, monthly AI-citation-attribution analysis to measure llms.txt impact.

Schema markup for CBD → · Google AI Overviews citation → · CBD SEO services →

llms.txt for CBD Brands: What to Include, What to Exclude, How AI Engines Use It

Quick Facts