Blogs/Engineering

llms.txt Isn't Enough

llms.txt solves discovery. Content negotiation solves consumption. One of these matters 27x more than the other.

F
Faizan KhanAuthor
7 min read

TL;DR

Someone pastes a docs link into Claude Code, cursor or even chatGPT. What happens next?

The agent fetches the page. It gets HTML—divs, React hydration markers, navigation sidebars, cookie consent banners, and somewhere in there, the actual content. The agent parses it. Works fine. But you just burned 14,000 tokens on a page that's 500 tokens of useful information.

llms.txt doesn't fix this.

llms.txt is a discovery mechanism. It tells agents what pages exist. But most AI interactions don't start with "give me an index of all docs." They start with a URL. And when agents fetch that URL, they get HTML.

Here's the token cost for a typical docs page:

FormatSizeTokens
HTML58KB~14,500
Markdown2.1KB~525

27x difference. At scale, this is the gap between AI integrations being viable or not.


Content Negotiation

Use basic headers.

Text
1GET /docs/authentication HTTP/1.1
2Accept: text/markdown

Server sees Accept: text/markdown, returns markdown. Same URL, different representation. Browsers send Accept: text/html, they get HTML. AI agents send Accept: text/markdown, they get markdown.

Bash
1# Browser
2curl https://docs.example.com/quickstart
3# → HTML
4
5# AI agent
6curl -H "Accept: text/markdown" https://docs.example.com/quickstart
7# → Markdown

No special URLs. No .md suffix. Standard HTTP.


Implementation

Middleware checks the Accept header:

TypeScript
1const acceptHeader = req.headers.get('accept') || '';
2if (acceptHeader.includes('text/markdown')) {
3 return NextResponse.rewrite(markdownApiUrl);
4}

If the request wants markdown, serve markdown. If not, HTML.

Gotchas

HEAD requests. Agents send HEAD to check headers before downloading:

TypeScript
1if (req.method !== 'GET' && req.method !== 'HEAD') {
2 return res.status(405).json({ error: 'Method not allowed' });
3}

Next.js middleware rewrites don't preserve query params. Pass data via headers:

TypeScript
1response.headers.set('x-subdomain', subdomain);
2response.headers.set('x-markdown-path', pathname);

Cache headers matter. AI agents respect caching:

Text
1Cache-Control: public, s-maxage=3600, stale-while-revalidate=86400

Discovery Headers

Content negotiation needs discovery. Add these to every response:

Text
1Link: <https://docs.example.com/llms.txt>; rel="llms-txt"
2X-Llms-Txt: https://docs.example.com/llms.txt

Agents can HEAD any page and find your llms.txt without downloading content.


llms.txt Still Matters

llms.txt isn't useless—it's just not the whole picture. We wrote a full guide on making your docs AI-readable a few weeks ago. The short version:

Markdown
1# Project Docs
2
3- [Quickstart](https://docs.example.com/quickstart.md): Get started in 5 minutes
4- [Auth](https://docs.example.com/auth.md): API authentication

Good for agents that need to explore. llms-full.txt concatenates everything for agents that want the whole picture.

But neither helps when someone pastes a single URL into an AI chat. Content negotiation does.


Test Your Docs

Bash
1# llms.txt exists?
2curl https://your-docs.com/llms.txt
3
4# Content negotiation works?
5curl -H "Accept: text/markdown" https://your-docs.com/some-page
6
7# Discovery headers present?
8curl -I https://your-docs.com/some-page | grep -i llms

Most sites fail all three.


The Point

The AI-readable web is being rebuilt. We obsess over SEO for Google. We optimize for crawlers that haven't changed in 20 years. But the new crawlers—the ones that answer questions and write code—are getting HTML soup.

llms.txt helps agents find pages. Content negotiation makes reading them 27x cheaper.

We rolled this out across all Docsalot sites this week—/llms.txt, /llms-full.txt, discovery headers, content negotiation. If you're curious, try curl -H "Accept: text/markdown" against any page on solid-docs.docsalot.dev.

The llms.txt spec is at llmstxt.org. It's short. Read it.