Blogs/Best Practices

The Documentation Generator That Finally Didn't Suck

I've tried every documentation generator since 2018. Most are garbage. Here's what actually works, and the hard-won principles behind why.

F
Faizan KhanAuthor
14 min read

TL;DR

I've been chasing the dream of automatic documentation for seven years. I've tried Sphinx, JSDoc, TypeDoc, Swagger, Redoc, Docusaurus, GitBook, ReadMe, and at least a dozen tools I've mercifully forgotten. Most of them generated documentation that was technically accurate and completely useless. The rest generated documentation that was neither.

Here's the thing nobody tells you about documentation generation: the hard part isn't generating text from code. Any idiot can parse a function signature and spit out "Parameters: x (int), y (string)". The hard part is generating documentation that helps humans understand things.

After seven years and probably 50 failed experiments, I finally have something that works. Not perfectly, I'll be honest about the gaps, but well enough that I've stopped manually writing API reference docs. That's a sentence I never thought I'd type.

Let me tell you what I learned.


Why Most Doc Generators Fail

Before we talk about what works, we need to understand why the obvious approach doesn't.

The obvious approach: parse code, extract metadata, generate docs. It's a pipeline that looks like this:

Text
1Source Code → Parser → Structured Data → Template → Documentation

Simple. Elegant. And it produces documentation like this:

processOrder

Parameters:

  • orderId (string)
  • options (object)

Returns: Promise<Order>

That's not documentation. That's a function signature with extra steps.

The problem is that useful documentation answers questions that aren't in the code:

  • When would I call this function vs. the other one?
  • What happens if I pass null?
  • Why does this return a Promise instead of the value directly?
  • How does this fit with the rest of the system?

The code doesn't contain this information. Comments sometimes do, but let's be honest. Your comments are probably lying. Mine definitely are. I write a comment explaining what the code does, then I change the code and forget to change the comment. Now I have a comment that's worse than no comment.


The Fundamental Insight

Here's what took me embarrassingly long to figure out:

Good automatic documentation isn't generated from code. It's generated from code changes.

Think about it. When you write documentation manually, you don't look at your entire codebase and write docs for everything. You write docs when something changes. New feature? Write docs. Changed API? Update docs. Deprecated endpoint? Add a warning.

Documentation is a diff, not a snapshot.

This changes everything about how you build a documentation generator. Instead of:

Text
1Codebase → Parser → Docs

You want:

Text
1Code Change → Semantic Diff → Doc Update

The input isn't "here's a function." The input is "here's how this function changed." And that context, what changed, why it changed, what the before and after look like, is exactly what you need to generate useful documentation.


What a Good Generator Actually Does

Let me walk through a real example. Say you have this endpoint:

Python
1@app.route('/api/orders/<order_id>', methods=['GET'])
2def get_order(order_id: str) -> Order:
3 """Fetch an order by ID."""
4 order = db.orders.find_one(order_id)
5 if not order:
6 abort(404)
7 return order

And you change it to:

Python
1@app.route('/api/orders/<order_id>', methods=['GET'])
2def get_order(
3 order_id: str,
4 include_items: bool = False,
5 include_history: bool = False
6) -> Order:
7 """
8 Fetch an order by ID.
9
10 Args:
11 order_id: The order's unique identifier
12 include_items: Whether to include line items in the response
13 include_history: Whether to include status change history
14
15 Returns:
16 Order object, optionally with items and/or history populated
17
18 Raises:
19 404: Order not found
20 """
21 order = db.orders.find_one(order_id)
22 if not order:
23 abort(404)
24 if include_items:
25 order.items = db.order_items.find(order_id=order_id)
26 if include_history:
27 order.history = db.order_history.find(order_id=order_id)
28 return order

A dumb doc generator sees the new function and produces:

GET /api/orders/{order_id}

Fetch an order by ID.

Query Parameters:

  • include_items (boolean, optional): Whether to include line items
  • include_history (boolean, optional): Whether to include status change history

Response: Order object

That's fine. It's accurate. But it's missing the story.

A good generator sees the change and produces:

GET /api/orders/{order_id}

Fetch an order by ID.

Query Parameters:

ParameterTypeDefaultDescription
include_itemsbooleanfalseInclude line items in response. New in v2.3
include_historybooleanfalseInclude status change history. New in v2.3

Response: Order object

When include_items=true, the response includes the full items array:

JSON
1{
2 "id": "ord_123",
3 "status": "shipped",
4 "items": [
5 {"sku": "WIDGET-01", "quantity": 2, "price": 29.99}
6 ]
7}

Changelog:

  • v2.3: Added include_items and include_history parameters for granular response control

See the difference? The second version understands context. It knows these parameters are new. It can generate an example showing what the expanded response looks like. It adds a changelog entry automatically.


The Architecture That Works

After many failures, here's the architecture I've landed on:

Text
1┌─────────────────────────────────────────────────┐
2│ Git Repository │
3└─────────────────────────┬───────────────────────┘
4
5
6┌─────────────────────────────────────────────────┐
7│ Change Detection Layer │
8│ ┌─────────────┐ ┌─────────────┐ ┌─────────┐ │
9│ │ Git Diff │ │ AST │ │ Semantic │ │
10│ │ Parser │──│ Differ │──│ Change │ │
11│ │ │ │ │ │ Classifier│ │
12│ └─────────────┘ └─────────────┘ └─────────┘ │
13└─────────────────────────┬───────────────────────┘
14
15
16┌─────────────────────────────────────────────────┐
17│ Documentation Context │
18│ ┌─────────────┐ ┌─────────────┐ ┌─────────┐ │
19│ │ Existing │ │ Code │ │ Style │ │
20│ │ Docs │ │ Context │ │ Guide │ │
21│ └─────────────┘ └─────────────┘ └─────────┘ │
22└─────────────────────────┬───────────────────────┘
23
24
25┌─────────────────────────────────────────────────┐
26│ Generation Engine │
27│ │
28│ Input: Change + Context │
29│ Output: Documentation Delta │
30│ │
31└─────────────────────────┬───────────────────────┘
32
33
34┌─────────────────────────────────────────────────┐
35│ Merge & Deploy │
36│ │
37│ Apply delta to existing docs │
38│ Preserve human-written sections │
39│ Deploy to documentation site │
40│ │
41└─────────────────────────────────────────────────┘

Let me explain each layer.

Layer 1: Change Detection

This isn't just git diff. Git diff tells you which lines changed. You need to know which semantic units changed.

Consider this diff:

DIFF
1- def calculate_total(items):
2+ def calculate_total(items, apply_discount=False):
3 total = sum(item.price for item in items)
4+ if apply_discount:
5+ total *= 0.9
6 return total

Git sees: 4 lines changed.

A semantic differ sees:

  • Function calculate_total signature modified
  • New optional parameter apply_discount (boolean, default False)
  • New behavior: 10% discount when apply_discount=True
  • Return type unchanged

That semantic understanding is what enables good documentation generation.

I use tree-sitter for parsing. It gives you an AST for basically any language, and you can diff ASTs to get semantic changes rather than line changes.

Layer 2: Documentation Context

The generator needs context beyond the code change:

Existing docs: What documentation already exists for this function? We need to update it, not overwrite it.

Code context: What other functions/classes does this relate to? If calculate_total is called by checkout, that relationship matters.

Style guide: How do we format documentation? What voice do we use? What's the structure for API endpoints vs. utility functions?

All of this gets fed to the generator alongside the semantic change.

Layer 3: Generation

This is where the LLM comes in. The prompt looks something like:

Text
1You are updating documentation based on a code change.
2
3CHANGE:
4- Function: calculate_total
5- Modification: Added optional parameter `apply_discount`
6- New behavior: Applies 10% discount when True
7
8EXISTING DOCUMENTATION:
9### calculate_total(items)
10Calculate the total price for a list of items.
11**Parameters:** items (list of Item)
12**Returns:** float
13
14STYLE GUIDE:
15- Use tables for parameters
16- Include type annotations
17- Show examples for optional parameters
18
19Generate the updated documentation section.

The key insight: we're not asking the LLM to write documentation from scratch. We're asking it to apply a specific change to existing documentation. That's a much more constrained task with much better results.

Layer 4: Merge & Deploy

The output is a delta, not a complete document. We apply that delta to existing docs, being careful to:

  • Preserve human-written sections marked with <!-- human -->
  • Update only the sections affected by the change
  • Add changelog entries where appropriate
  • Maintain formatting consistency

Then we deploy. Either directly (for reference docs) or via PR (for docs that need review).


The Config File That Actually Matters

Every documentation system needs configuration. Most get it wrong by either being too simple (no control) or too complex (nobody can use it).

Here's what I've found actually matters:

YAML
1# What to document
2sources:
3 - pattern: "src/api/**/*.py"
4 type: api
5
6 - pattern: "src/lib/**/*.ts"
7 type: library
8
9 - pattern: "src/cli/**/*.py"
10 type: cli
11
12# What to ignore
13ignore:
14 - "**/*.test.*"
15 - "**/*.spec.*"
16 - "**/internal/**"
17 - "**/fixtures/**"
18 - "node_modules/**"
19 - "build/**"
20
21# How to behave
22settings:
23 # Which branch is truth
24 source_branch: main
25
26 # When to generate
27 trigger: pr_merged # or: push, manual
28
29 # Require review for these paths
30 require_review:
31 - "docs/tutorials/**"
32 - "docs/guides/**"
33
34 # Auto-deploy these paths
35 auto_deploy:
36 - "docs/api/**"
37 - "docs/changelog.md"
38
39 # Never touch these paths (human-only)
40 preserve:
41 - "docs/philosophy.md"
42 - "docs/about.md"

The preserve section is crucial. Some documentation should never be auto-generated. Vision documents. Architectural decisions. The "why we exist" page. Mark them as preserved, and the system won't touch them.


Hard Lessons Learned

Let me save you some pain with lessons I learned the hard way.

Lesson 1: Comments Lie, Signatures Don't

I tried using docstrings and comments as the primary source of truth. Big mistake. Comments rot faster than code because there's no type checker for English.

Now I treat comments as hints, not sources. The function signature is truth. The return type is truth. The docstring is "maybe helpful context if it's not outdated."

Lesson 2: Generate Less, Not More

My first system generated documentation for everything. Every private function. Every helper utility. Every internal class.

The result was 400 pages of noise that buried the 40 pages of useful content.

Now: document public API only. Internal implementation details are not documentation. They're code that happens to have English near it.

Lesson 3: Breaking Changes Need Special Handling

If a function signature changes in a way that breaks existing callers, that's not just a documentation update. That's a migration guide, deprecation warning, and changelog entry.

My system detects breaking changes and generates:

  1. A deprecation warning on the old docs
  2. A migration section explaining how to update
  3. A changelog entry with the breaking change prominently marked
Markdown
1> ⚠️ **Breaking Change in v3.0**
2>
3> The `user_id` parameter is now required. Previously it defaulted to the
4> current user. To migrate:
5>
6> ```python
7> # Before
8> get_orders()
9>
10> # After
11> get_orders(user_id=current_user.id)
12> ```

This catches what humans often miss when updating docs manually.

Lesson 4: Examples Are Harder Than Reference

Generating "this function takes X and returns Y" is easy. Generating a useful example is hard.

Good examples show:

  • Real-world use cases, not toy inputs
  • Edge cases that matter
  • How this function interacts with others
  • What the output actually looks like

I've had mixed success here. Reference docs: automation works great. Examples: automation works okay, but needs human review.

Lesson 5: You Need Escape Hatches

No matter how good your automation is, sometimes you need to override it. Maybe the auto-generated explanation is technically correct but confusing. Maybe you want to add a warning that isn't derivable from code.

Every section needs an escape hatch:

Markdown
1<!-- auto -->
2This section is auto-generated from code.
3<!-- /auto -->
4
5<!-- human -->
6**Important:** This endpoint is rate-limited to 100 requests per minute
7per API key. Contact support for higher limits.
8<!-- /human -->

The automation updates the auto section. The human section is preserved.


What Still Doesn't Work

I want to be honest about the gaps.

Tutorials: Can't automate these. Tutorials need a learning arc, progressive complexity, and understanding of what the reader doesn't know. Code doesn't contain any of that.

Architecture docs: These explain why, not what. Why did you choose microservices? Why is auth handled this way? Why does this module exist? The answers are in Slack history and the heads of engineers who've left.

Troubleshooting guides: These come from support tickets and production incidents, not from code analysis. "If you see error X, it's probably because Y" requires knowing what errors users actually hit.

The "getting started" experience: The first 5 minutes of using your product. This needs to be crafted by humans who understand the user journey, not generated by machines that understand function signatures.

For these, you need humans. Good automation frees humans to focus on this high-value work by eliminating the tedious reference doc updates.


The 80/20 of Documentation

Here's my current mental model:

Doc Type% of TotalAutomationHuman Role
API Reference40%100%Light review
Config Reference15%100%Verify accuracy
Changelog10%90%Final edit
How-To Guides15%30%Write + verify
Tutorials10%0%Write from scratch
Concepts/Architecture10%0%Write from scratch

If you automate reference and changelog well, you've handled 65% of your documentation volume. That's massive. Your humans can focus on the 35% that actually requires human understanding.


Getting Started (For Real)

If you want to build this yourself, here's the minimum viable stack:

  1. Git hook or CI trigger on PR merge to main
  2. Tree-sitter for AST parsing (or language-specific parser)
  3. Semantic diff logic to classify changes
  4. LLM (Claude, GPT-4, or similar) for generation
  5. Merge logic to apply deltas without destroying human content
  6. Deploy pipeline to push docs to your site

The whole thing can be built in a few weeks if you're focused. The iteration to make it work well takes months.

Alternatively, there are tools emerging that do this. Some are good. Evaluate based on:

  • Does it understand semantic changes, not just line changes?
  • Can you preserve human-written sections?
  • Does it integrate with your Git workflow?
  • Can you control what gets documented?

The Uncomfortable Truth

Here's what I've come to believe after seven years of trying to solve this:

Most documentation is bad because it's treated as a separate job. Write code, then write docs. Ship feature, then document it. The gap between those activities is where documentation dies.

The solution isn't better writers or better processes. It's eliminating the gap entirely. Documentation should be a build artifact, generated from the same source as your code, deployed through the same pipeline, updated by the same triggers.

We've accepted this for other artifacts. Nobody manually writes compilation outputs. Nobody hand-edits minified JavaScript. But we still manually write documentation as if it's 2005.

The technology to change this finally exists. It's not perfect, I've been honest about the gaps, but it's good enough. Good enough that I've stopped manually updating API docs. Good enough that our docs are never more than a day behind our code.

For the first time in my career, I trust our documentation.

That's not a small thing.


If you're working on documentation tooling or have war stories from the trenches, I'd love to compare notes. This problem is finally solvable, and the more people working on it, the faster we'll all get there.