The Documentation Generator That Finally Didn't Suck
I've tried every documentation generator since 2018. Most are garbage. Here's what actually works, and the hard-won principles behind why.
TL;DR
I've been chasing the dream of automatic documentation for seven years. I've tried Sphinx, JSDoc, TypeDoc, Swagger, Redoc, Docusaurus, GitBook, ReadMe, and at least a dozen tools I've mercifully forgotten. Most of them generated documentation that was technically accurate and completely useless. The rest generated documentation that was neither.
Here's the thing nobody tells you about documentation generation: the hard part isn't generating text from code. Any idiot can parse a function signature and spit out "Parameters: x (int), y (string)". The hard part is generating documentation that helps humans understand things.
After seven years and probably 50 failed experiments, I finally have something that works. Not perfectly, I'll be honest about the gaps, but well enough that I've stopped manually writing API reference docs. That's a sentence I never thought I'd type.
Let me tell you what I learned.
Why Most Doc Generators Fail
Before we talk about what works, we need to understand why the obvious approach doesn't.
The obvious approach: parse code, extract metadata, generate docs. It's a pipeline that looks like this:
1Source Code → Parser → Structured Data → Template → DocumentationSimple. Elegant. And it produces documentation like this:
processOrderParameters:
orderId(string)options(object)Returns:
Promise<Order>
That's not documentation. That's a function signature with extra steps.
The problem is that useful documentation answers questions that aren't in the code:
- When would I call this function vs. the other one?
- What happens if I pass null?
- Why does this return a Promise instead of the value directly?
- How does this fit with the rest of the system?
The code doesn't contain this information. Comments sometimes do, but let's be honest. Your comments are probably lying. Mine definitely are. I write a comment explaining what the code does, then I change the code and forget to change the comment. Now I have a comment that's worse than no comment.
The Fundamental Insight
Here's what took me embarrassingly long to figure out:
Good automatic documentation isn't generated from code. It's generated from code changes.
Think about it. When you write documentation manually, you don't look at your entire codebase and write docs for everything. You write docs when something changes. New feature? Write docs. Changed API? Update docs. Deprecated endpoint? Add a warning.
Documentation is a diff, not a snapshot.
This changes everything about how you build a documentation generator. Instead of:
1Codebase → Parser → DocsYou want:
1Code Change → Semantic Diff → Doc UpdateThe input isn't "here's a function." The input is "here's how this function changed." And that context, what changed, why it changed, what the before and after look like, is exactly what you need to generate useful documentation.
What a Good Generator Actually Does
Let me walk through a real example. Say you have this endpoint:
1@app.route('/api/orders/<order_id>', methods=['GET'])2def get_order(order_id: str) -> Order:3 """Fetch an order by ID."""4 order = db.orders.find_one(order_id)5 if not order:6 abort(404)7 return orderAnd you change it to:
1@app.route('/api/orders/<order_id>', methods=['GET'])2def get_order(3 order_id: str,4 include_items: bool = False,5 include_history: bool = False6) -> Order:7 """8 Fetch an order by ID.9 10 Args:11 order_id: The order's unique identifier12 include_items: Whether to include line items in the response13 include_history: Whether to include status change history14 15 Returns:16 Order object, optionally with items and/or history populated17 18 Raises:19 404: Order not found20 """21 order = db.orders.find_one(order_id)22 if not order:23 abort(404)24 if include_items:25 order.items = db.order_items.find(order_id=order_id)26 if include_history:27 order.history = db.order_history.find(order_id=order_id)28 return orderA dumb doc generator sees the new function and produces:
GET /api/orders/{order_id}
Fetch an order by ID.
Query Parameters:
include_items(boolean, optional): Whether to include line itemsinclude_history(boolean, optional): Whether to include status change historyResponse: Order object
That's fine. It's accurate. But it's missing the story.
A good generator sees the change and produces:
GET /api/orders/{order_id}
Fetch an order by ID.
Query Parameters:
Parameter Type Default Description include_itemsboolean falseInclude line items in response. New in v2.3 include_historyboolean falseInclude status change history. New in v2.3 Response:
OrderobjectWhen
include_items=true, the response includes the fullitemsarray:JSON1{2 "id": "ord_123",3 "status": "shipped",4 "items": [5 {"sku": "WIDGET-01", "quantity": 2, "price": 29.99}6 ]7}Changelog:
- v2.3: Added
include_itemsandinclude_historyparameters for granular response control
See the difference? The second version understands context. It knows these parameters are new. It can generate an example showing what the expanded response looks like. It adds a changelog entry automatically.
The Architecture That Works
After many failures, here's the architecture I've landed on:
1┌─────────────────────────────────────────────────┐2│ Git Repository │3└─────────────────────────┬───────────────────────┘4 │5 ▼6┌─────────────────────────────────────────────────┐7│ Change Detection Layer │8│ ┌─────────────┐ ┌─────────────┐ ┌─────────┐ │9│ │ Git Diff │ │ AST │ │ Semantic │ │10│ │ Parser │──│ Differ │──│ Change │ │11│ │ │ │ │ │ Classifier│ │12│ └─────────────┘ └─────────────┘ └─────────┘ │13└─────────────────────────┬───────────────────────┘14 │15 ▼16┌─────────────────────────────────────────────────┐17│ Documentation Context │18│ ┌─────────────┐ ┌─────────────┐ ┌─────────┐ │19│ │ Existing │ │ Code │ │ Style │ │20│ │ Docs │ │ Context │ │ Guide │ │21│ └─────────────┘ └─────────────┘ └─────────┘ │22└─────────────────────────┬───────────────────────┘23 │24 ▼25┌─────────────────────────────────────────────────┐26│ Generation Engine │27│ │28│ Input: Change + Context │29│ Output: Documentation Delta │30│ │31└─────────────────────────┬───────────────────────┘32 │33 ▼34┌─────────────────────────────────────────────────┐35│ Merge & Deploy │36│ │37│ Apply delta to existing docs │38│ Preserve human-written sections │39│ Deploy to documentation site │40│ │41└─────────────────────────────────────────────────┘Let me explain each layer.
Layer 1: Change Detection
This isn't just git diff. Git diff tells you which lines changed. You need to know which semantic units changed.
Consider this diff:
1- def calculate_total(items):2+ def calculate_total(items, apply_discount=False):3 total = sum(item.price for item in items)4+ if apply_discount:5+ total *= 0.96 return totalGit sees: 4 lines changed.
A semantic differ sees:
- Function
calculate_totalsignature modified - New optional parameter
apply_discount(boolean, default False) - New behavior: 10% discount when
apply_discount=True - Return type unchanged
That semantic understanding is what enables good documentation generation.
I use tree-sitter for parsing. It gives you an AST for basically any language, and you can diff ASTs to get semantic changes rather than line changes.
Layer 2: Documentation Context
The generator needs context beyond the code change:
Existing docs: What documentation already exists for this function? We need to update it, not overwrite it.
Code context: What other functions/classes does this relate to? If calculate_total is called by checkout, that relationship matters.
Style guide: How do we format documentation? What voice do we use? What's the structure for API endpoints vs. utility functions?
All of this gets fed to the generator alongside the semantic change.
Layer 3: Generation
This is where the LLM comes in. The prompt looks something like:
1You are updating documentation based on a code change.2
3CHANGE:4- Function: calculate_total5- Modification: Added optional parameter `apply_discount`6- New behavior: Applies 10% discount when True7
8EXISTING DOCUMENTATION:9### calculate_total(items)10Calculate the total price for a list of items.11**Parameters:** items (list of Item)12**Returns:** float13
14STYLE GUIDE:15- Use tables for parameters16- Include type annotations17- Show examples for optional parameters18
19Generate the updated documentation section.The key insight: we're not asking the LLM to write documentation from scratch. We're asking it to apply a specific change to existing documentation. That's a much more constrained task with much better results.
Layer 4: Merge & Deploy
The output is a delta, not a complete document. We apply that delta to existing docs, being careful to:
- Preserve human-written sections marked with
<!-- human --> - Update only the sections affected by the change
- Add changelog entries where appropriate
- Maintain formatting consistency
Then we deploy. Either directly (for reference docs) or via PR (for docs that need review).
The Config File That Actually Matters
Every documentation system needs configuration. Most get it wrong by either being too simple (no control) or too complex (nobody can use it).
Here's what I've found actually matters:
1# What to document2sources:3 - pattern: "src/api/**/*.py"4 type: api5 6 - pattern: "src/lib/**/*.ts"7 type: library8 9 - pattern: "src/cli/**/*.py"10 type: cli11
12# What to ignore13ignore:14 - "**/*.test.*"15 - "**/*.spec.*"16 - "**/internal/**"17 - "**/fixtures/**"18 - "node_modules/**"19 - "build/**"20
21# How to behave22settings:23 # Which branch is truth24 source_branch: main25 26 # When to generate27 trigger: pr_merged # or: push, manual28 29 # Require review for these paths30 require_review:31 - "docs/tutorials/**"32 - "docs/guides/**"33 34 # Auto-deploy these paths35 auto_deploy:36 - "docs/api/**"37 - "docs/changelog.md"38 39 # Never touch these paths (human-only)40 preserve:41 - "docs/philosophy.md"42 - "docs/about.md"The preserve section is crucial. Some documentation should never be auto-generated. Vision documents. Architectural decisions. The "why we exist" page. Mark them as preserved, and the system won't touch them.
Hard Lessons Learned
Let me save you some pain with lessons I learned the hard way.
Lesson 1: Comments Lie, Signatures Don't
I tried using docstrings and comments as the primary source of truth. Big mistake. Comments rot faster than code because there's no type checker for English.
Now I treat comments as hints, not sources. The function signature is truth. The return type is truth. The docstring is "maybe helpful context if it's not outdated."
Lesson 2: Generate Less, Not More
My first system generated documentation for everything. Every private function. Every helper utility. Every internal class.
The result was 400 pages of noise that buried the 40 pages of useful content.
Now: document public API only. Internal implementation details are not documentation. They're code that happens to have English near it.
Lesson 3: Breaking Changes Need Special Handling
If a function signature changes in a way that breaks existing callers, that's not just a documentation update. That's a migration guide, deprecation warning, and changelog entry.
My system detects breaking changes and generates:
- A deprecation warning on the old docs
- A migration section explaining how to update
- A changelog entry with the breaking change prominently marked
1> ⚠️ **Breaking Change in v3.0**2> 3> The `user_id` parameter is now required. Previously it defaulted to the 4> current user. To migrate:5> 6> ```python7> # Before8> get_orders()9> 10> # After11> get_orders(user_id=current_user.id)12> ```This catches what humans often miss when updating docs manually.
Lesson 4: Examples Are Harder Than Reference
Generating "this function takes X and returns Y" is easy. Generating a useful example is hard.
Good examples show:
- Real-world use cases, not toy inputs
- Edge cases that matter
- How this function interacts with others
- What the output actually looks like
I've had mixed success here. Reference docs: automation works great. Examples: automation works okay, but needs human review.
Lesson 5: You Need Escape Hatches
No matter how good your automation is, sometimes you need to override it. Maybe the auto-generated explanation is technically correct but confusing. Maybe you want to add a warning that isn't derivable from code.
Every section needs an escape hatch:
1<!-- auto -->2This section is auto-generated from code.3<!-- /auto -->4
5<!-- human -->6**Important:** This endpoint is rate-limited to 100 requests per minute7per API key. Contact support for higher limits.8<!-- /human -->The automation updates the auto section. The human section is preserved.
What Still Doesn't Work
I want to be honest about the gaps.
Tutorials: Can't automate these. Tutorials need a learning arc, progressive complexity, and understanding of what the reader doesn't know. Code doesn't contain any of that.
Architecture docs: These explain why, not what. Why did you choose microservices? Why is auth handled this way? Why does this module exist? The answers are in Slack history and the heads of engineers who've left.
Troubleshooting guides: These come from support tickets and production incidents, not from code analysis. "If you see error X, it's probably because Y" requires knowing what errors users actually hit.
The "getting started" experience: The first 5 minutes of using your product. This needs to be crafted by humans who understand the user journey, not generated by machines that understand function signatures.
For these, you need humans. Good automation frees humans to focus on this high-value work by eliminating the tedious reference doc updates.
The 80/20 of Documentation
Here's my current mental model:
| Doc Type | % of Total | Automation | Human Role |
|---|---|---|---|
| API Reference | 40% | 100% | Light review |
| Config Reference | 15% | 100% | Verify accuracy |
| Changelog | 10% | 90% | Final edit |
| How-To Guides | 15% | 30% | Write + verify |
| Tutorials | 10% | 0% | Write from scratch |
| Concepts/Architecture | 10% | 0% | Write from scratch |
If you automate reference and changelog well, you've handled 65% of your documentation volume. That's massive. Your humans can focus on the 35% that actually requires human understanding.
Getting Started (For Real)
If you want to build this yourself, here's the minimum viable stack:
- Git hook or CI trigger on PR merge to main
- Tree-sitter for AST parsing (or language-specific parser)
- Semantic diff logic to classify changes
- LLM (Claude, GPT-4, or similar) for generation
- Merge logic to apply deltas without destroying human content
- Deploy pipeline to push docs to your site
The whole thing can be built in a few weeks if you're focused. The iteration to make it work well takes months.
Alternatively, there are tools emerging that do this. Some are good. Evaluate based on:
- Does it understand semantic changes, not just line changes?
- Can you preserve human-written sections?
- Does it integrate with your Git workflow?
- Can you control what gets documented?
The Uncomfortable Truth
Here's what I've come to believe after seven years of trying to solve this:
Most documentation is bad because it's treated as a separate job. Write code, then write docs. Ship feature, then document it. The gap between those activities is where documentation dies.
The solution isn't better writers or better processes. It's eliminating the gap entirely. Documentation should be a build artifact, generated from the same source as your code, deployed through the same pipeline, updated by the same triggers.
We've accepted this for other artifacts. Nobody manually writes compilation outputs. Nobody hand-edits minified JavaScript. But we still manually write documentation as if it's 2005.
The technology to change this finally exists. It's not perfect, I've been honest about the gaps, but it's good enough. Good enough that I've stopped manually updating API docs. Good enough that our docs are never more than a day behind our code.
For the first time in my career, I trust our documentation.
That's not a small thing.
If you're working on documentation tooling or have war stories from the trenches, I'd love to compare notes. This problem is finally solvable, and the more people working on it, the faster we'll all get there.
More Articles to Read
skill.md: An open standard for agent skills
Every documentation site on Docsalot now serves a skill.md file that AI agents can install with one command. Here's the standard, what we learned from Anthropic's best practices, and how you can implement it for your own docs.
Why Running an Inference Startup Is So Damn Hard
Inference-as-a-service looks like easy revenue from the outside. In practice, it's a brutal utilization game where bad unit economics can kill you even when demand is real.
MCP Servers: What They Are, Why They Matter, and What Can Go Wrong
Everyone's adding MCP servers to everything. Here's how the protocol actually works under the hood, why documentation teams should care, the security risks that come with it, and what it looks like in practice.