I Let AI Write My Docs for Six Months. Here's What It Can't Do.

TL;DR

I've been running an experiment for the past six months. Every time I needed to write documentation, I tried AI first. Generated API references. Auto-created changelogs. Let the machine take a first pass at tutorials. I wanted to find the actual boundary, not the marketing pitch boundary, the real one, between "AI can do this" and "AI will actively make this worse."

The results are messier than either the hype or the skepticism would suggest. AI-generated documentation isn't the future of all technical writing. But it's also not a gimmick. It's a tool with a surprisingly sharp edge. Useful when you respect its limitations, dangerous when you don't.

Let me show you what I learned.

The Experiment

Here's what I actually did:

For six months, across three different projects (a Python library, a TypeScript SDK, and an internal tool), I documented everything twice:

Once with AI-generated content (various tools, prompts, approaches)
Once the traditional way (me, a keyboard, four cups of coffee)

Then I tracked: time spent, accuracy of the result, user feedback when applicable, and how quickly each version went stale.

This wasn't rigorous science. I didn't control for every variable. But I documented over 200 functions, wrote 15 tutorials, and generated probably 50 pages of various docs. I have opinions now.

The Diataxis Framework (And Why It Matters Here)

Before I get into results, we need a shared vocabulary. There's this framework called Diataxis that divides documentation into four types. It's not perfect, but it's useful for this conversation:

Type	What It Does	Example
Tutorials	Teaches through doing	"Build your first widget"
How-To Guides	Solves specific problems	"How to configure auth"
Explanation	Builds understanding	"How the cache works"
Reference	Provides technical details	"API endpoint list"

The key insight: these aren't just different lengths of content. They require fundamentally different kinds of knowledge to write well. And that distinction is where AI's strengths and weaknesses become obvious.

Where AI Is Genuinely Good

Let me be specific. Not "AI can help with" but "AI is actually better than me at":

1. Reference Documentation

This is AI's home turf. Extracting function signatures, parameter types, return values, default arguments: this is exactly what machines should do.

Here's a Python function:

Python

1def create_user(
2    email: str,
3    name: str,
4    role: Literal["admin", "user", "guest"] = "user",
5    send_welcome_email: bool = True,
6    metadata: dict[str, Any] | None = None
7) -> User:
8    """
9    Create a new user in the system.
10    
11    Raises:
12        ValueError: If email is invalid
13        DuplicateUserError: If email already exists
14    """
15    ...

AI can turn this into:

create_user

Create a new user in the system.

Parameters:

Name Type Default Description
email str required User's email address
name str required User's display name
role "admin" | "user" | "guest" "user" Permission level
send_welcome_email bool True Whether to send welcome email
metadata dict[str, Any] | None None Additional user data

Returns: User object

Raises:

ValueError: If email is invalid

DuplicateUserError: If email already exists

Name	Type	Default	Description
`email`	`str`	required	User's email address
`name`	`str`	required	User's display name
`role`	`"admin" \| "user" \| "guest"`	`"user"`	Permission level
`send_welcome_email`	`bool`	`True`	Whether to send welcome email
`metadata`	`dict[str, Any] \| None`	`None`	Additional user data

And it does this perfectly. Every time. For hundreds of functions. Without getting bored. Without making typos. Without forgetting to update when the signature changes.

Time comparison: 3 minutes (AI) vs 15-20 minutes (me) per function. At scale, this is a 5-10x improvement.

2. Changelog Generation

This one surprised me. I expected AI to be mediocre at changelogs. Instead, it's excellent, if you give it the right input.

The trick: don't ask AI to write a changelog from commit messages. Commit messages are garbage. (Sorry, they just are. Mine say things like "fix" and "wip" and "ugh".)

Instead, give AI the diff plus a one-sentence human summary of intent. Like:

Text

1DIFF: [400 lines of code changes]
2INTENT: Refactored auth flow to support OAuth providers

AI can then produce:

Authentication Overhaul

Added support for OAuth providers (Google, GitHub, Microsoft)

New AuthProvider interface for custom auth implementations

login() now returns a Session object instead of raw token

Breaking: Removed deprecated auth_token parameter from all endpoints

Fixed: Session refresh now correctly extends expiration

That's actually useful. Way more useful than the changelog I would have written, which would have been "Updated auth."

3. Configuration Documentation

AI is shockingly good at parsing config schemas and explaining them. Give it a JSON Schema or a TypeScript type definition, and it'll generate docs that are more complete than what a human would write.

TypeScript

1interface Config {
2  /** Server port. Default: 3000 */
3  port?: number;
4  
5  /** Database connection string */
6  database: string;
7  
8  /** Cache settings */
9  cache?: {
10    /** Enable caching. Default: true */
11    enabled?: boolean;
12    /** TTL in seconds. Default: 3600 */
13    ttl?: number;
14    /** Max entries before eviction. Default: 10000 */
15    maxSize?: number;
16  };
17  
18  /** Log level. Default: "info" */
19  logLevel?: "debug" | "info" | "warn" | "error";
20}

AI will turn this into a properly formatted table with every option, every default, every constraint. It won't miss anything. It won't get tired halfway through and start abbreviating.

4. Consistency Enforcement

Here's one nobody talks about: AI is great at making 50 pages of docs look like they were written by one person instead of five people over three years.

Same heading styles. Same voice. Same way of presenting code examples. Humans are terrible at this. We get bored. We develop stylistic tics. We forget what we did last month.

AI doesn't. Give it a style guide, and it will follow it robotically. That's exactly what you want.

Where AI Is Actively Bad

Now the uncomfortable part. There are categories where AI doesn't just underperform. It produces content that's worse than nothing. Because bad documentation is worse than no documentation. Bad docs waste time and destroy trust.

1. Tutorials

This is AI's biggest failure mode, and it's not even close.

A tutorial isn't a list of steps. It's a learning experience. It needs to:

Build concepts progressively
Anticipate confusion before it happens
Know which details matter for a beginner and which don't
Tell a story that motivates why you're doing each step

AI produces tutorials that are technically correct and completely useless. They read like:

Install the package

Import the module

Call the function

Pass the parameters

Handle the response

That's not a tutorial. That's a grocery list. There's no understanding being built.

Real tutorials need sentences like:

"You might be wondering why we're doing this. Here's why..."
"This will error. That's expected. Let's fix it."
"Ignore this for now. We'll come back to it."

AI doesn't know what you don't know. It can't model a beginner's mind. It's never been confused.

My rule: Never let AI write tutorials. Not the first draft. Not any draft. It'll take you longer to fix than to write from scratch.

2. Architecture Documentation

"Why is this system designed this way?"

AI cannot answer this question. It can describe what the code does. It cannot explain why you chose microservices over a monolith, why you picked PostgreSQL over MongoDB, why the auth flow has that weird redirect.

Those decisions live in Slack threads, whiteboard photos, and the heads of engineers who've left the company. AI has no access to any of it.

When AI tries to explain architecture, it produces generic platitudes:

"The system uses a modular architecture for scalability and maintainability."

That sentence contains zero information. It's what a junior engineer writes when they don't understand the system but need to fill a page.

My rule: Architecture docs are 100% human-written. No exceptions.

3. Troubleshooting Guides

Troubleshooting requires knowing what goes wrong in practice. Not what could theoretically go wrong: what actually happens to real users.

AI can generate:

"If you receive a connection error, check that the server is running."

Thanks. Incredibly helpful.

What users actually need:

"If you're on macOS and seeing ECONNREFUSED, it's probably because the firewall prompt appeared behind another window. Look for the Security dialog."

That second one came from three support tickets and a frustrated DM. AI doesn't have access to that knowledge. It can't learn from the stupid, specific things that break in production.

My rule: Troubleshooting guides are written by humans who've done support. They're based on real tickets, not imagination.

4. The "Why" Behind Anything

AI can answer "what" and "how." It cannot answer "why."

What does this function do? ✅ AI is fine.
How do I call this API? ✅ AI is fine.
Why would I use this instead of the other option? ❌ AI has no idea.

The "why" requires understanding user goals, tradeoffs, alternatives considered and rejected. AI generates plausible-sounding reasons that are often wrong.

I've seen AI explain that a caching layer was "added for performance" when actually it was added because of a rate-limiting issue with a third-party API. The generated explanation wasn't technically wrong. Caching does improve performance. But it missed the actual point entirely.

The Dangerous Middle Ground

There's a category I haven't mentioned: documentation that AI can sort of do. This is the danger zone.

How-to guides fall here. AI can produce something that looks like a how-to guide. It has steps. They're in order. They might even work.

But the result is often subtly wrong in ways that are hard to catch:

Steps that work on the author's machine but not in common environments
Missing prerequisites that seem "obvious" to AI
Paths that work but aren't best practice
Correct instructions that become incorrect after a minor update

This is worse than obvious failure. With tutorials, you can immediately see AI is wrong. With how-to guides, you might not notice for months. Not until users start having problems.

My rule: AI can draft how-to guides, but a human must actually follow the steps before publishing. Every step. In a clean environment.

A Practical Division of Labor

After six months, here's how I actually split the work:

Documentation Type	AI Role	Human Role
API Reference	Generate completely	Light review, add context
Config Docs	Generate completely	Verify accuracy
Changelogs	Generate from diffs + intent	Final edit
How-To Guides	Draft structure	Write content, verify steps
Tutorials	Don't use AI	Write from scratch
Architecture	Don't use AI	Write from scratch
Troubleshooting	Don't use AI	Write from real support issues
Explanations	Don't use AI	Write from scratch

Notice the pattern: AI handles the mechanical. Humans handle the meaningful.

How To Actually Make This Work

If you're going to use AI for docs, here's what I've learned about doing it well:

1. Give AI Structured Input

AI is much better at transforming structured data than creating documentation from nothing.

Bad prompt: "Write documentation for this function"

Good input: Function signature + docstring + type hints + example usage

The more structure you give, the better the output.

2. Keep Humans in the Review Loop

Every AI-generated doc should be reviewed by someone who actually understands the system. Not just for typos, but for subtle inaccuracies that sound right but aren't.

My personal rule: if I can't verify it myself, it doesn't ship.

3. Preserve Human-Written Sections

If you have a hand-written "Overview" section on your API reference page, don't let automation overwrite it. Mark it somehow:

Markdown

1<!-- HUMAN-WRITTEN: Do not auto-generate -->
2## Overview
3This section explains why you'd use this API...
4<!-- END HUMAN-WRITTEN -->
5
6<!-- AUTO-GENERATED -->
7## API Reference
8...

Any decent automation system should respect these markers.

4. Track What Goes Wrong

Keep a log of AI documentation mistakes. You'll start to see patterns:

Which types of content fail most often
Which prompts produce better results
Which code patterns confuse the AI

This isn't one-time setup. It's ongoing calibration.

5. Don't Chase 100% Automation

The goal isn't "no human writing." The goal is "humans write the hard stuff, machines handle the tedious stuff."

If you're spending time on reference docs that could be generated, that's wasted human effort. If you're shipping AI tutorials that confuse users, that's wasted user trust.

Find the line. Stay on the right side of it.

The Honest Assessment

Here's where I've landed after six months:

AI is not going to write your documentation for you. If someone's selling that vision, they're either lying or building something that doesn't exist yet.

AI can save you 30-50% of documentation time. That's real. That's meaningful. That's worth taking seriously.

The time you save should go into the docs AI can't write. Better tutorials. Clearer architecture explanations. Troubleshooting guides based on real problems.

The teams I've seen do this well treat AI as a multiplier, not a replacement. They use it to handle the stuff that shouldn't require human creativity, freeing up humans to do the work that actually requires understanding.

That's not as exciting as "AI writes everything." But it's true. And in my experience, what's true tends to work better than what's exciting.

If you've been running similar experiments, I'd love to compare notes. The line between "AI works" and "AI fails" is still being drawn, and the more data points we have, the better we'll all get at finding it.