Commentary

The Harness Is the Cheap Part: Notes on Rebuilding Claude Code

A commentary on Fareed Khan's from-scratch rebuild of Claude Code — and why its central conclusion should be flipped.

Aleksandr Khomutov Original author: Fareed Khan May 23, 2026 ≈ 6 min

Fareed Khan — an engineer who writes for Level Up Coding — wrote twenty thousand words on how to rebuild Claude Code from scratch: the agent loop, the tool dispatcher, three-layer context compression, subagent isolation, FSM protocols between agents, a git worktree per task. He called the repository "23 Components of the Claude Code Architecture."

I live inside this harness every day — writing skills, agents, hooks and context routing for it — and I want to lead with the point: the article is excellent as a map of the mechanism and dangerous as a call to action. Because the conclusion it invites — "so I could build my own Claude Code" — is the exact opposite of the right one.

What gets reconstructed is precisely the part that costs nothing

Khan states the thesis accurately: "Anthropic built the right harness around the right model," and the harness is "fully reproducible." The first half is true. The second is true only in the sense that a skeleton is reproducible: a while True loop that calls the model, runs a tool call, feeds the result back. That's forty lines. I could write them in an evening, you could, and the author did.

But here is what Khan says himself and then walks straight past: "a poorly written description causes the model to pick the wrong tool." Exactly. And the article then shows toy one-line descriptions. The real value of Claude Code does not sit in the dispatcher — it sits in the fact that the descriptions of its eighteen tools, the text of the summariser prompt used during compression, the phrasing of the system reminders are all polished against millions of real execution traces. Twenty thousand words reconstruct the 20% that comes easy and skip the 80% that is the product: tuning the wording at scale. The skeleton is free precisely because the value isn't in it.

Folklore names

Throughout the article run "internal" names: the master loop nO, the compressor wU2, the async queue h2A, a compression threshold of "roughly 92%." This is presented with the confidence of documentation. In reality it is archaeology of someone else's minified bundle — mnemonics fished out of obfuscated code. As mnemonics they are harmless. As architectural references they are a category error: you are citing a guess, not a public contract. I would not build a single engineering decision on them, and I'd advise the reader to keep the same distance.

Where the author is right, and strongly so

One pattern the article captures flawlessly — progressive disclosure, loading skills on demand. "Install a hundred skills and the system prompt grows by a hundred lines, not a hundred pages." This really is the cleanest component. And it's exactly the idea I took to its logical conclusion: three-axis context routing, where the rules for a specific client, team and role load only when the working directory and the git remote match the right profile. Progressive disclosure isn't a token-saving trick; it's how you keep the model focused. Full marks to the author here.

Subagent isolation is described just as honestly: the parent sees only the final summary, while dozens of intermediate reads stay inside the child and are discarded. This isn't theory — it's the reason I run searches across a large codebase through separate explore agents rather than in the main context. The mechanism is described correctly.

But the conclusions in the multi-agent part are backwards

Here the article takes a wrong turn. The climax of the multi-agent chapter is autonomous task self-assignment: "autonomous self-assignment removes the coordinator entirely," agents pull tasks off a shared board, claiming them atomically through a lock. It's presented as an elegant win. In my experience it is exactly the conclusion that breaks in production.

I have built agent teams for real — an orchestrator plus a developer, a reviewer, a tester, a security engineer, with parallel review and tests. And the expensive part there is not the task-claiming mechanism at all. The expensive part is the loop guard (I cap hard at five iterations, otherwise the agents fall into an endless ping-pong of edits), conflict resolution through the lead (not autonomously — a mediator is mandatory, otherwise two agents silently overwrite each other's work at the level of meaning, not the file), and review fatigue (nobody honestly reviews forty-eight pages of changes). A lock on a file does not save you from a race at the level of intent. "Remove the coordinator entirely" sounds like scalability, but in practice it is giving up the single place where merge decisions are made. I deliberately kept the lead orchestrator — and I don't regret it.

Permissions: regex versus intent

The same blindness shows up in the governance chapter. The article's model: a YAML file with three lists (always_deny, always_allow, ask_user), and a regex over the command string decides the fate of a tool call. For rm -rf / it works. But the genuinely hard case is not the one a regex catches. My own instructions carry a separate gate for production actions: kubectl exec against a shared prod pod, SSH onto a live host, terraform apply against live infrastructure — all of it requires explicit, by-name approval within the current session. No regex over a string can express that, because the danger here is not in the command's syntax but in its semantics: the same kubectl exec is harmless in a sandbox and unacceptable against shared prod. The article shows a permissions skeleton and calls it a "structural property." A skeleton — yes. But safety lives at the level of intent, and a regex does not reach that far.

Phase 5: where the reconstruction is at its most honest

Curiously, the most indisputable part of the article is the last one: parallel tool execution via asyncio.gather, interrupt injection for steering on the fly, prompt caching, and a runtime for MCP. Here Khan has nothing to invent — these are measurable, observable properties, and they are described honestly. Parallel tool calls within a single turn and cache-prefix reuse are things I lean on every day. I would only shift one emphasis: prompt caching is presented as a performance detail, while in real work it is a load-bearing economic beam. The cache's time-to-live (on the order of five minutes) directly dictates the rhythm of my long loops and deferred tasks — not "how to go faster," but "when it even makes sense to wake up at all." The MCP runtime, meanwhile, is the one place the article touches genuine extensibility: any external server becomes a new entry in the tool registry, and here the reconstruction and production line up exactly.

What to do with this

Read the article — as a map it's good, and honest breakdowns of Claude Code's mechanics are rare. Build this harness over a weekend; it's worth it: you'll stop treating the agent as magic and see the bare loop. But don't walk out of the exercise thinking "now I'm my own Anthropic." Walk out with the opposite thought: if the skeleton is this cheap, then all of your work is above the skeleton. Tool descriptions tailored to your domain. A memory layer that survives sessions not as a toy .agent_memory.md but for real. Context routing for your teams and clients. Gates on actions a regex cannot describe. Anthropic's harness gives you an empty, perfectly assembled frame. The value is in the muscle you grow onto it — the muscle nobody, Anthropic included, reproduces in an evening.

What gets reconstructed is precisely the part that costs nothing

Folklore names

Where the author is right, and strongly so

But the conclusions in the multi-agent part are backwards

Permissions: regex versus intent

Phase 5: where the reconstruction is at its most honest

What to do with this

Related articles

The Short Memory File Collapses at Scale: Tokens Versus Rule Precedence

Agentic AI Security: Porting Enterprise Patterns Down to a Solo Harness

Saving Tokens in Agents: What the Survey Gets Right and What It Over-sells