Commentary

The Short Memory File Collapses at Scale: Tokens Versus Rule Precedence

Keeping the memory file tiny is right for a solo repository. But compliance rules from several organizations cannot be pinned to paths — they must be routed.

Aleksandr Khomutov Original author: Anubhav May 23, 2026 ≈ 5 min

Anubhav spent six months tuning his agent setup and laid out the eight-layer stack that, in his words, «finally worked»: a short project memory file (he insists on keeping it «under 500 tokens on purpose» for cache hits), path-scoped rules, plan mode, a single read-only reviewer subagent running on a cheap model, skills, hooks, five MCP servers, and parallel worktrees with headless mode in CI. His conclusion is blunt: «The stack is the workflow. The workflow is the multiplier. The prompt is just the last five percent.» I agree with that closing line completely. My disagreement is elsewhere: I have been building a similar stack for far longer than six months — with multi-agent teams, parallel review and tests, context routing, and layers of hard rules — and the very first layer Anubhav calls the foundation is the one that collapses at my scale.

My thesis is simple. The advice «keep the memory file tiny» is correct for a solo developer in a single repository. But the moment you have compliance rules from several organizations that cannot be pinned to file paths, a small flat file stops being a virtue and becomes a missing capability. Anubhav conflates two different problems: the ambient token cost of context and the authority and precedence of rules. His eight-layer recipe solves only the first.

Where he is right, and genuinely so

Almost everything in his article is where I once started, and these are the right primitives. Path-scoped rules save tokens exactly as he describes: a file of migration conventions should not load when the agent touches the frontend. Plan mode with a read-only subagent explicitly denied write and edit is sound hygiene — exploration and review live in an isolated context, not the main loop.

His emphasis on Deferred Permissions is especially good. Being able to pause an agent mid-run in headless mode, approve the action out of band, and resume from exactly the same place is the right primitive for nightly jobs. Before this, as he accurately notes, a nightly run that needed to push to main either ran with permissions disabled or failed at 3am. His subagents with a narrow tool allowlist, downshifted to a cheap model, are also sensible: the expensive model stays on the hard reasoning while the reviewer runs cheaply in the background. I do not argue here — I use this.

Where it hits its ceiling

The ceiling begins exactly where Anubhav lays the foundation. His first layer is the advice to keep the root memory file tiny because «Cache hit rates drop noticeably past ~500 tokens». The token argument is correct. But he silently equates «cheaper in tokens» with «better structured» — and those are different axes.

My rules arrive from several organizations and teams at once. Some are hard compliance rules: what may and may not appear in commit messages for certain remotes, which production actions require named approval, which organization rules a narrower team layer cannot override. These rules do not glob by file path. They must fire by organization, team, and function — by where you are and on whose behalf you are working, not by which file you are editing. Anubhav's path-scoped mechanism simply does not apply here: there is no glob that expresses «this organization rule overrides the team rule.»

So I do not stuff everything into one file and shrink it — I route the rules at session start along three axes: organization, team, function. Only the relevant triple loads into context. That solves both problems at once: tokens and precedence. Anubhav solves only tokens; his «small file» removes the ambient cost but says nothing about which rule wins when two layers contradict each other. At his scale the question never arises. At mine it arises on day one.

The second place it hits its ceiling is his single read-only reviewer. A reviewer subagent with no termination guarantee is dangerous precisely because it silently assumes one review pass converges. In my setup review and tests run in parallel, and they sometimes conflict — the reviewer says «blocker», the tester says «green». Who is right? A single subagent has no answer to that, and no upper bound on iterations. So my architecture has a hard guard of five iterations and a lead orchestrator that arbitrates the reviewer-versus-tester tie. That is exactly the expensive part a team architecture takes on — and the part a single cheap-model reviewer does not cover.

One more detail about his approval gate. His safety-net hook for pushes to main is a string match: case "$cmd" in *"git push"*" main"*). It catches the familiar shape of the command. But a string match is trivially bypassed — by another synonymous command, a variable, an alias. My approval gates are semantic: they fire on a class of action (a write to a shared external system, a destructive operation in production), not on a specific string. That is the difference between «protection from a typo» and «protection from a bypass».

What to do about it

If you work solo in a single repository, take Anubhav's recipe whole — it is accurate. A small imperative memory file, two or three path-scoped rules, one formatting hook, plan mode for anything risky. Deferred Permissions: mandatory.

But the moment rules from several organizations enter the picture, stop thinking about file size and start thinking about routing and precedence. Separate the two axes Anubhav glues together: tokens are about the cache, authority is about which rule overrides which. Move hard compliance rules into a layer that loads by session context, not by glob. And the moment review becomes more than one pass, give it an explicit upper bound on iterations and an arbiter for conflicts — otherwise the «cheap reviewer» will one day loop forever on an expensive run.

We agree on the essential thing: the stack beats the prompt. Anubhav simply stops at the baseline I started from. His eighth layer is my layer zero.

Where he is right, and genuinely so

Where it hits its ceiling

What to do about it

Related articles

The Harness Is the Cheap Part: Notes on Rebuilding Claude Code

Agentic AI Security: Porting Enterprise Patterns Down to a Solo Harness

Saving Tokens in Agents: What the Survey Gets Right and What It Over-sells