Collecting The Sparse Bits Between

Protect the spec. The rest is derived.

Jan 30, 2026

Shameless 🔌… We’re building tools for this at SpecStory. Checkout intent.build.

Andrej Karpathy recently described a phase shift in his own practice. In November, he was coding 80% manually with 20% agent assistance. By December, the ratio had inverted: 80% agents, 20% touchups. “I really am mostly programming in English now,” he wrote, “a bit sheepishly telling the LLM what code to write... in words.”

In a separate post before the end of 2025 he named it:

Andrej Karpathy@karpathy

I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become

5:36 PM · Dec 26, 2025 · 16.5M Views

2.63K Replies · 7.53K Reposts · 55.9K Likes

“The profession is being dramatically refactored, as the bits contributed by the programmer are increasingly sparse and between.”

Sparse.

The human contribution to software is not disappearing but it is becoming discontinuous. Concentrated in moments of judgment rather than distributed across hours of implementation. The code still gets written and the tests can be made to pass. But what the developer actually contributes is no longer the code itself. It is the specification that precedes it, the evaluation that follows it, the context that shapes it.

What remains in the gaps?

Drew Breunig whom Karpathy highlighted recently published a software library with no code. Wait… come again?

The library is called whenwords1. It formats relative time ”three hours ago,” “next Tuesday” in the style applications need.

It works in Ruby, Python, Rust, Elixir, Swift, PHP, Bash, and Excel of all things.

The repository contains only three files: a specification document of roughly 500 lines, a test suite, and instructions that say, essentially: paste this into Claude.

The library generates itself on demand. The specification is the product here and the implementation is ephemeral made to be conjured, used, discarded.

So this raises some interesting questions: If the programmer’s contribution is specification and judgment, and if agents reliably translate specification into working code, why distribute implementations at all? Why not distribute the intent and let implementation crystallize locally?

Breunig tested it. In his case, Claude has never failed to generate a working whenwords in any language he tried.

He is careful to note the limits: whenwords has 125 tests. SQLite has 51,445. Spec-only today distribution works for small, stable, well-defined functions.

When the cost of generating code approaches zero, the artifact that persists is the specification. The code is momentary crystallization. The intent is the durable thing.

If that’s true, the intent is also the defensible thing. The spec is the moat now. Not the code.

Karpathy again: “Don’t tell it what to do, give it success criteria and watch it go.”

If you seek tranquillity, do less.” Or (more accurately) do what’s essential—what the logos of a social being requires, and in the requisite way. Which brings a double satisfaction: to do less, better. —

Living in between

Peter Steinberger is living in the sparse bits between more aggressively than almost anyone I’ve seen.

His Moltbot went super viral. It’s an open-source personal AI agent that accumulated 60,000 GitHub stars in 72 hours. But Moltbot to some extent is less interesting than how Pete himself works. In an interview last year with my friend Isaac Flath, he confessed: “I ship code I never read.”

What does he do? He runs 3 - 8 agents simultaneously in a terminal grid, working on a 300,000-line TypeScript codebase by himself2.

Six hundred commits in a single day. When asked about safeguards (sandboxing, feature branches, careful review) his response was blunt: “To actually be super diligent you would have to be very attentive, which kind of defeats the point of moving fast. So I think yes, YOLO is the only way of running agents.”

Karpathy has noticed something else: he’s starting to atrophy his ability to write code manually. Generation and discrimination are different capabilities, he observes. You can review code just fine even if you struggle to write it. Steinberger has taken this further: he doesn’t struggle to write code, he simply doesn’t. The agents write. He reviews.

What makes it possible is his living CLAUDE.md. Which encodes context and constraints. The agents read it and work within it.

But unlike Breunig’s whenwords spec which is a tightly bounded artifact written upfront, Steinberger’s document accumulates. It is not a blueprint drafted before construction. It is the sediment of decisions made, conversations had, constraints discovered. And as specification it grows through use.

I think this distinction matters more than it first appears. We hear “specification” and imagine a static requirements document, handed down before implementation begins.

But the specifications emerging in agent-driven development are closer to institutional memory made explicit. The accumulated context that would otherwise live in engineers’ heads, in Slack threads, in tribal knowledge that evaporates when people leave. This is the asset now. Not the codebase. The codebase can be regenerated. But we all know the institutional memory cannot.

The document is not written once. It is maintained.

Steinberger’s philosophy rejects elaborate workflows: no subagents, no RAG systems, no complex orchestration. Direct communication with models. Intuition developed through practice. The same skills needed to manage senior engineers include clear communication, good judgment about scope, knowing when to trust and when to verify but applied to agents instead.

Karpathy described it as:

Some powerful alien tool was handed around, except it comes with no manual and everyone has to figure out how to hold it and operate it.

Steinberger and others are writing that manual.

Not everyone is comfortable with this. Security researchers have raised concerns that Moltbot’s default installation exposes it to prompt injection, that a compromised agent with shell access could exfiltrate data.

The tension is real. Trust the agents, move fast, accept the risk or slow down, sandbox everything, review what you can. Most organizations will find their own position between these poles.

But the debate itself tells us something. We are arguing about how much to trust systems that write code we do not read.

For sixty years, code has been an asset. Something maintained, documented, understood, protected. Version control exists because code is precious. Code review exists because code is consequential. Technical debt accrues because code persists.

But if code can be regenerated on demand from specification, it is no longer an asset in the same sense. It becomes a derived artifact, like a compiled binary. You do not read the binary, you read the source.

In the emerging paradigm, you do not read the code. You read “the spec”.

Steinberger’s CLAUDE.md. Breunig’s 500 lines. Karpathy’s success criteria that agents loop against until they pass. What persists is not the code but the accumulated intent which is the living document that encodes what should exist and why.

The code is ephemeral. The judgment is not.

Protect the spec. The rest is derived.

https://github.com/dbreunig/whenwords

Much more here: https://steipete.me/posts/2025/shipping-at-inference-speed

Meditations on Tech

Discussion about this post

Ready for more?