The Context Lesson
At AI speed you don't feel the friction until it’s too late
The biggest lesson from the last 70 years of software development is that context, not code, is the product.
Every methodology arrives promising to finally capture what we mean, not just what we write. Waterfall said: specify everything upfront. Agile said: specify minimally, iterate. Each was right within its constraints. Each breaks when constraints shift.
We’ve tried to make the implicit explicit through documentation, through types, through tests. These efforts help.
But ultimately software’s intent still lives in human minds and minds don’t scale.
The constraint has shifted again.
Agents can now implement at speeds that expose a new bottleneck most haven’t digested: we can generate code faster than we can think clearly about what to generate. A three-person team can now ship in three weeks what previously took twenty people six months. But ask the former what changed in week one and why. Silence.
This is our new epistemological crisis.
The implicit has always been expensive but we’ve never been truly upfront about the cost. When a senior leaves they take with them not just their ability but their understanding of why that database design looks wrong, what alternatives were tried in 2019, which customers will revolt if you change that workflow.
This is why fully onboarding developers takes months if not years. Why legacy rewrites fail. Why technical debt isn’t really about code quality but rather context loss.
Reading code tells you what was built. It never tells you what was tried and discarded, what nearly worked, what the PM originally asked for before the compromise, which edge case drove the architecture.
Human developers have compensated for this for decades through what we politely called “engineering judgment”. They asked questions. Inferred missing pieces. Remembered that conversation three months ago where someone mentioned the real constraint was catering to an ask from sales that secured the customer contract to help meet Q2’s goal. They knew which unstated assumptions were safe and which would detonate. This works because humans maintain an oral tradition alongside written code.
AI agents cannot do this. They cannot attend the retro where everyone agreed the feature was actually solving the wrong problem. Cannot access the Slack thread where the CTO said “never use service X for payment processing, we got burned in 2023.” Cannot read between the lines of a Jira ticket that says “make it more user-friendly.”
When you tell an agent “make it more user-friendly,” it invents something. It has no choice. It cannot ask “user-friendly like the onboarding flow you praised last week, or user-friendly like the competitor’s app you told me to avoid?” It cannot distinguish “we tried that and users hated it” from “no one’s thought of that yet.”
The consequence is direct: everything implicit must become explicit. Not just because agents demand it but because the multi-voice coordination of teams directing AI at speed demands it.
Here’s what today looks like in practice. One developer works with Cursor agent, building a payment flow. Their shared context lives in the conversation: “make it like Stripe but simpler,” “add validation like we did for the signup form,” “handle the edge case we discussed yesterday.” This works. Their agent has enough context to implement correctly.
Now add two more developers. One is building user settings. One is building the admin dashboard. Each has their own conversation with their own agent. Each conversation establishes different assumptions about how payments work, how errors surface, what “completed” means. They’re all implementing parts of the same system. But today they’re building three incompatible visions of it.
By the time they try to integrate, they discover:
Payment flow assumes synchronous confirmation, settings assumes async webhooks and admin assumes batch processing.
One agent used USD cents, another used decimal dollars, the third used string currency amounts “for flexibility”.
Error handling is different in each implementation because each developer said “handle errors gracefully” and each agent invented a different interpretation.
This is a context coordination problem that AI speed makes painful to reconcile. At human implementation speed its possible to discover during planning. At AI speed, you discover them at integration.
This is where the context lesson lands.
We think we can mitigate context preservation costs using today’s tools and yesterday’s workflow.
We cannot.
Jira doesn’t help when tickets say “implement feature X” but don’t capture the three alternatives the team rejected or why. Slack doesn’t help when crucial context is buried in #random six months ago.
We thought Agile’s oral tradition would scale. It’s showing cracks. “Working software over comprehensive documentation” worked when five people sat in one room and could sync context daily with the customer they’re serving. It fails when those same five people now coordinate 15 agents and want (or are pressed) to develop at 10x speed.
We think git branches can contain complexity but really they multiply it. Each branch is now a different context universe. The longer branches live, the more their contexts diverge. At human speed, we feel this as merge conflicts. Painful but manageable. At AI speed, we feel it as architectural incompatibility.
At AI speed unresolved ambiguity becomes an immediate bug. Unstated assumption becomes immediate divergence. Every “we’ll figure it out later” becomes “shit we’ve built three different systems.”
Separately, research supports what practitioners are discovering: we are hitting the limits of the model and not because the model is incapable. Far from it.
Even if you try to make everything explicit, the current generation of models fights you in another way: they’re bad at holding long, dense context.
Nelson Liu et al.’s 2023 work on long context windows shows that language models cannot robustly process information in the middle of long contexts and that performance degrades significantly when relevant information is not at the boundaries.
This is why context loading decisions matter (see below). You’re not just managing what the agent sees, you’re managing WHERE it sees it in the prompt. Put critical context in the middle of a 10,000 token prompt and the agent’s performance tanks. This is the “lost in the middle” problem, and it happens dozens of times per day in every AI-assisted codebase.
Think about what this means. An AI agent based on today’s LLMs given a two-hundred-page specification is going to generate chaos. Not because the spec is “bad” but because it can’t efficiently pay attention and decompress all of the expressed intent. And especially without external original context. Like:
The customer complaint that drove the redesign
The regulatory concern that killed the first approach
The vendor limitation that forced the workaround
The CEO’s offhand comment that actually set the real requirement
All of this is useful “context.” None of it may fit nicely in “a prewritten spec”. All of it is necessary to implement correctly. Humans reconstruct this through questions and inference. AI agents can’t do it reliably.
But we know the teams succeeding right now have found ways to preserve context1. Teams struggling are trying to use AI agents within code-centric workflows designed for human bandwidth constraints, where implicit knowledge is acceptable because humans can and do ask clarifying questions.
The adaptation requires inversion.
Time is a river, a violent current of events, glimpsed once and already carried past us, and another follows and is gone. — IV. 43
Specifications aren’t documents that precede code: they’re living conversational artifacts that evolve with code. When you change the implementation, you change “the spec”. When you update the spec through conversation you regenerate the implementation.
Tests aren’t verification but rather contracts that define behavior. “The payment should work” isn’t a test. “When a user submits a card, we authorize immediately, capture after confirmation email click, and handle ‘insufficient_funds’ by showing the saved-card selector” is a contract. An agent can implement from that. A human can verify it. Neither needs to guess.
Documentation in all its forms is not overhead. It’s the actual product, with code as one implementation of it. This sounds backwards until you realize: the code will be rewritten. The platform will change. The framework will be replaced. The context, the why, the tradeoffs, the failed alternatives, the actual requirements is what must persist and evolve.
When everything is made explicit, anyone (human or agent) can implement correctly. When much is implicit, only the person who was in the room can implement correctly, and they can only do it once before they forget the context themselves.
If context is the product, then someone has to be responsible for shaping, storing, and routing it. That’s not traditional ‘coding’ work but rather decision engineering.
Decision engineering is to context-centric development what software engineering was to code-centric development: the discipline of making choices explicit, traceable, and executable.
But this too creates new work most developers have never been trained for.
There are new micro-decisions that sit beside the strategic and compound in ways that aren’t obvious:
Context loading: Before prompting, which files do you include? A seasoned developer instinctively knows to reference the auth middleware and the error handling pattern, the logging utility and the test fixtures. That intuition came from their 10 previous implementations and rewrites. Now a junior must make that decision consciously, every time. Include too little, the agent lacks critical context. Include too much and you might confront the “lost in the middle problem”.
Model selection: GPT-5.1-Codex-Max for architectural planning? Claude 4.5 Sonnet for user story elaboration? Gemini 3 for implementation? Humans switch cognitive modes seamlessly: deep thinking for architecture, procedural execution for implementation. With AI, you must explicitly choose the tool that matches the cognitive mode required. Most developers have never thought about their own thinking this way.
Precision calibration: An agent that reads “implement authentication” might build OAuth 2.0 with refresh tokens, rate limiting, session management, and audit logging. Humans understand scope from context: ”we’re building an MVP” vs “this is for healthcare data.” Agents don’t. You must specify explicitly: “username/password only, store in postgres, hash with bcrypt, session expires in 24 hours, no social login, no 2FA yet.” Under-specify and you get over-engineering. Over-specify and you’re doing the implementation yourself in prose.
Intervention timing: A human developer coding the wrong solution typically self-corrects within minutes because they hit a design smell and pivot. An agent can generate 500 lines implementing an approach that won’t work, and you must decide: interrupt early and waste potential good work, or interrupt late and untangle a mess. You cannot rely on the agent to self-correct without instruction or an under specified objective. You must recognize wrong paths during implementation which requires understanding the solution space before seeing any code2.
This is the new and constant work of decision engineering. Purtell’s 2025 analysis captures why this matters: “Storing critical system information solely in human minds—and doing so more often as time goes on and AI becomes a bigger part of software—is not a good idea. The I/O bandwidth is low, the information degrades quickly, and collaboration scales poorly.”
Every time you make one of these decisions implicitly, you create context that exists only in your head. Every time someone else prompts the agent without that context, they generate something that won’t integrate with what you built.
The context lesson is that general methods leveraging explicit context always win, eventually.
We tried making code self-documenting. We tried making processes lightweight. We tried making knowledge implicit. Each reduced cognitive load for individuals. Each increased coordination cost for teams. At human speed, this tradeoff was acceptable because humans absorb coordination cost through meetings, questions, and osmosis.
At AI speed, it’s catastrophic. AI agents don’t absorb coordination cost. They amplify it. Give three agents three slightly different contexts and they will generate three incompatible systems at speed.
The teams that will dominate will make intent first-class. Not intent as aspirational documentation that diverges from reality within a sprint. Intent as versioned, tested, executable knowledge that agents consume and humans understand. This means:
Capturing decisions, not just outcomes: “We chose PostgreSQL” vs “We chose PostgreSQL over MongoDB because our data is highly relational and over MySQL because we need JSONB support for flexible schemas and because we need specific extensions”.
Linking prompts to implementations: When you look at code, you can trace back to the conversation that generated it. When that code breaks, you can see what context was present when it was created.
Treating conversations as “commits”: The discussion about whether to cache at the API layer or the database layer is as important as the caching code itself. Currently, the code goes in git. The discussion disappears into Slack.
Robert C. Martin was right: specifying requirements precisely enough for a machine to execute is programming. But the machine changed. The precision required changed. The audience changed. It’s no longer enough to write code humans can read. We must write specifications both humans and agents can execute, independently and collectively, without diverging.
Teams that master this transition will operate at a different scale entirely. Not because they write more code—they write less. Not because they move faster—they move more coherently. They build systems where:
Context is preserved and can be reconstructed: When someone asks “why does the checkout flow work this way?” the answer isn’t “ask Jennifer, she built it” but “read the decision log, here’s the three alternatives we tried, here’s why this one won, here’s the customer feedback that drove it”.
Decisions are explicit, not inferred: When an agent needs to implement a feature, it doesn’t guess at requirements from similar features. It reads the explicit specification of what this feature must do, what constraints matter, what tradeoffs are acceptable.
Knowledge accumulates, not degrades: Every implementation adds to shared context. Every decision becomes reference material. Every mistake becomes documented learning. The system gets smarter over time instead of dumber as people leave.
This requires a fundamental reorganization of how software is created. The abstraction and primary artifact is not just code but a time ordered sequence of decisions (from small to large).
Most haven’t fully internalized this yet and they’re using AI agents to write more code faster within existing code-centric workflows. This works for individuals: one person, one agent, one conversation, one context universe. It fails for teams…. Three people, six agents, three hundred conversations, three context universes trying to be applied against one codebase.
It will fail harder as agents become more capable because more capability without explicit context just means faster divergence.
The skills that made someone a “10x developer” in the code-centric era remain necessary but are no longer sufficient. You still need to write clean code. You still need to understand algorithms. You still need architecture sense.
But now you also need to externalize your thinking.
Make your context explicit. Specify precisely enough for machines while maintaining flexibility for humans.
This is a different skill entirely. Most 10x developers became 10x by internalizing vast amounts of context and applying it through intuition. Now they must externalize that context and make it queryable by both humans and machines.
Teams that solve this move at fundamentally different speeds. Not 10x faster. Not 100x faster. Categorically different.
A code-centric team’s bottleneck: “We need to implement five features, each takes two weeks, we can parallelize across three developers, so roughly three to four weeks total.”
A context-centric team’s bottleneck: “We need to specify five features clearly enough that both humans and agents understand them identically. That work takes a few days of conversational decision engineering. We build 30 prototypes, discover the real requirements, then five agents implement all five features in parallel in days.”
The great irony is that we’ve always known this.
We knew context mattered. We just convinced ourselves we could keep muddling through because making things explicit seemed too expensive. We accepted this as normal. “It takes time to learn the codebase.” But what were they actually learning? Not the code… any competent developer can read code. They were learning the context of decisions. Why this design? What was tried before? Which parts are critical? Which parts are vestigial? What assumptions can be changed?
All of that context existed. None of it was neatly preserved. So every new person rebuilt it from scratch, imperfectly, by archaeology, intuition and social interaction.
The context lesson is bitter because it was always true. Leverage belongs to those who can externalize context so that humans and agents can execute on it.
Context is still the product. But decisions must now be engineered.
While it applies to building agentic long running systems, the takeaway from “Don’t Build Multi-Agents” is equally applicable to building software with agents: every part of the system must understand the full nuance of the task, not just a slice of it.
This is the biggest reason why I don’t love async agents.


