After Legibility

Reflections on 2025: what happened, what will happen and how we keep pace

Jan 02, 2026

NOTA BENE: This meditation runs longer than most. This last year earned it. As does what will be required from us next.

If 2024 was the year AI became a household word, 2025 was the year we spent instructing agents to write code.

We will spend the years that follow learning to live with code that we do not read.

Part I: What Happened

2025 opened with a shock from a small Chinese lab called DeepSeek who released a reasoning model that matched (at the time) the best American efforts at a fraction of the cost, sending Nvidia’s stock tumbling seventeen percent in a single day and Silicon Valley scrambling. That single release cracked open assumptions about what AI development required and who could participate in it.

Within weeks, the DeepSeek app had dethroned ChatGPT atop the App Store. Within months, every major Chinese tech company had slashed their API prices, and American labs reportedly assembled war rooms to reverse-engineer how DeepSeek had done so much with so little.

The technical shifts led to mixture-of-experts architectures (designs that route each query to specialized subnetworks rather than activating the entire model) with Meta’s Llama 4, Alibaba’s Qwen3, Mistral’s Large 3: all adopting the pattern, training models with hundreds of billions of total parameters while activating only a fraction for any given response.

Context windows exploded past what most had previously considered feasible. Anthropic and Google pushed to one million tokens. And retrieval-augmented generation, once considered absolutely essential for grounding models in large document collections, began to feel like a workaround for a limitation that was fast disappearing. Models could now almost “read everything”.

Reasoning itself also changed character. OpenAI’s o-series models and DeepSeek’s R1, all demonstrated that giving models time to think, to generate chains of reasoning before committing to an answer, produced step-changes in reliability on hard problems. By mid-year, “thinking mode” toggles had become standard across frontier models. Users could choose: fast responses for casual queries, extended deliberation for mathematics, coding, or anything requiring careful logic. The models began to show their work, and in doing so, became easier to trust and easier to correct.

Multimodality ceased to be a feature and rather became a norm. The new crop of models fused modalities from the start, training on text, images, and video together so that understanding flowed between them. Gemini 3 could “watch a video” and reason about what it saw. Claude could look at a screenshot and click the right button.

What followed through the remainder of 2025 was relentless acceleration. OpenAI unified its scattered model lines into GPT-5 in early August via an internal routing system that could reason through complex problems or respond instantly depending on what the query demanded.

Google’s Gemini 3 briefly claimed the throne on every major benchmark in November, prompting internal “code red” memos at OpenAI and a hastily released GPT-5.2 before the year was out.

Anthropic’s Claude 4.5 learned to use computers the way humans do by clicking, typing, navigating interfaces and achieving scores on real-world computer tasks that would have seemed impossible a year prior. And Moonshot AI’s Kimi K2 earned quiet respect as perhaps the strongest open model available anywhere.

But the most important shift happened in what we built atop them.

If anything, 2025 marked the year of coding agents that could read entire repositories, plan changes across dozens of files, test their work, and submit pull requests for human review.

Devin which was initially dismissed as overhyped vaporware when it stumbled through early demos, matured into something that by year’s end was writing a quarter of its parent company’s Cognition own code.
Claude Code brought terminal-first agent development to the masses (leading to many copied efforts by Google, OpenAI, AMP, etc) and,
Cursor reached a nine-billion-dollar valuation on the promise that the IDE itself was the most important new piece of software.

Underneath it all, the plumbing matured. Orchestration frameworks moved from experimental libraries to production infrastructure, letting companies deploy fleets of specialized agents working in concert.

Anthropic’s Model Context Protocol emerged as a kind of lingua franca for tool integration and they donated it to the Agentic AI foundation.

The economics shifted too. DeepSeek’s efficiency gains early in the year triggered price wars that compressed margins across the industry. API costs began to fall by orders of magnitude. At the time of this writing, DeepSeek was offering inference at three cents per million tokens in output. The question of who could afford to build with frontier models is now quietly dissolving.

What we know is that the frontier has moved so far that the debates of early 2024 as to whether AI could really code, whether reasoning models were a gimmick, whether open-source could compete are beginning to feel like quite distant history.

The questions we carry into 2026 are different and harder.

You cannot quench understanding unless you put out the insights that compose it. But you can rekindle those at will, like glowing coals. I can control my thoughts as necessary; then how can I be troubled? What is outside my mind means nothing to it. Absorb that lesson and your feet stand firm. You can return to life. Look at things as you did before. And life returns. — VII. 2

Part II: What Will Happen

The year ahead will not slow down and we won’t be able to easily catch our breath. The challenges of 2026 will be less about capability and more about comprehension and how we maintain any meaningful relationship with the systems we are building.

After Legibility

For sixty years, software development has rested on an implicit assumption: that code is a text meant to be read.

We write it in languages designed for human comprehension. We’ve reviewed each other’s work line by line. We maintain it by understanding it. The entire craft has been organized around the principle that a sufficiently determined person can sit down, read the source, and know what a program does.

When an agent writes a thousand lines of code in response to a natural language prompt, the human who prompted it faces a choice. They can read every line, understand every function, trace every edge case.

Or they can trust.

They can write and run tests, check that the outputs look right, and move on.

The economics of our current situation will push relentlessly toward trust.

Competitive pressure will reward teams that ship fast and penalize those that insist on understanding everything.

This is not hypothetical. It is already happening.

And this year the tension will become organizational. Companies will discover that their codebases have grown in ways no one fully understands. Engineers will inherit systems where significant portions were written by agents guided by people who have since left, with prompts that were never saved and reasoning that was never recorded. The code will work. The tests will pass. And no human will be able to explain, with confidence, why.

The Specification Becomes the Source

When code review as we know it today becomes impossible, something else must take its place. The most likely candidate is the specification itself: the prompts, constraints, requirements, and architectural decisions that guided the agent behavior.

If we cannot audit the implementation, we can at least audit the intent.

This represents a fundamental inversion. For decades, what most consider “specifications” have been treated as aspirational documents, often abandoned or outdated by the time software ships.

The code was the truth.

But when agents write the code, the specification becomes the primary artifact that humans can meaningfully control. The prompt produces the product. The requirements document is the source code. The architecture diagram is the implementation.

Organizations that thrive in 2026 will be those that recognize this shift and build practices around it. They will version their prompts with the same rigor they version their code. They will review changes to system instructions the way they once reviewed pull requests. They will invest in tools that trace outputs back to the intentions that produced them: not just “what did the agent generate?” but “what were we trying to achieve, and how did we describe it?”

This will require new skills. The ability to write precise, unambiguous specifications: long considered a secondary competency, something adjacent to “real” engineering will become primary. The engineers who advance will be those who can articulate exactly what they want in language that both humans and machines can interpret consistently. Prompt engineering, once dismissed as a transitional skill, will reveal itself as something more enduring: the craft of encoding human intent into machine-actionable form. Context engineering will go entirely mainstream.

Shared Mental Models

Software teams have always relied on shared mental models, that is the collective understanding of how systems work, why decisions were made, and where the bodies are buried.

These models live in documentation, certainly, but more often in the heads of long-tenured engineers, transmitted through code review, pair programming, and hallway conversation.

They are what let a team move fast without constant coordination, because everyone already knows the shape of the system.

Agentic development threatens these models in ways we are only beginning to understand.

When an agent writes code, it does not absorb the team’s culture. It does not know that we never use that library because it caused an outage two years ago, unless someone thinks to tell it. It does not know that this particular naming convention signals that a module is deprecated, or that this function is never called directly because it has subtle concurrency bugs that only manifest under load. The agent operates from its context window and its training, not from the accumulated wisdom of the people who have maintained the system.

The result is a kind of erosion.

Each agent-generated change that violates an unwritten norm is a small crack in the team’s shared understanding. Over time, the codebase drifts from what any human would have written. It becomes something alien: functional, passing tests, but organized according to principles that emerged from the intersection of training data and prompts rather than from deliberate human design.

In 2026, organizations will need to confront this erosion directly.

The most sophisticated teams will develop practices for maintaining alignment between agent behavior and human intent: explicit style guides that agents are instructed to follow, architectural decision records that get included in every prompt, persistent context documents that encode not just what the system does but why it does it that way.

They will treat the agent not as a tool but as a new team member who needs onboarding, who needs to be told things explicitly that the humans take for granted.

This is harder than it sounds. The knowledge that matters most is often tacit: the kind of thing engineers know but cannot easily articulate.

Extracting it, formalizing it, and encoding it in forms that agents can use will be a discipline unto itself.

Some organizations will develop “context curators” whose job is to maintain the living documentation that keeps agents aligned. Others will fail to do so and will watch their systems become unmaintainable as the humans who understood them gradually lose their grip.

The Hierarchy of Trust

When humans write code, we trust them differentially.

A senior engineer’s changes might be approved with a glance (LGMT, am I right?)
A junior engineer’s work probably gets scrutinized line by line.

This hierarchy of trust is one of the ways we scale: we don’t have infinite attention, so we allocate it based on track record.

Agents complicate this hierarchy. How much trust should we place in an agent’s output?

The answer is: it depends. It depends on the model, the prompt, the domain and the stakes. An agent generating a marketing page deserves different scrutiny than one modifying authentication logic. But our tools and processes are not yet built to encode these distinctions.

In 2026, I believe we will see the emergence of trust frameworks for agentic development. These will likely include tiered review processes, where agent-generated changes in high-risk areas trigger human review while low-risk changes flow through automatically. They will include automated auditing tools that flag deviations from established patterns or unexpected changes to sensitive code paths. They will include more sandboxing and monitoring that lets agent-generated code prove itself in staging before reaching production.

But the deeper challenge is epistemic. How do we know when we can trust?

The agent’s output might look correct, pass tests, and even seem well-designed… but “seeming” is exactly the problem. The history of software is littered with bugs that passed every test and fooled every reviewer, only to fail catastrophically in production.

When humans write such bugs, we conduct postmortems and update our practices. When agents write them at scale, the failure modes multiply.

The organizations that navigate this successfully will be those that develop a sophisticated understanding of where agent output can be trusted and where it cannot.

They will learn, through painful experience, which domains are amenable to agentic development and which require direct human implementation. They will build institutional knowledge about the failure modes of specific models, the kinds of prompts that produce reliable results, and the warning signs that something has gone wrong. This knowledge will become a competitive advantage and a form of operational expertise that cannot be easily copied.

The Role of the Human

What, then, is the role of the human in software development after 2025?

The honest answer is that we do not yet know. But the outlines are drawn.

Humans will become architects in a more literal sense than before. They will design the structures within which agents operate: the boundaries, the interfaces, the invariants that must be maintained. They will specify what should exist without necessarily specifying how it should be implemented. This is less a demotion than a shift in altitude and many will revel in the shift from writing code to designing systems, from solving problems to defining problems worth solving.

Humans will become judges. When agents produce multiple candidate solutions, humans will evaluate them not by reading every line but by assessing outcomes: Does it work? Does it perform? Does it behave correctly under adversarial conditions?

The skill will be less in writing and more in discernment: knowing what questions to ask, what tests to run, what edge cases to probe.

Humans will become curators of context. They will maintain the documents, examples, and instructions that shape agent behavior. They will decide what the agent should know, what principles it should follow, what constraints it should respect. This curatorial role is more important than it might sound. The agent’s output is only as good as the context it receives, and providing good context requires deep understanding of both the problem domain and the agent’s capabilities.

And humans will remain the locus of responsibility. When an agent-generated system fails, when it causes harm, when it does something its creators did not intend: a human will be held accountable. This is not merely a legal or organizational fact. It is an ethical one. The decision to deploy an agent, to trust its output, to put it in a position where it can affect the world and these are human decisions, and they carry human responsibility. No amount of automation can discharge that burden.

Who Thrives

The organizations that navigate 2026 successfully will not be those that adopt agentic tools fastest or most aggressively. They will be those that adopt them most wisely an that understand the tradeoffs, build the supporting practices, and maintain human judgment where it matters.

These organizations will invest heavily in specification and documentation, recognizing that these artifacts are no longer second-class citizens but the primary means. They will develop new roles and skills: context curators, context engineers, agentic architects, trust auditors. They will build tooling that surfaces the provenance of agent-generated code, that traces decisions back to the intentions and constraints that produced them.

They will also preserve space for human understanding. Not all code will be written by agents, and the code that humans write (the critical paths, the core logic, the places where getting it wrong is unacceptable) will be maintained with precision and craft.

There will be a premium on systems that humans can still reason about, still audit, still trust because they understand rather than because they have tested.

The organizations that maintain legibility, even at the cost of speed, will be the ones that avoid catastrophic surprises. They will ship more slowly and sleep more soundly.

The Shape of What’s Coming

None of this will be resolved in a year. The transition from human-authored to agent-assisted to agent-primary software development is a process that will unfold over the next few years at least.

But 2026 is when many of the hard questions will become unavoidable. It is when the early adopters will hit the limits of naive automation, when the true costs of illegible systems will begin to come due, when organizations will be forced to develop real answers rather than hopeful experiments.

The year ahead will not slow down. But if we are thoughtful, if we attend to context, to alignment, to the irreducible need for human understanding we might just keep pace.

Meditations on Tech

Discussion about this post

Ready for more?