Applied Intelligence
Module 12: Knowing When Not to Use Agents

Context That Can't Be Conveyed

The knowledge that exists nowhere in text

Every codebase contains two systems: the code itself and the invisible context that explains why. Agents read the code. They cannot read the context.

Developers routinely overestimate what agents can learn from files. An agent can parse every function, trace every import, index every test. What it cannot do is infer the five years of decisions, failures, and compromises that shaped the architecture.

The philosopher Michael Polanyi called this "tacit knowledge": we know more than we can tell. A senior engineer "knows" certain modules are fragile without articulating why. They "feel" when a change will break something—and they're usually right. This knowledge resists documentation because it was never explicitly learned. It accumulated through hundreds of debugging sessions, late-night deployments, and whiteboard conversations that nobody recorded.

Research estimates 70-90% of organizational knowledge exists in tacit form. For software teams, the figure may be higher. One study of agile development found that 16% of commits occurred without corresponding issue tickets. 42% of those undocumented changes proved necessary for operation. The code required knowledge that was never captured anywhere.

Tribal knowledge and undocumented conventions

Every team operates with conventions that exist only in shared understanding.

These include implicit style preferences beyond what linters enforce, naming patterns that evolved organically, which modules are "safe" versus landmines, who wrote certain code and what they intended, and which external systems have undocumented quirks requiring workarounds.

New developers don't learn these from documentation. They learn through osmosis: code reviews, pair programming, casual conversations, and mistakes that teammates correct.

Research on software teams found tacit knowledge is "acquired and shared directly through good quality social interactions and through the development of a transactive memory system." The study of 181 team members across 46 organizations concluded that social interaction quality and transactive memory both predict team effectiveness—but neither can be written down.

Agents have no access to social interactions. They cannot call a colleague, ask why a pattern exists, or receive a knowing look when they touch dangerous code. Every session begins from scratch, without the informal knowledge network that makes teams effective.

One practitioner nailed it: "AI coding assistants are like new hires on their first hour on the team—they know absolutely nothing about your codebase, coding conventions, business, users, architectural vision, or preferred libraries." But unlike new hires, agents never learn. Each session resets to day one.

Historical context: decisions whose rationale was lost

Codebases accumulate decisions. Some were carefully considered. Others were expedient fixes under deadline pressure. Over time, the rationale evaporates, leaving only the implementation.

A documented case: a developer found a Sleep call in an authentication library that appeared pointless. Local tests passed when removed. Production broke for a subset of users. The Sleep had been added as a temporary buffer for poor performance in an unrelated module on specific operating systems. The original developer's reasoning was never recorded, making the code appear unnecessary until its absence caused failures.

Chesterton's Fence, applied to code: before removing something that seems useless, understand why it was built. But understanding requires context that may no longer exist. The original developer left the company. The discussion happened in a chat that wasn't archived. The problem it solved may have disappeared, leaving the solution looking like an artifact.

During a legacy migration at Salesforce, engineers encountered "extensive logic accumulated across numerous files, lacking documentation and featuring heavy use of static methods and tightly coupled class designs." Translation tools produced code that "appeared syntactically correct but deviated from expected runtime behavior" because the intent behind design choices was undocumented. The code worked. Why it worked was lost.

Architectural Decision Records address this problem for teams that use them. Most teams do not. Research identifies the barriers: "bad timing, unclear benefits, and overhead." Writing down rationale takes time during periods when time is scarce. The benefit is deferred—useful only to future maintainers who may never exist. The overhead compounds when decisions change faster than documentation can follow.

For agents, every undocumented decision is a trap. The agent sees code that looks suboptimal and "improves" it. The improvement breaks something because the "suboptimal" code served a purpose the agent couldn't infer.

Any code that looks unnecessary but has survived code review probably serves a purpose. If you cannot determine the purpose, do not delegate its modification to an agent. The agent will optimize it away.

Organizational dynamics and political considerations

Technical decisions don't exist in a vacuum. They emerge from organizational contexts that shape what's possible, preferred, and permitted.

Melvin Conway observed this in 1967: organizations design systems that mirror their communication structures. He cited a case where eight people were assigned to develop compilers—five for COBOL, three for ALGOL. The COBOL compiler ran in five phases; the ALGOL compiler ran in three. The architecture reflected team structure, not technical optimality.

The pattern holds. A technical leader facing a large project with six distributed teams declared: "There are going to be six major subsystems. I have no idea what they are going to be, but there are going to be six of them." The architect deliberately aligned decomposition with organizational boundaries, knowing communication barriers would necessitate limited component interactions.

Microsoft's analysis of Windows Vista confirmed this empirically: organizational metrics were "statistically significant predictors of failure-proneness." Team structure predicted where bugs would appear.

Understanding organizational context matters. Certain architectural decisions reflect political compromises, not technical preferences. Module boundaries may exist because two teams couldn't agree on ownership. API designs may include inefficiencies because they had to accommodate a powerful stakeholder's existing system. Code that looks irrational often preserves peace.

Agents have no organizational model. They cannot infer that the convoluted authentication flow exists because legal required it, or that the redundant data layer serves a team whose budget depends on maintaining it. They see technical artifacts without political context.

When an agent suggests consolidating redundant systems, it may be technically correct and organizationally catastrophic. The redundancy might be the only thing preventing a turf war from resurfacing.

The session reset problem

Unlike human developers, agents don't accumulate knowledge. Every session starts from zero.

A developer working on a codebase for six months builds understanding incrementally. They remember that the payment module is sensitive, that the database schema has quirks, that certain tests are flaky on Tuesdays. This knowledge doesn't disappear when they close their laptop. It persists, deepens, and eventually becomes intuition.

Agents have no equivalent. The context window contains the current conversation. When the session ends—or when the window fills and requires resetting—that learning vanishes. The next session begins with the agent reading the same files, making the same inferences, potentially repeating the same mistakes.

Research found that when developers provide context manually, 54% report the agent still misses relevant information. This drops to 33% with autonomous context selection and 16% when context persists across sessions. That gap between 54% and 16% is the cost of session resets: useful context that existed and was lost.

The METR study that found experienced developers were 19% slower with AI tools identified context integration challenges as a primary factor. Developers spent time re-explaining project context that a human collaborator would have retained. The time saved by automation was consumed by context reconstruction.

The CLAUDE.md file partially addresses session resets by persisting context across sessions. Treat it as the agent's long-term memory: anything learned through painful iteration should be recorded there. Otherwise, the next session will repeat the iteration.

Enterprise context that's too vast to provide

Enterprise codebases contain knowledge that exists nowhere else: performance decisions from production incidents, architectural patterns from infrastructure migrations, business logic distributed across databases, middleware, and legacy methods.

As one analysis put it: "This isn't knowledge you can easily document. It's accumulated understanding built through years of debugging production issues, performance optimizations, and workarounds for third-party service limitations."

The scale problem compounds the tacit knowledge problem. A small codebase might have a few undocumented conventions. An enterprise system might have thousands. Each service, each integration, each legacy module carries its own invisible context.

At a 50-person company, senior engineers know the product intimately. Vague specifications get caught in code review. Tribal knowledge fills the gaps. At 500+ engineers, that breaks down. The tribal knowledge fragments. No single person holds the full picture. Documentation efforts produce artifacts that age immediately.

Agents operating in enterprise contexts face compounded uncertainty. They cannot know which architectural decisions were deliberate versus expedient. They cannot trace the reasoning behind API contracts that seem arbitrary. They cannot identify which workarounds are essential versus outdated.

Research found agents achieved only 34.2% accuracy on domain-specific requirements in enterprise contexts. In regulated industries—healthcare, finance—success rates dropped to 22.7%. The gap between benchmark performance (65-70%) and enterprise reality (34%) is the weight of unconveyable context.

What this means

The knowledge gap between what agents can read and what they need to understand creates a permanent ceiling on delegation.

Tasks that succeed—documentation, boilerplate, test scaffolds—have low context requirements. The relevant information fits in the context window. Success criteria are verifiable without domain expertise. Error consequences are contained.

Tasks that fail—architectural changes, business logic, legacy modifications—require context that cannot be conveyed. The information doesn't fit because it was never captured. Success depends on judgment requiring understanding not present in any file. Consequences propagate beyond what the agent can verify.

The pattern from Page 1 applies: agents succeed at structured tasks and fail at judgment tasks. Context is why. Structured tasks have explicit context. Judgment tasks require tacit context that doesn't survive translation to text.

The practical response is conservative delegation. When working with unfamiliar code, read it yourself first. When modifying systems with historical baggage, understand the baggage before involving agents. When touching code that has survived without rationale, assume the rationale exists somewhere you haven't found.

Agents are powerful tools for executing well-understood work. They are not substitutes for understanding.

On this page