Common Context Failures and Their Symptoms

The five failure patterns (memory/forgetting, confusion, brittleness, confabulation, quality degradation); diagnosis techniques

The previous pages established context engineering as the core competency of ASD and introduced the three-layer hierarchy that organizes information flow. This page examines what happens when context fails the specific patterns of breakdown and the symptoms that indicate each failure type.

Recognizing these patterns is essential for diagnosis. When agents produce poor output, the instinct is to rewrite the prompt or blame the model. More often, the root cause is a specific, identifiable context failure that requires a targeted intervention rather than trial-and-error prompting.

The five failure patterns

Context failures cluster into five distinct categories, each with characteristic symptoms and different remediation strategies.

Memory failure (forgetting)

Memory failure occurs when information that was present earlier in a conversation becomes inaccessible to the agent. Unlike human memory, which degrades gradually and unpredictably, agent memory loss follows predictable patterns tied to context window mechanics.

Symptoms:

The agent contradicts decisions made earlier in the same session
Corrections applied successfully in one turn reappear as errors in subsequent turns
The agent asks for information that was already provided
Task requirements established at the start are ignored midway through implementation

Root causes:

The "lost in the middle" phenomenon accounts for much of this behavior. As established in Module 2, agents show U-shaped attention: strong recall for information at the beginning and end of context, with 15-20% performance degradation for middle-positioned content. In long conversations, early context drifts toward the middle as new exchanges push to the end.

Auto-compaction creates additional memory loss. When context approaches capacity, tools like Claude Code summarize conversation history to free space. This compression preserves general patterns while losing specific details including corrections made during troubleshooting.

After auto-compaction, agents frequently revert to behaviors that were corrected earlier in the session. The summary captured that a problem existed and was resolved, but not the specific fix that should be applied.

Diagnosis technique:

When the agent forgets established decisions, check the context meter. If utilization exceeds 70%, memory failure is likely. Test by explicitly restating the forgotten information if the agent immediately incorporates it, the issue is retrieval rather than comprehension.

Context confusion

Context confusion occurs when the agent misinterprets relationships between pieces of information, applies context from one domain to another, or conflates similar-looking patterns that serve different purposes.

Symptoms:

The agent uses patterns from the wrong part of the codebase
Generated code follows conventions from a different project that was discussed earlier
The agent treats deprecated code as authoritative
Suggestions contradict the architectural boundaries documented in project context

Root causes:

Codebases contain repetitive structures. Two functions may look syntactically identical but handle different edge cases. A pattern used safely in one module may violate constraints in another. Agents detect surface-level similarity but miss contextual distinctions that experienced developers internalize.

Multi-project conversations compound this problem. Discussing Project A before switching to Project B leaves residual context. The agent may apply Project A's conventions to Project B, generating code that compiles but violates architectural intent.

Diagnosis technique:

When output seems technically correct but architecturally wrong, ask the agent to explain its reasoning. Confusion reveals itself in the explanation the agent will reference patterns or decisions that belong to a different context. This exposes which information is polluting the current task.

Brittleness

Brittleness describes agent behavior that works correctly under expected conditions but fails unpredictably when conditions vary slightly. The agent demonstrates competence on straightforward cases while breaking on edge cases or environmental variations.

Symptoms:

Code works for the happy path but fails on edge cases
Solutions assume a specific environment and break elsewhere
The agent attempts Linux commands on Windows or vice versa
Multi-step workflows succeed on steps 1-7 but fail catastrophically on step 8

Root causes:

Each tool call in an agentic workflow carries independent failure probability. If each step has 90% reliability, a ten-step workflow drops to approximately 35% end-to-end reliability. This explains why simple tasks succeed while complex tasks fail disproportionately.

Environment assumptions create additional brittleness. Agents infer environment details from context clues rather than explicit detection. If context suggests one operating system but the actual environment differs, generated commands may fail silently or destructively.

Research on professional developer workflows found that experienced practitioners keep tasks small and contexts short specifically to avoid brittleness. They focus on individual steps rather than end-to-end automation.

Diagnosis technique:

Brittleness manifests at boundaries the transition from expected to unexpected conditions. When a workflow fails partway through, examine what changed between the last successful step and the failure. Environment differences, missing dependencies, or assumption violations typically explain the breakdown.

Confabulation

Confabulation sometimes called hallucination occurs when agents generate plausible-sounding but incorrect information. In code generation, this produces APIs that do not exist, function signatures that differ from actual implementations, and file paths that lead nowhere.

Symptoms:

Generated code imports packages or modules that do not exist
API calls use parameters or methods not present in the actual library
The agent references documentation that cannot be found
File paths or class names are plausible but incorrect

Root causes:

Confabulation emerges when the agent lacks authoritative information but must produce a response. Language models are trained to generate plausible continuations, not to say "I don't know." When relevant documentation is absent from context, the model generates what seems likely based on patterns in its training data.

Package confabulation has become a measurable security concern. Studies show approximately 20% of AI-generated package names are nonexistent. Attackers exploit this by publishing malicious packages with commonly hallucinated names a technique called "slopsquatting."

Confabulated code often compiles successfully. Syntax is correct; logic is plausible. The errors hide until runtime, when calls to nonexistent APIs fail. This makes confabulation among the most dangerous failure modes.

Diagnosis technique:

Verify unfamiliar APIs, packages, and file paths before accepting generated code. When the agent references something not in the current codebase, confirm it exists externally. Confabulation detection requires active verification; the agent cannot self-diagnose its own inventions.

Quality degradation

Quality degradation describes the progressive decline in output quality as sessions extend. Unlike the other failure modes, which produce discrete errors, quality degradation is gradual responses become less precise, less relevant, and less useful over time.

Symptoms:

Early responses are detailed and accurate; later responses become vague
The agent starts "playing it safe" with generic suggestions
Code quality visibly declines between the start and end of a session
Responses begin to cycle through the same suggestions without progress

Root causes:

Multiple factors compound as sessions lengthen. Context fills with exploratory tangents and troubleshooting sequences. The relevant signal becomes diluted in accumulated noise. Each turn adds to the cognitive load the model must process.

Research confirms the pattern: multi-turn conversation performance drops an average of 39% compared to single-turn interactions. Code generation tasks show particular vulnerability because they require sustained attention to detail across multiple exchanges.

The phenomenon accelerates itself. Lower-quality responses lead to more correction attempts, which add more context, which further degrades quality. Without intervention, the spiral continues until the session becomes unproductive.

Diagnosis technique:

Compare current output quality to earlier responses in the same session. If a noticeable decline is evident, quality degradation is occurring. The remedy is not better prompts it is context intervention through compaction or reset.

Recognizing failure in practice

These five failure modes rarely appear in isolation. A long session might begin with quality degradation, which leads to corrections that cause confusion, which prompts the agent to confabulate solutions, which creates brittleness in the generated code. Understanding the individual patterns enables identifying which failure initiated the cascade.

The diagnostic sequence

Check context utilization. If above 70%, memory failure and quality degradation are probable contributors.
Review recent exchanges. If the agent is contradicting itself or reverting to corrected behaviors, memory failure is primary.
Examine the agent's reasoning. Ask it to explain its approach. Confusion reveals itself through misattributed context.
Verify external references. Any API, package, or file path the agent mentions that you don't recognize warrants verification.
Compare to session start. If early responses were better, quality degradation is occurring regardless of other factors.

Warning signs that demand intervention

Certain symptoms indicate context failure severe enough to require immediate action:

Symptom	Likely Failure Mode	Immediate Action
Agent repeats the same mistake after correction	Memory failure	Restate correction; consider reset
Agent uses patterns from wrong project/module	Confusion	Clear session; reload only relevant context
Workflow fails on step N after steps 1-(N-1) succeeded	Brittleness	Debug step N in isolation
Agent references APIs you cannot find	Confabulation	Verify before using; provide authoritative docs
Responses become noticeably vaguer over time	Quality degradation	Compact or reset session

The reset decision

The most important diagnostic skill is recognizing when intervention cannot salvage a session.

Consider reset when:

Multiple failure modes are compounding simultaneously
Three or more correction attempts have not resolved the issue
Context utilization exceeds 85%
The agent has entered a repetitive loop

Preserve before resetting:

Before clearing context, extract valuable state. Ask the agent to summarize decisions made, code produced, and outstanding tasks. This summary can seed the fresh session with essential context while discarding accumulated noise.

The "thread fold" technique: before reset, prompt the agent with "Another engineer will continue this work tomorrow. Write a summary including all decisions, completed work, and remaining tasks." This captures essential context in compact form.

From diagnosis to prevention

Recognizing failure patterns is the first step. The subsequent pages in this module address prevention: project context that reduces confusion, conversation patterns that avoid degradation, and prompt structures that minimize confabulation.

The goal is not to eliminate all context failures some are inevitable given the constraints of language models. The goal is to recognize failures quickly, diagnose them accurately, and intervene appropriately. Developers who master this diagnostic skill spend less time fighting context problems and more time directing productive work.

Context failures are not model failures. They are engineering problems with engineering solutions. The remaining pages explore those solutions in detail.

Common Context Failures and Their Symptoms

On this page