Warning Signs of Derailed Sessions

Early behavioral indicators

The previous section introduced the fix loop and its three-attempt threshold. But derailed sessions announce themselves through subtler signals that precede obvious failure. Catching these early prevents wasted hours.

Response latency increases. Sessions start with quick responses. As context accumulates with failed attempts and irrelevant information, responses slow. When a task that got instant responses now takes thirty seconds, the session has degraded. The agent processes more noise while producing less signal.

File re-reading without progress. Healthy sessions read a file once, understand it, and move forward. Derailed sessions read the same file repeatedly. The agent examines auth/validate.js, proposes a change, encounters an error, then reads auth/validate.js again as if seeing it fresh. This pattern means the agent has lost track of what it learned.

Questions about previously discussed topics. The agent asks about constraints established at session start. "What framework are you using?" when the framework was specified three turns ago. "Should I maintain backward compatibility?" after that requirement was explicitly stated. The agent's effective memory has shortened below the conversation length.

Compaction warnings becoming frequent. Claude Code displays "Compacting conversation" when context approaches limits. One compaction is normal in long sessions. Multiple compactions signal context churn. Each compaction loses information. After several, the agent may have forgotten the original problem while retaining noise from failed attempts.

Conversational deterioration patterns

Beyond latency and memory, the conversation itself shows diagnostic patterns.

Circular explanations. The agent describes its reasoning using the same phrases turn after turn. "The issue is with the token validation" appears identically in three consecutive responses. The explanation doesn't deepen or evolve. The agent has latched onto a pattern without genuine understanding.

Contradictory reasoning within turns. Single responses that argue against themselves signal confusion. "The null check is necessary to prevent crashes. However, removing the null check allows the function to fail fast. The best approach is to add the null check." That whiplash reasoning, recommending addition after noting the value of removal, means the agent is pattern-matching rather than thinking.

Confidence without specificity. Healthy responses cite specific lines, functions, and error messages. Derailed responses become vague while remaining confident. "The problem is in the authentication logic" without identifying which file, which function, or which line. "This should fix the issue" without explaining why. Confidence has detached from concrete analysis.

Scope creep in responses. The agent's responses expand to include tangentially related topics. A question about a failing test triggers suggestions about refactoring the entire test suite, improving the CI pipeline, and restructuring the module hierarchy. This expansion means the agent has lost focus on the immediate problem.

Code-level warning signs

The code itself shows patterns of session deterioration.

Unused constructs appearing. Agent-generated code starts including variables that are declared but never used. Functions get defined but never called. Imports appear at the top of files for modules never referenced below. Research on agent-generated code confirms this: unused constructs and hardcoded debugging statements appear at elevated rates compared to human-written code.

# Warning signs in generated code
def process_data(items):
    result = []
    temp_cache = {}      # Declared, never used
    debug_mode = True    # Hardcoded debugging flag
    backup = items[:]    # Created, never referenced

    for item in items:
        result.append(transform(item))

    return result

These unused elements mean the agent is generating plausible-looking code rather than purposeful code.

Redundant operations. The agent inserts operations that accomplish nothing. Assigning a variable to itself. Checking a condition that is always true. Calling a function whose result is discarded. Each redundancy is cognitive noise that leaked into the output.

Missing boundary checks that were previously present. Earlier iterations included null checks, array bounds validation, and error handling. Later iterations omit them. The agent has forgotten constraints it once understood. This regression is particularly dangerous because the checks existed, creating a false sense that the code handles edge cases.

Inconsistent naming patterns. Variables start following one convention (userName, userEmail) then switch to another (account_status, login_count) within the same function. The agent has lost track of the codebase's style. This inconsistency often correlates with logical inconsistency.

Repetitive structure with minor variations. The agent generates the same code block multiple times with small differences. Three nearly identical try-catch blocks in sequence. Four functions that differ only in a single parameter. This repetition means the agent is pattern-completing rather than problem-solving.

The context exhaustion cascade

Sessions don't degrade linearly. They cascade once certain thresholds are crossed.

The cascade proceeds through stages: responses slow, then file re-reading begins, then contradictions appear, then the code itself deteriorates. By the time garbage code appears, the session has been struggling for several turns.

Research quantifies the performance cliff. At 32,000 tokens of context, accuracy drops 20-60 percentage points compared to short contexts. Models don't announce this degradation. Response confidence remains high while accuracy collapses.

The practical threshold hits earlier than benchmarks suggest. Developer reports show performance drops after 50-60% context consumption, not at 100%. Sessions lasting 8-12 exchanges on complex tasks often show degradation symptoms.

Tool-specific signals

Different tools fail in characteristic ways.

Claude Code announces context pressure through compaction messages and slowing responses. Sessions that freeze for extended periods, showing zero tokens consumed in the UI, have hit a processing wall. Recovery requires restarting the session entirely.

Codex shows degradation through error messages about context limits and stream disconnections before task completion. The message "Stream disconnected before completion" with significant context remaining means the session has entered a failure state.

Cursor users report the agent executing the same terminal command repeatedly with no recognition that it already ran. The infinite loop becomes literal: the same npm test command runs three, four, five times with identical failing output.

Each tool's failure mode reflects its architecture, but the underlying cause is consistent: the agent has lost coherent access to the information it needs.

The accumulation problem

Failed attempts don't just fail. They pollute the context for subsequent attempts.

Each error message adds tokens. Each stack trace consumes context. Each failed fix introduces information about what didn't work, crowding out information about the original problem. By the tenth attempt, the agent may have more context about its failures than about the actual task.

The "lost in the middle" effect compounds this. Information in the middle of long contexts receives 15-20% less attention than information at the beginning or end. As error messages accumulate in the middle of the conversation, they simultaneously consume space and receive diminishing attention.

This accumulation explains why fresh sessions often resolve problems that polluted sessions could not. Starting over isn't just about clearing errors. It's about restoring the agent's ability to focus on what matters.

Intervention thresholds

Certain symptoms warrant immediate intervention rather than continued attempts.

Stop and reset when the same file has been read three or more times without progress, explanations from consecutive turns directly contradict each other, generated code includes unused constructs in areas previously clean, response latency has doubled or tripled from session start, or the agent asks about constraints established in the first few turns.

Consider narrowing scope when responses expand to include tangentially related suggestions, the agent proposes changes to files not mentioned in the request, or generated code touches more functions than the task requires.

Switch to diagnostic mode when three fix attempts have not resolved the issue, the agent's confidence language ("definitely," "certainly") increases while specificity decreases, or generated tests pass but the underlying functionality remains broken.

These thresholds aren't arbitrary. They emerge from the patterns documented across developer communities: sessions showing these symptoms rarely recover through continued iteration. The intervention that works is a new start with clean context.

Developing pattern recognition

Building intuition for session health requires conscious attention during early ASD practice.

During the first few sessions, monitor actively. Note when responses slow, when the agent re-reads files, when explanations start circling. This monitoring builds the pattern recognition that becomes automatic with experience.

Mature practitioners report catching derailed sessions within two turns of deterioration beginning. The subtle signals, a slight latency increase, a hint of repetition in explanations, trigger assessment before obvious failure occurs.

This early detection is one of the core skills of effective ASD: knowing when the tool is helping versus when it's consuming time without progress.

On this page