Applied Intelligence
Module 3: Context Engineering

Conversation context mechanics

The ephemeral layer

Conversation context is the most dynamic and least permanent layer of the context hierarchy. Unlike project context that persists across sessions or prompt context that exists only for a single turn, conversation context occupies an unstable middle ground. It accumulates as dialogue progresses, faces compression and degradation, and ultimately disappears when the session ends.

Understanding these mechanics is essential because the 39% average performance drop in multi-turn conversations documented in research stems directly from how this layer behaves. The tools are not failing randomly. They are failing predictably, according to specific patterns that become manageable once understood.

What persists versus what fades

Conversation history in both Claude Code and Codex CLI is stored client-side. Claude Code maintains session data in JSONL files at ~/.claude/projects/[project-hash]/[session-id].jsonl. Codex stores sessions at ~/.codex/sessions/. Neither tool maintains persistent server-side conversation history all continuity depends on local storage and explicit resumption.

Within an active session, different types of information demonstrate different persistence characteristics:

High persistence:

  • Explicit decisions stated clearly ("Use async/await throughout this codebase")
  • Files that were read or modified (paths and actions are tracked)
  • Structural patterns established through repetition
  • Constraints defined in the first few exchanges

Medium persistence:

  • Exploratory discussions about approaches
  • Reasoning about why certain choices were made
  • Intermediate steps in problem-solving
  • Context gathered from file exploration

Low persistence:

  • Verbose explanations that don't affect future actions
  • Repetitive confirmations and acknowledgments
  • Detailed tool outputs that served their immediate purpose
  • Discussion of options that were rejected

This hierarchy matters because auto-compaction the system's response to context pressure preserves high-persistence information while aggressively dropping low-persistence content. When Claude Code triggers compaction (typically at 75-95% context utilization), it retains key decisions and current status while discarding detailed reasoning and full document contents.

The practical implication: information stated once in passing may not survive to influence later turns. Information established as a clear constraint in the opening exchange typically persists throughout the session.

The U-shaped attention curve

Position within the context window significantly affects how reliably information influences agent behavior. Research consistently demonstrates a U-shaped performance curve: agents process information at the beginning and end of context most accurately, while middle-positioned content shows 15-20% degraded recall.

This phenomenon often called "lost in the middle" has architectural roots. Rotary Position Embedding (RoPE), used by most modern language models, introduces inherent bias toward beginning and end tokens. MIT research in 2025 confirmed that causal masking and specific positional hidden states intensify this pattern independently of training data.

For multi-turn conversations, this creates a predictable problem. The beginning of context contains system instructions and early exchanges. The end contains the most recent turn. Everything between including corrections made mid-session, context gathered during exploration, and refined requirements occupies the vulnerable middle zone.

Practical mitigations:

The most reliable approach is structural: restate critical information at strategic moments rather than assuming earlier statements remain accessible. When providing important constraints mid-conversation, frame them as current requirements rather than references to previous discussion.

For complex tasks, consider the conversation as having a "beginning, middle, end" structure where key information appears in each section:

  • Early turns: establish core constraints and architecture decisions
  • Mid-session: explore and iterate, accepting that details here are most vulnerable
  • Before completion: restate critical requirements and validate against original intent

The /compact command in Claude Code provides an opportunity to influence what survives compression. Custom compaction instructions can prioritize specific information: /compact preserve the authentication requirements and API contract details.

Token economy and compression

Every conversation operates within a finite token budget. Claude Code provides approximately 200,000 tokens; Codex offers roughly 192,000. These numbers represent theoretical maximums practical limits are lower.

Research on context windows reveals that advertised limits rarely correspond to reliable performance. A 200k token context window typically becomes unreliable around 130k tokens, with degradation sudden rather than gradual. The 80% guideline established in earlier modules reflects this reality: plan for meaningful work to occur within 160k tokens, treating the remaining capacity as buffer rather than working space.

Auto-compaction mechanics:

When context utilization reaches threshold (75-95% depending on configuration), the system automatically summarizes conversation history. This process preserves:

  • What work was completed
  • Current state and next steps
  • User constraints and requirements
  • Critical context explicitly identified

This process loses:

  • Full text of documents that were read
  • Detailed tool outputs and intermediate results
  • Verbose reasoning chains
  • Assistant confirmations and acknowledgments

A known limitation: corrections made to agent behavior don't always survive compaction. If the agent was producing code with a specific error pattern and feedback corrected it, that correction may be lost after compression. The agent reverts to earlier patterns because the corrective feedback occupied low-persistence conversation content.

Manual compaction strategy:

Rather than waiting for auto-compaction at unpredictable moments, manual compaction at natural breakpoints provides more control. After completing a significant subtask, use /compact with specific instructions about what to preserve. This creates a deliberate checkpoint rather than system-determined summarization.

The distinction between /compact and /clear becomes important here. Compaction summarizes and continues useful when accumulated context has value but needs condensing. Clear wipes completely appropriate when starting a genuinely new task or when context has become poisoned with accumulated errors.

Client-side memory management

The conversation layer interacts with other memory mechanisms that provide stability across its ephemeral nature.

CLAUDE.md as persistent memory:

CLAUDE.md files are injected at every session start and survive all compaction and clear operations. This makes them the appropriate location for information that must remain stable regardless of conversation dynamics architectural decisions, coding standards, known issues, and project-specific constraints.

The hierarchical loading order (Enterprise → Global → Project → Local → Rules) means different persistence needs can be addressed at appropriate levels. Global CLAUDE.md handles personal preferences that apply everywhere. Project CLAUDE.md captures codebase-specific knowledge. Local overrides handle experimental or machine-specific variations.

Session resumption:

Both tools support session resumption from local storage:

  • Claude Code: claude --continue (most recent) or claude --resume abc123 (specific session)
  • Codex: codex resume <SESSION_ID>

Resumption loads full context including tool usage history and checkpoint data. However, resumed sessions still operate under the same context limits a resumed long session faces the same degradation risks as an active long session.

Handoff summaries:

When ending a session that will be continued later, requesting a handoff summary captures critical context in compact form. A structured handoff (goal, progress, decisions, files modified, failed approaches, open questions) typically requires 500-800 words compared to 10,000+ tokens to replay the full conversation.

The prompt pattern: "Summarize this session for handoff. Another engineer will continue this work without access to our conversation. Include what we accomplished, current state, key decisions, and next steps."

This summary can be stored in a file, added to CLAUDE.md, or used to initialize a new session converting ephemeral conversation context into persistent project context.

Operating within the mechanics

These mechanics are not obstacles to work around but constraints to work within. The 39% multi-turn degradation, the U-shaped attention curve, the lossy compression these are architectural realities of current language models.

Effective conversation management accepts these constraints:

  • State critical information at the beginning of sessions and restate before completion
  • Use manual compaction at natural breakpoints rather than waiting for system intervention
  • Treat mid-conversation context as provisional and verify important details before acting
  • Convert valuable discoveries from conversation to project context through CLAUDE.md updates
  • Reset proactively at 70-80% capacity rather than pushing toward theoretical limits

The conversation layer is where most interaction happens, but it is also where most context failures originate. Managing it deliberately rather than assuming it behaves like human memory distinguishes effective agentic development from the frustrating cycle of correction, repetition, and degradation that characterizes unmanaged sessions.

On this page