Applied Intelligence
Module 4: Advanced Context Management

Tool-Specific Context Handling

Different philosophies, different mechanisms

Every AI coding agent faces the same constraint: finite context windows. Each tool solves this differently. Context management strategies that work in one tool often fail in another.

This page compares how Claude Code, Codex, Cursor, GitHub Copilot, and Aider handle context their window sizes, compaction approaches, and the trade-offs baked into each design.

Claude Code: summarization-based compaction

Claude Code provides the largest context windows among CLI-based coding agents.

TierContext WindowAvailability
Standard200,000 tokensAll users
Enterprise500,000 tokensClaude.ai Enterprise
Beta1,000,000 tokensUsage tier 4 organizations with beta header

The 1M token window became available in August 2025 for Claude Sonnet 4 and 4.5, requiring the beta header context-1m-2025-08-07 and premium pricing for requests exceeding 200,000 tokens.

Compaction mechanics

Claude Code uses general-purpose summarization for compaction. The model does not receive special training for this task it applies standard summarization capabilities to conversation history.

Manual compaction via /compact accepts focus instructions:

/compact Focus on the authentication implementation decisions

Auto-compaction triggers at configurable thresholds. The documented default is 95% capacity, though the VS Code extension triggers earlier (75-78%) to reserve headroom for the compaction process itself.

Configuration options:

# Override auto-compact threshold
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=70

# Disable auto-compaction (not recommended)
# In settings.json:
{ "autoCompact": false }

Context awareness

Claude Sonnet 4.5 and Haiku 4.5 receive budget information directly. At conversation start, a budget tag appears:

<budget:token_budget>200000</budget:token_budget>

After each tool call, a usage warning updates:

<system_warning>Token usage: 35000/200000; 165000 remaining</system_warning>

This visibility enables smarter agent decisions about when to compress or split tasks.

What distinguishes Claude Code

The combination of large windows and configurable compaction gives developers significant control. However, general-purpose summarization means compression quality depends heavily on how clearly the conversation structured information. Explicit decision markers and CLAUDE.md documentation survive better than implicit reasoning.

Codex: native compaction training

Codex takes a different approach: the model itself learns how to compress.

SpecificationValue
Input context272,000 tokens
Output tokens128,000 tokens
Total budget400,000 tokens

Handoff summaries

Codex models receive native training for compaction they learn to write "handoff summaries" specifically designed for their future self. This differs from Claude Code's general summarization.

When compaction triggers, Codex:

  1. Analyzes the full conversation with a dedicated summarization prompt
  2. Generates a structured summary emphasizing current progress, key decisions, constraints, user preferences, and remaining tasks
  3. Reconstructs the session with initial context, recent messages (up to 20,000 tokens), and the generated summary

The summary includes context explaining that "another language model started to solve this problem" and to "build on the work already done." This framing helps the model understand its position in a continued workflow.

Configuration

Codex provides granular control through config.toml:

# Token threshold triggering automatic compaction
model_auto_compact_token_limit = 220000

# Custom compaction prompt
compact_prompt = "Focus on preserving API contracts and test requirements"

# Or load from external file
experimental_compact_prompt_file = "~/.codex/compact-prompt.md"

# Tool output token limit (per individual tool result)
tool_output_token_limit = 10000

The model_auto_compact_token_limit defaults to 180,000-244,000 tokens depending on the model variant.

What distinguishes Codex

Native compaction training produces more actionable summaries the model understands what its future self needs to continue working. Codex can operate for 24+ hours on complex tasks, compacting multiple times while maintaining coherence.

The trade-off is window size. Codex's 272,000 input tokens is smaller than Claude Code's maximum. For single-session work within 200,000 tokens, Claude Code offers more raw capacity. For extended multi-session work, Codex's trained compaction preserves context better across resets.

Cursor: dynamic context discovery

Cursor sidesteps the compression problem: instead of compressing existing context, it minimizes what enters context in the first place.

The retrieval architecture

Cursor indexes entire projects into a vector store using embeddings that emphasize comments and docstrings. The system uses Turbopuffer, a specialized search engine for high-dimensional vector data.

When the agent needs information:

  1. Vector search identifies candidate code snippets
  2. An AI model re-ranks results by relevance
  3. Only the most relevant portions enter the active context

This Retrieval-Augmented Generation (RAG) approach means the agent never needs the full codebase in context simultaneously.

Dynamic context discovery

Cursor's January 2026 update introduced dynamic context discovery:

  • Large outputs (shell commands, tool results) write to files instead of consuming context
  • Full history saves to files, with minimal summaries in active context
  • MCP tool definitions load on demand agents receive only tool names initially

The result: 46.9% reduction in total agent tokens while maintaining access to full information.

Parallel agents

Cursor supports running up to 8 agents in parallel on a single prompt. Each agent operates in an isolated copy of the codebase via git worktrees. After all agents finish, Cursor evaluates the runs and recommends the best solution.

Per-workspace limits: 20 worktrees maximum, with automatic cleanup based on last access time (default: 6 hours).

What distinguishes Cursor

Cursor shines with large codebases where raw context capacity cannot hold everything relevant. The trade-off: retrieval quality becomes load-bearing. If the vector search misses relevant code, the agent works with incomplete information and has no way to know what it missed.

For codebases with good naming, comments, and structure, retrieval works well. For legacy code with poor documentation, retrieval misses important connections that a full-context approach would catch.

GitHub Copilot: memory-enhanced compaction

GitHub Copilot combines automatic compaction with a cross-agent memory system.

Compaction behavior

Copilot CLI triggers automatic compaction at 95% of the token limit. Compaction runs in the background without blocking the conversation. A warning appears when less than 20% of the limit remains.

The summarization uses SimpleSummarizedHistory a text-based summary that preserves earlier exchanges while freeing token space. This enables "infinite sessions" through compaction checkpoints.

Manual controls:

/compact    # Manual compression
/context    # Visual breakdown of current token usage
/usage      # Session statistics including per-model token usage

The memory system

Copilot's January 2026 update introduced agentic memory a cross-agent system where agents create and share memories about repositories.

When an agent discovers actionable insights, it invokes memory creation as a tool call:

Subject: Logging conventions
Fact: All API handlers use structured logging with request_id correlation
Citations: src/handlers/api.ts:45, src/middleware/logging.ts:12
Reason: Consistent logging enables distributed tracing in production

Memories are repository-scoped and validated at retrieval time the system checks cited code locations for accuracy and detects contradictions with current code.

Cross-agent sharing

Multiple Copilot agents access the same memory pool:

  • Code Review discovers patterns (logging conventions, synchronization requirements)
  • Coding Agent retrieves and applies patterns to implementations
  • CLI uses learned formats for debugging

GitHub's A/B testing showed 7% increase in PR merge rates (90% vs 83%) and 3% precision increase for agents using memories.

What distinguishes Copilot

The memory system creates persistent context that survives across sessions and agents. One developer's agent learns from another's discoveries the memory pool is shared across the team.

The trade-off: memories require creation and maintenance. The system relies on agents recognizing what should be remembered, which does not always happen. Memory validation at retrieval time adds latency.

Aider: the repo map approach

Aider takes the minimalist path: no automatic compaction, no token limit enforcement, manual context control throughout.

The repo map

Instead of including full files, Aider generates a concise map of the entire git repository:

  • File list with key symbols defined in each file
  • Critical lines of code for each definition
  • Function, class, and variable signatures without implementations

The map uses tree-sitter to parse source code into Abstract Syntax Trees, identifying where definitions occur. A PageRank-style algorithm on the dependency graph ranks symbols by how frequently they are referenced across the codebase.

Dynamic sizing

The repo map adjusts based on chat state:

aider --map-tokens 2000  # Set token budget for repo map

When no files are explicitly added to the chat, the map expands significantly Aider needs to understand the entire repo to identify relevant files. When files are added, the map shrinks to stay within budget.

Default: approximately 1,000 tokens for the repo map.

Manual context control

Aider relies on explicit user management:

/add src/auth.py        # Add file for editing
/read src/constants.py  # Add file as read-only context
/drop src/auth.py       # Remove file from session
/clear                  # Remove conversation history

The .aiderignore file (following .gitignore syntax) excludes irrelevant repository parts. The --subtree-only switch restricts operations to a subdirectory.

What distinguishes Aider

Maximum control at the cost of manual management. The repo map efficiently represents codebase structure without consuming context on file contents.

The trade-off: no automatic compaction means you manage context. Above approximately 25,000 tokens, Aider warns that models become distracted and less likely to follow system prompts. Add only files that need editing, use /read for context files.

Choosing based on workflow

WorkflowBest FitRationale
Single complex sessionClaude CodeLargest context window, configurable compaction
Multi-day tasksCodexNative compaction training preserves continuity
Large codebase explorationCursorRAG-based retrieval scales beyond context limits
Team collaborationCopilotCross-agent memory sharing
Precise manual controlAiderNo automatic behavior, explicit management

Patterns that work in one tool do not transfer directly. A Claude Code workflow relying on large context windows fails in Aider. A Cursor workflow depending on retrieval quality struggles with poorly documented legacy code. A Copilot workflow expecting memory persistence loses that advantage in Claude Code.

The convergence trend

The tools are converging on similar conclusions despite different starting points:

  • Raw context window size matters less than effective context management
  • Summarization quality depends on explicit structure in the original content
  • Persistent storage (files, memories) compensates for volatile context
  • Retrieval mechanisms enable working beyond context limits

The next page examines persistent context with files a technique that works across all these tools.

On this page