Tool-Specific Context Handling
Different philosophies, different mechanisms
Every AI coding agent faces the same constraint: finite context windows. Each tool solves this differently. Context management strategies that work in one tool often fail in another.
This page compares how Claude Code, Codex, Cursor, GitHub Copilot, and Aider handle context their window sizes, compaction approaches, and the trade-offs baked into each design.
Claude Code: summarization-based compaction
Claude Code provides the largest context windows among CLI-based coding agents.
| Tier | Context Window | Availability |
|---|---|---|
| Standard | 200,000 tokens | All users |
| Enterprise | 500,000 tokens | Claude.ai Enterprise |
| Beta | 1,000,000 tokens | Usage tier 4 organizations with beta header |
The 1M token window became available in August 2025 for Claude Sonnet 4 and 4.5, requiring the beta header context-1m-2025-08-07 and premium pricing for requests exceeding 200,000 tokens.
Compaction mechanics
Claude Code uses general-purpose summarization for compaction. The model does not receive special training for this task it applies standard summarization capabilities to conversation history.
Manual compaction via /compact accepts focus instructions:
/compact Focus on the authentication implementation decisionsAuto-compaction triggers at configurable thresholds. The documented default is 95% capacity, though the VS Code extension triggers earlier (75-78%) to reserve headroom for the compaction process itself.
Configuration options:
# Override auto-compact threshold
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=70
# Disable auto-compaction (not recommended)
# In settings.json:
{ "autoCompact": false }Context awareness
Claude Sonnet 4.5 and Haiku 4.5 receive budget information directly. At conversation start, a budget tag appears:
<budget:token_budget>200000</budget:token_budget>After each tool call, a usage warning updates:
<system_warning>Token usage: 35000/200000; 165000 remaining</system_warning>This visibility enables smarter agent decisions about when to compress or split tasks.
What distinguishes Claude Code
The combination of large windows and configurable compaction gives developers significant control. However, general-purpose summarization means compression quality depends heavily on how clearly the conversation structured information. Explicit decision markers and CLAUDE.md documentation survive better than implicit reasoning.
Codex: native compaction training
Codex takes a different approach: the model itself learns how to compress.
| Specification | Value |
|---|---|
| Input context | 272,000 tokens |
| Output tokens | 128,000 tokens |
| Total budget | 400,000 tokens |
Handoff summaries
Codex models receive native training for compaction they learn to write "handoff summaries" specifically designed for their future self. This differs from Claude Code's general summarization.
When compaction triggers, Codex:
- Analyzes the full conversation with a dedicated summarization prompt
- Generates a structured summary emphasizing current progress, key decisions, constraints, user preferences, and remaining tasks
- Reconstructs the session with initial context, recent messages (up to 20,000 tokens), and the generated summary
The summary includes context explaining that "another language model started to solve this problem" and to "build on the work already done." This framing helps the model understand its position in a continued workflow.
Configuration
Codex provides granular control through config.toml:
# Token threshold triggering automatic compaction
model_auto_compact_token_limit = 220000
# Custom compaction prompt
compact_prompt = "Focus on preserving API contracts and test requirements"
# Or load from external file
experimental_compact_prompt_file = "~/.codex/compact-prompt.md"
# Tool output token limit (per individual tool result)
tool_output_token_limit = 10000The model_auto_compact_token_limit defaults to 180,000-244,000 tokens depending on the model variant.
What distinguishes Codex
Native compaction training produces more actionable summaries the model understands what its future self needs to continue working. Codex can operate for 24+ hours on complex tasks, compacting multiple times while maintaining coherence.
The trade-off is window size. Codex's 272,000 input tokens is smaller than Claude Code's maximum. For single-session work within 200,000 tokens, Claude Code offers more raw capacity. For extended multi-session work, Codex's trained compaction preserves context better across resets.
Cursor: dynamic context discovery
Cursor sidesteps the compression problem: instead of compressing existing context, it minimizes what enters context in the first place.
The retrieval architecture
Cursor indexes entire projects into a vector store using embeddings that emphasize comments and docstrings. The system uses Turbopuffer, a specialized search engine for high-dimensional vector data.
When the agent needs information:
- Vector search identifies candidate code snippets
- An AI model re-ranks results by relevance
- Only the most relevant portions enter the active context
This Retrieval-Augmented Generation (RAG) approach means the agent never needs the full codebase in context simultaneously.
Dynamic context discovery
Cursor's January 2026 update introduced dynamic context discovery:
- Large outputs (shell commands, tool results) write to files instead of consuming context
- Full history saves to files, with minimal summaries in active context
- MCP tool definitions load on demand agents receive only tool names initially
The result: 46.9% reduction in total agent tokens while maintaining access to full information.
Parallel agents
Cursor supports running up to 8 agents in parallel on a single prompt. Each agent operates in an isolated copy of the codebase via git worktrees. After all agents finish, Cursor evaluates the runs and recommends the best solution.
Per-workspace limits: 20 worktrees maximum, with automatic cleanup based on last access time (default: 6 hours).
What distinguishes Cursor
Cursor shines with large codebases where raw context capacity cannot hold everything relevant. The trade-off: retrieval quality becomes load-bearing. If the vector search misses relevant code, the agent works with incomplete information and has no way to know what it missed.
For codebases with good naming, comments, and structure, retrieval works well. For legacy code with poor documentation, retrieval misses important connections that a full-context approach would catch.
GitHub Copilot: memory-enhanced compaction
GitHub Copilot combines automatic compaction with a cross-agent memory system.
Compaction behavior
Copilot CLI triggers automatic compaction at 95% of the token limit. Compaction runs in the background without blocking the conversation. A warning appears when less than 20% of the limit remains.
The summarization uses SimpleSummarizedHistory a text-based summary that preserves earlier exchanges while freeing token space.
This enables "infinite sessions" through compaction checkpoints.
Manual controls:
/compact # Manual compression
/context # Visual breakdown of current token usage
/usage # Session statistics including per-model token usageThe memory system
Copilot's January 2026 update introduced agentic memory a cross-agent system where agents create and share memories about repositories.
When an agent discovers actionable insights, it invokes memory creation as a tool call:
Subject: Logging conventions
Fact: All API handlers use structured logging with request_id correlation
Citations: src/handlers/api.ts:45, src/middleware/logging.ts:12
Reason: Consistent logging enables distributed tracing in productionMemories are repository-scoped and validated at retrieval time the system checks cited code locations for accuracy and detects contradictions with current code.
Cross-agent sharing
Multiple Copilot agents access the same memory pool:
- Code Review discovers patterns (logging conventions, synchronization requirements)
- Coding Agent retrieves and applies patterns to implementations
- CLI uses learned formats for debugging
GitHub's A/B testing showed 7% increase in PR merge rates (90% vs 83%) and 3% precision increase for agents using memories.
What distinguishes Copilot
The memory system creates persistent context that survives across sessions and agents. One developer's agent learns from another's discoveries the memory pool is shared across the team.
The trade-off: memories require creation and maintenance. The system relies on agents recognizing what should be remembered, which does not always happen. Memory validation at retrieval time adds latency.
Aider: the repo map approach
Aider takes the minimalist path: no automatic compaction, no token limit enforcement, manual context control throughout.
The repo map
Instead of including full files, Aider generates a concise map of the entire git repository:
- File list with key symbols defined in each file
- Critical lines of code for each definition
- Function, class, and variable signatures without implementations
The map uses tree-sitter to parse source code into Abstract Syntax Trees, identifying where definitions occur. A PageRank-style algorithm on the dependency graph ranks symbols by how frequently they are referenced across the codebase.
Dynamic sizing
The repo map adjusts based on chat state:
aider --map-tokens 2000 # Set token budget for repo mapWhen no files are explicitly added to the chat, the map expands significantly Aider needs to understand the entire repo to identify relevant files. When files are added, the map shrinks to stay within budget.
Default: approximately 1,000 tokens for the repo map.
Manual context control
Aider relies on explicit user management:
/add src/auth.py # Add file for editing
/read src/constants.py # Add file as read-only context
/drop src/auth.py # Remove file from session
/clear # Remove conversation historyThe .aiderignore file (following .gitignore syntax) excludes irrelevant repository parts.
The --subtree-only switch restricts operations to a subdirectory.
What distinguishes Aider
Maximum control at the cost of manual management. The repo map efficiently represents codebase structure without consuming context on file contents.
The trade-off: no automatic compaction means you manage context.
Above approximately 25,000 tokens, Aider warns that models become distracted and less likely to follow system prompts.
Add only files that need editing, use /read for context files.
Choosing based on workflow
| Workflow | Best Fit | Rationale |
|---|---|---|
| Single complex session | Claude Code | Largest context window, configurable compaction |
| Multi-day tasks | Codex | Native compaction training preserves continuity |
| Large codebase exploration | Cursor | RAG-based retrieval scales beyond context limits |
| Team collaboration | Copilot | Cross-agent memory sharing |
| Precise manual control | Aider | No automatic behavior, explicit management |
Patterns that work in one tool do not transfer directly. A Claude Code workflow relying on large context windows fails in Aider. A Cursor workflow depending on retrieval quality struggles with poorly documented legacy code. A Copilot workflow expecting memory persistence loses that advantage in Claude Code.
The convergence trend
The tools are converging on similar conclusions despite different starting points:
- Raw context window size matters less than effective context management
- Summarization quality depends on explicit structure in the original content
- Persistent storage (files, memories) compensates for volatile context
- Retrieval mechanisms enable working beyond context limits
The next page examines persistent context with files a technique that works across all these tools.