Context Compression Techniques

When context must shrink

Context does not just degrade over time it hits capacity. When that happens, something has to go.

Every AI coding agent compresses context somehow. Some trigger automatically. Others wait for manual intervention. Knowing what survives compression versus what disappears determines whether you lose an hour of work or keep rolling.

Auto-compaction mechanics

Claude Code documentation says auto-compaction triggers at 95% capacity. In practice, behavior varies by interface. The VS Code extension triggers closer to 75-78%, reserving headroom for the compaction process itself. The CLI operates nearer the documented 95%.

You can override the trigger threshold:

CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=50  # Triggers at 50% instead of default

Or in settings.json:

{
  "env": {
    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "50"
  }
}

Disabling auto-compaction entirely is possible but not recommended:

{
  "autoCompact": false
}

Codex works differently. Auto-compaction triggers based on model_auto_compact_token_limit, defaulting to 180,000-244,000 tokens depending on the model. The Codex model was trained specifically for this it writes a "handoff summary" for its future self, preserving enough to continue after context resets.

What happens during auto-compaction

When Claude Code auto-compacts:

Workflow pauses before the next API call
A summary request injects as a user message
Claude generates a summary wrapped in <summary></summary> tags
The entire conversation history clears, replaced with just the summary
Processing resumes with compressed context

This happens without asking. One moment the full conversation exists; the next, a condensed version replaces it. Since version 2.0.64, compaction executes instantly with no wait time.

Manual compression: `/compact` versus `/clear`

Two commands give direct control over context reduction.

`/clear`

The nuclear option. /clear removes all conversation history. Nothing survives except project-level context (CLAUDE.md, file system).

Use /clear when:

Switching to completely unrelated work
Context has been polluted with incorrect assumptions
A handoff summary provides enough context to start fresh

Recent updates (v2.1.3) ensure /clear properly resets plan state. Earlier versions left plan files intact after clearing, which caused confusion.

`/compact`

The surgical approach. /compact reduces conversation size while preserving what matters. Unlike /clear, it maintains continuity.

/compact                           # Default compaction
/compact Focus on authentication   # Prioritize auth-related context

Focus instructions influence what survives:

/compact preserve: 1) WebSocket decision for real-time, 2) API contract details

This does not guarantee preservation it guides the summarization process. Decisions with explicit preservation instructions survive more reliably than incidental context.

Choosing between them

Situation	Command	Rationale
Finishing task, starting new one	`/clear`	Clean slate prevents cross-task confusion
Long session, same task	`/compact`	Preserve relevant context, reduce noise
Context poisoned by errors	`/clear`	Remove incorrect assumptions entirely
Complex multi-phase work	`/compact` with focus	Retain decisions while reducing detail
Session approaching limit mid-task	`/compact`	Continue work without starting over

The 70% rule from earlier applies: proactive /compact at 70% capacity beats automatic compaction at 95%. Manual intervention lets you direct what survives.

What survives compression

Compaction summarizes rather than deletes. Certain categories survive more reliably than others.

High survival probability

Recent exchanges survive nearly intact the last 2-3 turns typically make it through in raw or near-raw form.

Architecture decisions persist well. "We chose PostgreSQL over MongoDB for consistency guarantees" sticks around because it was stated explicitly with reasoning.

Current task objectives remain. Without them, the agent cannot continue.

Established patterns survive because they shape ongoing work. "All API endpoints follow REST conventions with snake_case naming" keeps influencing code generation.

Completion status persists. The fact that something finished matters more than how it finished.

Active errors survive because they need attention. Resolved errors often disappear.

Low survival probability

Intermediate reasoning compresses to conclusions. Step-by-step explanations for decisions become just the decisions the "why" often disappears.

Full file contents become summaries. When files were read into context, compression retains the fact that files were read, possibly with highlights, but not complete contents. File paths may or may not survive depending on the tool.

Exploratory discussions compress aggressively. Back-and-forth ideation that did not produce code changes rarely makes it.

Nuanced conditional rules become vague. "If X and Y but not Z, then W" becomes "conditional logic exists."

Implicit connections disappear. Relationships understood but never stated explicitly do not survive.

Debugging sessions become outcomes. The specific error messages, attempted fixes, and iteration steps vanish.

The compression trade-off

Anthropic's research on context editing provides concrete numbers. In a 100-turn web search evaluation, context editing enabled workflows that would otherwise fail from context overflow. Token consumption dropped 84% while task performance improved 39%.

But those gains cost something.

Consider a session where you established this constraint:

"When modifying authentication, never change the JWT signing algorithm because downstream services validate signatures with hardcoded expectations."

After compaction, this might become:

"Authentication modification constraints established."

The constraint technically survives. The specific rationale and downstream implications do not. A future prompt asking "can I switch from HS256 to RS256?" may get a yes without the context about why that breaks everything.

Influencing compression outcomes

Several techniques improve the odds that critical information survives.

State constraints in CLAUDE.md

Project-level documentation survives all compaction because it loads fresh each interaction. Constraints critical enough to preserve should live in CLAUDE.md rather than relying on conversation memory.

# Authentication constraints

Never modify JWT signing algorithm (HS256).
Downstream services validate signatures with hardcoded algorithm expectations.
Breaking this constraint causes silent authentication failures in production.

Use explicit markers

During conversation, frame important decisions clearly:

DECISION: Using WebSocket for real-time updates.
RATIONALE: Polling created unacceptable latency for trading data.
CONSTRAINT: Must maintain backward compatibility with REST fallback.

Explicit structure signals importance. Unmarked statements compress harder than marked ones.

Compact proactively with focus

The 70% rule exists to preserve control over what survives. When compacting manually, you can specify preservation priorities:

/compact preserve the authentication constraint regarding JWT signing

Automatic compaction at 95% does not accept focus instructions. It compresses based on recency and implicit importance only.

Create handoff summaries before compaction

Before triggering /compact on a complex session, state what must survive:

"Before compacting: we decided on WebSocket for real-time, JWT auth is immutable, and the API contract at /v2/trades is frozen. All three must survive."

Stating requirements immediately before compaction places them in the high-attention end position and explicitly marks them for retention.

Tool-specific variations

Different tools handle compression differently.

Claude Code uses general-purpose summarization. The model does not receive special training for compaction it uses standard summarization capabilities. Compression quality depends on how clearly the conversation structured information.

Codex models receive native training for compaction. The model learns to write summaries specifically designed for future-self continuation. Handoff summaries from Codex preserve more actionable detail because the model understands what the summary is for.

Cursor takes a different approach: dynamic context discovery. Rather than compressing existing context, Cursor writes large outputs to files and retrieves them on demand. Full history saves to files, with only minimal summaries in active context. This reduced token usage by 46.9% while keeping access to full information.

Amp (from Sourcegraph) rejected compaction entirely. OpenAI research showed that recursive summaries degraded performance summaries of summaries distort earlier reasoning. Amp implemented handoffs instead: users create new conversation threads with explicit goals, preserving original context while enabling fresh focus.

The compression mindset

Long sessions compress. This is not a bug to avoid but a constraint to work within. The question is not whether information disappears, but which information.

Structuring sessions with compression in mind means:

Stating decisions explicitly when made
Moving critical constraints to project documentation
Compacting proactively rather than reactively
Accepting that some context will disappear

The next page compares how different tools handle context internally across Claude Code, Codex, Cursor, Copilot, and Aider.

On this page