Context Windows in Practice

Each file read, each search result returned, each directory listed all consume tokens from a finite budget called the context window. Understanding how context windows work in practice is essential for effective ASD.

The 200k token reality

Tool	Context Window	Notes
Claude Code	~200,000 tokens	Enterprise: 500k; Beta: 1M
Codex	~192,000 tokens

200,000 tokens represents approximately 150,000 words or 400-500 pages of prose. The scale invites a reasonable assumption: larger context windows should enable better understanding of complex codebases.

This assumption is incorrect. Context windows don't function like computer memory. They function more like human working memory: effective up to a point, then subject to interference, confusion, and degradation.

The lost-in-the-middle effect

Research from Stanford and Meta ("Lost in the Middle: How Language Models Use Long Contexts," 2024) revealed a distinctive U-shaped performance curve:

Primacy bias Models perform best when relevant information appears at the beginning of context.
Recency bias Models also perform well with information at the end of context.
Middle degradation Performance degrades 15-20% when critical information is buried in the middle.

In tests, GPT-3.5-Turbo's accuracy with information placed mid-context dropped below its closed-book performance. The model performed worse with the documents than without them.

This phenomenon stems from transformer attention mechanisms. Rotary Position Embedding (RoPE) introduces decay effects that cause models to prioritize tokens at sequence boundaries.

Why bigger windows ≠ better performance

Signal dilution More files means more noise. The critical function signature becomes one needle among thousands of lines.
Quadratic attention costs Self-attention computation grows with the square of token count, creating practical ceilings.
Working memory bottlenecks The model can hold tokens but cannot effectively reason across them beyond a certain point.

Stay within 80% of the practical limit. For a 200k window, treat ~160k as the ceiling optimal performance often occurs well below that.

When context windows fill

Claude Code manages context automatically through auto-compaction. When utilization reaches ~75-95%, the system:

Analyzes the conversation
Identifies key information worth preserving
Creates a summary of previous interactions
Replaces old messages with that summary

Manual context management

/compact preserve the authentication patterns we established
/compact summarize only the API endpoints we discussed
/status   # Check current context usage
/clear    # Complete reset for unrelated tasks

Auto-compaction can lose specific details even while preserving general patterns. Corrections made earlier may not survive, causing the agent to repeat mistakes.

Practical context management strategies

Front-load critical info. CLAUDE.md and AGENTS.md load first, occupying a privileged position. The middle of a long conversation is the worst place for important guidance.

Chunk, don't dump. Loading 10,000 lines when one function is relevant wastes capacity. Read specific sections with line offsets.

Reset at boundaries. Context from debugging auth issues becomes noise when implementing payments. Reset between unrelated tasks.

Use sub-agents. The Explore agent conducts investigations in isolated context, returning only distilled findings.

The paradox of long contexts

Enterprise codebases routinely exceed one million tokens. A monorepo with thousands of files cannot fit in any current context window.

The solution isn't waiting for larger windows. The solution is treating context as a scarce resource requiring deliberate engineering: selecting the minimal set of high-signal tokens that maximize the likelihood of the desired outcome.

This discipline context engineering represents a fundamental skill in ASD. A well-managed 50,000 token context often outperforms a carelessly assembled 200,000 token context.