Summarization and Continuation Patterns

Beyond simple summaries

The previous page introduced the thread fold technique: asking an agent to summarize a session from the perspective of someone continuing the work. This framing produces better results than generic summary requests, but the real challenge lies deeper.

How do you compress thousands of tokens into hundreds of actionable tokens? What should survive compression? The patterns that follow transform session continuation from art into engineering.

The Manus hybrid approach

Manus, an agentic system serving millions of users, developed a compression strategy that balances detail with efficiency. The approach follows a priority hierarchy: raw content takes precedence over compacted content, which takes precedence over summarized content.

The implementation works as follows:

Keep recent turns raw. The last three turns remain in full, uncompressed format. This preserves the "rhythm" of the conversation the formatting patterns, the established terminology, the implicit agreements about how to communicate. Compressing recent turns loses this rhythm and causes the agent to shift its response style unexpectedly.

Summarize older turns. When context exceeds a threshold (approximately 128,000 tokens), the oldest 20 turns get summarized into a structured JSON format. The summary captures decisions made, files modified, and progress achieved, but discards the conversational back-and-forth that led there.

Use structured summaries, not narrative. The summary format matters. A narrative summary ("We discussed authentication and decided to use JWT...") requires the agent to parse prose and extract relevant facts. A structured summary provides direct access:

{
  "decisions": [
    "Authentication uses JWT with refresh tokens",
    "Token storage in httpOnly cookies, not localStorage"
  ],
  "files_modified": [
    "src/auth/jwt.ts",
    "src/middleware/auth.ts"
  ],
  "current_state": "Implementing refresh token rotation"
}

This hybrid approach reflects a fundamental insight: recent context needs precision, while older context needs only conclusions. The journey to a decision matters less than the decision itself.

Restorable compression

A second principle from production systems: compression should be reversible whenever possible.

Consider what happens when an agent reads a web page or a large file. The content enters context and begins consuming tokens. Hours later, that content sits in context occupying space but providing diminishing value. The obvious solution is deletion but deletion is permanent.

Restorable compression replaces content with references to content:

Original	Compressed
Full web page HTML (3,000 tokens)	URL only (20 tokens)
Complete file contents (500 tokens)	Path only (10 tokens)
Query results (200 tokens)	Query text (30 tokens)

The key insight: information remains recoverable without staying in context. If the agent needs that web page content again, it can fetch it. If the file contents become relevant, a re-read retrieves them. The reference preserves the ability to restore; the deletion recovers the space.

This approach requires distinguishing between:

Essential context: Information that cannot be reconstructed. Architectural decisions, user requirements, discovered constraints these must persist.

Recoverable context: Information that can be fetched again. File contents, API responses, search results, documentation these can be replaced with references.

Disposable context: Information that served its purpose and is no longer relevant. Failed debugging attempts, superseded approaches, exploratory tangents these can be removed entirely.

Placeholder compression

Silent deletion creates problems when the agent cannot find expected context, it may confabulate. Placeholder compression provides a middle path: remove the content but leave actionable guidance.

Tool result	Placeholder
File read: `[500 lines]`	`[Re-read if needed: /src/auth/jwt.ts]`
Query: `[results]`	`[Re-run for fresh data: SELECT * FROM users]`
Search: `[25 files]`	`[Re-search if needed: "auth middleware"]`

The placeholder acknowledges that information was present, provides recovery instructions, and preserves the signal that an operation occurred.

Structured handoff summaries

When context must transfer between sessions through a reset, a tool switch, or a handoff to another developer the structure of the summary determines how much value survives.

Unstructured summaries lose information:

"We worked on authentication today. Made good progress on JWT implementation. Some tests are still failing. The code is in the auth directory."

This summary requires the next session to ask follow-up questions, re-discover the architecture, and potentially repeat work that was already completed.

Structured summaries preserve actionable information:

## Session Handoff - Authentication Implementation

### Goal
Implement JWT-based authentication with refresh token rotation.

### Completed
- [x] JWT signing and verification (`/src/auth/jwt.ts`)
- [x] Login endpoint (`/src/routes/auth.ts:24-67`)

### In Progress
- [ ] Refresh token rotation on use

### Key Decisions
- RS256 algorithm for JWT signing
- Refresh tokens in Redis with 7-day TTL

### Next Steps
1. Fix Redis connection pool in tests
2. Implement token rotation

Explicit sections prevent information loss that occurs when models generate freeform summaries. Research from Factory.ai showed structured approaches score significantly higher on compression quality than narrative summaries.

The anchored iterative approach

A common failure mode: re-summarizing already summarized content. When compression triggers repeatedly, naive approaches re-summarize everything from scratch. Each iteration introduces information loss.

Anchored iterative summarization compresses only newly added content and merges it with existing summaries:

Turn 1-20:  [Summary A]
Turn 21-40: [Summary A] + summarize(Turn 21-40) → [Summary B]
Turn 41-60: [Summary B] + summarize(Turn 41-60) → [Summary C]

The anchor remains stable; only new content gets compressed and merged. This prevents recursive degradation where summaries of summaries accumulate. Research showed this approach scored 4.04/5.0 on accuracy compared to 3.43/5.0 for full regeneration.

What survives compression

Understanding what agents prioritize during compression helps predict what will be lost.

High survival probability:

Recent tool calls and their results (last 3-5 turns)
Explicit decisions stated clearly
Current file being edited
Error messages from failing tests

Low survival probability:

File paths mentioned in passing
Nuanced conditional rules
Information stated early in long sessions
Implicit connections between decisions

This asymmetry suggests where to invest effort. Critical decisions should be restated explicitly and recently rather than relying on early-session mentions. Important file paths belong in structured handoffs rather than trusted to survive compression.

The continuation mindset

Effective summarization requires thinking ahead. During a session, consider what a future session would need to continue the work.

State decisions explicitly when made, not implicitly through action
Prefer structured formats over narrative explanation
Update project documentation incrementally rather than at session end
Create checkpoints at natural boundaries, not just at capacity limits

The goal is not perfect preservation but productive continuation. A well-structured handoff summary of 500 tokens enables better continuation than 10,000 tokens of unstructured conversation history.

Quality of preserved context matters more than quantity. The pages that follow examine context window mechanics and optimization strategies that build on these summarization patterns.

Summarization and Continuation Patterns

On this page