Checkpoint, Validation, and Error Recovery

Automatic checkpointing

The previous page covered verification loops and validation gates as patterns. Now for the infrastructure that makes error recovery practical: automatic checkpoints, rewind capabilities, and generator-critic patterns that catch mistakes before they spread.

Claude Code creates checkpoints automatically. Every user prompt that results in a file edit triggers a checkpoint capturing the state before that edit. No explicit commands or configuration required. The checkpointing system tracks all file modifications made through Claude's editing tools Write, Edit, and NotebookEdit operations.

Checkpoints persist across sessions. Stored in SQLite within the ~/.claude directory, they survive terminal closure and system restarts. The default retention period is 30 days, configurable via settings.cleanupPeriodDays. This creates a rolling safety net that covers recent work without unbounded storage growth.

What checkpoints capture

The system captures file state at edit boundaries:

Content of files before modification
New files before creation
Files before deletion (via edit tools)

Each checkpoint associates with the conversation turn that triggered it. The interface displays checkpoints as a timeline of your prompts paired with their file impacts, shown in Git-style diff notation: auth.ts +15 -3.

What checkpoints miss

Not all changes flow through Claude's editing tools. Checkpoints do not capture:

Bash command side effects (rm, mv, cp, shell redirects)
Manual edits made in external editors
Changes from concurrent Claude sessions
External operations like git push or database modifications

This reflects architectural boundaries. Claude observes edits through its own tools but cannot intercept arbitrary file system operations. For operations outside the checkpointing system, Git remains the authoritative history.

Tip: When directing agents to move or delete files, prefer the Edit tool's capabilities over Bash commands when possible. This keeps operations within the checkpointing system.

The rewind feature

Checkpoints become useful through rewind. Access it three ways:

# Press Escape twice for instant access
Esc + Esc

# Or use the slash command
/rewind

# Or list checkpoints first, then rewind to a specific one
/checkpoints
/rewind abc123

The rewind interface presents your conversation as a timeline. Each entry shows your message and the files it changed. Select a point to restore to the state before that turn.

Three restoration modes

Rewind offers granular control over what gets restored:

Mode	Effect	Use case
Conversation only	Returns to earlier conversation state; keeps current code	Agent is confused but code changes are correct
Code only	Reverts file changes; keeps conversation understanding	Implementation failed but agent grasps the goal
Both	Complete reset to checkpoint state	Everything needs to roll back

These modes address different failure patterns. Sometimes an agent produces correct code but talks itself into confusion through excessive explanation. Conversation-only rewind clears the verbal drift while preserving working implementation.

Code going wrong while the agent's understanding remains sound is the more common case. Code-only rewind handles this well. The agent retains its grasp of your requirements, the codebase structure, and the approach only the broken implementation disappears. This often enables immediate success on the next attempt because the understanding persists.

Both-mode rewind handles cases where the entire approach failed. A misunderstood requirement might produce confident but wrong code. Full restoration returns to a clean state for a fresh attempt.

Strategic rewind versus starting over

Rewind targets surgical recovery from recent mistakes. It works well when:

The error is localized to recent turns
The agent's earlier understanding was sound
Context accumulated before the error remains valuable

Rewind becomes less effective when:

Context pollution accumulated over many turns
The fundamental approach was flawed from the start
Earlier context has already degraded

For systemic problems, /clear or a new session often works better than rewinding to a distant checkpoint. Rewind is for tactical recovery; session management handles strategic resets.

Generator-critic patterns

Beyond reactive recovery, generator-critic patterns prevent errors proactively. The pattern separates generation from evaluation, forcing explicit quality checks between production and acceptance.

In single-agent form, the same agent alternates roles:

Generate initial implementation
↓
Shift to critic mode: evaluate against requirements
↓
Identify gaps or errors
↓
Generate corrections
↓
Re-evaluate
↓
(repeat until satisfactory)

The role shift matters. When an agent generates code and immediately declares it complete, confirmation bias operates. Forcing explicit evaluation mode disrupts this pattern. The agent treats its own output as it would external content critically rather than defensively.

Implementing critic prompts

Structure critic evaluation explicitly:

Review the implementation above for:
1. Edge cases not handled
2. Error conditions without recovery
3. Type mismatches or null safety issues
4. Deviations from the specification

List issues found. Do not defend the implementation.

The instruction to avoid defense prevents the agent from rationalizing problems. Without this constraint, agents tend to explain why apparent issues are actually acceptable.

Multi-agent critic patterns

Stronger separation uses distinct agents for generation and criticism. The generator produces code; a separate critic agent with different instructions evaluates it. The critic either approves or returns feedback for revision.

Multi-agent patterns provide:

Role isolation: The critic never sees the generation process, only the output
Independent judgment: No memory of why certain choices were made
Consistent standards: The critic applies the same rubric regardless of generation complexity

The cost is additional latency and token usage. Each review cycle requires at least one more model call. For code paths where bugs would hurt, this investment pays for itself. For routine changes, the overhead may exceed the benefit.

When critic patterns deliver value

Apply generator-critic patterns when:

Output quality significantly outweighs speed
Errors carry meaningful consequences
Code requires security review or compliance verification
The domain is unfamiliar to the generator

Skip them for:

Trivial changes with obvious correctness
Iterative refinement where tests provide feedback
Time-sensitive operations where latency matters

Verification loops in practice

Page 10 introduced Boris Cherny's observation that verification loops improve quality 2-3x. Here are the implementation patterns that deliver that improvement.

The test-verify-fix cycle

The tightest verification loop connects agents directly to test execution:

1. Agent writes or modifies code
2. Test suite runs automatically (via hooks or instruction)
3. Agent receives test output
4. Agent fixes failures
5. Repeat until tests pass

This closed loop eliminates the "works for me" problem. The agent cannot claim completion while tests fail. Each iteration narrows the gap between implementation and specification.

Configure this loop through Claude Code hooks:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "command": "npm test"
      }
    ]
  }
}

With this configuration, every file edit triggers the test suite. The agent sees failures immediately rather than discovering them later when context has evolved.

The validation gate

Validation gates block progress until verification passes. Structure prompts to enforce this:

Implement the authentication middleware.
After implementation, run all auth tests.
Do not proceed to route implementation if any test fails.
Fix failures first.

The explicit blocking instruction prevents agents from optimistically continuing. Without it, agents may note failures but proceed anyway, accumulating broken dependencies.

Type checking as verification

TypeScript and other statically-typed languages provide continuous verification:

# Run type checker after edits
tsc --noEmit

# Or in watch mode for continuous feedback
tsc --noEmit --watch

Type errors caught immediately cost less to fix than those discovered later. Configure type checking in hooks or instruct agents to run it after structural changes.

Browser and UI verification

For frontend work, visual verification catches what unit tests miss. Cherny describes his approach: "Claude tests every single change I land to claude.ai/code using the Claude Chrome extension. It opens a browser, tests the UI, and iterates until the code works and the UX feels good."

Automated browser testing through tools like Playwright or Cypress extends this pattern. The agent writes tests that actually interact with rendered UI, catching layout issues, interaction bugs, and visual regressions.

Error recovery strategies

When verification catches errors, recovery strategies determine what happens next.

Targeted correction

For isolated failures, targeted correction works efficiently:

Test X failed with error Y.
The failure is in function Z.
Fix only the code causing this specific failure.
Do not modify other functions.

Scope constraints prevent agents from "helpfully" refactoring adjacent code while fixing bugs. Each fix addresses one problem without introducing new surface area for issues.

Rollback and retry

When fixes compound instead of resolve, rollback becomes appropriate:

The last three attempts to fix this test have introduced new failures.
Rewind to before the first fix attempt.
Approach the original failure differently.

Recognizing when iteration produces negative progress requires judgment. Three or four attempts at the same fix without improvement often indicates the approach not just the implementation needs revision.

Escalation to planning

Persistent failures may indicate insufficient planning:

This implementation has required six fix iterations.
Stop implementing.
Switch to plan mode and analyze why the approach keeps failing.
Return with a revised strategy before more code changes.

Stepping back to analysis mode breaks the fix-fail cycle. The agent reconsiders assumptions rather than repeatedly attempting the same flawed approach.

Combining recovery mechanisms

Error handling works best when these mechanisms layer together:

Automatic checkpoints capture state continuously
Verification loops catch errors immediately
Generator-critic patterns prevent errors before commits
Rewind capabilities recover from errors that slip through
Session failure acceptance handles irrecoverable situations

Each layer addresses different failure modes. No single mechanism handles all cases. Automatic checkpoints cost nothing to maintain; use them as the foundation. Verification loops catch most implementation errors; configure them for every substantial codebase. Generator-critic patterns add cost but catch design errors that tests miss; apply them selectively. Rewind handles the failures that get through; know how to use it quickly when needed. Session failure acceptance handles systemic problems; recognize when fresh starts beat continued effort.

The point is not preventing all errors that is neither possible nor efficient. The point is minimizing the cost of errors by catching them early and recovering from them cheaply. A checkpoint taken automatically costs nothing until needed; a rewind takes seconds; a new session starts clean. The investment in recovery infrastructure pays off in reduced time debugging polluted contexts and compounded mistakes.

Checkpoint, Validation, and Error Recovery

On this page