Showing vs Telling in Corrections

Why examples beat explanations

Page 9 introduced showing versus telling as a principle. This page examines why examples work and how to construct them for maximum impact.

The mechanism is straightforward. When developers describe what they want in prose, agents must extract actionable requirements from natural language. When developers show what they want through examples, agents pattern-match directly. Pattern-matching is more reliable than extraction.

A 2025 study analyzing the DevGPT dataset measured this gap directly. The "question" pattern, where developers ask casual questions, required an average of 11.16 prompts to reach satisfactory output. The "context and instruction" pattern, where developers provide explicit structure, required 8.04 prompts. The "recipe" pattern, providing step-by-step demonstrations, required just 7.07 prompts.

That's 37% fewer iterations. The gap compounds across a project: every correction that includes an example instead of a description saves a round trip.

The anatomy of an effective example

Not all examples work equally well. Effective examples share specific characteristics.

They are minimal. An example that demonstrates a naming convention needs only the relevant naming pattern, not an entire function. Extraneous code obscures the pattern being demonstrated.

They show contrast. The wrong-and-right pair communicates more than the right alone. Seeing what not to do alongside what to do clarifies the boundary.

They include context about where the pattern applies. An example without scope creates uncertainty: does this apply everywhere, or only in specific cases?

A correction that fails to demonstrate:

Use our standard error handling.

The agent has no reference for "standard." It will guess, probably wrong.

The same correction with an effective example:

Use this error handling pattern in all service layer functions:

WRONG (what you generated):
function fetchUser(id) {
  const user = await db.users.find(id);
  return user;
}

RIGHT (project standard):
async function fetchUser(id: string): Promise<User> {
  try {
    const user = await db.users.find(id);
    if (!user) {
      throw new NotFoundError('User', id);
    }
    return user;
  } catch (error) {
    logger.error('fetchUser_failed', { userId: id, error });
    throw error instanceof AppError ? error : new DatabaseError(error);
  }
}

Apply this pattern to all functions in src/services/.

The example is minimal: one function, focused on error handling. The contrast is explicit: wrong then right, with labels. The scope is clear: service layer functions in a specific directory.

Pattern references over inline examples

Sometimes the pattern is too complex to inline. Or it exists already in the codebase. In these cases, pointing to an existing implementation outperforms pasting code.

The Claude Code documentation recommends this approach directly: "Look at how existing widgets are implemented... HotDogWidget.php is a good example to start with." The reference leverages existing code as a living example.

Pattern references have two advantages. They stay current, since an inlined example can become outdated when the codebase evolves, while a reference to a file always reflects the current implementation. They show the pattern in context, preserving relationships between the pattern and its surroundings that extracted examples strip away.

Effective pattern references are specific:

Follow the pattern in src/services/UserService.ts for new services.
Pay attention to how constructor injection works at lines 12-18.
The error handling pattern at lines 45-62 is the one to use.

Line numbers matter. A reference to an entire file forces the agent to identify the relevant pattern. A reference to specific lines eliminates that search.

Vague references fail:

Follow the patterns in the codebase.

The agent cannot search "the codebase" for "patterns." Precision enables action.

The self-review technique

One of the most effective correction patterns involves no correction at all: asking the agent to review its own output.

The Self-Refine framework demonstrates that models can improve their output significantly when asked to critique before revising. The approach is mechanical: generate initial output, ask the model to critique that output, then ask the model to revise based on its own critique.

Studies show GPT-4 improved code optimization scores by 8.7 units and readability by 13.9 units using this technique. The agent finds errors that the developer might miss, and the critique provides focused guidance for revision.

In practice, self-review works as a correction strategy when direct feedback isn't resolving an issue:

Before making any changes, review your previous implementation.
List three potential problems with the current approach.
Then fix the most significant one.

This forces the agent to examine its own work critically. It constrains revision to the most important issue, preventing wholesale rewrites. It generates diagnostic information that reveals the agent's understanding.

Self-review is particularly effective for logic errors. The agent generated the faulty logic, so it has full context about what it intended. Asking it to verify intent against implementation often surfaces the gap.

You wrote this condition at line 23:
if (user.role === 'admin' || user.permissions.includes('write'))

Explain what this condition is supposed to allow.
Then verify that the implementation matches your explanation.

If the explanation doesn't match the requirement, the agent has identified its own error. The next generation will incorporate that self-correction.

The Spotify verification pattern

Production AI systems formalize self-review into their architecture. Spotify's background coding agent implements a dual verification system: deterministic verifiers run first, then an LLM judge evaluates results.

When the LLM judge vetoes output, the agent self-corrects 50% of the time. Half of all vetoed outputs recover through self-correction without human intervention.

Developers can approximate this pattern manually:

Run the tests. If any fail, fix the failures.
After fixing, explain what was wrong and how you fixed it.

The explanation requirement forces the agent to articulate its correction. If the explanation doesn't make sense, the fix probably doesn't either. The explanation becomes a verification signal.

Agents struggle to produce coherent explanations for incorrect fixes. When the fix is wrong, the explanation tends to be vague, circular, or contradictory. Those qualities are easier to spot than subtle code bugs.

Calibrating example quantity

How many examples to provide depends on what you're demonstrating.

For simple patterns like naming conventions, one example suffices. The pattern is mechanical: replace underscores with camelCase, capitalize appropriately, done.

For complex patterns like architectural decisions, two to three examples demonstrate variation. One example shows the pattern, two examples show what varies and what stays constant, three examples establish the boundaries.

Our repository pattern:

Example 1: Simple entity
class UserRepository extends BaseRepository<User> {
  constructor(db: Database) {
    super(db, 'users');
  }
}

Example 2: Entity with relations
class OrderRepository extends BaseRepository<Order> {
  constructor(db: Database) {
    super(db, 'orders');
  }

  async findWithItems(id: string): Promise<OrderWithItems> {
    return this.db.orders.find(id).include('items');
  }
}

Example 3: Entity with custom validation
class PaymentRepository extends BaseRepository<Payment> {
  constructor(db: Database, private validator: PaymentValidator) {
    super(db, 'payments');
  }

  async create(payment: Payment): Promise<Payment> {
    await this.validator.validate(payment);
    return super.create(payment);
  }
}

Three examples establish basic structure, how to add relations, and how to add custom behavior. The agent can extrapolate from this foundation.

Beyond three examples, returns diminish. Research suggests 2-8 examples optimize accuracy, but more doesn't mean better. Recent studies on reasoning models found that few-shot prompting can actually degrade performance in advanced models like DeepSeek R1. Sufficiency matters, not abundance.

Good and bad feedback contrasted

The difference between feedback that works and feedback that fails often comes down to structure.

A vague complaint like "This code is wrong" gives the agent nothing to work with. No location, no problem description, no direction for correction. The agent must guess everything.

Compare that to: "The condition at line 34 inverts the logic. It currently reads if (!isValid) but should read if (isValid). The negative check causes valid inputs to be rejected." Location specified, error identified, correction provided, rationale included.

Explanations without examples fail in a similar way. Telling an agent that "API responses should follow RESTful conventions with proper status codes, consistent error formats, and HATEOAS links where appropriate" leaves "proper," "consistent," and "appropriate" undefined. The agent has no reference implementation.

An example with minimal explanation works better:

Format all API responses like this:

{
  "data": { ... },
  "meta": {
    "timestamp": "2025-01-21T10:30:00Z",
    "requestId": "abc123"
  },
  "links": {
    "self": "/api/users/123",
    "related": { "posts": "/api/users/123/posts" }
  }
}

Errors use the same structure with "error" instead of "data".

The example is the specification. No interpretation required.

Overwhelming context creates different problems. A paragraph explaining JWT tokens, RS256 signing, key rotation schedules, refresh token expiration, and graceful error handling buries the actual action the agent should take. The agent may focus on irrelevant details.

A scoped constraint works better: "This endpoint requires authentication. Add the authMiddleware from src/middleware/auth.ts. If the token is invalid or expired, return 401." Action is clear, location is specified, behavior is constrained.

Feedback as learned skill

Over time, effective patterns become reflexive. Developers stop thinking about how to structure feedback and simply structure it correctly.

Start by examining failed corrections. When an agent iteration doesn't improve, ask: was the location clear? Was the problem identified? Was the expected behavior specified? Was an example provided?

The absent element usually explains the failure.

Then front-load the absent element in the next correction. If the agent seemed confused about location, start with file and line. If the agent seemed uncertain about expected behavior, start with an example.

This self-correction process, applied to your own feedback rather than the agent's output, accelerates skill development. Most developers report that feedback quality becomes automatic within a few weeks of deliberate practice.

Every session benefits from better feedback patterns. Every project accumulates fewer wasted iterations. The agent's apparent capability rises without any change to the underlying model.

That last point matters. Developers who master correction patterns often mistake their own improvement for model improvement. The model is constant. The feedback changed.

On this page