Safe Modification Strategies

The legacy code dilemma

Michael Feathers defines legacy code as "code without tests." This definition frames a real bind: making changes safely requires tests, but adding tests often requires changing code first.

When working with unfamiliar codebases, the bind tightens. You do not know which changes are safe. You cannot predict which modifications will cascade into unexpected failures. The code's actual behavior may differ from its documented or intended behavior and you have no way to tell.

AI agents face the same problem. They generate modifications quickly, but speed without safety just produces bugs faster. The techniques that make human modifications safe also apply to agent-directed changes. Understanding these techniques lets you guide agents toward safe modifications rather than risky rewrites.

Cover and modify, not edit and pray

Two approaches dominate legacy code modification.

Edit and Pray works like this:

Study the code carefully
Make the change
Run the system
Hope nothing broke

The "pray" step gives away the problem. Without automated verification, you cannot know whether the change worked until something fails in production. Manual testing catches obvious errors but misses edge cases and interaction bugs.

Cover and Modify works differently:

Write tests that capture current behavior
Verify the tests pass
Make the change
Verify the tests still pass
Know with confidence the behavior preserved

The tests serve as a safety net. If a modification breaks something, a test fails immediately. You discover problems in seconds rather than days.

This matters for agent-directed work because agents default to edit-and-pray unless directed otherwise. They can generate changes quickly, but they cannot assess whether those changes broke subtle behaviors. Directing agents to follow cover-and-modify workflows produces safer results.

When an agent proposes a change to unfamiliar code, ask: "What tests verify this behavior?" If none exist, the first task is creating them, not making the modification.

Characterization tests: capturing actual behavior

Characterization tests describe what code actually does, not what it should do. Michael Feathers coined the term to distinguish them from specification tests that verify intended behavior.

The recipe:

Put the code in a test harness
Write an assertion you expect to fail
Run the test and observe the actual behavior
Update the assertion to match the actual behavior
Repeat until behavior is captured

This seems backward. You write tests that pass by definition. But that is exactly the point. You are not verifying correctness; you are freezing current behavior.

When you later modify the code, any characterization test that fails indicates a behavioral change. You can then decide: was this change intentional or accidental? Intentional changes update the test. Accidental changes revert the modification.

// Before understanding the code, discover what it does
describe('calculateDiscount', () => {
  it('captures current behavior for regular customers', () => {
    // We don't know what this should return
    // Run once, see the actual value, then assert that
    const result = calculateDiscount('regular', 100);
    expect(result).toBe(95); // Discovered: 5% discount
  });

  it('captures edge case at zero quantity', () => {
    const result = calculateDiscount('regular', 0);
    expect(result).toBe(0); // Discovered: no error thrown
  });

  it('captures behavior for unknown customer type', () => {
    const result = calculateDiscount('unknown', 100);
    expect(result).toBe(100); // Discovered: falls through to no discount
  });
});

Agents do well at generating characterization tests. Direct them to explore code behavior systematically:

Examine the calculateDiscount function.
Generate characterization tests that capture its actual behavior.
Test each parameter combination you find in the calling code.
Include edge cases: zero values, null inputs, boundary conditions.
Do not assume what the function should do.
Observe what it actually does and write tests that pass.

The minimal change principle

Every modification carries risk. Larger modifications carry more risk. The minimal change principle reduces risk by limiting scope.

A minimal change modifies only what is necessary to achieve the stated goal. It does not clean up surrounding code. It does not refactor adjacent functions. It does not update patterns elsewhere in the file.

This conflicts with developer instincts. When you see bad code, you want to fix it. When you spot inconsistencies, you want to standardize. These impulses introduce additional risk without corresponding benefit to the current task.

For agent-directed modifications, constrain scope explicitly:

Add input validation to the createUser function.
Modify only the createUser function itself.
Do not change calling code.
Do not refactor other validation in this file.
Do not update the User class.
The change should be reviewable in under 50 lines.

Without these constraints, agents often propose sweeping changes. They see patterns that could improve and try to improve them all. Each additional modification increases the chance that something breaks.

A useful test: can you describe the change in one sentence? "Add null check to createUser" is a minimal change. "Improve validation across the user module" is not.

The sprout and wrap techniques

When adding new behavior to legacy code, two techniques minimize risk: sprouting and wrapping.

Sprout Method extracts new logic into a separate, testable function:

# Original: complex function you don't fully understand
def process_order(order):
    # 200 lines of legacy code
    # You need to add discount calculation somewhere in here
    pass

# Sprout: add new logic in isolated function
def calculate_order_discount(order):
    """New, testable function for discount logic."""
    if order.customer.is_premium:
        return order.total * 0.15
    return order.total * 0.05

# Minimal modification to original
def process_order(order):
    # ... existing code ...
    discount = calculate_order_discount(order)  # Single new line
    # ... existing code continues ...

The new function is fully testable in isolation. The modification to the original code is a single line insertion. If something breaks, the problem is either in the new function or in the insertion point nowhere else.

Wrap Method preserves the original function and adds behavior around it:

# Original function you cannot easily modify
def send_notification(user, message):
    # Complex legacy notification logic
    pass

# Wrap: rename original and create new entry point
def _send_notification_original(user, message):
    # Original logic unchanged
    pass

def send_notification(user, message):
    """Wrapper that adds logging before delegation."""
    log_notification_attempt(user, message)
    result = _send_notification_original(user, message)
    log_notification_result(user, result)
    return result

Wrapping lets you add behavior without modifying the original logic. The risk concentrates in the wrapper, which you control and understand.

Direct agents toward these patterns:

Add audit logging to the processPayment function.
Use the wrap technique: rename the original, create a wrapper with the original name.
The wrapper should log before and after calling the original.
Do not modify the original function's internal logic.

Phased modification for code without tests

Some code cannot be tested easily. Dependencies on databases, external services, or global state make isolation difficult. In these cases, a phased approach reduces risk.

Phase 1: Identify seams. A seam is a place where behavior can change without editing the code. Constructor parameters, method arguments, and inheritance all create seams. Find where you can substitute behavior for testing purposes.

Phase 2: Break dependencies. Make minimal changes to introduce seams where none exist. Extract dependencies into parameters. Convert static calls to instance methods. These changes are mechanical and low-risk.

Phase 3: Add characterization tests. With seams in place, write tests that capture current behavior. Substitute test doubles for external dependencies. Build the safety net.

Phase 4: Make the intended change. With tests protecting you, make the modification you originally intended. Run the tests. Verify behavior preserved or changed intentionally.

This phased approach takes longer than direct modification. That time investment pays off in reduced debugging and fewer production incidents. For critical code in unfamiliar systems, the investment is worth it.

I need to modify the OrderProcessor class to add caching.
The class has hard-coded database dependencies.

Phase 1: Identify what I can substitute.
Phase 2: Refactor to inject the database connection.
Phase 3: Write characterization tests with a mock database.
Phase 4: Add caching with test coverage.

Start with Phase 1. Show me the seams you identify.

When experts define scope, agents handle mechanics

The highest-leverage pattern for safe modification combines human judgment with agent execution.

Humans provide:

Scope boundaries (which files, which functions)
Risk assessment (what must not break)
Verification criteria (how to know it worked)
Rollback conditions (when to stop)

Agents provide:

Mechanical code changes
Test generation
Pattern application
Cross-reference updates

This division plays to each party's strengths. Humans understand context, consequences, and business impact. Agents handle repetitive transformations reliably and quickly.

A modification workflow might look like:

Human identifies the function needing change
Human specifies what must not break
Agent generates characterization tests
Human reviews and approves tests
Agent makes the modification
Agent runs tests and reports results
Human reviews the final change

At each stage, the human retains decision authority. The agent handles execution. Unsafe modifications get caught before they merge.

The 96% of developers who do not fully trust AI output to be functionally correct are applying appropriate skepticism. Safe modification workflows assume verification is required, not optional.

Incremental over ambitious

Large refactoring projects fail more often than small, incremental improvements. The Strangler Fig pattern shows why: new functionality grows around legacy code until the old code can be removed entirely. No big bang rewrite required.

Applied to individual modifications:

Fix one function rather than an entire module
Update one call site rather than all call sites
Improve one test rather than the test suite

Each small change is easier to review, easier to verify, and easier to revert. Small verified steps accumulate into large changes more safely than ambitious rewrites.

When directing agents, break large requests into sequences:

We need to modernize the authentication module.

Task 1: Add characterization tests for the login function.
Task 2: Extract password hashing into a separate function.
Task 3: Add tests for the extracted function.
Task 4: Update the hashing algorithm in the extracted function.
Task 5: Verify all tests pass.

Complete Task 1 and stop. I will review before proceeding.

Each task is reviewable independently. Problems surface early when changes are small. The final result is the same as an ambitious rewrite, but the path is safer.

Verification before commitment

Never commit changes to unfamiliar code without verification. "It compiles" is not verification. "Tests pass" is better but not enough.

Complete verification includes:

All existing tests pass
New tests cover the changed behavior
Manual verification of the specific change
Review of adjacent code for unintended impact
Confirmation the change addresses the original requirement

For agent-generated modifications, add an explicit verification step:

After making the change:
1. Run the test suite and report results
2. Show me the git diff
3. Identify any functions that call the modified code
4. Explain how you verified the change works

This prompt structure forces verification before you accept the change. An agent that cannot explain its verification has not verified sufficiently.

Safe modification of unfamiliar code is slow work. Characterization tests, minimal changes, sprout and wrap, phased modification, explicit verification these techniques take time. But they build confidence systematically. Apply them consistently, and unfamiliar code becomes modifiable without becoming dangerous.

On this page