Applied Intelligence
Module 5: Output Validation and Iteration

Exercise: Output Validation and Feedback Practice

Overview

This exercise practices the complete validation-feedback loop: reviewing agent-generated code, identifying issues, providing targeted feedback, and iterating to correct solutions.

The ms library converts time strings like "2 days" or "1h" to milliseconds and back. It is small enough to understand completely, widely used (9,000+ npm dependents), and has the kinds of edge cases where AI agents commonly make mistakes: parsing ambiguity, floating-point precision, and boundary conditions.

The exercise generates code that contains realistic errors, then practices the feedback patterns covered in this module to correct them.

The task

Add a strict mode to the ms library that rejects ambiguous or potentially incorrect inputs instead of guessing.

Currently, ms is permissive:

  • ms("1.5.5d") returns something (incorrect behavior)
  • ms("") returns undefined silently
  • ms("10000000000000000d") may overflow
  • ms(" 1d ") handles whitespace inconsistently

The strict mode should:

  • Reject inputs with multiple decimal points
  • Reject empty strings with an explicit error
  • Reject values that would overflow Number.MAX_SAFE_INTEGER
  • Require trimmed input (leading/trailing whitespace rejected)
  • Return errors instead of undefined for invalid formats

This feature introduces parsing validation, numeric boundary checks, and format enforcement, areas where AI code reliably struggles.

Setup

Clone the repository and install dependencies:

git clone https://github.com/vercel/ms.git
cd ms
npm install

Run the existing tests to verify setup:

npm test

Explore the codebase structure. The library is a single file with straightforward logic. Understand how parsing works before generating code.

The experiment

This exercise has three phases: generate code that contains errors, practice validation and feedback, and iterate to working code.

Phase 1: Generate flawed code

Ask an agent to implement strict mode without providing extensive guidance. The goal is to produce code that works partially but contains the kinds of errors covered in this module.

Prompt:

Add a strict mode option to this ms library. When strict mode is enabled,
the library should reject ambiguous inputs instead of guessing.

Implement:
- ms("1d", { strict: true }) - valid, returns milliseconds
- ms("bad input", { strict: true }) - throws Error instead of returning undefined
- Handle edge cases strictly: empty strings, overflow, malformed numbers

Add the strict option and update the parsing logic.

This prompt is deliberately underspecified. It does not define what "ambiguous" means, does not specify all edge cases, and does not reference existing code patterns. The agent will make assumptions. Some will be wrong.

Before reviewing the output, write down your predictions:

  • What error categories from this module do you expect to appear?
  • Which edge cases will the agent likely miss?
  • What validation issues will you look for first?

Phase 2: Systematic validation

Review the generated code using the validation hierarchy from this module. Do not run tests yet. Catch what code review catches.

Step 1: Logic and correctness scan

Read through the parsing logic. Check for:

  • Off-by-one errors in string slicing
  • Incorrect regex patterns that match too much or too little
  • Type coercion issues (string to number conversions)
  • Missing cases in conditional logic
  • Operations in wrong order

Use the seven error categories: conditional errors, garbage output, math/logic errors, formatting errors, operation order errors, API misuse, index errors. Note which categories you find.

Step 2: Edge case audit

Test mentally against edge cases:

// Empty and whitespace
ms("", { strict: true })          // should throw
ms(" ", { strict: true })         // should throw
ms(" 1d ", { strict: true })      // should throw (untrimmed)

// Malformed numbers
ms("1.2.3d", { strict: true })    // should throw (multiple decimals)
ms(".5d", { strict: true })       // ambiguous - what should happen?
ms("1.d", { strict: true })       // ambiguous - what should happen?
ms("d", { strict: true })         // should throw (no number)

// Boundary conditions
ms("9007199254740992ms", { strict: true })  // exceeds MAX_SAFE_INTEGER
ms("-1d", { strict: true })       // negative - valid or not?
ms("0d", { strict: true })        // zero - edge case

// Format ambiguity
ms("1 d", { strict: true })       // space between number and unit
ms("1D", { strict: true })        // case sensitivity
ms("1day", { strict: true })      // full word vs abbreviation

For each case, determine:

  • What does the generated code do?
  • What should it do?
  • Was this specified in the prompt?

Note any specification misunderstanding errors, the dominant category from this module.

Step 3: Security check

For a parsing library, security concerns include:

  • ReDoS (Regular Expression Denial of Service) from backtracking
  • Prototype pollution if options objects are mishandled
  • Type confusion from unexpected inputs

Examine any regex patterns added. Check how the options parameter is processed.

Record your findings:

IssueCategorySeverityLine/Location

Aim for 3-5 issues before proceeding. If the generated code is flawless (unlikely), the prompt was too specific.

Phase 3: Feedback iteration

Now practice feedback patterns. The goal is correcting issues efficiently with minimal iterations.

Round 1: Specific feedback

Choose the highest-severity issue from your findings. Provide feedback using the complete feedback format:

Problem: [what's wrong]
Location: [file and line]
Current behavior: [what the code does]
Expected behavior: [what it should do]
Verification: [how to confirm the fix]

Example:

Problem: The regex pattern matches inputs with multiple decimal points.
Location: src/index.js line 23
Current behavior: "1.2.3d" parses as 1.2 days, ignoring ".3"
Expected behavior: Strict mode should throw Error for malformed numbers
Verification: ms("1.2.3d", { strict: true }) should throw

Observe:

  • Did the agent fix the issue in one iteration?
  • Did the fix introduce new problems?
  • Did the fix follow existing code patterns?

Round 2: Batched feedback

List multiple issues in a single message:

Three issues to fix:

1. Empty string handling: ms("", { strict: true }) returns undefined.
   Should throw Error with message "Empty input not allowed in strict mode"

2. Whitespace handling: ms(" 1d ", { strict: true }) parses successfully.
   Should throw Error - require trimmed input in strict mode

3. Overflow check missing: ms("9999999999999999d", { strict: true }) overflows.
   Should throw Error when result exceeds Number.MAX_SAFE_INTEGER

Observe:

  • Did batching reduce total iterations compared to one-at-a-time?
  • Were all issues addressed, or did some get dropped?
  • Did fixing multiple issues introduce conflicts?

Round 3: Show don't tell

For any remaining issues, provide example code:

The error message format is inconsistent with the library style.

Current:
throw new Error("invalid input")

Expected (matches existing patterns):
throw new Error(`Invalid value: "${val}"`)

Follow the existing error format in the library.

Observe:

  • Did the example communicate faster than explanation?
  • Did the agent match the pattern exactly or approximately?

Phase 4: Verification

Run the test suite:

npm test

If tests pass but you know edge cases are unhandled, add tests:

// Add to test file
describe('strict mode', () => {
  it('rejects empty strings', () => {
    expect(() => ms('', { strict: true })).toThrow();
  });

  it('rejects malformed decimals', () => {
    expect(() => ms('1.2.3d', { strict: true })).toThrow();
  });

  it('rejects overflow values', () => {
    expect(() => ms('9999999999999999d', { strict: true })).toThrow();
  });

  it('rejects untrimmed input', () => {
    expect(() => ms(' 1d ', { strict: true })).toThrow();
  });

  it('accepts valid strict input', () => {
    expect(ms('1d', { strict: true })).toBe(86400000);
  });
});

The combination of verification (tests) and feedback (corrections) should produce working code. If the session has derailed (more than 5 correction iterations without progress), recognize the fix loop pattern and start fresh.

Analysis

After completing the exercise, record your observations.

Validation effectiveness

QuestionYour Observation
How many issues did code review catch before running tests?
What categories of errors appeared most?
Were specification misunderstanding errors dominant?
Did any security-relevant issues appear?

Feedback effectiveness

Feedback approachIterations to fix
Vague feedback (if you tried it)
Specific feedback with location
Batched multiple issues
Example-based showing

Overall patterns

  1. Detection accuracy: How many issues did you find versus how many existed? Did running tests reveal issues you missed?

  2. Feedback efficiency: Which feedback format required the fewest iterations?

  3. Session health: Did the session stay productive, or did you observe derailing patterns? What signals indicated session state?

  4. Trust calibration: After this exercise, how would you calibrate trust for similar parsing tasks? What verification would you require before accepting such code?

What this exercise teaches

The validation-feedback loop is the core skill of output review. This exercise practices each component:

Systematic validation catches issues before they become problems. The hierarchy (logic, security, maintainability, performance) prioritizes review effort. The error category taxonomy helps identify what you're looking at.

Feedback quality determines iteration count. Specific feedback with location, current behavior, expected behavior, and verification closes issues in one round. Vague feedback ("fix the bug") multiplies iterations.

Verification confirms correctness. Tests convert subjective review into objective confirmation. Without verification, review remains opinion.

The complete loop compounds. Better validation finds more issues. Better feedback fixes issues faster. Better verification confirms fixes work. Each component strengthens the others.

Variations

Variation A: Compare feedback approaches

Intentionally try vague feedback first ("the parsing doesn't handle edge cases"). Count iterations. Then start fresh and try specific feedback. Compare quantitatively.

Variation B: Skip code review

Go straight to running tests without reviewing code first. Note what tests catch versus what they miss. Note the debugging time when tests fail without prior understanding of the code. This variation makes the value of code review concrete.

Variation C: Different feature

Instead of strict mode, implement:

  • Microsecond precision support (ms("1.5ms") returns 1.5)
  • Relative time output (ms(86400000, { relative: true }) returns "1 day ago")

Both features introduce floating-point precision issues and output formatting edge cases.

Variation D: Deliberate derailing

After generating initial code, provide contradictory feedback:

Actually, strict mode should allow empty strings.
Wait, no, it should reject them.
Actually, make it configurable.

Observe how quickly the session degrades. Practice recognizing derailing patterns in real time.

Completion

The exercise is complete when:

  • Initial code has been generated with minimal prompting
  • At least 3 issues have been identified through code review
  • Feedback has been provided and iterated until tests pass
  • The analysis section has been completed

The goal is not perfect code on the first try. The goal is developing fluency with the validation-feedback loop that makes "almost right" code actually right.

On this page