Exercise: Output Validation and Feedback Practice
Overview
This exercise practices the complete validation-feedback loop: reviewing agent-generated code, identifying issues, providing targeted feedback, and iterating to correct solutions.
The ms library converts time strings like "2 days" or "1h" to milliseconds and back. It is small enough to understand completely, widely used (9,000+ npm dependents), and has the kinds of edge cases where AI agents commonly make mistakes: parsing ambiguity, floating-point precision, and boundary conditions.
The exercise generates code that contains realistic errors, then practices the feedback patterns covered in this module to correct them.
The task
Add a strict mode to the ms library that rejects ambiguous or potentially incorrect inputs instead of guessing.
Currently, ms is permissive:
ms("1.5.5d")returns something (incorrect behavior)ms("")returns undefined silentlyms("10000000000000000d")may overflowms(" 1d ")handles whitespace inconsistently
The strict mode should:
- Reject inputs with multiple decimal points
- Reject empty strings with an explicit error
- Reject values that would overflow Number.MAX_SAFE_INTEGER
- Require trimmed input (leading/trailing whitespace rejected)
- Return errors instead of undefined for invalid formats
This feature introduces parsing validation, numeric boundary checks, and format enforcement, areas where AI code reliably struggles.
Setup
Clone the repository and install dependencies:
git clone https://github.com/vercel/ms.git
cd ms
npm installRun the existing tests to verify setup:
npm testExplore the codebase structure. The library is a single file with straightforward logic. Understand how parsing works before generating code.
The experiment
This exercise has three phases: generate code that contains errors, practice validation and feedback, and iterate to working code.
Phase 1: Generate flawed code
Ask an agent to implement strict mode without providing extensive guidance. The goal is to produce code that works partially but contains the kinds of errors covered in this module.
Prompt:
Add a strict mode option to this ms library. When strict mode is enabled,
the library should reject ambiguous inputs instead of guessing.
Implement:
- ms("1d", { strict: true }) - valid, returns milliseconds
- ms("bad input", { strict: true }) - throws Error instead of returning undefined
- Handle edge cases strictly: empty strings, overflow, malformed numbers
Add the strict option and update the parsing logic.This prompt is deliberately underspecified. It does not define what "ambiguous" means, does not specify all edge cases, and does not reference existing code patterns. The agent will make assumptions. Some will be wrong.
Before reviewing the output, write down your predictions:
- What error categories from this module do you expect to appear?
- Which edge cases will the agent likely miss?
- What validation issues will you look for first?
Phase 2: Systematic validation
Review the generated code using the validation hierarchy from this module. Do not run tests yet. Catch what code review catches.
Step 1: Logic and correctness scan
Read through the parsing logic. Check for:
- Off-by-one errors in string slicing
- Incorrect regex patterns that match too much or too little
- Type coercion issues (string to number conversions)
- Missing cases in conditional logic
- Operations in wrong order
Use the seven error categories: conditional errors, garbage output, math/logic errors, formatting errors, operation order errors, API misuse, index errors. Note which categories you find.
Step 2: Edge case audit
Test mentally against edge cases:
// Empty and whitespace
ms("", { strict: true }) // should throw
ms(" ", { strict: true }) // should throw
ms(" 1d ", { strict: true }) // should throw (untrimmed)
// Malformed numbers
ms("1.2.3d", { strict: true }) // should throw (multiple decimals)
ms(".5d", { strict: true }) // ambiguous - what should happen?
ms("1.d", { strict: true }) // ambiguous - what should happen?
ms("d", { strict: true }) // should throw (no number)
// Boundary conditions
ms("9007199254740992ms", { strict: true }) // exceeds MAX_SAFE_INTEGER
ms("-1d", { strict: true }) // negative - valid or not?
ms("0d", { strict: true }) // zero - edge case
// Format ambiguity
ms("1 d", { strict: true }) // space between number and unit
ms("1D", { strict: true }) // case sensitivity
ms("1day", { strict: true }) // full word vs abbreviationFor each case, determine:
- What does the generated code do?
- What should it do?
- Was this specified in the prompt?
Note any specification misunderstanding errors, the dominant category from this module.
Step 3: Security check
For a parsing library, security concerns include:
- ReDoS (Regular Expression Denial of Service) from backtracking
- Prototype pollution if options objects are mishandled
- Type confusion from unexpected inputs
Examine any regex patterns added. Check how the options parameter is processed.
Record your findings:
| Issue | Category | Severity | Line/Location |
|---|---|---|---|
Aim for 3-5 issues before proceeding. If the generated code is flawless (unlikely), the prompt was too specific.
Phase 3: Feedback iteration
Now practice feedback patterns. The goal is correcting issues efficiently with minimal iterations.
Round 1: Specific feedback
Choose the highest-severity issue from your findings. Provide feedback using the complete feedback format:
Problem: [what's wrong]
Location: [file and line]
Current behavior: [what the code does]
Expected behavior: [what it should do]
Verification: [how to confirm the fix]Example:
Problem: The regex pattern matches inputs with multiple decimal points.
Location: src/index.js line 23
Current behavior: "1.2.3d" parses as 1.2 days, ignoring ".3"
Expected behavior: Strict mode should throw Error for malformed numbers
Verification: ms("1.2.3d", { strict: true }) should throwObserve:
- Did the agent fix the issue in one iteration?
- Did the fix introduce new problems?
- Did the fix follow existing code patterns?
Round 2: Batched feedback
List multiple issues in a single message:
Three issues to fix:
1. Empty string handling: ms("", { strict: true }) returns undefined.
Should throw Error with message "Empty input not allowed in strict mode"
2. Whitespace handling: ms(" 1d ", { strict: true }) parses successfully.
Should throw Error - require trimmed input in strict mode
3. Overflow check missing: ms("9999999999999999d", { strict: true }) overflows.
Should throw Error when result exceeds Number.MAX_SAFE_INTEGERObserve:
- Did batching reduce total iterations compared to one-at-a-time?
- Were all issues addressed, or did some get dropped?
- Did fixing multiple issues introduce conflicts?
Round 3: Show don't tell
For any remaining issues, provide example code:
The error message format is inconsistent with the library style.
Current:
throw new Error("invalid input")
Expected (matches existing patterns):
throw new Error(`Invalid value: "${val}"`)
Follow the existing error format in the library.Observe:
- Did the example communicate faster than explanation?
- Did the agent match the pattern exactly or approximately?
Phase 4: Verification
Run the test suite:
npm testIf tests pass but you know edge cases are unhandled, add tests:
// Add to test file
describe('strict mode', () => {
it('rejects empty strings', () => {
expect(() => ms('', { strict: true })).toThrow();
});
it('rejects malformed decimals', () => {
expect(() => ms('1.2.3d', { strict: true })).toThrow();
});
it('rejects overflow values', () => {
expect(() => ms('9999999999999999d', { strict: true })).toThrow();
});
it('rejects untrimmed input', () => {
expect(() => ms(' 1d ', { strict: true })).toThrow();
});
it('accepts valid strict input', () => {
expect(ms('1d', { strict: true })).toBe(86400000);
});
});The combination of verification (tests) and feedback (corrections) should produce working code. If the session has derailed (more than 5 correction iterations without progress), recognize the fix loop pattern and start fresh.
Analysis
After completing the exercise, record your observations.
Validation effectiveness
| Question | Your Observation |
|---|---|
| How many issues did code review catch before running tests? | |
| What categories of errors appeared most? | |
| Were specification misunderstanding errors dominant? | |
| Did any security-relevant issues appear? |
Feedback effectiveness
| Feedback approach | Iterations to fix |
|---|---|
| Vague feedback (if you tried it) | |
| Specific feedback with location | |
| Batched multiple issues | |
| Example-based showing |
Overall patterns
-
Detection accuracy: How many issues did you find versus how many existed? Did running tests reveal issues you missed?
-
Feedback efficiency: Which feedback format required the fewest iterations?
-
Session health: Did the session stay productive, or did you observe derailing patterns? What signals indicated session state?
-
Trust calibration: After this exercise, how would you calibrate trust for similar parsing tasks? What verification would you require before accepting such code?
What this exercise teaches
The validation-feedback loop is the core skill of output review. This exercise practices each component:
Systematic validation catches issues before they become problems. The hierarchy (logic, security, maintainability, performance) prioritizes review effort. The error category taxonomy helps identify what you're looking at.
Feedback quality determines iteration count. Specific feedback with location, current behavior, expected behavior, and verification closes issues in one round. Vague feedback ("fix the bug") multiplies iterations.
Verification confirms correctness. Tests convert subjective review into objective confirmation. Without verification, review remains opinion.
The complete loop compounds. Better validation finds more issues. Better feedback fixes issues faster. Better verification confirms fixes work. Each component strengthens the others.
Variations
Variation A: Compare feedback approaches
Intentionally try vague feedback first ("the parsing doesn't handle edge cases"). Count iterations. Then start fresh and try specific feedback. Compare quantitatively.
Variation B: Skip code review
Go straight to running tests without reviewing code first. Note what tests catch versus what they miss. Note the debugging time when tests fail without prior understanding of the code. This variation makes the value of code review concrete.
Variation C: Different feature
Instead of strict mode, implement:
- Microsecond precision support (
ms("1.5ms")returns 1.5) - Relative time output (
ms(86400000, { relative: true })returns "1 day ago")
Both features introduce floating-point precision issues and output formatting edge cases.
Variation D: Deliberate derailing
After generating initial code, provide contradictory feedback:
Actually, strict mode should allow empty strings.
Wait, no, it should reject them.
Actually, make it configurable.Observe how quickly the session degrades. Practice recognizing derailing patterns in real time.
Completion
The exercise is complete when:
- Initial code has been generated with minimal prompting
- At least 3 issues have been identified through code review
- Feedback has been provided and iterated until tests pass
- The analysis section has been completed
The goal is not perfect code on the first try. The goal is developing fluency with the validation-feedback loop that makes "almost right" code actually right.