Applied Intelligence
Module 8: Code Review and Testing

Exercise: AI-Assisted Code Review Workflow

This exercise walks through the full AI-assisted code review workflow: configuring review guidelines, running AI pre-review, spotting AI-generated code patterns, and combining AI analysis with human judgment.

Overview

HTTPie is a command-line HTTP client. The codebase is Python with good test coverage, and the contribution guidelines require tests for new features useful constraints for practicing review workflows.

The scenario: reviewing a pull request that contains AI-generated code. You configure review guidelines, run AI pre-review, identify red flags, and make a merge decision.

The scenario

A teammate used Claude Code to implement a new feature for HTTPie: automatic retry logic for failed requests. The PR is ready for review. Your task is to apply the AI-assisted code review workflow from this module.

You will:

  1. Configure review guidelines for the project
  2. Use AI tools to pre-review the changes
  3. Identify patterns specific to AI-generated code
  4. Conduct human review focusing on what AI cannot validate
  5. Document your findings and make a merge recommendation

Setup

Clone the repository:

git clone https://github.com/httpie/cli.git
cd cli

Create a Python virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e ".[dev]"

Verify the test suite runs:

pytest tests/ -x -q --tb=short 2>/dev/null | head -20

The full test suite takes several minutes. Running a subset confirms the environment works.

Explore the project structure:

ls -la httpie/
cat CONTRIBUTING.md | head -50

Phase 1: Configure review guidelines

Before reviewing, configure AI tools with project-specific guidelines. This gives the AI reviewer context and forces you to think through conventions upfront.

Create CLAUDE.md

Create a CLAUDE.md file in the repository root with review guidelines:

# HTTPie Development Guidelines

## Project Context
HTTPie is a CLI HTTP client focused on usability and intuitive command-line syntax.
All changes must maintain backwards compatibility with existing workflows.

## Code Style
- Follow PEP 8 with 79-character line limit
- Use type hints for function signatures
- Docstrings required for public functions (Google style)
- No print statements in library code (use logging)

## Testing Requirements
- All new features require tests
- Maintain existing test coverage (currently ~90%)
- Tests must run in isolation without network dependencies
- Use pytest fixtures from conftest.py

## Security Considerations
- Never log or display authentication credentials
- Sanitize URLs before logging (remove query parameters with sensitive data)
- Validate all user input before shell operations
- No eval() or exec() with user-provided data

## Review Focus Areas
When reviewing code, prioritize:
1. Security implications (credential handling, input validation)
2. Error handling and edge cases
3. Test coverage for new code paths
4. API compatibility with existing workflows
5. Performance implications for large requests/responses

## Common Patterns
- Use `httpie.output.streams` for output formatting
- Request/response handling flows through `httpie.core`
- CLI argument parsing uses `httpie.cli.definition`
- Authentication plugins follow `httpie.plugins.base` interface

Create AGENTS.md for Codex

If using Codex, create an AGENTS.md file:

## Review Guidelines

### Code Quality
- Verify PEP 8 compliance
- Check type hints on all function signatures
- Ensure error messages are user-friendly

### Testing
- Flag PRs without tests as incomplete
- Check that tests actually exercise new code paths
- Watch for over-mocking (tests should validate real behavior)

### Security
- Mark credential exposure as P0
- Flag any use of eval/exec as P0
- Check input validation at CLI boundaries

Phase 2: Create a PR to review

For this exercise, you will create a simulated PR with AI-generated code. This lets you practice the review workflow without needing an actual teammate.

Generate the feature with Claude Code

Start a Claude Code session:

claude

Prompt the agent to generate the retry feature:

Add automatic retry logic for failed HTTP requests to HTTPie.

Requirements:
- Retry on connection errors and 5xx responses
- Configurable retry count (default 3)
- Exponential backoff between retries
- New CLI flag: --retry and --retry-max
- Add tests for the retry behavior

Create the implementation following existing patterns in this codebase.

Let the agent generate the implementation. Do not review the output yet that comes in the next phase.

Commit and create a branch

Once generation completes:

git checkout -b feature/auto-retry
git add .
git commit -m "Add automatic retry logic for failed requests"

If the agent produced errors or incomplete code, that is useful for the exercise. Real AI-generated PRs often have issues. Proceed with whatever was generated.

Phase 3: AI pre-review

Run AI tools to pre-review the changes before you look at them. This is the "AI reviews AI" workflow from section 8.5.

Claude Code review

In a new Claude Code session (to avoid the circular validation problem):

claude

Request a code review:

Review the changes on the feature/auto-retry branch compared to main.

Focus on:
1. Security issues
2. Logic errors and edge cases
3. Test coverage gaps
4. Code style consistency with the existing codebase
5. Any patterns typical of AI-generated code

Provide a structured review with severity ratings for each finding.

Document AI findings

Record the AI review findings:

FindingSeverityDescriptionAI Confidence

Pay attention to findings where the AI hedges those need extra human scrutiny.

Run automated checks

Complement AI review with deterministic tools:

# Type checking
mypy httpie/ --ignore-missing-imports 2>/dev/null | head -20

# Linting
ruff check httpie/ --select=E,W,F 2>/dev/null | head -20

# Test the new code
pytest tests/ -x -q --tb=short -k retry 2>/dev/null

Document any failures:

CheckResultIssue
mypy
ruff
pytest

Phase 4: Human review for AI-specific patterns

Now you review, focusing on patterns AI-generated code tends to have. The checklist from section 8.4 guides what to look for.

Baseline verification

Check for immediate red flags:

# Check for hallucinated imports
git diff main --name-only
git diff main -- httpie/ | grep "^+import\|^+from" | head -20

# Check for debugging artifacts
git diff main | grep -E "print\(|console\.log|debugger|TODO|FIXME" | head -10

# Check for hardcoded values
git diff main | grep -E "api_key|secret|password|token" -i | head -10

Record findings:

  • All imports reference real packages
  • No debugging artifacts in production code
  • No hardcoded secrets or credentials
  • No unused imports or dead code

Security verification

Review security-sensitive areas manually:

# Find the retry implementation
git diff main -- httpie/

# Look for input validation
git diff main | grep -A5 -B5 "retry" | head -50

Security checklist:

  • Retry count validated (not unbounded)
  • Backoff delay has reasonable limits
  • No credential exposure in retry logging
  • Error messages do not leak sensitive request data

Test quality assessment

Examine the generated tests:

# Find test files for retry
git diff main -- tests/

# Check test coverage
pytest tests/ -x -q --tb=short -k retry --cov=httpie --cov-report=term-missing 2>/dev/null | tail -30

Test quality checklist:

  • Tests exercise actual retry behavior (not just mocks)
  • Edge cases covered (max retries, immediate success, permanent failure)
  • No over-mocking (tests validate real httpie behavior)
  • Test assertions are meaningful (not just "no exception")

Architectural alignment

Check that the implementation follows existing patterns:

# Compare to existing features for patterns
cat httpie/core.py | head -50
cat httpie/cli/definition.py | head -50

Architecture checklist:

  • Retry logic integrates with existing request flow
  • CLI arguments follow existing naming conventions
  • Error handling matches project patterns
  • No duplicate functionality with existing code

Phase 5: Synthesize findings

Combine AI pre-review findings with what you found manually.

Issue categorization

Categorize all findings by severity:

P0 - Blocking (must fix before merge):

  • List security issues, logic errors that break functionality

P1 - High (should fix before merge):

  • List test gaps, significant style violations

P2 - Medium (fix in follow-up):

  • List minor style issues, documentation gaps

P3 - Low (optional improvements):

  • List suggestions, refactoring opportunities

Compare AI vs human findings

FindingFound by AIFound by HumanNotes

Analysis questions:

  • What did AI catch that human might have missed?
  • What did human catch that AI missed?
  • Were there false positives from AI review?
  • Did AI and human agree on severity?

Merge recommendation

Based on your review, make a recommendation:

  • Approve - Ready to merge as-is
  • Approve with comments - Minor issues, can merge after addressing
  • Request changes - Blocking issues require fixes
  • Reject - Fundamental problems, needs rework

Document your rationale:

## Review Summary

**Recommendation:** [Your choice]

**Rationale:**
[Explain why, referencing specific findings]

**Required changes before merge:**
1. [List blocking issues]

**Suggested improvements:**
1. [List non-blocking suggestions]

Phase 6: Fix and re-review

If you found blocking issues, practice the fix-and-review cycle.

Request fixes from the agent

In Claude Code:

Address the following review feedback on the retry feature:

1. [List your P0/P1 findings]

Make the necessary changes and ensure tests pass.

Verify fixes

After the agent makes changes:

# Run tests
pytest tests/ -x -q --tb=short -k retry

# Re-run linting
ruff check httpie/ --select=E,W,F 2>/dev/null | head -10

# Commit fixes
git add .
git commit -m "Address review feedback on retry feature"

Second review pass

Conduct a focused re-review:

Review the latest changes addressing the review feedback.
Verify that the issues were fixed correctly without introducing new problems.

Document whether issues were resolved:

Original IssueStatusNotes
Fixed / Partially fixed / Not fixed

Debrief

Workflow effectiveness

QuestionYour Answer
How long did AI pre-review take?
How long did human review take?
What percentage of issues did AI find?
Were there false positives from AI?
Did the CLAUDE.md guidelines help?

AI-generated code patterns

Which AI-specific patterns did you observe in this PR?

  • Package hallucinations
  • Missing error handling
  • Over-mocked tests
  • Excessive code duplication
  • Debugging artifacts
  • Inconsistent style with existing code
  • Missing edge case handling
  • Other: ____________

Review process improvements

  • What would you change about the CLAUDE.md guidelines?
  • Which automated checks should run before human review?
  • How would you configure CI to catch these issues automatically?

Success criteria

  • CLAUDE.md and AGENTS.md created with project-specific guidelines
  • AI-generated feature created (retry logic)
  • AI pre-review completed with documented findings
  • Automated checks run (mypy, ruff, pytest)
  • Human review completed using AI-specific checklist
  • Findings categorized by severity (P0-P3)
  • AI vs human findings compared
  • Merge recommendation documented with rationale
  • Fix cycle completed (if issues found)
  • Debrief section completed

Variations

Variation A: Different reviewer model

Use a different AI model for review than was used for generation. If Claude Code generated the feature, use Codex for review (or vice versa). Compare findings between models. Do they catch different things?

Variation B: Security-focused review

Generate a feature with intentional security implications:

Add a feature to save request history to a local SQLite database.
Include authentication headers in the saved history for replay.

Conduct a security-focused review. Can you catch the credential storage issue? How well does AI spot security problems?

Variation C: Test-first review

Before reviewing the implementation, review only the tests:

Review only the test files for the retry feature.
Do these tests adequately validate the expected behavior?
What edge cases are missing?

Then review the implementation. Does reviewing tests first change what you notice?

Variation D: Tiered automation

Configure a CI pipeline that:

  1. Runs linting and type checking
  2. Runs AI pre-review with Claude Code Action
  3. Blocks merge until human approval

Use the GitHub Actions examples from section 8.10. Test the pipeline with another generated PR.

Variation E: Your own codebase

Run this exercise on a project you maintain:

  1. Create CLAUDE.md with your project's guidelines
  2. Generate a feature with an AI agent
  3. Conduct the full review workflow
  4. Compare to how you usually review code

What this exercise teaches

AI pre-review handles the mechanical parts of code review. Syntax errors, style violations, obvious bugs AI catches these quickly. The 1.7x issue rate in AI-generated code means there's more to catch, but AI review scales.

Human review is still necessary for context. Does this implementation align with business requirements? Does it fit the project's architecture? Will it create maintenance burden? AI cannot answer these.

The workflow is layered, not sequential. AI review and human review happen in parallel, catching different categories of issues. Neither replaces the other.

Configuration matters more than you'd expect. The CLAUDE.md and AGENTS.md files shape AI review quality. Generic review prompts produce generic findings. Project-specific guidelines teach the AI what matters in your codebase.

Circular validation is real. Using the same AI to write and review code creates false confidence. Separate the agents, or use different models, to get independent review.

AI-generated code has different failure modes than human code. Package hallucinations, over-mocking, missing edge cases these require explicit review attention. The checklist from section 8.4 targets these patterns.

This exercise is one data point. After reviewing multiple AI-generated PRs, you develop a sense for which issues AI catches reliably and which slip through. That's the judgment this module builds.

On this page