Exercise: AI-Assisted Code Review Workflow
This exercise walks through the full AI-assisted code review workflow: configuring review guidelines, running AI pre-review, spotting AI-generated code patterns, and combining AI analysis with human judgment.
Overview
HTTPie is a command-line HTTP client. The codebase is Python with good test coverage, and the contribution guidelines require tests for new features useful constraints for practicing review workflows.
The scenario: reviewing a pull request that contains AI-generated code. You configure review guidelines, run AI pre-review, identify red flags, and make a merge decision.
The scenario
A teammate used Claude Code to implement a new feature for HTTPie: automatic retry logic for failed requests. The PR is ready for review. Your task is to apply the AI-assisted code review workflow from this module.
You will:
- Configure review guidelines for the project
- Use AI tools to pre-review the changes
- Identify patterns specific to AI-generated code
- Conduct human review focusing on what AI cannot validate
- Document your findings and make a merge recommendation
Setup
Clone the repository:
git clone https://github.com/httpie/cli.git
cd cliCreate a Python virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e ".[dev]"Verify the test suite runs:
pytest tests/ -x -q --tb=short 2>/dev/null | head -20The full test suite takes several minutes. Running a subset confirms the environment works.
Explore the project structure:
ls -la httpie/
cat CONTRIBUTING.md | head -50Phase 1: Configure review guidelines
Before reviewing, configure AI tools with project-specific guidelines. This gives the AI reviewer context and forces you to think through conventions upfront.
Create CLAUDE.md
Create a CLAUDE.md file in the repository root with review guidelines:
# HTTPie Development Guidelines
## Project Context
HTTPie is a CLI HTTP client focused on usability and intuitive command-line syntax.
All changes must maintain backwards compatibility with existing workflows.
## Code Style
- Follow PEP 8 with 79-character line limit
- Use type hints for function signatures
- Docstrings required for public functions (Google style)
- No print statements in library code (use logging)
## Testing Requirements
- All new features require tests
- Maintain existing test coverage (currently ~90%)
- Tests must run in isolation without network dependencies
- Use pytest fixtures from conftest.py
## Security Considerations
- Never log or display authentication credentials
- Sanitize URLs before logging (remove query parameters with sensitive data)
- Validate all user input before shell operations
- No eval() or exec() with user-provided data
## Review Focus Areas
When reviewing code, prioritize:
1. Security implications (credential handling, input validation)
2. Error handling and edge cases
3. Test coverage for new code paths
4. API compatibility with existing workflows
5. Performance implications for large requests/responses
## Common Patterns
- Use `httpie.output.streams` for output formatting
- Request/response handling flows through `httpie.core`
- CLI argument parsing uses `httpie.cli.definition`
- Authentication plugins follow `httpie.plugins.base` interfaceCreate AGENTS.md for Codex
If using Codex, create an AGENTS.md file:
## Review Guidelines
### Code Quality
- Verify PEP 8 compliance
- Check type hints on all function signatures
- Ensure error messages are user-friendly
### Testing
- Flag PRs without tests as incomplete
- Check that tests actually exercise new code paths
- Watch for over-mocking (tests should validate real behavior)
### Security
- Mark credential exposure as P0
- Flag any use of eval/exec as P0
- Check input validation at CLI boundariesPhase 2: Create a PR to review
For this exercise, you will create a simulated PR with AI-generated code. This lets you practice the review workflow without needing an actual teammate.
Generate the feature with Claude Code
Start a Claude Code session:
claudePrompt the agent to generate the retry feature:
Add automatic retry logic for failed HTTP requests to HTTPie.
Requirements:
- Retry on connection errors and 5xx responses
- Configurable retry count (default 3)
- Exponential backoff between retries
- New CLI flag: --retry and --retry-max
- Add tests for the retry behavior
Create the implementation following existing patterns in this codebase.Let the agent generate the implementation. Do not review the output yet that comes in the next phase.
Commit and create a branch
Once generation completes:
git checkout -b feature/auto-retry
git add .
git commit -m "Add automatic retry logic for failed requests"If the agent produced errors or incomplete code, that is useful for the exercise. Real AI-generated PRs often have issues. Proceed with whatever was generated.
Phase 3: AI pre-review
Run AI tools to pre-review the changes before you look at them. This is the "AI reviews AI" workflow from section 8.5.
Claude Code review
In a new Claude Code session (to avoid the circular validation problem):
claudeRequest a code review:
Review the changes on the feature/auto-retry branch compared to main.
Focus on:
1. Security issues
2. Logic errors and edge cases
3. Test coverage gaps
4. Code style consistency with the existing codebase
5. Any patterns typical of AI-generated code
Provide a structured review with severity ratings for each finding.Document AI findings
Record the AI review findings:
| Finding | Severity | Description | AI Confidence |
|---|---|---|---|
Pay attention to findings where the AI hedges those need extra human scrutiny.
Run automated checks
Complement AI review with deterministic tools:
# Type checking
mypy httpie/ --ignore-missing-imports 2>/dev/null | head -20
# Linting
ruff check httpie/ --select=E,W,F 2>/dev/null | head -20
# Test the new code
pytest tests/ -x -q --tb=short -k retry 2>/dev/nullDocument any failures:
| Check | Result | Issue |
|---|---|---|
| mypy | ||
| ruff | ||
| pytest |
Phase 4: Human review for AI-specific patterns
Now you review, focusing on patterns AI-generated code tends to have. The checklist from section 8.4 guides what to look for.
Baseline verification
Check for immediate red flags:
# Check for hallucinated imports
git diff main --name-only
git diff main -- httpie/ | grep "^+import\|^+from" | head -20
# Check for debugging artifacts
git diff main | grep -E "print\(|console\.log|debugger|TODO|FIXME" | head -10
# Check for hardcoded values
git diff main | grep -E "api_key|secret|password|token" -i | head -10Record findings:
- All imports reference real packages
- No debugging artifacts in production code
- No hardcoded secrets or credentials
- No unused imports or dead code
Security verification
Review security-sensitive areas manually:
# Find the retry implementation
git diff main -- httpie/
# Look for input validation
git diff main | grep -A5 -B5 "retry" | head -50Security checklist:
- Retry count validated (not unbounded)
- Backoff delay has reasonable limits
- No credential exposure in retry logging
- Error messages do not leak sensitive request data
Test quality assessment
Examine the generated tests:
# Find test files for retry
git diff main -- tests/
# Check test coverage
pytest tests/ -x -q --tb=short -k retry --cov=httpie --cov-report=term-missing 2>/dev/null | tail -30Test quality checklist:
- Tests exercise actual retry behavior (not just mocks)
- Edge cases covered (max retries, immediate success, permanent failure)
- No over-mocking (tests validate real httpie behavior)
- Test assertions are meaningful (not just "no exception")
Architectural alignment
Check that the implementation follows existing patterns:
# Compare to existing features for patterns
cat httpie/core.py | head -50
cat httpie/cli/definition.py | head -50Architecture checklist:
- Retry logic integrates with existing request flow
- CLI arguments follow existing naming conventions
- Error handling matches project patterns
- No duplicate functionality with existing code
Phase 5: Synthesize findings
Combine AI pre-review findings with what you found manually.
Issue categorization
Categorize all findings by severity:
P0 - Blocking (must fix before merge):
- List security issues, logic errors that break functionality
P1 - High (should fix before merge):
- List test gaps, significant style violations
P2 - Medium (fix in follow-up):
- List minor style issues, documentation gaps
P3 - Low (optional improvements):
- List suggestions, refactoring opportunities
Compare AI vs human findings
| Finding | Found by AI | Found by Human | Notes |
|---|---|---|---|
Analysis questions:
- What did AI catch that human might have missed?
- What did human catch that AI missed?
- Were there false positives from AI review?
- Did AI and human agree on severity?
Merge recommendation
Based on your review, make a recommendation:
- Approve - Ready to merge as-is
- Approve with comments - Minor issues, can merge after addressing
- Request changes - Blocking issues require fixes
- Reject - Fundamental problems, needs rework
Document your rationale:
## Review Summary
**Recommendation:** [Your choice]
**Rationale:**
[Explain why, referencing specific findings]
**Required changes before merge:**
1. [List blocking issues]
**Suggested improvements:**
1. [List non-blocking suggestions]Phase 6: Fix and re-review
If you found blocking issues, practice the fix-and-review cycle.
Request fixes from the agent
In Claude Code:
Address the following review feedback on the retry feature:
1. [List your P0/P1 findings]
Make the necessary changes and ensure tests pass.Verify fixes
After the agent makes changes:
# Run tests
pytest tests/ -x -q --tb=short -k retry
# Re-run linting
ruff check httpie/ --select=E,W,F 2>/dev/null | head -10
# Commit fixes
git add .
git commit -m "Address review feedback on retry feature"Second review pass
Conduct a focused re-review:
Review the latest changes addressing the review feedback.
Verify that the issues were fixed correctly without introducing new problems.Document whether issues were resolved:
| Original Issue | Status | Notes |
|---|---|---|
| Fixed / Partially fixed / Not fixed | ||
Debrief
Workflow effectiveness
| Question | Your Answer |
|---|---|
| How long did AI pre-review take? | |
| How long did human review take? | |
| What percentage of issues did AI find? | |
| Were there false positives from AI? | |
| Did the CLAUDE.md guidelines help? |
AI-generated code patterns
Which AI-specific patterns did you observe in this PR?
- Package hallucinations
- Missing error handling
- Over-mocked tests
- Excessive code duplication
- Debugging artifacts
- Inconsistent style with existing code
- Missing edge case handling
- Other: ____________
Review process improvements
- What would you change about the CLAUDE.md guidelines?
- Which automated checks should run before human review?
- How would you configure CI to catch these issues automatically?
Success criteria
- CLAUDE.md and AGENTS.md created with project-specific guidelines
- AI-generated feature created (retry logic)
- AI pre-review completed with documented findings
- Automated checks run (mypy, ruff, pytest)
- Human review completed using AI-specific checklist
- Findings categorized by severity (P0-P3)
- AI vs human findings compared
- Merge recommendation documented with rationale
- Fix cycle completed (if issues found)
- Debrief section completed
Variations
Variation A: Different reviewer model
Use a different AI model for review than was used for generation. If Claude Code generated the feature, use Codex for review (or vice versa). Compare findings between models. Do they catch different things?
Variation B: Security-focused review
Generate a feature with intentional security implications:
Add a feature to save request history to a local SQLite database.
Include authentication headers in the saved history for replay.Conduct a security-focused review. Can you catch the credential storage issue? How well does AI spot security problems?
Variation C: Test-first review
Before reviewing the implementation, review only the tests:
Review only the test files for the retry feature.
Do these tests adequately validate the expected behavior?
What edge cases are missing?Then review the implementation. Does reviewing tests first change what you notice?
Variation D: Tiered automation
Configure a CI pipeline that:
- Runs linting and type checking
- Runs AI pre-review with Claude Code Action
- Blocks merge until human approval
Use the GitHub Actions examples from section 8.10. Test the pipeline with another generated PR.
Variation E: Your own codebase
Run this exercise on a project you maintain:
- Create CLAUDE.md with your project's guidelines
- Generate a feature with an AI agent
- Conduct the full review workflow
- Compare to how you usually review code
What this exercise teaches
AI pre-review handles the mechanical parts of code review. Syntax errors, style violations, obvious bugs AI catches these quickly. The 1.7x issue rate in AI-generated code means there's more to catch, but AI review scales.
Human review is still necessary for context. Does this implementation align with business requirements? Does it fit the project's architecture? Will it create maintenance burden? AI cannot answer these.
The workflow is layered, not sequential. AI review and human review happen in parallel, catching different categories of issues. Neither replaces the other.
Configuration matters more than you'd expect. The CLAUDE.md and AGENTS.md files shape AI review quality. Generic review prompts produce generic findings. Project-specific guidelines teach the AI what matters in your codebase.
Circular validation is real. Using the same AI to write and review code creates false confidence. Separate the agents, or use different models, to get independent review.
AI-generated code has different failure modes than human code. Package hallucinations, over-mocking, missing edge cases these require explicit review attention. The checklist from section 8.4 targets these patterns.
This exercise is one data point. After reviewing multiple AI-generated PRs, you develop a sense for which issues AI catches reliably and which slip through. That's the judgment this module builds.