What Changes When AI Wrote the Code
The review paradox
Code review exists to catch problems before they reach production. When AI generates code, the nature of those problems shifts. So must the review approach.
Here is the uncomfortable reality: CodeRabbit's December 2025 analysis of 470 open-source GitHub pull requests found AI-generated code contained an average of 10.83 issues per PR compared to 6.45 for human-written code. That is 1.7 times more issues. The distribution matters more than the raw count.
Traditional code review evolved to catch human mistakes: unclear naming, inconsistent formatting, logic errors born from fatigue or unfamiliarity. AI makes different mistakes. It produces syntactically correct code that compiles and often looks reasonable. But it hallucinates package names, misunderstands business context, introduces security vulnerabilities at rates 1.5 to 2.7 times higher than humans, and generates code that violates conventions the AI cannot see.
Module 7 established that all AI-generated code requires human review before merging. This module examines what that review should actually look like.
What AI changes about code review
The style nitpick becomes obsolete
For decades, code review has included low-level feedback: variable naming, indentation, bracket placement, comment formatting. These nitpicks served a purpose when humans introduced them inconsistently. AI does not make these mistakes in the same way.
AI-generated code typically passes linting and formatting checks. It adheres to syntactic patterns it observed in training data. When style issues appear, they are inconsistencies with project-specific conventions the AI was not told about. That is a different class of problem than a human forgetting to run the formatter.
This shifts the reviewer's role. Time spent confirming brace placement is time not spent examining whether the authentication logic correctly handles edge cases.
Wire linting and formatting into CI so these checks never consume human review capacity. Tools like ESLint, Pylint, Prettier, and Black enforce consistency automatically. Reviewers who receive PRs where these checks have already passed can focus elsewhere.
Logic errors demand more attention
The CodeRabbit analysis found AI-generated code produces 75% more logic and correctness issues than human-written code. These are not syntax errors. The code compiles. The tests it generates often pass, because the tests were written by the same AI that wrote the code.
Logic errors in AI-generated code take characteristic forms.
Misunderstood requirements. The AI implements what was literally requested, not what was actually needed. A prompt asking for "user validation" might produce email format checking when the requirement was role-based access control.
Flawed control flow. Conditional logic that handles the happy path but fails on edge cases. Loops that iterate one too many or one too few times. Early returns that bypass necessary cleanup.
Silent failures. Error conditions caught but not properly handled. Exceptions logged but not surfaced. Fallback paths that mask problems rather than reporting them.
Type coercion surprises. In dynamically typed languages, AI-generated code may perform implicit conversions that work for test data but fail in production.
Reviewers must read AI-generated logic as if they wrote none of it, because they did not. The code carries no implicit trust from shared context or known authorship.
Security requires explicit verification
AI-generated code fails security testing at alarming rates. Veracode's 2025 analysis found 45% of AI-generated code contains security flaws. The rate varies by language: 72% for Java, 43% for JavaScript, 38% for Python. No language is safe.
The specific vulnerabilities are predictable:
| Vulnerability Type | AI vs Human Rate |
|---|---|
| Cross-site scripting (XSS) | 2.74x higher |
| Insecure direct object references | 1.91x higher |
| Improper password handling | 1.88x higher |
| Injection vulnerabilities | 1.5x higher |
These are not exotic attacks. They are OWASP Top 10 vulnerabilities that AI models reproduce from training data patterns that were themselves insecure.
Security review for AI-generated code cannot rely on the assumption that a competent developer avoided obvious mistakes. The AI has no security intuition. It optimizes for code that looks right, not code that is right.
Business context cannot be inferred
AI generates code without understanding the business domain. It knows syntax, patterns, and what similar code looks like. It does not know that this particular endpoint handles payment processing and must never return raw error messages to clients.
Only 8% of code review suggestions focus on alignment with company best practices, according to Qodo's 2025 State of AI Code Quality report. Yet alignment failures are among the most costly. Code that works but violates architectural decisions creates technical debt. Code that works but exposes internal details creates security risk. Code that works but ignores business rules creates production incidents.
Human reviewers bring context that AI cannot: knowledge of past incidents, understanding of regulatory requirements, awareness of upcoming changes that make certain approaches problematic. This context is the review's unique value.
The 1.7x issue rate in context
The CodeRabbit finding that AI PRs contain 1.7 times more issues requires interpretation. Not all issues are equal.
Where AI performs worse:
- Performance issues: approximately 8 times higher in AI code, with excessive I/O operations as the most common pattern
- Readability violations: 3 times higher, particularly violations of local naming conventions
- Formatting inconsistencies: 2.66 times higher
- Logic and correctness: 1.75 times higher
- Maintainability concerns: 1.64 times higher
- Security vulnerabilities: 1.57 times higher on average, with specific vulnerability types much higher
Where AI performs better:
- Spelling errors: 1.76 times more common in human code
- Testability issues: 1.32 times more common in human code
Critical and major severity findings appeared 1.4 to 1.7 times more frequently in AI-generated code. These are the categories that cause production incidents.
The data suggests AI excels at the mechanical aspects of code production (correct spelling, test structure, consistent syntax) while struggling with the contextual aspects: performance optimization, business logic, security awareness.
What reviewers should prioritize
Given limited review time, prioritization matters. The following focus areas reflect where AI is most likely to introduce problems that humans must catch.
Does it actually work? Verify the code accomplishes its stated purpose. This sounds obvious, but AI-generated code often implements something adjacent to the requirement. What was requested? What was delivered? Are the acceptance criteria met, not just approximated? Does the implementation handle the cases users will actually encounter?
Security and data handling. Apply security-focused review to all AI-generated code, regardless of apparent scope. Does this code handle user input? Is it validated and sanitized? Does this code touch authentication or authorization? Are edge cases covered? Does this code store or transmit sensitive data? Is it protected appropriately? Does this code generate output that reaches users? Is it escaped properly?
Architectural alignment. Verify the code fits the system it enters. Does this follow established patterns in the codebase? Does this introduce dependencies that were not discussed? Does this create coupling that will cause problems later? Does this respect boundaries between components?
Performance under real conditions. AI optimizes for code that compiles, not code that performs. What happens at scale? With large inputs? Under load? Are there O(n²) patterns hidden in innocent-looking loops? Does this make network calls or database queries in loops? Does this load large objects into memory unnecessarily?
Style and formatting. If these issues remain after automated checks, address them. But they should rarely reach human review.
The trust calibration problem
Surveys reveal a gap between AI tool usage and trust in AI output. According to the Stack Overflow 2025 Developer Survey, 84% of developers use AI tools. Yet 46% actively distrust the accuracy of AI output, and 96% report difficulty trusting that AI-generated code is functionally correct.
The calibration problem is real. Only 48% of developers report always checking AI-generated code before committing. The remaining 52% sometimes commit code they have not fully verified.
This gap between widespread usage and incomplete verification makes code review the critical safety net. When developers do not fully verify their own AI-generated code, reviewers must.
Throughput and capacity
Organizations now face a new constraint: review capacity. Amazon CTO Werner Vogels observed that while AI accelerates code generation, "you will review more code because understanding it takes time."
The math is straightforward. If developers produce code 30% faster with AI assistance, but review takes the same time per line, the review queue grows. If AI generates code with 1.7 times more issues per PR, each review takes longer.
Teams shipping 3 times faster (a commonly reported figure) without expanding review capacity create a bottleneck. Code waits in review queues. Reviewers face pressure to approve quickly. Quality suffers.
The solution is not faster rubber-stamping. It is restructuring the review process to allocate human attention where it matters most, using automation to handle what automation can handle, and accepting that review capacity may be the limiting factor, not development velocity.
What this module covers
Module 8 builds on Module 7's policy foundations with practical review methodology. The pages that follow cover:
- Specific focus areas and red flags for AI-generated code
- How to use AI tools as review assistants without circular validation
- Test generation strategies that verify AI output
- CI/CD integration patterns that enforce quality gates
- Metrics that reveal whether AI is helping or hurting code quality
The goal is not to slow down AI-assisted development. It is to ensure that velocity gains from AI do not become quality losses in production.
Code review applies human judgment to catch what automated systems miss. When AI writes the code, that judgment carries more weight.