Focus Areas for AI-Generated Code

Where human attention matters most

Page 1 established that AI-generated code produces 1.7 times more issues than human-written code. The question is: which issues? Not all defects carry equal weight. A formatting inconsistency costs minutes. A security vulnerability costs careers.

Human review time is finite. Spending that time on the categories where AI fails most yields better results than spreading attention evenly. This page examines four focus areas: logic and correctness, security, context alignment, and architectural judgment.

Logic and correctness errors

Logic errors in AI-generated code follow predictable patterns. The CodeRabbit December 2025 study found logic and correctness issues appeared 75% more frequently in AI-generated pull requests than in human-written ones. Algorithm and business logic errors appeared more than twice as often.

These are not syntax errors. The code compiles. The AI-generated tests pass. The problem surfaces in production when edge cases arrive.

The happy path trap

AI models optimize for code that appears correct. Training data contains vastly more working code than failing code. The result: AI generates code that handles expected inputs well and unexpected inputs poorly.

Consider error handling. CodeRabbit found AI-generated code contains nearly twice as many error handling gaps as human-written code. The patterns are consistent:

Missing null checks before dereferencing
Early returns that bypass necessary cleanup
Exception handlers that log but do not propagate
Fallback values that mask failures instead of reporting them

A function that processes user data might validate the expected fields but silently accept malformed input that crashes downstream systems. The code works in development with clean test data. It fails in production with messy real-world input.

Computational waste

Performance issues appeared approximately 8 times more frequently in AI-generated code, according to the CodeRabbit analysis. Excessive I/O operations were the most common pattern.

AI generates code that works but does not consider cost. Database queries inside loops. Network calls that could be batched. In-memory operations that process entire datasets when streaming would suffice.

These patterns pass functional tests. They fail load tests. Reviewers must trace data flow and ask: what happens when this runs 10,000 times? What happens with 10 gigabytes of input?

Sequencing and dependency errors

Issues tied to missing dependencies and incorrect operation sequencing showed close to 2 times the rate in AI-generated code. Concurrency control mistakes followed the same pattern.

AI lacks understanding of temporal relationships between operations. It may generate code that reads a value before it exists. It may create race conditions by accessing shared state without synchronization. It may assume ordering guarantees that the underlying system does not provide.

These bugs are expensive because they manifest intermittently. The code works in deterministic test environments and fails under production timing variability.

Security validation

Veracode's 2025 GenAI Code Security Report analyzed AI-generated code across multiple languages and found 45% failed basic security tests. Java performed worst at 72% failure. JavaScript followed at 43%. Python, often considered safer due to higher-level abstractions, still failed at 38%.

The Cloud Security Alliance's 2025 analysis found that 48% of AI-generated code contains security vulnerabilities, and 62% contains design flaws or references to known vulnerable patterns.

These are not obscure attack vectors. They are OWASP Top 10 vulnerabilities that security training has addressed for decades. AI reintroduces them because training data contains both secure and insecure patterns, and the model cannot distinguish between them.

Input validation gaps

Missing input validation is the most common security flaw in AI-generated code. The AI produces functional code that assumes well-formed input. Production receives malicious input.

The gap is predictable. AI generates code that processes data. It does not generate code that validates data unless explicitly prompted. Reviewers must verify every point where external data enters the system:

HTTP request parameters and headers
Database query results from untrusted sources
File contents from user uploads
API responses from third-party services
Environment variables that users might control

Each entry point requires validation appropriate to its context. AI rarely provides this automatically.

Specific vulnerability rates

CodeRabbit's analysis quantified specific vulnerability rates comparing AI and human code:

Vulnerability Type	AI vs Human Rate
Cross-site scripting (XSS)	2.74x more likely
Insecure direct object references	1.91x more likely
Improper password handling	1.88x more likely
Insecure deserialization	1.82x more likely

Cross-site scripting leads the list. AI-generated code that renders user input often fails to escape output properly. The code displays data. The attacker executes scripts.

Password handling failures include storing passwords in reversible encryption, comparing passwords without constant-time comparison, and logging authentication attempts with credentials included.

The 86% XSS failure rate

Veracode's deeper analysis found that 86% of AI-generated code fails XSS prevention tests. Similarly, 88% fails log injection prevention tests.

These statistics should inform review priorities. Any AI-generated code that renders user content to browsers or writes user-influenced data to logs requires explicit security review. Assume the AI got it wrong. Verify otherwise.

Context alignment with team conventions

AI agents enter every session without knowledge of team conventions. Page 1 in Module 2 described this as the "new hire without onboarding" problem. The implications for code review are direct.

Qodo's 2025 State of AI Code Quality survey found that only 8% of code review suggestions focus on alignment with company best practices. Yet alignment failures create technical debt that accumulates across the codebase.

When AI ignores local conventions

AI-generated code follows patterns from training data. Those patterns may conflict with established project conventions.

Naming conventions are a visible example. A project using camelCase for variables receives snake_case from an AI trained on Python-dominant data. The code works. The inconsistency confuses maintainers.

Architectural conventions are less visible but more costly. A project that routes database access through a repository layer receives AI-generated code that queries the database directly. The code works. The architectural boundary erodes.

Error handling conventions vary by project. Some teams throw exceptions. Some teams return error objects. Some teams use Result types. AI generates whatever pattern it saw most often in training, regardless of what the current project uses.

The 65% context gap

Qodo's survey found that 65% of developers report AI missing relevant context during refactoring tasks. The percentage rises to 66% for developers who cite "almost right" solutions as their top frustration.

"Almost right" code is expensive code. It requires reviewer time to identify what is wrong. It requires developer time to fix. It often ships because the review missed the subtle misalignment.

What context alignment review looks like

Alignment review answers specific questions:

Does this code follow the naming conventions in the style guide?
Does this code use the patterns established elsewhere in this module?
Does this code introduce dependencies the team has decided against?
Does this code implement error handling the way other code in this system does?
Does this code respect the boundaries between components?

These questions cannot be automated reliably. They require human understanding of what the team has decided, explicitly or implicitly. This is review work that AI cannot do for itself.

Architecture and business logic gaps

AI has no understanding of system architecture. It generates code that accomplishes local tasks without considering global constraints.

The Ox Security 2025 analysis characterized AI-generated code as "highly functional but systematically lacking in architectural judgment." The study identified patterns appearing in 80-90% of AI-generated codebases:

Over-specification of implementation details
Avoidance of refactoring that would improve structure
Repetition of bugs across multiple locations (bugs deja-vu)
Comments that explain what the code does rather than why

These patterns reflect AI optimization for local correctness. The code accomplishes its immediate purpose. It does not fit into a larger design.

Business rules AI cannot know

Business logic encodes decisions that cannot be derived from code patterns. Which users can access which data? What happens when a transaction fails partially? Which error messages are safe to show customers?

AI generates plausible answers. Plausible answers are often wrong answers.

A function processing payments might implement the literal request (transfer funds) without implementing the actual requirement (transfer funds only if compliance checks pass, only during permitted hours, only up to configured limits).

These requirements exist in documentation the AI did not read, in conversations the AI did not hear, in organizational knowledge that predates the codebase. Reviewers carry this context. AI does not.

Architectural decisions AI undermines

Teams make architectural decisions for reasons that persist beyond the immediate code. Abstraction boundaries exist to enable future changes. Interface contracts exist to enable independent deployment. Layered architectures exist to isolate concerns.

AI-generated code often violates these decisions because violations work. Reaching through an abstraction layer to access data directly is faster than going through the proper channel. The code passes tests. The architecture erodes.

Reviewers must verify that AI-generated code respects boundaries:

Does this code bypass abstractions that exist for a reason?
Does this code couple components that should remain independent?
Does this code assume implementation details that might change?
Does this code create dependencies that make future changes harder?

These questions require understanding not just what the code does but why the system is structured as it is.

The human review burden

AI-generated code shifts the review burden toward categories that require human judgment. Formatting and syntax, which humans did inconsistently, AI does well. Logic, security, alignment, and architecture, which require contextual understanding, AI does poorly.

The practical implication: human review of AI-generated code takes more time per issue, not less. The issues that remain after automated checks are harder issues. They require understanding intent, verifying security properties, checking alignment with unwritten conventions, and validating architectural fitness.

Page 3 examines specific red flags that signal AI-generated code has failed in these categories.

On this page