Applied Intelligence
Module 5: Output Validation and Iteration

Common Quality Issues in AI-Generated Code

What the multiplier hides

The previous page established the 1.7x issue multiplier: AI-assisted PRs contain 10.83 issues on average compared to 6.45 for human code. That aggregate number hides important detail. Different issue types appear at different rates, and some hurt more than others.

The CodeRabbit analysis broke down issues by category. Logic and correctness problems lead at 1.75x the human rate. Maintainability follows at 1.64x. Security at 1.57x. Performance at 1.42x.

Within each category, specific problems dominate. These aren't random bugs. They're predictable consequences of how language models generate code and once you know what to look for, they're hard to miss.

Logic errors: when the code looks right

AI-generated code has a quality researchers call "surface-level correctness." The code looks right. It compiles. It handles the obvious case. Then it skips control-flow protections, misordered dependencies, and misuses concurrency primitives in ways that won't show up until production.

The 1.75x logic error rate makes sense when you think about it. Language models optimize for plausible code, not provably correct code. They pattern-match against training data rather than reasoning about invariants. The result is code that passes the eye test but fails the edge test.

Conditionals that forget branches

Conditional errors appear most frequently. AI generates code that handles some branches correctly while others fail silently or crash.

# AI-generated code with conditional error
def process_user(user_data):
    if user_data.get('status') == 'active':
        return handle_active_user(user_data)
    elif user_data.get('status') == 'pending':
        return handle_pending_user(user_data)
    # Missing: what happens for 'inactive', 'suspended', or unknown status?
    # Falls through and returns None implicitly

AI handles the cases present in the prompt or obvious from context. Cases that require domain knowledge like knowing your system has five user statuses, not two get omitted.

Operations in the wrong order

Incorrect ordering and faulty dependency flow appear far more frequently in AI code. Operations execute in an order that works for simple cases but fails when state matters.

// AI-generated code with ordering issue
async function initializeApp() {
    loadConfig();           // Synchronous start, async finish
    setupDatabase();        // Depends on config being loaded
    startServer();          // Depends on database being ready
    // All three may run concurrently, causing race conditions
}

The fix requires understanding that configuration must complete before database setup, and database must complete before server start. AI generates code that looks sequential but isn't.

Business logic from imagination

Logic errors often stem from AI's inability to understand business rules that experienced engineers internalize. A 2025 Monash/Otago University study found GPT-4 produces more complex code requiring additional rework because it lacks organizational context.

// AI-generated code with business logic error
function calculateDiscount(order: Order): number {
    if (order.total > 100) {
        return order.total * 0.1;  // 10% discount
    }
    return 0;
}
// Missing: loyalty tier multipliers, promotion codes,
// maximum discount caps, excluded product categories

The code implements a discount. It doesn't implement your discount system. The agent guessed at business rules because you didn't provide them.

Edge cases: the happy path trap

AI excels at happy-path scenarios. It consistently overlooks edge cases including null handling, boundary conditions, error states, and unusual input patterns.

Training data overrepresents common scenarios. Edge cases empty arrays, null values, maximum integers, Unicode characters appear infrequently in training examples. The model learns the common pattern but not the defensive code that protects it.

Missing null checks

AI-generated code often omits null checks, early returns, and guardrails. These omissions tie directly to real-world outages.

// AI-generated code missing null guards
public String formatUserName(User user) {
    return user.getFirstName().trim() + " " +
           user.getLastName().trim().toUpperCase();
}
// Fails: user is null, firstName is null, lastName is null
// Each possible null requires separate handling

The defensive version requires three to five additional lines. AI generates the core logic; you add the protection.

Boundary failures

Boundary conditions fail predictably: off-by-one errors, integer overflow, empty collections, maximum values.

# AI-generated code with boundary issues
def get_page(items, page_number, page_size=10):
    start = page_number * page_size
    return items[start:start + page_size]
# Fails: page_number = -1 (returns wrong slice)
# Fails: page_number > len(items) / page_size (returns empty, no error)
# Fails: page_size = 0 (infinite loop potential in callers)

Every parameter needs validation. AI generates the algorithm; you add bounds checking.

Error handling that isn't

Error handling based on common training patterns handles expected errors but not edge cases that break production. Crashes on null values and silent failures are the norm.

// AI-generated API endpoint
app.post('/api/users', async (req, res) => {
    const user = await db.createUser(req.body);
    res.json({ success: true, user });
});
// Missing: input validation, authorization check,
// database error handling, rate limiting, audit logging

The prompt asked for a user creation endpoint. The response delivers exactly that and nothing more. Everything you didn't explicitly request is absent.

Readability: code that doesn't communicate

Readability issues occur more than 3x as often in AI-assisted PRs. Formatting problems appear at nearly triple the human rate. The result is code that functions but doesn't communicate.

Names that mean nothing

Naming inconsistencies show close to a 2x increase. Unclear naming, mismatched terminology, and generic identifiers increase cognitive load.

# AI-generated code with naming issues
def proc(d):
    r = []
    for i in d:
        if i['t'] == 'a':
            r.append(handle_type_a(i))
        elif i['t'] == 'b':
            r.append(handle_type_b(i))
    return r

The code works. Understanding it requires reading every line. Six months later, maintenance becomes archaeology.

AI doesn't adhere to repository idioms. Naming patterns, architectural norms, and formatting conventions drift toward generic defaults. Even teams with formatters and linters see elevated noise: spacing, indentation, structural inconsistencies, and style drift that automation can't catch.

Duplication everywhere

GitClear's analysis of 211 million lines found code cloning increased 4x during AI adoption. Copy-pasted code exceeded moved code for the first time in history. Duplicate code blocks of 5+ lines increased 8x during 2024.

This makes sense: AI generates solutions for each prompt independently. It doesn't know you asked the same question yesterday in a different file. Refactoring the practice of consolidating duplicated logic dropped from 24.1% of changes in 2020 to just 9.5% in 2024.

AI excels at adding code. It rarely removes or consolidates. The result is codebases that grow in volume without growing in capability.

Security: optimized for functionality, not defense

The 45% security vulnerability rate reflects a fundamental gap: AI optimizes for functionality, not defense. Security failures aren't occasional mistakes. They're systematic.

Hardcoded credentials

Repositories with Copilot active show 40% higher incidence of secret leaks compared to average public repositories. AI often suggests patterns found in public codebases, including hardcoding API keys directly into source files.

# AI-generated code with hardcoded credentials
import requests

def fetch_weather(city):
    api_key = "sk_live_abc123xyz789"  # Leaked in training data
    response = requests.get(
        f"https://api.weather.com/v1/{city}",
        headers={"Authorization": f"Bearer {api_key}"}
    )
    return response.json()

The model doesn't understand "secrets." It pattern-matches against examples where keys were embedded. Research found 11,908 live secrets in Common Crawl training data. One WalkScore API key appeared 57,029 times across training sources.

The defensive pattern environment variables, secrets managers, configuration injection must come from you.

Input validation failures

API endpoints generated by AI accept input without validating, sanitizing, or authorizing. The prompt didn't specify security requirements, so the model didn't add them.

XSS prevention fails in 86% of relevant AI code samples. Log injection prevention fails in 88%. These aren't obscure vulnerabilities. They're OWASP Top 10 basics that AI consistently misses.

Performance: works at demo scale

Excessive I/O operations occur nearly 8x more often in AI-assisted PRs. AI optimizes for "working" solutions, not performance under load.

N+1 queries

AI learns from tutorials that prioritize readability over performance. Without explicit prompting, it doesn't optimize.

# AI-generated code with N+1 query pattern
def get_order_details(order_ids):
    orders = []
    for order_id in order_ids:
        order = db.query(f"SELECT * FROM orders WHERE id = {order_id}")
        customer = db.query(f"SELECT * FROM customers WHERE id = {order.customer_id}")
        orders.append({"order": order, "customer": customer})
    return orders
# 100 orders = 200 database queries
# Should be: 2 queries with IN clauses and a join

The code works for 10 orders. It times out for 10,000.

Resource leaks

AI generates code that acquires resources without releasing them: database connections, file handles, network sockets.

# AI-generated code with resource leak
def process_files(file_paths):
    results = []
    for path in file_paths:
        f = open(path, 'r')
        data = f.read()
        results.append(process(data))
    return results
# Files never closed; handles accumulate until process crashes

The pattern recurs across languages: JDBC connections without close, HTTP clients without shutdown, memory allocated without free.

What to check

These patterns point to a focused review strategy. For each AI-generated change, verify:

Logic completeness: All conditional branches handled explicitly. Dependencies ordered correctly. Business rules match actual requirements.

Edge case coverage: Null and empty inputs handled. Boundary conditions checked. Error states produce meaningful responses.

Security basics: No hardcoded credentials. Inputs validated and sanitized. Authentication and authorization present where needed.

Performance awareness: No database calls inside loops. Resources acquired are released. Operations batch where possible.

Readability standards: Names communicate intent. No unexplained duplication. Style matches repository conventions.

The next page examines security review in depth: the specific vulnerability patterns that dominate AI code and systematic approaches to catching them.

On this page