Automated Consistency Checking

Pattern enforcement at scale

Page 5 covered using agents to pre-review code for bugs and CLAUDE.md compliance. This page addresses a different problem: enforcing consistency across an entire codebase.

When teams adopt agentic development, code volume increases. GitClear's analysis of 211 million lines found substantial growth with AI assistance. That volume creates drift naming conventions diverge, API patterns fragment, logging structures vary, test designs lose coherence. A team of ten developers cannot manually verify that every pull request follows the same patterns across hundreds of files. Automated checking fills the gap.

The layered enforcement model

Consistency checking works best when you combine multiple mechanisms:

Layer	Mechanism	What it catches
1	Linters and formatters	Syntax, whitespace, import order, basic style
2	Static analysis	Type errors, unreachable code, security patterns
3	Agent configuration	Project conventions, architectural rules, team standards
4	Agent review	Semantic consistency, pattern adherence, cross-file alignment

Never send an agent to do a linter's job. Linters run in milliseconds, produce deterministic output, and integrate with every CI system. Agents are expensive, slow, and probabilistic. Use agents for what linters cannot check: semantic patterns, architectural alignment, and context-dependent rules.

Style guide compliance through configuration

CLAUDE.md and AGENTS.md files transform style guides from documents developers read into rules agents follow.

Claude Code configuration

Claude Code reads CLAUDE.md files hierarchically:

Global (~/.claude/CLAUDE.md) personal preferences
Project root (CLAUDE.md) team standards
Subdirectory (src/api/CLAUDE.md) component-specific rules

Each level can add or override rules. A project-level file might specify:

## Style requirements

- Use named exports, not default exports
- Prefer async/await over .then() chains
- Log errors with structured fields: { error, context, userId }
- API handlers return { data, error } shape, never throw

The agent reads these rules at session start. Generated code follows them. Review agents check against them.

For enforcement beyond suggestions, use hooks. Claude Code hooks execute shell commands at specific lifecycle points:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Write|Edit",
        "hooks": [
          {
            "type": "command",
            "command": "eslint --fix $CLAUDE_FILE_PATHS && prettier --write $CLAUDE_FILE_PATHS"
          }
        ]
      }
    ]
  }
}

This runs ESLint and Prettier after every file modification. The agent cannot commit code that fails linting. Deterministic enforcement happens automatically.

Codex configuration

Codex reads AGENTS.md files using the same hierarchical pattern:

Global (~/.codex/AGENTS.md) user defaults
Project root (AGENTS.md) repository standards
Nested directories (src/services/AGENTS.md) service-specific rules

AGENTS.md files closer to the current working directory take precedence. This allows different rules for different parts of a monorepo.

## Code style

- Run `pnpm lint` before committing
- Use Zod for runtime validation, not manual checks
- Service functions return Result<T, Error> types
- Tests use @testing-library patterns, not enzyme

Codex checks these files before executing any task. Review requests via @codex review apply these guidelines when categorizing issues.

API versioning and contract checks

API consistency is harder than code consistency. Schema changes break clients. Endpoint modifications ripple through dependent services. Version mismatches cause production incidents.

Static API comparison

Tools like oasdiff compare OpenAPI specifications to detect breaking changes:

oasdiff breaking api/v1.yaml api/v2.yaml

This identifies removed endpoints, changed request parameters, modified response shapes, and deprecated fields still in use. Integrate this into CI to fail builds when incompatible changes appear.

Agent-assisted API review

Agents add semantic analysis that static tools miss. Include API versioning rules in configuration:

## API guidelines

- All breaking changes require major version bump
- Deprecated fields must have removal timeline in description
- New required fields need migration documentation
- Response shape changes require client notification

When reviewing API changes, agents check whether the change is additive or breaking, whether deprecation notices exist for removed functionality, and whether version numbers reflect the change magnitude.

Agents cannot replace contract testing. They supplement it by catching issues that slip through automated checks.

Cross-file consistency checking

Single-file linting misses consistency issues that span multiple files. A component might follow different patterns than its neighbors. A utility function might duplicate logic that exists elsewhere. Naming conventions might drift between modules.

Agent-based cross-file analysis

Modern agent review tools examine relationships between files:

Claude Code: The /code-review command's Opus agents analyze the full diff context, not just individual files
GitHub Copilot: Tool-calling agents can examine related files, check how similar patterns are implemented elsewhere, and verify consistency with project conventions
Qodo: Specialized agents analyze multi-repo context and understand dependencies between components

Cross-file checks to configure:

## Cross-file rules

- Components in /components follow the same prop interface patterns
- API handlers use shared error response utilities, not local implementations
- Test files mirror the structure of the files they test
- Database models use consistent field naming (camelCase in code, snake_case in DB)

The limits of cross-file analysis

Agents analyze what they can see. Context windows limit how much code fits in a single review. A 200,000-token context window holds substantial code, but not an entire enterprise monorepo.

For large codebases:

Focus agent review on changed files and their immediate dependencies
Use traditional static analysis for repository-wide checks
Reserve agent analysis for semantic patterns that static tools miss

The confirmation bias problem revisited

Page 5 established that AI reviewing its own output creates confirmation bias. This gets worse in consistency checking.

When the same agent generates code following patterns it inferred from context, then reviews that code against those same inferred patterns, it finds no inconsistencies. Both sides share the same biases. The agent becomes anchored to its initial interpretation of the codebase.

What the research shows

The evidence is consistent: LLMs struggle to detect and correct their own errors without external feedback. MIT Press research found that LLMs "cannot self-correct or even self-detect their own mistakes" in isolation. The CRITIC framework demonstrates that self-correction requires external tools and verification.

Breaking the confirmation loop

Separate generation from review architecturally. Claude Code's /code-review command uses different agents than those that generated the code. The review agents have no access to the generation prompt. They see only the diff.

Use different models for different purposes. Generate code with one model, review with another. Each model has different training data and different blind spots. Using Claude for generation and a security-focused model for audit reduces shared biases.

Provide explicit patterns, not inferred ones. Configuration files (CLAUDE.md, AGENTS.md) supply explicit rules. Agents checking against documented patterns are more reliable than agents inferring patterns from existing code.

Combine with deterministic verification. Linters, type checkers, and test suites provide ground truth. Agents cannot argue with a failing test. External verification breaks the confirmation loop.

What agent review cannot catch

Knowing the limits prevents over-reliance.

Context blindness

Agents see the diff and surrounding code. They do not understand why the code was written this way, what alternatives were considered and rejected, what constraints apply from external systems, or what business rules govern this domain.

Business logic errors that violate unstated assumptions slip through.

Semantic drift

Agents optimize for local consistency. Code that matches patterns in nearby files passes review. But if the nearby files have already drifted from project standards, the agent reinforces the drift rather than catching it.

Regular human review of architectural patterns catches drift that agents miss.

Novel patterns

Agents recognize patterns from training data. Proprietary frameworks, internal conventions, and domain-specific patterns may not match anything in training. The agent might flag correct code as inconsistent or miss actual inconsistencies in unfamiliar patterns.

Document novel patterns explicitly in configuration files. The agent needs written rules to enforce standards it has never seen.

Building a consistency checking workflow

Combine layers for comprehensive coverage:

Layer 1: Pre-commit hooks. Run linters and formatters on every commit. Fail fast on deterministic issues. No human or agent review needed for formatting violations.

Layer 2: CI static analysis. Type checking, security scanning, and dependency auditing run in CI. Block merges for violations. These checks are fast, deterministic, and comprehensive.

Layer 3: Agent pre-review. Before human review, run agent review for configuration compliance (CLAUDE.md, AGENTS.md rules), pattern consistency across changed files, and API contract compatibility. Authors address findings before requesting human review.

Layer 4: Human architecture review. Human reviewers focus on whether patterns should change (not just whether they match), cross-cutting concerns that span beyond the diff, business logic correctness, and long-term maintainability.

Layer 5: Periodic consistency audits. Schedule regular reviews of pattern drift across the codebase, configuration file accuracy, and whether documented rules match actual practice.

Configuration file maintenance

Consistency checking is only as good as the rules it enforces.

Update rules when patterns change. When a team decides to adopt a new pattern, update CLAUDE.md and AGENTS.md immediately. Stale rules cause false positives (flagging correct new patterns) and false negatives (missing violations of outdated rules).

Document the why, not just the what. Rules with explanations help agents generalize correctly:

## Error handling

- Wrap external API calls in try/catch with specific error types
  - Why: Generic errors lose context needed for debugging
  - Why: Specific types enable targeted retry logic

Review rules periodically. As codebases evolve, some rules become obsolete. Others need refinement based on false positive patterns. Treat configuration files as living documents, not write-once artifacts.

The role of human judgment

Automated consistency checking accelerates enforcement. It does not replace the judgment needed to decide what consistency means.

Agents enforce rules. Humans decide which rules matter. Agents flag deviations. Humans decide whether deviations are violations or intentional variations.

The goal is not perfect consistency it is intentional consistency, where deviations exist because someone decided they should, not because no one noticed.

On this page