Automated Consistency Checking
Pattern enforcement at scale
Page 5 covered using agents to pre-review code for bugs and CLAUDE.md compliance. This page addresses a different problem: enforcing consistency across an entire codebase.
When teams adopt agentic development, code volume increases. GitClear's analysis of 211 million lines found substantial growth with AI assistance. That volume creates drift naming conventions diverge, API patterns fragment, logging structures vary, test designs lose coherence. A team of ten developers cannot manually verify that every pull request follows the same patterns across hundreds of files. Automated checking fills the gap.
The layered enforcement model
Consistency checking works best when you combine multiple mechanisms:
| Layer | Mechanism | What it catches |
|---|---|---|
| 1 | Linters and formatters | Syntax, whitespace, import order, basic style |
| 2 | Static analysis | Type errors, unreachable code, security patterns |
| 3 | Agent configuration | Project conventions, architectural rules, team standards |
| 4 | Agent review | Semantic consistency, pattern adherence, cross-file alignment |
Never send an agent to do a linter's job. Linters run in milliseconds, produce deterministic output, and integrate with every CI system. Agents are expensive, slow, and probabilistic. Use agents for what linters cannot check: semantic patterns, architectural alignment, and context-dependent rules.
Style guide compliance through configuration
CLAUDE.md and AGENTS.md files transform style guides from documents developers read into rules agents follow.
Claude Code configuration
Claude Code reads CLAUDE.md files hierarchically:
- Global (
~/.claude/CLAUDE.md) personal preferences - Project root (
CLAUDE.md) team standards - Subdirectory (
src/api/CLAUDE.md) component-specific rules
Each level can add or override rules. A project-level file might specify:
## Style requirements
- Use named exports, not default exports
- Prefer async/await over .then() chains
- Log errors with structured fields: { error, context, userId }
- API handlers return { data, error } shape, never throwThe agent reads these rules at session start. Generated code follows them. Review agents check against them.
For enforcement beyond suggestions, use hooks. Claude Code hooks execute shell commands at specific lifecycle points:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Write|Edit",
"hooks": [
{
"type": "command",
"command": "eslint --fix $CLAUDE_FILE_PATHS && prettier --write $CLAUDE_FILE_PATHS"
}
]
}
]
}
}This runs ESLint and Prettier after every file modification. The agent cannot commit code that fails linting. Deterministic enforcement happens automatically.
Codex configuration
Codex reads AGENTS.md files using the same hierarchical pattern:
- Global (
~/.codex/AGENTS.md) user defaults - Project root (
AGENTS.md) repository standards - Nested directories (
src/services/AGENTS.md) service-specific rules
AGENTS.md files closer to the current working directory take precedence. This allows different rules for different parts of a monorepo.
## Code style
- Run `pnpm lint` before committing
- Use Zod for runtime validation, not manual checks
- Service functions return Result<T, Error> types
- Tests use @testing-library patterns, not enzymeCodex checks these files before executing any task.
Review requests via @codex review apply these guidelines when categorizing issues.
API versioning and contract checks
API consistency is harder than code consistency. Schema changes break clients. Endpoint modifications ripple through dependent services. Version mismatches cause production incidents.
Static API comparison
Tools like oasdiff compare OpenAPI specifications to detect breaking changes:
oasdiff breaking api/v1.yaml api/v2.yamlThis identifies removed endpoints, changed request parameters, modified response shapes, and deprecated fields still in use. Integrate this into CI to fail builds when incompatible changes appear.
Agent-assisted API review
Agents add semantic analysis that static tools miss. Include API versioning rules in configuration:
## API guidelines
- All breaking changes require major version bump
- Deprecated fields must have removal timeline in description
- New required fields need migration documentation
- Response shape changes require client notificationWhen reviewing API changes, agents check whether the change is additive or breaking, whether deprecation notices exist for removed functionality, and whether version numbers reflect the change magnitude.
Agents cannot replace contract testing. They supplement it by catching issues that slip through automated checks.
Cross-file consistency checking
Single-file linting misses consistency issues that span multiple files. A component might follow different patterns than its neighbors. A utility function might duplicate logic that exists elsewhere. Naming conventions might drift between modules.
Agent-based cross-file analysis
Modern agent review tools examine relationships between files:
- Claude Code: The
/code-reviewcommand's Opus agents analyze the full diff context, not just individual files - GitHub Copilot: Tool-calling agents can examine related files, check how similar patterns are implemented elsewhere, and verify consistency with project conventions
- Qodo: Specialized agents analyze multi-repo context and understand dependencies between components
Cross-file checks to configure:
## Cross-file rules
- Components in /components follow the same prop interface patterns
- API handlers use shared error response utilities, not local implementations
- Test files mirror the structure of the files they test
- Database models use consistent field naming (camelCase in code, snake_case in DB)The limits of cross-file analysis
Agents analyze what they can see. Context windows limit how much code fits in a single review. A 200,000-token context window holds substantial code, but not an entire enterprise monorepo.
For large codebases:
- Focus agent review on changed files and their immediate dependencies
- Use traditional static analysis for repository-wide checks
- Reserve agent analysis for semantic patterns that static tools miss
The confirmation bias problem revisited
Page 5 established that AI reviewing its own output creates confirmation bias. This gets worse in consistency checking.
When the same agent generates code following patterns it inferred from context, then reviews that code against those same inferred patterns, it finds no inconsistencies. Both sides share the same biases. The agent becomes anchored to its initial interpretation of the codebase.
What the research shows
The evidence is consistent: LLMs struggle to detect and correct their own errors without external feedback. MIT Press research found that LLMs "cannot self-correct or even self-detect their own mistakes" in isolation. The CRITIC framework demonstrates that self-correction requires external tools and verification.
Breaking the confirmation loop
Separate generation from review architecturally.
Claude Code's /code-review command uses different agents than those that generated the code.
The review agents have no access to the generation prompt.
They see only the diff.
Use different models for different purposes. Generate code with one model, review with another. Each model has different training data and different blind spots. Using Claude for generation and a security-focused model for audit reduces shared biases.
Provide explicit patterns, not inferred ones. Configuration files (CLAUDE.md, AGENTS.md) supply explicit rules. Agents checking against documented patterns are more reliable than agents inferring patterns from existing code.
Combine with deterministic verification. Linters, type checkers, and test suites provide ground truth. Agents cannot argue with a failing test. External verification breaks the confirmation loop.
What agent review cannot catch
Knowing the limits prevents over-reliance.
Context blindness
Agents see the diff and surrounding code. They do not understand why the code was written this way, what alternatives were considered and rejected, what constraints apply from external systems, or what business rules govern this domain.
Business logic errors that violate unstated assumptions slip through.
Semantic drift
Agents optimize for local consistency. Code that matches patterns in nearby files passes review. But if the nearby files have already drifted from project standards, the agent reinforces the drift rather than catching it.
Regular human review of architectural patterns catches drift that agents miss.
Novel patterns
Agents recognize patterns from training data. Proprietary frameworks, internal conventions, and domain-specific patterns may not match anything in training. The agent might flag correct code as inconsistent or miss actual inconsistencies in unfamiliar patterns.
Document novel patterns explicitly in configuration files. The agent needs written rules to enforce standards it has never seen.
Building a consistency checking workflow
Combine layers for comprehensive coverage:
Layer 1: Pre-commit hooks. Run linters and formatters on every commit. Fail fast on deterministic issues. No human or agent review needed for formatting violations.
Layer 2: CI static analysis. Type checking, security scanning, and dependency auditing run in CI. Block merges for violations. These checks are fast, deterministic, and comprehensive.
Layer 3: Agent pre-review. Before human review, run agent review for configuration compliance (CLAUDE.md, AGENTS.md rules), pattern consistency across changed files, and API contract compatibility. Authors address findings before requesting human review.
Layer 4: Human architecture review. Human reviewers focus on whether patterns should change (not just whether they match), cross-cutting concerns that span beyond the diff, business logic correctness, and long-term maintainability.
Layer 5: Periodic consistency audits. Schedule regular reviews of pattern drift across the codebase, configuration file accuracy, and whether documented rules match actual practice.
Configuration file maintenance
Consistency checking is only as good as the rules it enforces.
Update rules when patterns change. When a team decides to adopt a new pattern, update CLAUDE.md and AGENTS.md immediately. Stale rules cause false positives (flagging correct new patterns) and false negatives (missing violations of outdated rules).
Document the why, not just the what. Rules with explanations help agents generalize correctly:
## Error handling
- Wrap external API calls in try/catch with specific error types
- Why: Generic errors lose context needed for debugging
- Why: Specific types enable targeted retry logicReview rules periodically. As codebases evolve, some rules become obsolete. Others need refinement based on false positive patterns. Treat configuration files as living documents, not write-once artifacts.
The role of human judgment
Automated consistency checking accelerates enforcement. It does not replace the judgment needed to decide what consistency means.
Agents enforce rules. Humans decide which rules matter. Agents flag deviations. Humans decide whether deviations are violations or intentional variations.
The goal is not perfect consistency it is intentional consistency, where deviations exist because someone decided they should, not because no one noticed.