Finding Undocumented Conventions

The knowledge that lives only in code

Every codebase contains knowledge that exists nowhere else. Not in documentation. Not in wikis. Not in anyone's head. The knowledge lives only in the patterns of the code itself.

Roughly 80% of processes in most organizations remain undocumented. For codebases, this means naming conventions, preferred libraries, error handling strategies, and architectural decisions that accumulated over years without ever being written down. When the developers who established these patterns leave, the knowledge remains, but only if someone can read it out of the code.

This is where agents excel. Static analysis tools check for rule violations. Documentation generators extract comments and signatures. Agents recognize patterns: the implicit conventions that govern how code was actually written, regardless of what the style guide says or doesn't say.

What counts as an undocumented convention

Undocumented conventions fall into several categories.

Naming patterns. Variable prefixes that indicate scope or type. Method names that signal behavior: get for synchronous lookups, fetch for async API calls, load for initializing from storage, retrieve for database queries. File naming that encodes purpose: *Service.ts, *Repository.ts, *Controller.ts. These patterns exist consistently across files but appear nowhere in writing.

Library preferences. A project might use lodash for array manipulation but date-fns for dates. Configuration loading might always use dotenv even when other options exist. HTTP clients might be axios in one layer and fetch in another. These choices become conventions through repetition, not declaration.

Error handling strategies. Some codebases wrap everything in try-catch. Others let exceptions propagate. Some use result types. The strategy often varies by layer: services handle errors differently than controllers. None of this may be documented.

Structural patterns. Where validation logic lives. How services interact with repositories. What goes in utility functions vs. class methods. The architectural decisions that shaped hundreds of files but never made it into an Architecture Decision Record (ADR).

Workarounds and constraints. Third-party API limitations that required specific approaches. Performance optimizations from production incidents. Compatibility code for legacy integrations. The reasons are forgotten; the patterns remain.

Developers who find themselves explaining "that's just how we do things here" to new team members have identified an undocumented convention. Agents can find these systematically across an entire codebase.

Asking agents about conventions

Direct questions surface implicit patterns. The agent examines multiple files, identifies commonalities, and reports what it finds.

Naming conventions:

What naming conventions does this codebase follow?
Look at: variable names, function names, file names, class names.
Identify any prefixes, suffixes, or casing patterns that appear consistently.
Show examples of each pattern you identify.

The response reveals patterns like is* for boolean functions, *Handler for event processors, use* for React hooks, I* for TypeScript interfaces.

Library usage:

What libraries does this project use for common tasks?
Specifically:
- HTTP requests
- Date/time handling
- Validation
- Logging
- Testing
Are these used consistently, or do different parts of the codebase use different libraries?

Inconsistency here often indicates accumulated technical debt. Different developers made different choices at different times. Knowing this prevents adding yet another variation.

Error handling:

How does this codebase handle errors?
Look at:
- Exception vs. result types
- Where errors are caught vs. propagated
- Error logging patterns
- User-facing error messages
Is there a consistent strategy, or does it vary by module?

The answer shows whether error handling is systematic or ad-hoc. Systematic handling suggests established conventions. Ad-hoc handling suggests an area that needs documentation.

Extracting coding style beyond linters

Linters enforce configured rules. Conventions often exceed what linters check.

Beyond what ESLint/Prettier would catch, what coding style patterns
does this codebase follow?

Consider:
- How are async operations structured?
- How are imports organized within files?
- How are comments used (or not used)?
- What's the typical function length?
- How is state managed in components?

The agent reads actual code patterns, not configured rules. A linter might not care whether you use early returns or nested conditionals. The codebase might have a clear preference anyway.

Code organization:

How are files in src/components/ organized internally?
Is there a standard structure? For example:
- Imports order
- Type definitions location
- Helper functions placement
- Export style
Show me the pattern from 3-4 representative files.

Asking for multiple examples forces the agent to identify true patterns rather than describing one file's idiosyncrasies.

The pattern recognition advantage

Humans learning a codebase absorb conventions through osmosis. Weeks of code review. Months of proximity to senior developers. The knowledge transfers slowly, incompletely.

Agents work differently. They analyze hundreds of files in seconds. They identify what's consistent across the codebase rather than what stood out in the three files a human happened to read.

Roblox's engineering team found that agents learn codebases similarly to new team members, picking up naming conventions, preferred libraries, and documentation style. The difference: agents do this faster, across more files, without fatigue.

This creates a specific advantage: asking "what conventions exist" rather than "does this follow conventions." The first question extracts implicit knowledge. The second assumes you already know what to check.

Agents report what they observe, not what's correct. A convention that appears in 90% of files might still be wrong. Consistency indicates a pattern, not necessarily a good practice. Use judgment about which discovered conventions deserve preservation vs. correction.

Identifying patterns that should be documented

Convention discovery is most valuable when it produces artifacts. Ask agents to generate the documentation that should exist:

Based on your analysis of this codebase, draft a "Coding Conventions"
document that captures the implicit patterns. Include:
- Naming conventions
- File organization
- Error handling approach
- Testing patterns
- Common utilities and when to use them

The draft captures knowledge that existed only in code. Review it, refine it, add it to the project. The implicit becomes explicit.

This approach works well with Claude Code's /init command. Running /init causes Claude to examine the codebase (package files, configuration, code structure) and generate a CLAUDE.md file documenting detected patterns. The generated file becomes a starting point, not a final product. Human review adds context the agent missed.

JetBrains Junie can be asked to "create a guidelines.md file that includes the coding conventions being followed in the current codebase." The agent examines existing code and generates documentation.

When agents misidentify conventions

Pattern recognition has failure modes.

Small sample sizes. Three files isn't a convention. Thirty files might be. If the agent identifies a "pattern" from limited examples, verify it appears broadly.

Inconsistent codebases. When different parts of the codebase follow different conventions, agents report what they happen to see first. Ask specifically: "Are there multiple competing patterns for X?" The answer is often "yes, three different approaches exist."

Historical accidents vs. intentional decisions. A pattern might exist because of one developer's habit, not team consensus. Convention discovery surfaces what exists, not why it exists. The "why" still requires human knowledge.

Outdated patterns. Old code follows old conventions. New code might follow different ones. Ask the agent to compare: "Do files modified in the last year follow the same patterns as older files?" Evolution becomes visible.

Validating discovered conventions

Verification strategies for convention discovery:

Cross-reference with team knowledge. If the agent claims the codebase uses early returns consistently, check with developers who know the code. Sometimes the agent is right and the developers didn't realize how consistent they'd been. Sometimes the agent overgeneralized.

Test against recent code. Recent PRs reflect current conventions. Ask the agent to analyze recently merged code specifically. If new code follows different patterns than the historical codebase, the conventions are shifting.

Look for documented exceptions. Some files intentionally deviate from conventions. Generated code, third-party integrations, legacy modules. Ask: "Which files seem to follow different patterns, and is there an obvious reason?" Known exceptions strengthen confidence in the core conventions.

Count occurrences. For critical conventions, ask for numbers. "How many files use pattern A vs. pattern B?" Quantification beats vague claims of consistency.

From discovered conventions to CLAUDE.md

The goal of convention discovery is improved context for future work. What you learn about the codebase belongs in project documentation.

Effective CLAUDE.md entries for discovered conventions:

## Coding conventions

### Naming
- Boolean functions use `is` prefix: `isValid`, `isEnabled`
- Event handlers use `handle` prefix: `handleClick`, `handleSubmit`
- Async functions use `fetch` prefix for API calls: `fetchUser`, `fetchOrders`

### Error handling
- Services return Result types, never throw
- Controllers catch service errors and transform to HTTP responses
- Error messages use the format: "Failed to {action}: {reason}"

### File structure
Components follow: imports, types, helpers, component, exports
Services follow: imports, types, class, private methods, public methods

These entries guide agents to match existing patterns. Without them, agents invent. And inventions rarely match established conventions.

Convention extraction as onboarding

The same process that extracts conventions for documentation accelerates human onboarding.

A new developer can ask the agent: "What do I need to know about conventions in this codebase before I start writing code?"

The answer compresses weeks of osmotic learning into a single response. Not perfectly. Context always helps, and nuance gets lost. But the starting point is dramatically better than a blank slate.

Teams that use agent-assisted documentation report reducing onboarding time by 40-53%. Convention extraction is a significant factor. New developers write code that fits immediately rather than learning through correction.

Building institutional knowledge

Undocumented conventions represent institutional knowledge locked in code. Key developers leave, taking context with them. The patterns remain, but the reasons evaporate.

Agents don't retrieve the reasons. They don't explain why the codebase uses dependency injection or why controllers avoid business logic. But they do surface what patterns exist, which is the first step toward documenting why.

Convention discovery transforms implicit knowledge into explicit documentation. The documentation persists beyond any individual developer's tenure. The codebase becomes self-explaining in ways it wasn't before.

The next page examines documentation archaeology: using agents to reconstruct not just patterns but the design decisions that created them, working backward from code to intent.

On this page