Mapping Architectures and Dependencies
The dependency mapping problem
Understanding how components connect across a large codebase is expensive work. Tracing a single data flow from user input through validation, processing, storage, and response touches dozens of files across multiple layers. Multiply this by every feature, and mapping becomes a serious time investment.
Traditional approaches rely on documentation (often outdated), tribal knowledge (often departed), or manual code reading (always slow). The Thoughtworks Technology Radar noted that AI agents can map dependencies across millions of lines of code in hours. The same work done manually takes months.
This isn't automation replacing understanding. It's acceleration that makes understanding feasible when it otherwise wouldn't happen.
Entry point analysis
Every codebase has entry points places where external requests become internal execution. HTTP endpoints, CLI commands, message queue consumers, scheduled jobs. Finding these maps the boundaries between outside and inside.
Agents find entry points through pattern recognition. Ask directly:
What are all the entry points into this application?
List HTTP endpoints, CLI commands, message handlers, and scheduled tasks.
Show file paths and function names.The response shows the surface area: how users, other services, and scheduled processes interact with the system. From here, you can trace what happens after entry.
For web applications, entry points concentrate in routing configuration:
Find all route definitions in this codebase. Show the HTTP method,
path pattern, and handler function for each route. Identify which
router or framework is being used.For event-driven systems, entry points are spread across message handlers:
List all message queue consumers. For each, show the queue name,
message type, and handler function. Identify how acknowledgment
and error handling work.Entry point analysis answers a fundamental question: how does the outside world talk to this system? Everything else flows from these boundaries.
Tracing data flow
Data enters systems, transforms through processing, and exits as responses or side effects. Tracing this flow shows you the implemented reality not the intended design, but what actually runs.
Start with a specific flow and trace it end to end:
Trace what happens when a user submits an order:
1. Which endpoint receives the request?
2. How is the request validated?
3. What business logic processes the order?
4. Which database tables get written?
5. What events or notifications are triggered?
6. What response goes back to the user?This gives you a vertical slice through the architecture. The agent follows function calls, reads through layers, and reports the chain of execution.
For verification, ask the agent to show the specific code:
For each step in that trace, show me the exact function calls
and the files they're in. I want to verify this flow myself.Claims without code references are harder to verify. Specific file and line citations let you confirm the agent got it right.
When tracing data flows, ask for the "happy path" first the normal successful case. Then ask about error cases, edge cases, and exception paths. The happy path shows intended design; error handling shows actual robustness.
Dependency graph construction
Modules depend on other modules. Services call other services. Classes inherit from other classes. These relationships form a dependency graph that determines what you can change safely and what will break.
Agents construct dependency graphs through code analysis:
For the OrderService module, identify:
1. What other modules does it depend on (import/use)?
2. What modules depend on it?
3. Are there any circular dependencies?
4. What external services or APIs does it call?The response shows coupling. A module with twenty dependencies is harder to modify than one with three. A module that everything else depends on is dangerous to touch.
For visual understanding, request structured output:
List the dependencies between modules in src/services/ as a
simple table: "Module A depends on Module B because [reason]".
Include both direct imports and runtime calls.Tables and lists are easier to scan than narrative explanations. They also expose gaps a missing dependency might indicate something the agent overlooked.
Cross-layer analysis
Modern applications organize code into layers: controllers, services, repositories, models. Understanding how these layers interact shows you both the architectural patterns and the places where they break down.
How does data flow between layers in this application?
Specifically:
- How do controllers call services?
- How do services access the database?
- Are there places where controllers access the database directly?
- Are there services that call other services, and is that expected?Layer violations surface through this analysis. A controller that queries the database directly bypasses the service layer. A model that calls an external API inverts the expected flow.
These violations aren't necessarily bugs. Sometimes they're deliberate shortcuts, sometimes accumulated technical debt. Knowing they exist matters more than judging them immediately.
Service boundary mapping
Distributed systems add another dimension: service-to-service communication. Each service has its own entry points, and now they also call each other through APIs, message queues, or shared databases.
What external services does this application communicate with?
For each:
- What protocol (HTTP, gRPC, message queue, database)?
- What data flows in each direction?
- How are failures handled?
- Where is the client code that makes these calls?This shows the system's dependencies on external components databases, third-party APIs, internal microservices. Each dependency is a potential failure point.
For microservice architectures, understanding cross-service flows becomes essential:
Trace a user registration through all involved services.
Start from the initial HTTP request and follow every
inter-service call until the flow completes. Note which
services are synchronous vs. asynchronous.Distributed flows are harder to trace than local function calls. Agents follow the code, but you may need to correlate with infrastructure documentation or API specifications for the full picture.
Agents see code, not runtime behavior. They can identify that service A calls service B, but not that service B runs on three replicas behind a load balancer. Infrastructure architecture requires additional context beyond the codebase.
Pattern identification at scale
Large codebases contain patterns some intentional, some emergent over time. Agents identify these patterns by examining multiple instances of similar code.
Look at the API endpoints in src/api/. Do they follow a
consistent pattern? If so, describe the pattern. If not,
what variations exist and which files contain exceptions?Pattern analysis does two things. First, it documents conventions that may not be written anywhere. Second, it exposes inconsistencies that may need standardization.
For architectural patterns:
Does this codebase use any recognizable architectural patterns?
Look for: MVC, repository pattern, CQRS, event sourcing, dependency
injection, factory patterns. Show examples of each pattern you find.The agent scans code for structural signatures of known patterns. Recognizing that the codebase uses repository pattern tells you that new code should follow the same structure.
Scaling to large codebases
Codebases with millions of lines exceed what any single conversation can analyze. Even with 200k token context windows, comprehensive mapping requires breaking the work into pieces.
Divide by module. Map one subsystem at a time. "Analyze the authentication system" produces focused results. "Analyze everything" produces shallow ones.
Use parallel sub-agents. Each sub-agent investigates a different area:
Explore this codebase using 4 parallel tasks:
1. Map the API layer - all endpoints, routes, and controllers
2. Map the service layer - business logic organization
3. Map the data layer - repositories, models, database access
4. Map the infrastructure layer - external service clients, messagingEach sub-agent returns a summary. The summaries combine into a comprehensive picture without exhausting any single context window.
Build incrementally. Initial mapping shows you the major components. Subsequent sessions drill into areas that matter for your specific task. Complete mapping of every corner satisfies curiosity but delays practical work.
Documenting what you learn
Architecture understanding exists in your head until you write it down. Agents cannot remember previous sessions. The next conversation starts from zero.
As mapping proceeds, capture findings:
Component inventory. List major components with one-line descriptions. "OrderService: handles order creation, validation, and status updates."
Dependency summary. Document key dependencies between components. "OrderService depends on: InventoryService, PaymentGateway, NotificationService."
Pattern catalog. Record patterns in use. "API endpoints follow controller → service → repository → model pattern."
Boundary documentation. Map external dependencies. "Connects to: PostgreSQL (orders DB), Redis (caching), Stripe API (payments), Kafka (events)."
These notes become context for future sessions. What took hours to discover loads in seconds.
Architecture mapping in practice
Effective architecture mapping follows a progression:
- Identify entry points where does execution begin?
- Trace representative flows what paths does data take?
- Map major components what are the building blocks?
- Document dependencies what connects to what?
- Identify patterns what conventions govern the code?
- Note exceptions where does the code deviate from patterns?
This progression moves from concrete (entry points, specific flows) to abstract (patterns, conventions). Each step builds on the previous.
The goal is not complete knowledge. The goal is knowing enough to work effectively. Perfect maps don't exist for living codebases anyway they change faster than documentation updates.
What matters is navigating confidently and recognizing what you still need to learn. The next page examines finding conventions that aren't documented patterns that exist only in the code itself.