What is Context Engineering?
Defining context engineering as the full information ecosystem; distinguishing from prompt engineering; why vibe coding fails
The previous module established how agents perceive codebases and the constraints of context windows. This module transforms that understanding into practical technique: context engineering, the core competency of Agentic Software Development.
Beyond prompting
The term "prompt engineering" entered common usage as AI tools became mainstream. It originally described the craft of structuring inputs to language models for optimal outputs. Over time, the term diluted to mean little more than "typing things into a chatbot."
Context engineering reclaims the systematic nature of the original concept while expanding its scope. Where prompt engineering focuses on crafting individual instructions, context engineering addresses the entire information ecosystem that shapes agent behavior.
Andrej Karpathy describes context engineering as "the delicate art and science of filling the context window with just the right information for the next step."
The distinction matters because agent performance depends far more on what information is available than on how cleverly a single prompt is worded. A mediocre prompt with excellent context typically outperforms a brilliant prompt with poor context.
The full information ecosystem
Context engineering encompasses seven distinct components that together determine what an agent knows when it responds:
- System instructions The foundational directives that establish agent behavior, loaded before any conversation begins
- User prompt The immediate request or instruction in the current turn
- Conversation history The accumulated dialogue from previous turns in the session
- Long-term memory Information that persists across sessions, stored in files like CLAUDE.md
- Retrieved information Content dynamically loaded through tools, file reads, or search operations
- Available tools The capabilities the agent can invoke, which shape what actions it considers possible
- Output specifications Constraints on response format, length, and structure
Traditional prompting addresses only the second component. Context engineering orchestrates all seven.
The context window as workspace
A useful analogy: the context window functions like a developer's desk. The desk has finite space. What occupies that space determines what work is possible.
Scattered, disorganized materials make even simple tasks difficult. A clean desk with precisely the needed references enables focused, efficient work. The size of the desk matters less than how effectively the space is used.
This explains the counterintuitive finding from Module 2: larger context windows do not automatically produce better results. A carefully curated 50,000 token context often outperforms a carelessly assembled 200,000 token context. The quality of information matters more than the quantity.
Why most agent failures are context failures
When an agent produces incorrect output, the instinct is to blame the model. Research and practitioner experience suggest otherwise.
"Most agent failures are not model failures anymore they are context failures."
Consider the failure modes:
Confabulation The agent invents APIs, file names, or patterns that do not exist. This typically occurs when relevant documentation is absent from context. The agent, lacking authoritative information, generates plausible-sounding alternatives.
Inconsistency The agent produces code that contradicts established patterns in the codebase. This happens when architectural context is missing or buried in irrelevant history.
Repetitive errors The agent makes the same mistake across multiple attempts. Often, the error stems from incorrect information that entered context early and propagated.
Quality degradation Output quality declines as sessions extend. The "lost in the middle" phenomenon means valuable context becomes inaccessible even when technically present.
Each failure mode traces back to context problems, not model limitations. The model processes what it receives. Engineering what it receives is the developer's responsibility.
Quantifying context quality
Context quality can be measured through semantic similarity. By comparing the current conversation state against the original task description, drift becomes quantifiable.
| Similarity Score | Interpretation | Recommended Action |
|---|---|---|
| > 0.90 | Strongly aligned | Continue |
| 0.75 - 0.90 | Minor drift | Monitor |
| 0.55 - 0.75 | Noticeable deviation | Clarify and refocus |
| < 0.55 | High drift risk | Consider reset |
This framework transforms subjective "something feels off" observations into actionable metrics. When context quality drops below threshold, intervention prevents compounding errors.
The vibe coding contrast
The term "vibe coding" coined by Andrej Karpathy in February 2025 and named Collins Word of the Year that same year describes an approach where developers describe projects in natural language and accept generated code without deep review.
Vibe coding treats AI output as finished product rather than draft material. For prototypes and throwaway experiments, this works. For production systems, it fails catastrophically.
Studies show AI-generated pull requests contain 1.7x more issues than human-written code. They are 2.74x more likely to introduce XSS vulnerabilities and 1.88x more likely to implement improper password handling.
The failures compound:
Context rot Without deliberate context management, session quality degrades exponentially. Agents operate on limited memory; as conversations grow messy, suggestions deteriorate.
Architectural drift AI generates solutions from isolated prompts without unified patterns. The result is a patchwork codebase where each component reflects a different design philosophy.
Debugging impossibility Developers who accept code without understanding cannot debug it. When problems emerge, they lack the knowledge to diagnose causes.
Documentation absence Vibe coding prioritizes speed over explanation. The resulting codebase has no record of why decisions were made.
Enterprise adoption research reinforces these concerns. A survey of engineering leaders found 14 of 18 reported that vibe coding creates more long-term problems than short-term benefits. Approximately 11% of vibe coding sessions end in code breakdown or abandonment.
Context engineering as professional practice
Where vibe coding abdicates responsibility, context engineering embraces it.
The discipline requires understanding what agents need to perform well and systematically providing it. This includes:
- Structuring project documentation for agent consumption
- Managing conversation context to prevent degradation
- Front-loading critical information while avoiding overload
- Designing prompts that provide sufficient guidance without excessive constraint
- Recognizing when context has degraded and requires intervention
These skills transfer across tools. Whether working with Claude Code, Codex, Cursor, or future agents, the principles remain consistent. The practitioner who masters context engineering can adapt to any agentic tool.
The payoff
Studies of context engineering effectiveness show measurable improvements:
- 10.6% improvement on agentic task completion
- 86.9% average latency reduction through optimized context
- Performance matching top-ranked production agents using smaller, open-source models
The last finding is particularly significant. Effective context engineering can compensate for model limitations. A smaller model with excellent context often outperforms a larger model with poor context.
This has practical implications for cost and speed. Context-engineered workflows frequently achieve better results at lower computational expense.
The context engineering mindset
Effective context engineering requires a mental shift.
The traditional developer asks: "What code solves this problem?"
The context engineer asks: "What information does the agent need to generate code that solves this problem?"
This reframing places information architecture at the center of the development process. Before writing any prompt, the context engineer considers:
- What does the agent already know from project documentation?
- What additional context does this specific task require?
- How should that context be structured for optimal processing?
- What information might distract or confuse the agent?
- How will this context interact with accumulated conversation history?
These questions precede every significant agent interaction. The answers shape not just the immediate prompt but the entire information environment in which the agent operates.
"Rules didn't constrain agents; they unlocked deeper work by reducing environmental uncertainty."
This observation captures the counterintuitive nature of context engineering. Providing more structure enables more autonomy. Agents perform best when the boundaries are clear, leaving them free to work within those boundaries without constant guidance.
The hierarchy ahead
The remainder of this module explores context engineering through three levels:
Project context The persistent information embedded in documentation, file structure, and naming conventions. This context exists before any conversation begins and shapes every interaction.
Conversation context The accumulated dialogue within a session. Managing this layer requires understanding what persists, what fades, and how to intervene when quality degrades.
Prompt context The immediate instruction. Effective prompts draw on project and conversation context while adding task-specific guidance.
Each level builds on the previous. Strong project context reduces the burden on conversation management. Effective conversation context makes individual prompts more powerful. The three layers work together to create conditions where agents consistently produce high-quality output.
This hierarchy provides the structure for systematic improvement. Rather than hoping each interaction goes well, the context engineer designs systems where success is the expected outcome.