Understanding Agent Skills
Beyond prompting
Module 10 covered MCP as a way to give agents access to external data and tools. Skills solve a different problem: teaching agents how to perform tasks, not just what resources they can access.
Think of the difference between giving someone access to a kitchen and teaching them to cook. MCP provides access to ingredients and equipment. Skills provide the recipes.
Agent skills are structured, reusable capability packages that agents load and execute when relevant tasks arise. Unlike prompts, which vanish when the conversation ends, skills persist across sessions and can be shared across teams, projects, and even different agent platforms.
What skills actually are
A skill is a directory containing instructions, optional scripts, templates, and reference materials.
The core file is SKILL.md: YAML frontmatter describing the skill, followed by markdown instructions the agent follows when activated.
.claude/skills/
└── code-review/
├── SKILL.md # Required: Instructions and metadata
├── scripts/ # Optional: Executable helpers
│ └── lint-check.sh
├── references/ # Optional: Documentation
│ └── style-guide.md
└── assets/ # Optional: Templates
└── review-template.mdWhen Claude Code encounters a task matching a skill's description, it loads the skill's instructions and follows them. Skills are not documentation that agents might consult. They are active instruction sets that shape agent behavior.
The SKILL.md format
Every skill requires a SKILL.md file with YAML frontmatter:
---
name: code-review
description: Performs thorough code review following team standards
---
## Review process
1. Check for security vulnerabilities using the OWASP checklist
2. Verify test coverage meets 80% threshold
3. Confirm naming conventions match style guide
4. Flag any dependencies not in the approved list
## Output format
Provide review as a numbered list of findings...The name field identifies the skill (max 64 characters, lowercase with hyphens only).
The description field (max 1,024 characters) helps agents decide when to activate the skill.
How agents discover skills
Skills use lazy loading to manage context. Rather than dumping all skill instructions into context at startup (which would burn thousands of tokens), agents use a three-tier loading strategy:
| Tier | What loads | Token cost | When |
|---|---|---|---|
| Discovery | Name and description only | ~50-100 tokens | Session startup |
| Activation | Full SKILL.md body | ~2,000-5,000 tokens | Skill invocation |
| Execution | Scripts, references, assets | As needed | During execution |
At session startup, Claude Code scans skill directories and loads only metadata. This creates a lightweight index of available capabilities. When a task matches a skill's description, the full instructions load. Supporting files load only when the skill explicitly references them.
Teams may have dozens of skills. Loading all of them into every conversation would exhaust context windows before real work begins. Lazy loading prevents that.
The Agent Skills open standard
Anthropic released Agent Skills as an open standard in December 2025, publishing the specification and SDK at agentskills.io. The approach mirrors what they did with MCP: define foundational AI infrastructure as an open standard to encourage industry-wide adoption.
Adoption came fast. Within weeks:
- OpenAI Codex added support
- Microsoft VS Code and GitHub Copilot integrated it
- Cursor, Google Gemini CLI joined
- Enterprise platforms like Atlassian, Figma, and Databricks followed
- Over 26 additional tools and platforms
The result is portability. A skill written for Claude Code works in Codex. A team's internal skills transfer when people switch tools. Enterprise skill libraries work across whatever agents the organization runs.
The agentskills.io specification is minimal by design. It defines only what's necessary for cross-platform compatibility: file structure, metadata format, and discovery conventions. Individual platforms extend the spec with proprietary features while maintaining compatibility with the core.
Cross-platform skill anatomy
The open standard defines a baseline every compliant agent must support:
---
name: skill-name # Required: identifier
description: What it does # Required: helps agents decide when to use
license: Apache-2.0 # Optional: for shared skills
compatibility: Requires git # Optional: environment requirements
metadata: # Optional: arbitrary key-value pairs
author: team-name
version: "1.0"
---
Instructions in markdown...Claude Code extends this with additional frontmatter options covered in later sections. Codex supports the same baseline plus its own extensions. The core portability comes from the shared foundation.
How skills differ from prompts
| Aspect | Prompts | Skills |
|---|---|---|
| Persistence | Disappear after conversation ends | Persist across sessions |
| Invocation | Manual, type every time | Automatic or explicit /skill-name |
| Structure | Freeform text | Defined YAML frontmatter + markdown body |
| Portability | Copy-paste between conversations | Share via version control, work cross-platform |
| Resources | Cannot bundle scripts or files | Can include scripts, templates, references |
| Tool permissions | Use conversation defaults | Can specify allowed-tools |
| Context cost | Full cost every time | Lazy loading reduces cost |
The repeatability problem
Prompts work for one-off tasks. Ask an agent to refactor a function, and a well-crafted prompt produces good results.
The problem emerges with repeated tasks. Code review, commit message generation, PR creation, deployment verification: these happen daily or hourly. Without skills, every invocation requires reconstructing the prompt. Instructions drift across team members. Tribal knowledge stays in people's heads instead of being captured somewhere useful.
Skills fix this by encoding procedural knowledge into artifacts that can be version-controlled, reviewed, and shared like any other code.
Deterministic invocation
Prompts are probabilistic. Even with identical prompts, agent behavior varies based on conversation context, model state, and other factors.
Skills introduce determinism at the invocation layer.
When someone types /code-review, the skill always loads the same instructions.
The agent's interpretation may vary, but the starting point is consistent.
For workflows with side effects (deployments, commits, database migrations), that consistency matters. You want to know exactly what instructions the agent received before it ran something consequential.
When to use skills versus direct interaction
Skills are not always the right tool. Direct interaction remains better for certain tasks.
Use skills when
Repeating workflows across conversations: If you find yourself typing the same prompt structure repeatedly, that prompt should become a skill. The threshold is roughly three repetitions. After typing the same instructions three times, extract them.
Encoding domain expertise: Teams accumulate knowledge about how to perform tasks correctly in their specific codebase. How to write tests that don't flake. What patterns to use for error handling. Which conventions apply to which directories. Skills capture this expertise in executable form.
Standardizing across a team: When multiple developers should follow the same procedure, a shared skill ensures consistency. Instead of documenting the procedure and hoping people follow it, the skill enforces it every time.
Procedures with side effects: Deployments, commits, PR creation, and database changes need controlled execution.
Skills can specify disable-model-invocation: true to prevent accidental activation.
The agent only executes when explicitly commanded.
Cross-platform workflows: If your team uses multiple agent tools, skills written to the open standard work everywhere. Write them once, run them in Claude Code, Codex, and other compliant platforms.
Use direct interaction when
One-off exploratory tasks: Questions like "What does this function do?" or "Help me understand this error" don't need skills. These are ad-hoc queries where conversation context provides everything needed.
Iterative refinement: When you're working through a problem and adjusting your approach based on results, the back-and-forth of direct conversation fits better than triggering predefined procedures.
Novel problems: Skills encode solutions to known problems. When facing something genuinely new, direct interaction lets you explore without constraints.
Tasks that vary significantly each time: If no two invocations look similar, a skill just adds overhead. The skill would need to be so general it provides little value over direct prompting.
Skill composition with other agent capabilities
Skills complement rather than replace other agent building blocks:
| Building block | Best for | How skills relate |
|---|---|---|
| MCP servers | Data connectivity, external systems | Skills can use MCP tools as part of their procedures |
| Subagents | Task parallelization, tool isolation | Skills can spawn subagents for complex workflows |
| Project context | Codebase-wide knowledge | Skills access project context like any agent operation |
| Direct prompts | Ad-hoc requests | Skills formalize prompts that prove valuable |
A sophisticated workflow might combine all of these: a skill that spawns subagents to analyze code in parallel, uses MCP to query a database for configuration, and references project context to understand codebase conventions.
Module 2 introduced .claude/skills/ as a way to organize detailed guidance for large projects.
This module expands on that foundation with the full capabilities of the skill system: frontmatter configuration, invocation patterns, hooks integration, and multi-agent orchestration.
The context economics of skills
Context windows are finite. Procedural knowledge is not.
Without skills, teams face a tradeoff. Either stuff CLAUDE.md with comprehensive procedures (consuming context on every session) or keep procedures brief and accept inconsistent execution when agents improvise to fill gaps.
Skills avoid this tradeoff. Procedures live in skill files. Only metadata loads at startup. Full instructions load on-demand. The context window contains what the current task actually needs, not everything the team has ever documented.
For enterprise teams with hundreds of procedures, the math is stark. A 200-page procedures manual as a single document would consume an entire context window. The same content as 50 skills, loading on demand, might use 5% of that on any given task.