Understanding Agent Skills

Beyond prompting

Module 10 covered MCP as a way to give agents access to external data and tools. Skills solve a different problem: teaching agents how to perform tasks, not just what resources they can access.

Think of the difference between giving someone access to a kitchen and teaching them to cook. MCP provides access to ingredients and equipment. Skills provide the recipes.

Agent skills are structured, reusable capability packages that agents load and execute when relevant tasks arise. Unlike prompts, which vanish when the conversation ends, skills persist across sessions and can be shared across teams, projects, and even different agent platforms.

What skills actually are

A skill is a directory containing instructions, optional scripts, templates, and reference materials. The core file is SKILL.md: YAML frontmatter describing the skill, followed by markdown instructions the agent follows when activated.

.claude/skills/
└── code-review/
    ├── SKILL.md           # Required: Instructions and metadata
    ├── scripts/           # Optional: Executable helpers
    │   └── lint-check.sh
    ├── references/        # Optional: Documentation
    │   └── style-guide.md
    └── assets/            # Optional: Templates
        └── review-template.md

When Claude Code encounters a task matching a skill's description, it loads the skill's instructions and follows them. Skills are not documentation that agents might consult. They are active instruction sets that shape agent behavior.

The SKILL.md format

Every skill requires a SKILL.md file with YAML frontmatter:

---
name: code-review
description: Performs thorough code review following team standards
---

## Review process

1. Check for security vulnerabilities using the OWASP checklist
2. Verify test coverage meets 80% threshold
3. Confirm naming conventions match style guide
4. Flag any dependencies not in the approved list

## Output format

Provide review as a numbered list of findings...

The name field identifies the skill (max 64 characters, lowercase with hyphens only). The description field (max 1,024 characters) helps agents decide when to activate the skill.

How agents discover skills

Skills use lazy loading to manage context. Rather than dumping all skill instructions into context at startup (which would burn thousands of tokens), agents use a three-tier loading strategy:

Tier	What loads	Token cost	When
Discovery	Name and description only	~50-100 tokens	Session startup
Activation	Full SKILL.md body	~2,000-5,000 tokens	Skill invocation
Execution	Scripts, references, assets	As needed	During execution

At session startup, Claude Code scans skill directories and loads only metadata. This creates a lightweight index of available capabilities. When a task matches a skill's description, the full instructions load. Supporting files load only when the skill explicitly references them.

Teams may have dozens of skills. Loading all of them into every conversation would exhaust context windows before real work begins. Lazy loading prevents that.

The Agent Skills open standard

Anthropic released Agent Skills as an open standard in December 2025, publishing the specification and SDK at agentskills.io. The approach mirrors what they did with MCP: define foundational AI infrastructure as an open standard to encourage industry-wide adoption.

Adoption came fast. Within weeks:

OpenAI Codex added support
Microsoft VS Code and GitHub Copilot integrated it
Cursor, Google Gemini CLI joined
Enterprise platforms like Atlassian, Figma, and Databricks followed
Over 26 additional tools and platforms

The result is portability. A skill written for Claude Code works in Codex. A team's internal skills transfer when people switch tools. Enterprise skill libraries work across whatever agents the organization runs.

The agentskills.io specification is minimal by design. It defines only what's necessary for cross-platform compatibility: file structure, metadata format, and discovery conventions. Individual platforms extend the spec with proprietary features while maintaining compatibility with the core.

Cross-platform skill anatomy

The open standard defines a baseline every compliant agent must support:

---
name: skill-name              # Required: identifier
description: What it does     # Required: helps agents decide when to use
license: Apache-2.0           # Optional: for shared skills
compatibility: Requires git   # Optional: environment requirements
metadata:                     # Optional: arbitrary key-value pairs
  author: team-name
  version: "1.0"
---

Instructions in markdown...

Claude Code extends this with additional frontmatter options covered in later sections. Codex supports the same baseline plus its own extensions. The core portability comes from the shared foundation.

How skills differ from prompts

Aspect	Prompts	Skills
Persistence	Disappear after conversation ends	Persist across sessions
Invocation	Manual, type every time	Automatic or explicit `/skill-name`
Structure	Freeform text	Defined YAML frontmatter + markdown body
Portability	Copy-paste between conversations	Share via version control, work cross-platform
Resources	Cannot bundle scripts or files	Can include scripts, templates, references
Tool permissions	Use conversation defaults	Can specify `allowed-tools`
Context cost	Full cost every time	Lazy loading reduces cost

The repeatability problem

Prompts work for one-off tasks. Ask an agent to refactor a function, and a well-crafted prompt produces good results.

The problem emerges with repeated tasks. Code review, commit message generation, PR creation, deployment verification: these happen daily or hourly. Without skills, every invocation requires reconstructing the prompt. Instructions drift across team members. Tribal knowledge stays in people's heads instead of being captured somewhere useful.

Skills fix this by encoding procedural knowledge into artifacts that can be version-controlled, reviewed, and shared like any other code.

Deterministic invocation

Prompts are probabilistic. Even with identical prompts, agent behavior varies based on conversation context, model state, and other factors.

Skills introduce determinism at the invocation layer. When someone types /code-review, the skill always loads the same instructions. The agent's interpretation may vary, but the starting point is consistent.

For workflows with side effects (deployments, commits, database migrations), that consistency matters. You want to know exactly what instructions the agent received before it ran something consequential.

When to use skills versus direct interaction

Skills are not always the right tool. Direct interaction remains better for certain tasks.

Use skills when

Repeating workflows across conversations: If you find yourself typing the same prompt structure repeatedly, that prompt should become a skill. The threshold is roughly three repetitions. After typing the same instructions three times, extract them.

Encoding domain expertise: Teams accumulate knowledge about how to perform tasks correctly in their specific codebase. How to write tests that don't flake. What patterns to use for error handling. Which conventions apply to which directories. Skills capture this expertise in executable form.

Standardizing across a team: When multiple developers should follow the same procedure, a shared skill ensures consistency. Instead of documenting the procedure and hoping people follow it, the skill enforces it every time.

Procedures with side effects: Deployments, commits, PR creation, and database changes need controlled execution. Skills can specify disable-model-invocation: true to prevent accidental activation. The agent only executes when explicitly commanded.

Cross-platform workflows: If your team uses multiple agent tools, skills written to the open standard work everywhere. Write them once, run them in Claude Code, Codex, and other compliant platforms.

Use direct interaction when

One-off exploratory tasks: Questions like "What does this function do?" or "Help me understand this error" don't need skills. These are ad-hoc queries where conversation context provides everything needed.

Iterative refinement: When you're working through a problem and adjusting your approach based on results, the back-and-forth of direct conversation fits better than triggering predefined procedures.

Novel problems: Skills encode solutions to known problems. When facing something genuinely new, direct interaction lets you explore without constraints.

Tasks that vary significantly each time: If no two invocations look similar, a skill just adds overhead. The skill would need to be so general it provides little value over direct prompting.

Skill composition with other agent capabilities

Skills complement rather than replace other agent building blocks:

Building block	Best for	How skills relate
MCP servers	Data connectivity, external systems	Skills can use MCP tools as part of their procedures
Subagents	Task parallelization, tool isolation	Skills can spawn subagents for complex workflows
Project context	Codebase-wide knowledge	Skills access project context like any agent operation
Direct prompts	Ad-hoc requests	Skills formalize prompts that prove valuable

A sophisticated workflow might combine all of these: a skill that spawns subagents to analyze code in parallel, uses MCP to query a database for configuration, and references project context to understand codebase conventions.

Module 2 introduced .claude/skills/ as a way to organize detailed guidance for large projects. This module expands on that foundation with the full capabilities of the skill system: frontmatter configuration, invocation patterns, hooks integration, and multi-agent orchestration.

The context economics of skills

Context windows are finite. Procedural knowledge is not.

Without skills, teams face a tradeoff. Either stuff CLAUDE.md with comprehensive procedures (consuming context on every session) or keep procedures brief and accept inconsistent execution when agents improvise to fill gaps.

Skills avoid this tradeoff. Procedures live in skill files. Only metadata loads at startup. Full instructions load on-demand. The context window contains what the current task actually needs, not everything the team has ever documented.

For enterprise teams with hundreds of procedures, the math is stark. A 200-page procedures manual as a single document would consume an entire context window. The same content as 50 skills, loading on demand, might use 5% of that on any given task.

On this page