Building Automation Workflows
From components to systems
The previous pages covered individual capabilities: skills define reusable behaviors, hooks provide deterministic execution, parallel agents tackle concurrent work, and CI/CD integration runs everything unattended. Separately, each solves narrow problems. Combined, they handle complex, multi-stage automation.
This page covers how to combine these components into production systems—practical patterns for building workflows that run reliably without human intervention.
Workflow composition patterns
Three patterns cover most production deployments.
Sequential pipelines:
Each stage completes before the next begins. Output flows forward.
analyze → plan → implement → test → review → deployUse sequential pipelines when stages depend on previous results. An implementation agent needs the plan. A test agent needs the implementation. The ordering isn't a preference—it's a constraint.
# Sequential skill composition
skills:
release-prep:
stages:
- skill: code-analysis
output: analysis.md
- skill: changelog-generator
input: analysis.md
output: CHANGELOG.md
- skill: version-bump
input: CHANGELOG.mdThe tradeoff: total duration equals the sum of all stages. No parallelization possible.
Fan-out/fan-in:
Work splits across parallel agents, then aggregates.
┌─ ui-tests ──────┐
analyze ────────>├─ api-tests ─────┼───> aggregate → report
└─ security-scan ─┘Use fan-out when tasks are independent but results need consolidation. Test suites covering different layers. Reviews from multiple perspectives. Documentation for different audiences.
# Parallel execution with aggregation
jobs:
split:
outputs:
matrix: ${{ steps.analyze.outputs.components }}
steps:
- id: analyze
run: echo "components=['ui','api','data']" >> $GITHUB_OUTPUT
process:
needs: split
strategy:
matrix:
component: ${{ fromJson(needs.split.outputs.matrix) }}
steps:
- uses: anthropics/claude-code-action@v1
with:
prompt: "Review the ${{ matrix.component }} component"
aggregate:
needs: process
steps:
- name: Combine results
run: |
# Merge all component reviews into unified reportThe tradeoff: aggregation adds complexity. Results from parallel agents may conflict. Someone has to write resolution logic—and that someone is you.
Hierarchical orchestration:
A coordinator agent delegates to specialists.
orchestrator
/ | \
planner worker reviewer
|
sub-workerUse hierarchical when task scope is unknown upfront. The orchestrator decomposes dynamically based on what it discovers.
# Orchestrator pattern in Claude Code
skills:
feature-implementation:
context: fork
agent: orchestrator
body: |
You coordinate feature implementation.
Available specialists:
- /architect: Design decisions
- /implementer: Code changes
- /tester: Test creation
Analyze the request, delegate appropriately,
and aggregate results.The tradeoff: orchestrators add latency and cost. Every delegation is an additional agent invocation. Simple tasks don't need coordinators—you're paying for overhead that buys you nothing.
Start with sequential pipelines. Add parallelization when duration matters. Use hierarchical orchestration only when decomposition must happen at runtime.
Combining skills with hooks
Skills define what agents do. Hooks enforce what must happen around those actions.
Pre-execution validation:
{
"hooks": {
"PreToolUse": [
{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "python validate_action.py"
}
]
}
]
}
}The hook runs before every tool use. Skills operate within that constraint. A deployment skill cannot bypass the validation—hooks execute no matter what.
Post-execution quality gates:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "npm run lint -- --fix $file_path && npm run typecheck"
}
]
}
]
}
}Every file modification triggers linting and type checking. Skills that edit code automatically comply with quality standards.
Skill-specific hooks:
# In SKILL.md frontmatter
---
name: database-migration
hooks:
pre:
- command: "pg_dump $DATABASE_URL > backup.sql"
post:
- command: "npm run migration:verify"
---Migration-specific hooks: backup before, verify after. These run in addition to global hooks.
Multi-agent workflow design
Enterprise workflows often require multiple agents working together. Patterns that work at scale share common characteristics.
Clear handoff protocols:
Agents need explicit interfaces. What does one agent produce? What does the next agent expect?
# Explicit contract between agents
analyst:
output:
format: json
schema:
files_to_change: array
risk_assessment: string
dependencies: array
implementer:
input:
source: analyst.output
required: [files_to_change, dependencies]Without contracts, agents interpret ambiguously. Structured handoffs eliminate confusion.
Context isolation:
Each agent operates in its own context window. Shared state flows through explicit channels, not conversation history.
# State management via files
- name: Agent A writes state
run: |
echo "$ANALYSIS_RESULT" > .workflow/state/analysis.json
- name: Agent B reads state
run: |
ANALYSIS=$(cat .workflow/state/analysis.json)
# Proceed with explicit stateFile-based state is auditable, versioned, and survives agent restarts.
Bounded autonomy:
Define what agents can and cannot do.
agents:
reviewer:
permissions:
- read: "**/*"
- write: "reviews/**" # Only write to reviews directory
- execute: ["npm test", "npm run lint"]
forbidden:
- "git push"
- "npm publish"Narrow permissions limit blast radius. A reviewer that can only write reviews cannot accidentally deploy.
Production workflow examples
Dependency update workflow:
Combines scheduled execution, parallel processing, and quality gates.
name: Automated Dependency Updates
on:
schedule:
- cron: '0 9 * * 1' # Weekly Monday 9 AM
jobs:
identify:
runs-on: ubuntu-latest
outputs:
updates: ${{ steps.check.outputs.packages }}
steps:
- uses: actions/checkout@v4
- id: check
run: |
npm outdated --json > outdated.json || true
echo "packages=$(cat outdated.json | jq -c 'keys')" >> $GITHUB_OUTPUT
update:
needs: identify
if: needs.identify.outputs.updates != '[]'
strategy:
matrix:
package: ${{ fromJson(needs.identify.outputs.updates) }}
fail-fast: false
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: anthropics/claude-code-action@v1
with:
prompt: |
Update ${{ matrix.package }} to latest version.
Review changelog for breaking changes.
Update code if breaking changes affect us.
Run tests to verify.
aggregate:
needs: update
runs-on: ubuntu-latest
steps:
- name: Create combined PR
uses: peter-evans/create-pull-request@v6
with:
title: "Weekly dependency updates"
body: "Automated updates with compatibility verification"Each package updates in parallel. Failed updates don't block others. A single PR collects successful updates.
Documentation sync workflow:
Keeps documentation aligned with code changes.
name: Documentation Sync
on:
push:
branches: [main]
paths:
- 'src/**/*.ts'
- 'src/**/*.tsx'
jobs:
detect-changes:
runs-on: ubuntu-latest
outputs:
changed-modules: ${{ steps.diff.outputs.modules }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 2
- id: diff
run: |
modules=$(git diff --name-only HEAD~1 | grep '^src/' | cut -d'/' -f2 | sort -u | jq -R -s -c 'split("\n")[:-1]')
echo "modules=$modules" >> $GITHUB_OUTPUT
update-docs:
needs: detect-changes
if: needs.detect-changes.outputs.changed-modules != '[]'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: anthropics/claude-code-action@v1
with:
prompt: |
Code changed in: ${{ needs.detect-changes.outputs.changed-modules }}
For each changed module:
1. Read the updated source code
2. Find corresponding documentation in docs/
3. Update documentation to reflect code changes
4. Ensure examples still work
Only modify documentation files. Do not change source code.Documentation updates trigger only when relevant code changes. Agents focus on specific modules, not the entire codebase.
Observability integration
Production workflows need visibility. OpenTelemetry is the standard.
Semantic conventions for agent workflows:
from opentelemetry import trace
tracer = trace.get_tracer("agent-workflow")
with tracer.start_as_current_span("invoke_agent") as span:
span.set_attribute("gen_ai.agent.name", "dependency-updater")
span.set_attribute("gen_ai.agent.id", workflow_run_id)
span.set_attribute("gen_ai.operation.name", "invoke_agent")
span.set_attribute("gen_ai.conversation.id", session_id)
# Agent execution
result = run_agent(task)
span.set_attribute("gen_ai.response.model", result.model)
span.set_attribute("gen_ai.usage.input_tokens", result.input_tokens)
span.set_attribute("gen_ai.usage.output_tokens", result.output_tokens)Standardized attributes enable cross-platform analysis. Traces from Claude Code workflows appear alongside Codex workflows in the same dashboard.
Key metrics for workflow health:
| Metric | Purpose | Aggregation |
|---|---|---|
workflow.duration | End-to-end time | P50, P95, P99 |
agent.invocations | Agents called per workflow | Sum, average |
agent.tokens.total | Token consumption | Sum per workflow |
workflow.success_rate | Completion without error | Percentage |
escalation.rate | Human intervention required | Percentage |
Track by workflow, by agent type, and by time period. Anomaly detection surfaces degradation before it becomes failure.
Workflow maturity
Automation capabilities grow incrementally. Expecting full orchestration on day one sets you up for disappointment.
Level 1: Manual triggers Single-purpose skills invoked by developers. Human decides when to run. Human reviews all output.
Level 2: Scheduled execution Skills run on cron schedules. Basic quality gates block bad output. Human reviews flagged issues.
Level 3: Event-driven Workflows trigger on repository events. Multiple skills compose into pipelines. Human reviews PRs, not individual runs.
Level 4: Self-healing Workflows detect and recover from failures. Circuit breakers prevent cascading problems. Human reviews escalations only.
Level 5: Adaptive orchestration Orchestrator agents decompose tasks dynamically. Parallel execution scales with workload. Human reviews metrics and adjusts policies.
Most teams plateau at Level 3. Levels 4 and 5 require investment in observability and failure handling that many organizations skip. The right level depends on how much value the automation provides versus operational overhead—Level 3 with solid quality gates beats a fragile Level 5 implementation every time.
Don't skip levels. Each builds capabilities the next requires. Level 4 without Level 2's quality gates creates autonomous systems that produce garbage.
Making workflows maintainable
Automation that works today breaks tomorrow. Dependencies update. APIs change. Requirements evolve.
Version everything:
Skills, hooks, and workflow definitions belong in version control. Changes go through review. Rollback is a git revert.
Test workflows, not just code:
# Workflow test in CI
- name: Test dependency update workflow
run: |
# Simulate outdated package
npm install lodash@4.17.0
# Run workflow logic
./scripts/dependency-update.sh
# Verify expected behavior
test -f updates.json
grep -q "lodash" updates.jsonWorkflow logic is code. Test it like code.
Document decisions:
Why does the security scan run after tests instead of in parallel? Why is the cost limit $5 instead of $10? Future maintainers—including future you—need context.
# Workflow with decision documentation
jobs:
security:
# Runs after tests because security scanning on broken code
# wastes time and produces misleading results.
# Changed from parallel execution on 2025-11-15 after
# three incidents of false positives from parse errors.
needs: testComments in workflow files are underutilized. Use them.
What comes next
Skills, hooks, and multi-agent orchestration are tools. This page covered how to combine them. The final module shifts perspective: when should these tools be used at all?
Module 12 addresses judgment—knowing when agentic approaches accelerate work versus when traditional development remains superior. Having powerful automation available doesn't mean using it everywhere.