Building Automation Workflows

From components to systems

The previous pages covered individual capabilities: skills define reusable behaviors, hooks provide deterministic execution, parallel agents tackle concurrent work, and CI/CD integration runs everything unattended. Separately, each solves narrow problems. Combined, they handle complex, multi-stage automation.

This page covers how to combine these components into production systems—practical patterns for building workflows that run reliably without human intervention.

Workflow composition patterns

Three patterns cover most production deployments.

Sequential pipelines:

Each stage completes before the next begins. Output flows forward.

analyze → plan → implement → test → review → deploy

Use sequential pipelines when stages depend on previous results. An implementation agent needs the plan. A test agent needs the implementation. The ordering isn't a preference—it's a constraint.

# Sequential skill composition
skills:
  release-prep:
    stages:
      - skill: code-analysis
        output: analysis.md
      - skill: changelog-generator
        input: analysis.md
        output: CHANGELOG.md
      - skill: version-bump
        input: CHANGELOG.md

The tradeoff: total duration equals the sum of all stages. No parallelization possible.

Fan-out/fan-in:

Work splits across parallel agents, then aggregates.

                 ┌─ ui-tests ──────┐
analyze ────────>├─ api-tests ─────┼───> aggregate → report
                 └─ security-scan ─┘

Use fan-out when tasks are independent but results need consolidation. Test suites covering different layers. Reviews from multiple perspectives. Documentation for different audiences.

# Parallel execution with aggregation
jobs:
  split:
    outputs:
      matrix: ${{ steps.analyze.outputs.components }}
    steps:
      - id: analyze
        run: echo "components=['ui','api','data']" >> $GITHUB_OUTPUT

  process:
    needs: split
    strategy:
      matrix:
        component: ${{ fromJson(needs.split.outputs.matrix) }}
    steps:
      - uses: anthropics/claude-code-action@v1
        with:
          prompt: "Review the ${{ matrix.component }} component"

  aggregate:
    needs: process
    steps:
      - name: Combine results
        run: |
          # Merge all component reviews into unified report

The tradeoff: aggregation adds complexity. Results from parallel agents may conflict. Someone has to write resolution logic—and that someone is you.

Hierarchical orchestration:

A coordinator agent delegates to specialists.

                    orchestrator
                    /    |     \
              planner  worker   reviewer
                         |
                      sub-worker

Use hierarchical when task scope is unknown upfront. The orchestrator decomposes dynamically based on what it discovers.

# Orchestrator pattern in Claude Code
skills:
  feature-implementation:
    context: fork
    agent: orchestrator
    body: |
      You coordinate feature implementation.
      Available specialists:
      - /architect: Design decisions
      - /implementer: Code changes
      - /tester: Test creation

      Analyze the request, delegate appropriately,
      and aggregate results.

The tradeoff: orchestrators add latency and cost. Every delegation is an additional agent invocation. Simple tasks don't need coordinators—you're paying for overhead that buys you nothing.

Start with sequential pipelines. Add parallelization when duration matters. Use hierarchical orchestration only when decomposition must happen at runtime.

Combining skills with hooks

Skills define what agents do. Hooks enforce what must happen around those actions.

Pre-execution validation:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "*",
        "hooks": [
          {
            "type": "command",
            "command": "python validate_action.py"
          }
        ]
      }
    ]
  }
}

The hook runs before every tool use. Skills operate within that constraint. A deployment skill cannot bypass the validation—hooks execute no matter what.

Post-execution quality gates:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "npm run lint -- --fix $file_path && npm run typecheck"
          }
        ]
      }
    ]
  }
}

Every file modification triggers linting and type checking. Skills that edit code automatically comply with quality standards.

Skill-specific hooks:

# In SKILL.md frontmatter
---
name: database-migration
hooks:
  pre:
    - command: "pg_dump $DATABASE_URL > backup.sql"
  post:
    - command: "npm run migration:verify"
---

Migration-specific hooks: backup before, verify after. These run in addition to global hooks.

Multi-agent workflow design

Enterprise workflows often require multiple agents working together. Patterns that work at scale share common characteristics.

Clear handoff protocols:

Agents need explicit interfaces. What does one agent produce? What does the next agent expect?

# Explicit contract between agents
analyst:
  output:
    format: json
    schema:
      files_to_change: array
      risk_assessment: string
      dependencies: array

implementer:
  input:
    source: analyst.output
    required: [files_to_change, dependencies]

Without contracts, agents interpret ambiguously. Structured handoffs eliminate confusion.

Context isolation:

Each agent operates in its own context window. Shared state flows through explicit channels, not conversation history.

# State management via files
- name: Agent A writes state
  run: |
    echo "$ANALYSIS_RESULT" > .workflow/state/analysis.json

- name: Agent B reads state
  run: |
    ANALYSIS=$(cat .workflow/state/analysis.json)
    # Proceed with explicit state

File-based state is auditable, versioned, and survives agent restarts.

Bounded autonomy:

Define what agents can and cannot do.

agents:
  reviewer:
    permissions:
      - read: "**/*"
      - write: "reviews/**"  # Only write to reviews directory
      - execute: ["npm test", "npm run lint"]
    forbidden:
      - "git push"
      - "npm publish"

Narrow permissions limit blast radius. A reviewer that can only write reviews cannot accidentally deploy.

Production workflow examples

Dependency update workflow:

Combines scheduled execution, parallel processing, and quality gates.

name: Automated Dependency Updates
on:
  schedule:
    - cron: '0 9 * * 1'  # Weekly Monday 9 AM

jobs:
  identify:
    runs-on: ubuntu-latest
    outputs:
      updates: ${{ steps.check.outputs.packages }}
    steps:
      - uses: actions/checkout@v4
      - id: check
        run: |
          npm outdated --json > outdated.json || true
          echo "packages=$(cat outdated.json | jq -c 'keys')" >> $GITHUB_OUTPUT

  update:
    needs: identify
    if: needs.identify.outputs.updates != '[]'
    strategy:
      matrix:
        package: ${{ fromJson(needs.identify.outputs.updates) }}
      fail-fast: false
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: anthropics/claude-code-action@v1
        with:
          prompt: |
            Update ${{ matrix.package }} to latest version.
            Review changelog for breaking changes.
            Update code if breaking changes affect us.
            Run tests to verify.

  aggregate:
    needs: update
    runs-on: ubuntu-latest
    steps:
      - name: Create combined PR
        uses: peter-evans/create-pull-request@v6
        with:
          title: "Weekly dependency updates"
          body: "Automated updates with compatibility verification"

Each package updates in parallel. Failed updates don't block others. A single PR collects successful updates.

Documentation sync workflow:

Keeps documentation aligned with code changes.

name: Documentation Sync
on:
  push:
    branches: [main]
    paths:
      - 'src/**/*.ts'
      - 'src/**/*.tsx'

jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      changed-modules: ${{ steps.diff.outputs.modules }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 2
      - id: diff
        run: |
          modules=$(git diff --name-only HEAD~1 | grep '^src/' | cut -d'/' -f2 | sort -u | jq -R -s -c 'split("\n")[:-1]')
          echo "modules=$modules" >> $GITHUB_OUTPUT

  update-docs:
    needs: detect-changes
    if: needs.detect-changes.outputs.changed-modules != '[]'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: anthropics/claude-code-action@v1
        with:
          prompt: |
            Code changed in: ${{ needs.detect-changes.outputs.changed-modules }}

            For each changed module:
            1. Read the updated source code
            2. Find corresponding documentation in docs/
            3. Update documentation to reflect code changes
            4. Ensure examples still work

            Only modify documentation files. Do not change source code.

Documentation updates trigger only when relevant code changes. Agents focus on specific modules, not the entire codebase.

Observability integration

Production workflows need visibility. OpenTelemetry is the standard.

Semantic conventions for agent workflows:

from opentelemetry import trace

tracer = trace.get_tracer("agent-workflow")

with tracer.start_as_current_span("invoke_agent") as span:
    span.set_attribute("gen_ai.agent.name", "dependency-updater")
    span.set_attribute("gen_ai.agent.id", workflow_run_id)
    span.set_attribute("gen_ai.operation.name", "invoke_agent")
    span.set_attribute("gen_ai.conversation.id", session_id)

    # Agent execution
    result = run_agent(task)

    span.set_attribute("gen_ai.response.model", result.model)
    span.set_attribute("gen_ai.usage.input_tokens", result.input_tokens)
    span.set_attribute("gen_ai.usage.output_tokens", result.output_tokens)

Standardized attributes enable cross-platform analysis. Traces from Claude Code workflows appear alongside Codex workflows in the same dashboard.

Key metrics for workflow health:

Metric	Purpose	Aggregation
`workflow.duration`	End-to-end time	P50, P95, P99
`agent.invocations`	Agents called per workflow	Sum, average
`agent.tokens.total`	Token consumption	Sum per workflow
`workflow.success_rate`	Completion without error	Percentage
`escalation.rate`	Human intervention required	Percentage

Track by workflow, by agent type, and by time period. Anomaly detection surfaces degradation before it becomes failure.

Workflow maturity

Automation capabilities grow incrementally. Expecting full orchestration on day one sets you up for disappointment.

Level 1: Manual triggers Single-purpose skills invoked by developers. Human decides when to run. Human reviews all output.

Level 2: Scheduled execution Skills run on cron schedules. Basic quality gates block bad output. Human reviews flagged issues.

Level 3: Event-driven Workflows trigger on repository events. Multiple skills compose into pipelines. Human reviews PRs, not individual runs.

Level 4: Self-healing Workflows detect and recover from failures. Circuit breakers prevent cascading problems. Human reviews escalations only.

Level 5: Adaptive orchestration Orchestrator agents decompose tasks dynamically. Parallel execution scales with workload. Human reviews metrics and adjusts policies.

Most teams plateau at Level 3. Levels 4 and 5 require investment in observability and failure handling that many organizations skip. The right level depends on how much value the automation provides versus operational overhead—Level 3 with solid quality gates beats a fragile Level 5 implementation every time.

Don't skip levels. Each builds capabilities the next requires. Level 4 without Level 2's quality gates creates autonomous systems that produce garbage.

Making workflows maintainable

Automation that works today breaks tomorrow. Dependencies update. APIs change. Requirements evolve.

Version everything:

Skills, hooks, and workflow definitions belong in version control. Changes go through review. Rollback is a git revert.

Test workflows, not just code:

# Workflow test in CI
- name: Test dependency update workflow
  run: |
    # Simulate outdated package
    npm install lodash@4.17.0

    # Run workflow logic
    ./scripts/dependency-update.sh

    # Verify expected behavior
    test -f updates.json
    grep -q "lodash" updates.json

Workflow logic is code. Test it like code.

Document decisions:

Why does the security scan run after tests instead of in parallel? Why is the cost limit $5 instead of $10? Future maintainers—including future you—need context.

# Workflow with decision documentation
jobs:
  security:
    # Runs after tests because security scanning on broken code
    # wastes time and produces misleading results.
    # Changed from parallel execution on 2025-11-15 after
    # three incidents of false positives from parse errors.
    needs: test

Comments in workflow files are underutilized. Use them.

What comes next

Skills, hooks, and multi-agent orchestration are tools. This page covered how to combine them. The final module shifts perspective: when should these tools be used at all?

Module 12 addresses judgment—knowing when agentic approaches accelerate work versus when traditional development remains superior. Having powerful automation available doesn't mean using it everywhere.

On this page