Back to Openings

Forward Deployed Engineer

San Francisco
Deployment
Full-time

The Mission

The best tech companies in the world have a dirty secret: they're slowing down. The same startups that once shipped features daily now take weeks to push a single change. Their velocity has been crushed under the weight of success—millions of users, thousands of microservices, and mountains of technical debt. In San Francisco, you won't be fighting mainframes. You'll be fighting complexity at scale. Your clients are late-stage unicorns and big tech companies who've lost their startup speed and desperately want it back. Your job is to install the "AI-Native Operating System" that transforms how they build software. You'll work directly with legendary CTOs and engineering leaders who built the products you use every day. You'll implement agentic CI/CD pipelines that catch bugs before humans do. You'll build self-healing infrastructure that fixes itself at 3am so on-call engineers can sleep. You'll show teams shipping to millions of users how to move fast without breaking things. This is the frontier of developer productivity. The patterns you establish here will define how the next generation of software gets built.

Responsibilities

  • Implement 'Eval-Driven Development' workflows for engineering teams shipping to millions of users—replacing flaky tests with intelligent agent-powered verification.
  • Build custom MCP servers that connect Retrain agents to proprietary internal tools: deployment systems, feature flags, observability platforms, and incident response.
  • Optimize agent performance for high-throughput production environments: reduce latency, minimize token costs, and ensure reliability at scale.
  • Design and deploy 'Self-Healing' infrastructure patterns where agents automatically detect, diagnose, and fix production issues before they page humans.
  • Work directly with CTOs and VP Engineering to redefine their engineering culture: new hiring criteria, updated performance metrics, and AI-augmented development workflows.
  • Lead architecture reviews and help teams identify the highest-leverage opportunities for agent automation in their specific codebase and workflow.
  • Build demo environments and proof-of-concepts that showcase the art of the possible—turning skeptics into champions.
  • Contribute to Retrain's internal tooling: if you find yourself doing something manually twice, automate it and share it with the team.
  • Develop reference implementations and playbooks that accelerate future deployments and scale our collective knowledge.
  • Represent Retrain at Bay Area tech events, meetups, and conferences—building our brand in the developer community.

Requirements

  • 5+ years of experience in Infrastructure, DevOps, Platform Engineering, or Site Reliability Engineering.
  • Expert-level knowledge of cloud-native technologies: Kubernetes, Docker, Terraform, and at least one major cloud provider (AWS/GCP/Azure).
  • Strong programming skills in systems languages: Go, Rust, or Python with performance-critical experience.
  • Experience with modern web stacks: Next.js, React, TypeScript—you should be able to build full-stack when needed.
  • Deep understanding of CI/CD pipelines, deployment strategies, and release engineering at scale.
  • Track record of developer tooling contributions: you've built internal tools, CLIs, or automation that made your team significantly faster.
  • A history of 'automating yourself out of a job'—you find manual processes offensive and compulsively eliminate them.
  • Strong opinions about developer experience, loosely held—you know what great DX looks like but you're open to learning.
  • Passion for the AI/LLM space: you've built with foundation models and understand their capabilities and limitations.
  • Based in the San Francisco Bay Area or willing to relocate.

Apply for this role

Click to upload

PDF, DOCX up to 10MB

Protected by Retrain.ent AI-Screening.