Founding Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Relevance AI · 4 months ago

Founding Site Reliability Engineer

Relevance AI is a fast-growing company focused on AI automation, enabling teams to create intelligent AI agents. They are seeking a Founding Site Reliability Engineer to establish and scale the SRE discipline, ensuring the reliability, scalability, and security of their platform as it powers multi-agent workloads globally.

Agentic AIAnalyticsArtificial Intelligence (AI)Generative AIMachine LearningSoftware

Responsibilities

Own SRE establishing best practices, tooling, and culture
Tackle reliability challenges unique to multi-agent orchestration at enterprise scale
Guarantee >99.9% uptime of production systems, ensuring reliability at global scale
Architect and automate AWS infrastructure with Terraform and CI/CD pipelines
Design observability systems across microservices, APIs, and vector infrastructure (metrics, tracing, logging)
Drive down incidents and MTTR through runbooks, alerting, and incident response excellence
Help scale infra to support hundreds of thousands of agents and billions of API calls
Partner with engineering teams to embed SRE principles into the SDLC and shape org-wide reliability strategy
Act as a founding voice in our SF office, influencing product direction and engineering culture

Qualification

AWS expertiseInfrastructure as CodeObservability stacksIncident managementMulti-agent workloadsSoft skills

Required

5+ years in SRE/DevOps/Infrastructure roles, with experience in enterprise SaaS environments
Deep AWS expertise (EC2, ECS/EKS, Lambda, RDS, VPC, IAM)
Proven track record with Infrastructure as Code (Terraform, Kubernetes/EKS, CDK, or CloudFormation)
Hands-on with observability stacks (CloudWatch, Grafana, Prometheus, Datadog)
Incident management experience in production SaaS systems, including on-call, postmortems, and reliability improvements

Preferred

Prior exposure to AI/ML platforms, data-heavy systems, or multi-agent workloads

Company

Relevance AI

twittertwittertwitter
company-logo
Relevance AI provides an AI agent operating system that helps companies automate repetitive reasoning tasks.

Funding

Current Stage
Growth Stage
Total Funding
$42M
Key Investors
Bessemer Venture PartnersKing River CapitalInsight Partners
2025-05-06Series B· $24M
2023-12-12Series A· $15M
2021-12-12Seed· $3M

Leadership Team

J
Jacky Koh
Founder & co-CEO
linkedin
leader-logo
Daniel Vassilev
Co-Founder
Company data provided by crunchbase