Senior/Staff Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Mochi Health · 5 hours ago

Senior/Staff Site Reliability Engineer

Mochi Health is on a mission to be the discovery layer of healthcare, building a platform that empowers patients and providers. They are seeking a Senior/Staff Site Reliability Engineer to develop an AI-driven incident management system and improve the reliability posture of their services through innovative technology and operational excellence.

Health Care
check
H1B Sponsor Likelynote

Responsibilities

Build an AI-driven SRE platform that ingests telemetry (logs/metrics/traces), deploy events, and incident artifacts to detect anomalies, summarize failures, and propose mitigations
Design a human-in-the-loop learning loop (RLHF-style) so the system gets better with every incident: capturing decisions, outcomes, and postmortems into training/evaluation data
Create safe auto-remediation capabilities: runbook execution, automated rollbacks, feature-flag actions with strong guardrails, auditability, and progressive rollout controls
Build tooling that can propose bug fixes: generate well-scoped PRs, run tests, support canary releases—with clear handoff and approval flows
Define and operationalize SLOs/SLIs and error budgets for critical user journeys (patient onboarding, provider workflows, pharmacy fulfillment, billing, etc.)
Level up observability end-to-end: alert quality, dashboarding, tracing standards, and “unknown unknown” detection
Lead incident response excellence: on-call improvements, incident command, blameless postmortems, and driving systemic fixes that reduce repeat failures
Partner with product + engineering teams to reduce toil and improve reliability via better architecture, load testing, resilience testing, and capacity planning
Establish reliability standards and patterns across the org (golden signals, deployment safety, dependency management, fault isolation)

Qualification

Site Reliability EngineeringKubernetesAWSSoftware EngineeringIncident ResponseAutomationAI ToolingObservabilityStartup MindsetCommunication SkillsCollaboration Skills

Required

7+ years in SRE / platform / infrastructure engineering, with a track record of owning production reliability at scale
Deep experience operating Kubernetes-based systems in the cloud (AWS preferred), including networking, autoscaling, rollout strategies, and incident mitigation
Strong software engineering ability—you can debug production issues across services, understand failure modes, and contribute code when needed (Python/Go/TypeScript are all great)
Expert-level grasp of observability and incident response: metrics, logs, tracing, alerting design, and postmortem-driven improvements
Comfortable building automation that touches production—and obsessive about safety: least-privilege access, audit logs, approvals, canaries, and rollback
Excited by AI tooling and agentic workflows (or already experienced): LLM-based triage/summarization, retrieval over runbooks/postmortems, evaluation harnesses, and feedback loops
Strong communication and collaboration skills—you can lead during incidents, write clearly, and align teams around reliability priorities
Startup mindset: you move fast, take end-to-end ownership, and love turning ambiguity into shipped systems
Excited to work in-person with our team in San Francisco

Preferred

Experience building LLM-powered internal tools (incident copilots, automated debugging, RAG over docs/runbooks) and/or RLHF-style feedback pipelines
Familiarity with security and compliance in regulated environments (HIPAA, SOC 2, audit requirements, PHI handling)
Experience with chaos engineering / game days and resilience testing programs
Experience building CI/CD guardrails and progressive delivery systems (canaries, automated verification, safe rollout policies)
Prior work on distributed tracing standards (OpenTelemetry), service meshes, or large-scale event-driven systems

Benefits

Daily Meals and Espresso Bar - Breakfast, lunch, and dinner every weekday. Our on-site barista keeps the espresso and matcha flowing all day
Pre-Tax Commuter Perks - Save on transit and parking through pre-tax commuter benefits
Top-of-Market Compensation - We offer competitive salaries along with generous equity packages so you can share in the success you help create
Profitable and Rapid Growth - We’re scaling fast, with financial discipline and long-term vision. No VC constraints, just sustainable momentum and smart decisions
High-Impact Work - Help shape the future of digital healthcare. Your work here directly improves lives and scales nationwide
World-Class Team - Collaborate with teammates from Tesla, SpaceX, Citadel, Harvard, IIT, and more. We value excellence, humility, and empathy in equal measure
Comprehensive Benefits - 401(k) with match, generous time off, life insurance, and high-quality medical, dental, and vision plans
Mochi Health Membership – We cover your monthly subscription fee so you can experience the same care as our patients (medications not included)
Time to Recharge – Enjoy unlimited PTO, generous company holidays, and true flexibility. We trust you to take the time you need to rest, reset, and thrive
Wellness First – From weekly mindfulness sessions to group workouts and fitness perks, your physical and mental health are top priority
Team Socials and Community - We make time to connect through regular socials, happy hours, and spontaneous events. Our stocked kitchen doesn’t hurt either
Downtown SF HQ - Our San Francisco office is just steps from BART, Muni, and great food. It’s designed for deep work and casual collaboration

Company

Mochi Health

twittertwittertwitter
company-logo
Mochi Health is a provider of FDA-approved prescription medications for weight loss and an obesity treatment program.

H1B Sponsorship

Mochi Health has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)

Funding

Current Stage
Growth Stage
Total Funding
$0.5M
Key Investors
AngelList
2022-03-14Pre Seed· $0.5M
Company data provided by crunchbase