CoreWeave · 15 hours ago
Staff Production Engineer
CoreWeave is The Essential Cloud for AI™, providing a platform that enables innovators to build and scale AI with confidence. The Staff Production Engineer will design and own foundational platforms that ensure operational excellence, focusing on improving availability, resiliency, and delivery velocity at scale.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingCloud InfrastructureInformation TechnologyMachine Learning
Responsibilities
Design, build, and own the foundational platforms and frameworks that underpin operational excellence across CoreWeave
Combine deep technical leadership with hands-on engineering to create systems that improve availability, resiliency, and delivery velocity at scale
Develop a deep understanding of CoreWeave’s infrastructure and services, shape architecture and tooling decisions, and partner closely with service owners to operationalize reliability through automation and paved paths rather than manual process or advocacy
Lead technical strategy and execution for internal tooling that reduces manual operations, improves delivery velocity, and supports CoreWeave’s revenue growth through faster, more reliable datacenter delivery
Partner with service owners and platform teams to translate reliability and operational requirements into automation, self-service capabilities, and opinionated paved paths
Build and evolve systems for observability, alerting, automated remediation, resiliency testing, and authoritative sources of truth, operationalizing best practices through tooling rather than manual enforcement
Participate in incident response for critical outages with the explicit goal of improving systems, tooling, and defaults to reduce future operational load—not as a long-term escalation path
Ship production code, participate in on-call rotations as needed, and mentor engineers on platform ownership, operational design, and sustainable production practices
Qualification
Required
10+ years of experience building and operating distributed systems or cloud platforms at scale
Demonstrated ability to diagnose and resolve complex production failures across services, infrastructure, and automation layers
Strong programming experience (Python, Go, or similar) with a history of shipping and operating production systems
Deep expertise in cloud-native platforms and distributed systems, especially Kubernetes
Advanced experience with observability and incident practices, including metrics, tracing, structured logs, SLIs/SLOs, and PIRs
Proven ability to lead large technical efforts and influence outcomes across teams without direct authority
Track record of delivering durable, platform-driven improvements that reduce operational risk and scale with organizational growth
Preferred
Ownership of foundational internal platforms or frameworks used broadly across an organization
Experience with service tiering, disaster recovery or business continuity planning, chaos engineering, or structured resilience programs
Background operating large-scale AI/cloud infrastructure
Experience guiding organizations through rapid scale while maintaining operational quality and discipline
Benefits
Medical, dental, and vision insurance - 100% paid for by CoreWeave
Company-paid Life Insurance
Voluntary supplemental life insurance
Short and long-term disability insurance
Flexible Spending Account
Health Savings Account
Tuition Reimbursement
Ability to Participate in Employee Stock Purchase Program (ESPP)
Mental Wellness Benefits through Spring Health
Family-Forming support provided by Carrot
Paid Parental Leave
Flexible, full-service childcare support with Kinside
401(k) with a generous employer match
Flexible PTO
Catered lunch each day in our office and data center locations
A casual work environment
A work culture focused on innovative disruption
Company
CoreWeave
CoreWeave is a cloud-based AI infrastructure company offering GPU cloud services to simplify AI and machine learning workloads.
Funding
Current Stage
Public CompanyTotal Funding
$23.37BKey Investors
Jane Street CapitalStack CapitalCoatue
2025-12-08Post Ipo Debt· $2.54B
2025-11-12Post Ipo Debt· $1B
2025-08-20Post Ipo Secondary
Recent News
2026-01-13
The Motley Fool
2026-01-13
2026-01-13
Company data provided by crunchbase