Director of Engineering, Inference Services jobs in United States
cer-icon
Apply on Employer Site
company-logo

CoreWeave · 1 month ago

Director of Engineering, Inference Services

CoreWeave is The Essential Cloud for AI™, delivering a platform that enables innovators to build and scale AI with confidence. The Director of Engineering will lead the development of the next-generation Inference Platform, focusing on optimizing GPU inference services and enhancing the overall developer experience.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingCloud InfrastructureInformation TechnologyMachine Learning
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Define and continuously refine the end-to-end Inference Platform roadmap, prioritizing low-latency, high-throughput model serving and world-class developer UX
Set technical standards for runtime selection, GPU/CPU heterogeneity, quantization, and model-optimization techniques
Design and implement a global, Kubernetes-native inference control plane that delivers <50 ms P99 latencies at scale
Build adaptive micro-batching, request-routing, and autoscaling mechanisms that maximize GPU utilization while meeting strict SLAs
Integrate model-optimization pipelines (TensorRT, ONNX Runtime, BetterTransformer, AWQ, etc.) for frictionless deployment
Implement state-of-the-art runtime optimizations —including speculative decoding, KV-cache reuse across batches, early-exit heuristics, and tensor-parallel streaming—to squeeze every microsecond out of LLM inference while retaining accuracy
Establish SLOs/SLA dashboards, real-time observability, and self-healing mechanisms for thousands of models across multiple regions
Drive cost-performance trade-off tooling that makes it trivial for customers to choose the best HW tier for each workload
Hire, mentor, and grow a diverse team of engineers and managers passionate about large-scale AI inference
Foster a customer-obsessed, metrics-driven engineering culture with crisp design reviews and blameless post-mortems
Partner closely with Product, Orchestration, Networking, and Security teams to deliver a unified CoreWeave experience
Engage directly with flagship customers (internal and external) to gather feedback and shape the roadmap

Qualification

GPU inference servicesKubernetesModel optimizationLarge-scale distributed systemsReal-time data-plane servicesCost-performance optimizationCI/CD for ML workloadsCommunicationTeam leadershipCollaboration

Required

10+ years building large-scale distributed systems or cloud services, with 5+ years leading multiple engineering teams
Proven success delivering mission-critical model-serving or real-time data-plane services (e.g., Triton, TorchServe, vLLM, Ray Serve, SageMaker Inference, GCP Vertex Prediction)
Deep understanding of GPU/CPU resource isolation, NUMA-aware scheduling, micro-batching, and low-latency networking (gRPC, QUIC, RDMA)
Track record of optimizing cost-per-token / cost-per-request and hitting sub-100 ms global P99 latencies
Expertise in Kubernetes, service meshes, and CI/CD for ML workloads; familiarity with Slurm, Kueue, or other schedulers a plus
Hands-on experience with LLM optimization (quantization, compilation, tensor parallelism, speculative decoding) and hardware-aware model compression
Excellent communicator who can translate deep technical concepts into clear business value for C-suite and engineering audiences
Bachelor's or Master's in CS, EE, or related field (or equivalent practical experience)

Preferred

Experience operating multi-region inference fleets at a cloud provider or hyperscaler
Contributions to open-source inference or MLOps projects
Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry) for AI workloads
Background in edge inference, streaming inference, or real-time personalization systems

Benefits

Medical, dental, and vision insurance - 100% paid for by CoreWeave
Company-paid Life Insurance
Voluntary supplemental life insurance
Short and long-term disability insurance
Flexible Spending Account
Health Savings Account
Tuition Reimbursement
Ability to Participate in Employee Stock Purchase Program (ESPP)
Mental Wellness Benefits through Spring Health
Family-Forming support provided by Carrot
Paid Parental Leave
Flexible, full-service childcare support with Kinside
401(k) with a generous employer match
Flexible PTO
Catered lunch each day in our office and data center locations
A casual work environment
A work culture focused on innovative disruption

Company

CoreWeave

twittertwittertwitter
company-logo
CoreWeave is a cloud-based AI infrastructure company offering GPU cloud services to simplify AI and machine learning workloads.

Funding

Current Stage
Public Company
Total Funding
$26.87B
Key Investors
NVIDIAGoldman Sachs,JP Morgan Chase,Morgan Stanley,MUFG Union BankJane Street Capital
2026-01-26Post Ipo Equity· $2B
2025-12-08Post Ipo Debt· $2.54B
2025-11-12Post Ipo Debt· $2.5B

Leadership Team

leader-logo
Michael Intrator
Chief Executive Officer
linkedin
leader-logo
Brannin McBee
Founder & CDO
linkedin
Company data provided by crunchbase