Shopify · 5 months ago
Machine Learning Infrastructure Engineers
Shopify is a company that empowers entrepreneurs and enterprises to reach their potential. The Machine Learning Infrastructure Engineer will build and operate the platform that powers AI, focusing on high-performance systems and enhancing the developer experience for ML teams.
E-CommerceE-Commerce PlatformsEnterprise SoftwareSaaS
Responsibilities
Build and operate ML control planes, APIs, CLIs, and self-serve golden paths
Design and optimize multi-tenant GPU Kubernetes clusters, including autoscaling, scheduling, packing, and utilization
Own model lifecycle: training orchestration/experiments, registries/versioning, CI/CD, canary/blue-green, and safe rollback
Build real-time serving stacks (KServe/Seldon/TensorFlow Serving) and end-to-end pipelines for batch and streaming
Design feature platforms and engineer storage/data movement for datasets, features, and artifacts tuned for cost/performance
Implement observability and SLOs across pipelines, training, and inference; automate remediation and capacity planning
Partner with ML, data, and product teams to unblock delivery and accelerate idea-to-impact
Qualification
Required
Proven platform/infrastructure engineering experience with a track record of shipping production systems and code
Deep Kubernetes/containerization expertise for ML workloads (operators, Helm, service mesh/gRPC) and multi-tenant clusters
Hands-on experience running GPU infrastructure at scale (NVIDIA ecosystem; scheduling/packing/optimization)
Strong distributed systems and API/service design fundamentals; experience with high-scale inference
Proficiency with infrastructure-as-code and automation (Terraform, Helm, GitOps) on major clouds (GCP/AWS/Azure)
Observability expertise (Prometheus/Grafana) and SLO-driven operations for ML systems
Proficient in Python/Go/Java; experience building developer tooling and self-service platforms
Preferred
Model serving and lifecycle tooling: KServe/Seldon/TensorFlow Serving, Kubeflow, MLflow/W&B, model registries, DVC
Feature store experience (Feast/Tecton) with online/offline parity and SLAs
Data infrastructure familiarity (Kafka, Spark/Flink) and stateful stores (Redis/MySQL); CI/CD for online/batch inference
Model performance optimization (batching, caching, quantization, distillation) and hardware-aware tuning
Experience with experimentation/A/B testing platforms and online evaluation frameworks
Company
Shopify
Shopify is a cloud-based, multi-channel commerce platform designed for small and medium-sized businesses.
H1B Sponsorship
Shopify has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (57)
2024 (37)
2023 (9)
2022 (84)
2021 (41)
2020 (13)
Funding
Current Stage
Public CompanyTotal Funding
$122.25MKey Investors
Bessemer Venture PartnersKlister Credit
2015-05-21IPO
2013-12-11Series C· $100M
2011-10-17Series B· $15M
Recent News
BNN Bloomberg
2026-01-13
2026-01-13
2026-01-13
Company data provided by crunchbase