Staff Machine Learning Engineer (Infra) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Aarki · 20 hours ago

Staff Machine Learning Engineer (Infra)

Aarki is an AI-driven company specializing in mobile advertising solutions designed to fuel revenue growth. The Staff Machine Learning Engineer (Infra) will design, build, and operate the model training and deployment infrastructure for the Demand-Side Platform, focusing on automation, reproducibility, and reliability.

AdvertisingApp MarketingArtificial Intelligence (AI)MobileMobile Advertising
check
H1B Sponsor Likelynote

Responsibilities

Own and evolve shared ML infrastructure for training, deployment, and lifecycle management; deliver measurable gains in reliability, cost, and developer velocity
Lead cross-pod initiatives end-to-end (design → build → production), reducing org bottlenecks and aligning stakeholders on goals and success metrics
Build scalable training and orchestration systems (Prefect-first) for billion-scale datasets with strong failure recovery and backfill support
Build and operate high-throughput, low-latency serving/inference systems for DSP models (bidding, ranking, pacing, fraud), including safe rollouts and performance guardrails
Establish ML observability across the lifecycle: data quality, training stability, drift/anomalies, and regression monitoring with actionable alerting and runbooks
Standardize reproducibility and governance: versioning, lineage/traceability, and experiment tracking (MLflow), with clear production readiness criteria
Drive operational excellence for owned components: on-call ownership, incident response, postmortems, and reliability improvements
Build foundations for feature management (feature pipelines/feature store) and offline/online consistency guarantees

Qualification

Production ML systemsPythonSparkWorkflow orchestrationDevOps/MLOps practicesHigh-throughput systemsExperimentationReproducibilityC++RustProbabilityStatisticsAd-tech familiarityCommunication skills

Required

6+ years building and operating production ML systems, including training pipelines and online inference
Strong Python and Spark for large-scale processing (on-prem/YARN environments preferred)
Proven experience with workflow orchestration for ML (Prefect or similar) and production-grade automation
Experience designing and operating serving systems in high-throughput, low-latency environments (REST/gRPC, canary/rollback strategies)
Strong DevOps/MLOps practices: CI/CD, automated testing, infrastructure as code, and reliability engineering
Strong understanding of experimentation and reproducibility: dataset/model versioning, lineage, and traceability; MLflow familiarity preferred
Solid grounding in core ML methods to evaluate and diagnose model/data issues
Strong communication skills across ML and engineering stakeholders

Preferred

Familiarity with system programming languages including C++ and Rust is a plus
Strong grasp of probability, statistics, and data analysis principles
Ad-tech familiarity: auction dynamics, pacing, fraud signals, creative personalization

Company

Aarki

twittertwittertwitter
company-logo
Arki is an AI company that offers advertising solutions aimed at enhancing revenue growth for mobile app businesses.

H1B Sponsorship

Aarki has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (4)
2024 (8)
2023 (6)

Funding

Current Stage
Growth Stage
Total Funding
$1.74M
Key Investors
Walden Venture Capital
2021-06-02Acquired
2013-05-02Convertible Note
2012-03-08Series A· $1.74M

Leadership Team

leader-logo
Aman Sareen
Chief Executive Officer
linkedin
Company data provided by crunchbase