ML Infra Engineer (Data Systems) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Physical Intelligence · 2 weeks ago

ML Infra Engineer (Data Systems)

Physical Intelligence is focused on building large-scale robot learning infrastructure. As an ML Infra Engineer (Data Systems), you will design and operate the data infrastructure that supports efficient data processing and machine learning training.

Artificial Intelligence (AI)Machine LearningRobotics
check
H1B Sponsor Likelynote

Responsibilities

Design and build high-throughput pipelines that validate, transform, and featurize raw multimodal data
Operate large-scale batch and streaming workflows over massive datasets
Design object storage layouts, metadata systems, and efficient access patterns; choose file formats with performance and scalability in mind
Build systems for backfills, dataset rebuilds, garbage collection, and large-scale transformations
Optimize dataloaders, sharding, prefetching, caching, and throughput to reduce time from data arrival → model training
Build scalable metadata stores for datasets, annotations, and training artifacts
Move hundreds of terabytes to petabytes efficiently across clusters and environments
Implement observability, validation, and guardrails to prevent silent data regressions
Work closely with cross-functional teams of researchers, engineers and roboticists to translate evolving data needs into robust systems

Qualification

Distributed systemsLarge-scale data pipelinesData ingestion & processingBatch processing systemsStreaming processing systemsObject storage systemsPerformance optimizationSoftware engineering fundamentalsCross-functional collaborationOwnership mindset

Required

Strong software engineering fundamentals
Experience building distributed systems or large-scale data pipelines
Comfort reasoning about performance, memory, I/O, and storage efficiency
Familiarity with batch and/or streaming processing systems
Experience with object storage systems and data format tradeoffs
Ownership mindset: design, build, operate, and iterate on systems end-to-end
Enjoy working closely with researchers and unblocking fast-moving projects

Preferred

Experience with large ML training pipelines or dataloading systems
Knowledge of columnar or custom data formats
Experience with systems like ClickHouse, Ray, Flink, Spark, or similar
Hands-on experience operating petabyte-scale datasets
Debugging and fixing performance bottlenecks in data-heavy systems

Company

Physical Intelligence

twittertwittertwitter
company-logo
Physical Intelligence is an AI company developing machine learning for robots and other physical devices.

H1B Sponsorship

Physical Intelligence has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (4)
2024 (1)

Funding

Current Stage
Growth Stage
Total Funding
$1.07B
Key Investors
CapitalGJeff Bezos,Lux Capital,Thrive CapitalThrive Capital
2025-11-20Series B· $600M
2024-11-04Series A· $400M
2024-03-12Seed· $70M

Leadership Team

leader-logo
Lachy Groom
Co-Founder
linkedin
Company data provided by crunchbase