Sr. MLOps Engineer – ML Platform jobs in United States
cer-icon
Apply on Employer Site
company-logo

BrightAI · 1 week ago

Sr. MLOps Engineer – ML Platform

Bright.AI is a high-growth Physical AI company transforming how infrastructure businesses interact with the physical world through intelligent automation. They are seeking a Senior MLOps Engineer to lead the build-out of their cloud-native ML developer platform and production pipelines, focusing on designing scalable data/model workflows and CI/CD for ML. This role is pivotal in enabling teams to transition from notebook to secure, reliable, and cost-efficient production services quickly.

Artificial Intelligence (AI)Cloud ComputingInternet of ThingsMobile

Responsibilities

Design, build, and operate our ML/AI development platform on AWS—including Amazon SageMaker AI (Studio/Notebooks, Training/Processing/Batch Transform, Real‑Time & Async Inference, Pipelines, Feature Store) and supporting services
Establish golden‑path project templates, base Docker images, and internal Python libraries to standardize experiments, data processing, training, and deployment workflows
Implement Infrastructure‑as‑Code (e.g., Terraform) and workflow orchestration (Step Functions, Airflow); optionally support EKS for training/inference
Build automated data pipelines with S3, Glue, EMR/Spark (PySpark), Athena/Redshift; add data quality (Great Expectations/Deequ) and lineage
Stand up experiment tracking and a model registry (SageMaker Experiments & Model Registry or MLflow); enforce versioning for data, code, and models
Implement CI/CD for ML (CodeBuild/CodePipeline or GitHub Actions): unit/integration tests, data contracts, model tests, canary/shadow deployments, and safe rollback
Ship real‑time endpoints (SageMaker endpoints/FastAPI on Lambda/ECS/EKS) and batch jobs; set SLOs and autoscaling, and optimize for cost/performance
Build monitoring & observability for production models and services (drift, performance, bias with SageMaker Model Monitor; service telemetry with CloudWatch/Prometheus/Grafana)
Enforce security & governance: least‑privilege IAM, VPC isolation/PrivateLink, encryption, secret management
Partner with backend engineers to productionize notebooks and prototypes
Help integrate GenAI/Bedrock services where appropriate; support RAG pipelines with vector stores (OpenSearch) and evaluation harnesses

Qualification

MLOpsAWSPythonCI/CD for MLDockerTerraformData engineeringExperiment trackingMonitoring & observabilitySoft skills

Required

B.S. or M.S. in Computer Science, Electrical/Computer Engineering, or related field; advanced degree a plus
5+ years in software/ML engineering, including 2+ years in MLOps or in a similar role
Strong programming skills (proficient in Python), fluent with Docker and Terraform or AWS CDK
Hands-on with AWS: SageMaker, S3, IAM, CloudWatch, ECR, and ECS/EKS/Lambda
Built and operated CI/CD for ML (tests for code/data/models; automated deploys) and shipped real-time & batch ML workloads to production
Experience with experiment tracking & model registry (e.g., SageMaker Experiments/Model Registry or MLflow) and data versioning
Implemented monitoring & quality (SageMaker Model Monitor, EvidentlyAI, Great Expectations/Deequ) and created on-call/runbooks for model & service incidents
Solid grasp of security & compliance in cloud ML (IAM policy design, VPC/private networking, KMS encryption, secrets management, audit logging)

Preferred

Distributed training at scale (SageMaker Training, PyTorch DDP, Hugging Face on SageMaker)
Data engineering at scale (e.g., Spark/EMR, Glue, Redshift)
Observability stacks (e.g., Grafana), performance tuning, and capacity planning for ML services
LLMOps/RAG (Bedrock, vector databases, evals) as optional capabilities
Prior startup experience building ML platforms and products from the ground up

Company

BrightAI

twittertwitter
company-logo
BrightAI provides physical AI solutions for infrastructure and services.

Funding

Current Stage
Growth Stage
Total Funding
$66M
Key Investors
Upfront Ventures
2025-07-18Series A· $51M
2024-11-19Seed· $15M

Leadership Team

leader-logo
Alex Hawkinson
Founder & CEO
linkedin
Company data provided by crunchbase