DatologyAI · 1 day ago
Software Engineer, Cloud Infrastructure
DatologyAI is a pioneering company focused on optimizing data curation for machine learning models. They are seeking an experienced Cloud Infrastructure Engineer to design, build, and operate secure and scalable cloud infrastructure, collaborating with various teams to support training and inference pipelines.
Artificial Intelligence (AI)Data CenterData IntegrationDatabaseInformation Technology
Responsibilities
Architect and maintain our multi-cloud infrastructure (primarily AWS, potentially Azure/GCP), with a focus on reliability, security, and scalability
Define and implement infrastructure-as-code best practices using Terraform, CloudFormation, Pulumi (and similar technologies)
Design and manage Kubernetes-based systems for model training, inference, and data processing workloads
Optimize our CI/CD pipelines and streamline deployment of services across environments
Build monitoring, alerting, and logging systems to ensure high system availability and observability
Collaborate with research and engineering teams to provide infrastructure support for training large-scale ML models
Ensure our infrastructure supports various deployment models (cloud, on-prem, hybrid) for enterprise use cases
Drive cost-efficiency strategies across compute and storage resources
Respond to and resolve infrastructure-related incidents with a sense of ownership and urgency
Qualification
Required
You've led or helped build robust infrastructure systems at a startup or fast-moving engineering organization
Deep experience working with cloud providers (especially AWS), and ideally exposure to multi-cloud or hybrid-cloud setups
Strong with Kubernetes, Terraform, and containerized architectures
Confident with systems-level debugging—networking issues, memory leaks, resource bottlenecks, etc
Comfortable writing clean, maintainable scripts in Bash, Python, or Go
You care deeply about building secure and scalable systems and take pride in reliable infrastructure
You're collaborative, humble, and ready to own high-impact projects end-to-end
Architect and maintain our multi-cloud infrastructure (primarily AWS, potentially Azure/GCP), with a focus on reliability, security, and scalability
Define and implement infrastructure-as-code best practices using Terraform, CloudFormation, Pulumi (and similar technologies)
Design and manage Kubernetes-based systems for model training, inference, and data processing workloads
Optimize our CI/CD pipelines and streamline deployment of services across environments
Build monitoring, alerting, and logging systems to ensure high system availability and observability
Collaborate with research and engineering teams to provide infrastructure support for training large-scale ML models
Ensure our infrastructure supports various deployment models (cloud, on-prem, hybrid) for enterprise use cases
Drive cost-efficiency strategies across compute and storage resources
Respond to and resolve infrastructure-related incidents with a sense of ownership and urgency
Preferred
Experience supporting infrastructure for ML workloads (training pipelines, inference clusters, GPU orchestration)
Built or scaled infrastructure for teams working with large-scale datasets
Exposure to cost monitoring and optimization tools in cloud environments
Background supporting compliance and security in enterprise deployments
Benefits
100% covered health benefits (medical, vision, and dental).
401(k) plan with a generous 4% company match.
Unlimited PTO policy
Annual $2,000 wellness stipend.
Annual $1,000 learning and development stipend.
Daily lunches and snacks are provided in our office!
Relocation assistance for employees moving to the Bay Area.
Company
DatologyAI
DatologyAI is an AI-data curation startup that develops deep learning tools for automatic selection in data training.
H1B Sponsorship
DatologyAI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (4)
2024 (2)
Funding
Current Stage
Early StageTotal Funding
$57.65MKey Investors
FelicisAmplify Partners
2024-05-08Series A· $46M
2024-02-22Seed· $11.65M
Recent News
felicis.com
2025-12-30
Company data provided by crunchbase