xAI · 8 hours ago
Site Reliability Engineer - US Government
xAI is a company focused on creating AI systems that understand the universe and aid humanity. They are seeking a Senior Infrastructure Engineer to design, build, and operate secure, scalable infrastructure for critical government projects, ensuring compliance with federal regulations.
Artificial Intelligence (AI)Generative AIInformation TechnologyMachine Learning
Responsibilities
Develop and optimize software to provision and manage xAI’s infrastructure across on-premise, virtual machine, and classified cloud environments, enabling efficient scaling for US government initiatives
Enhance the reliability, performance, and cost-effectiveness of infrastructure to support large-scale AI and application workloads in secure, classified settings
Collaborate with xAI engineers to understand workload requirements and design tailored solutions that meet government-specific needs and compliance standards
Implement robust observability, monitoring, and security practices to ensure the integrity, availability, and confidentiality of critical systems, adhering to federal protocols
Manage storage infrastructure using Infrastructure-as-Code (IaC) tools such as Pulumi, Terraform, or Ansible, with a focus on secure data handling
Drive system reliability through incident management, postmortems, and the definition of clear SLAs and SLOs, while maintaining security and compliance
Qualification
Required
Active Top Secret (TS) security clearance
5+ years of experience as an Infrastructure Engineer, Site Reliability Engineer, or similar role, with a focus on building and maintaining reliable, scalable systems, preferably in secure or government environments
Proficiency in managing storage infrastructure with IaC tools such as Pulumi, Terraform, or Ansible
Deep understanding of the Kubernetes stack, including CNI, CRI, CSI, and related components
Demonstrated ability to improve system reliability through incident management, postmortems, and defining SLAs/SLOs
Excellent communication and documentation skills, with the ability to handle sensitive information concisely and accurately
Preferred
Deep familiarity with installing and using GPU hardware, including setting up drivers, debugging issues, and ensuring reliability
Experience with high-traffic web or mobile application workloads, including optimizing Kubernetes for large-scale deployments in classified or federal settings
Familiarity with chaos engineering, capacity planning, or similar practices for ensuring system resilience in government projects
Proficiency with tools such as Kyverno, ArgoCD, or Go programming for infrastructure automation
Strong sense of ownership, curiosity, and enthusiasm for tackling complex technical challenges in secure environments
Passion for problem-solving and a proactive drive to deliver impactful results while adhering to security protocols
Certifications in security-related fields (e.g., CISSP) or experience in secure federal environments
Benefits
Equity
Comprehensive medical, vision, and dental coverage
Access to a 401(k) retirement plan
Short & long-term disability insurance
Life insurance
Various other discounts and perks
Company
xAI
XAI is an artificial intelligence startup that develops AI solutions and tools to enhance reasoning and search capabilities.
Funding
Current Stage
Growth StageTotal Funding
$22.73BKey Investors
Neptune Digital AssetsSpaceXMorgan Stanley
2025-12-11Secondary Market· $0.3M
2025-07-13Corporate Round· $5.32B
2025-07-01Debt Financing· $5B
Recent News
2025-12-31
2025-12-31
Company data provided by crunchbase