Varda Space Industries · 6 hours ago
Principal Site Reliability Engineer
Varda is a company focused on developing commercial space infrastructure, aiming to create products and systems that benefit life on Earth. As a Principal Site Reliability Engineer, you will set the technical vision for reliability across various platforms and mentor senior engineers to ensure mission readiness and operational excellence.
AerospaceManufacturingProduct Design
Responsibilities
Lead and contribute hands-on to the deployment, maintenance, and operations of mission-critical applications and infrastructure supporting spacecraft, ground systems, and company-wide platforms
Design, execute, and manage highly scalable, reliable, and operable software and infrastructure platforms, applying Infrastructure as Code (IaC) principles to drive automation, consistency, and repeatability across Kubernetes environments
Collaborate closely with software and hardware teams to align reliability best practices, CI/CD pipelines, and compliance with their workflows, enabling faster, more secure deployments for mission-critical systems
Anticipate and address reliability risks, capacity challenges, and performance bottlenecks; develop long-term strategies in partnership with leadership
Rotate through the team’s on-call schedule to keep critical systems healthy and responsive
Occasionally travel to customer sites and other Varda locations to troubleshoot, deploy, or test critical infrastructure
Qualification
Required
10+ years of experience in SRE, DevOps, or systems engineering, including leadership of large-scale, mission-critical systems
Experience leading technical direction and architecture for large-scale systems
Hands-on experience with observability stacks and telemetry pipelines—including metrics collection, alerting, and dashboards—for Linux systems and Kubernetes workloads (e.g., Prometheus and Grafana)
Strong background in systems architecture and software-defined networking (VPC, subnets, firewalls, VPNs, etc.)
Proficiency in automation and scripting with Python, Bash, or similar languages
Positive and strong communication skills, both written and oral
Preferred
Expertise in time-series databases (e.g., InfluxDB) for large-scale telemetry pipeline
Expertise in provisioning and managing scalable Azure cloud infrastructure using native tools and best practices (Azure GCC High preferred)
Experience with IaC tools like Terraform, and Ansible and CI/CD systems like Git and ArgoCD
Experience building and maintaining dynamic system configurations with templating frameworks such as YAML, and Helm
Strong understanding of Linux systems, containerization technologies, and Kubernetes internals
Benefits
Equity in a fully funded space startup with potential for significant growth (interns excluded)
401(k) matching (interns excluded)
Unlimited PTO (interns excluded)
Health insurance, including Vision and Dental
Lunch and snacks provided on site every day. Dinners provided twice a week.
Maternity / Paternity leave (interns excluded)
Company
Varda Space Industries
Low Earth orbit is open for business.
Funding
Current Stage
Growth StageTotal Funding
$328.02MKey Investors
Alumni VenturesCaffeinated CapitalKhosla Ventures
2025-07-10Series C· $187M
2024-12-19Series C
2024-04-05Series B· $90M
Recent News
Business Insider
2025-12-25
2025-12-17
Company data provided by crunchbase