Swoon · 5 hours ago
Site Reliability Engineer
Swoon is actively seeking a Product Platform Site Reliability Engineer to join the team. The role involves designing, building, and operating multi-cloud platform infrastructure to support healthcare applications while ensuring reliability, scalability, and security.
Responsibilities
Design, build, and operate shared multi-cloud platform infrastructure (AWS, GCP, Azure) to support secure, scalable, and highly available healthcare applications
Develop and manage Kubernetes-based platform services, including multi-cluster environments and service mesh (Istio), to ensure resilient application delivery
Implement and maintain Infrastructure-as-Code and automation frameworks (Terraform, Helm, CloudFormation, Ansible) to standardize and streamline platform environments
Build and operate CI/CD and GitOps pipelines (Bitbucket Pipelines, ArgoCD) to enable reliable, repeatable, and zero-downtime application deployments
Architect and maintain high-availability, disaster recovery, and cross-region deployment solutions for mission-critical services
Establish and manage platform-wide monitoring, observability, and alerting systems (Prometheus, Grafana, OpenTelemetry) to proactively ensure reliability and performance
Enforce security, compliance, and cost-optimization practices (HIPAA, SOC 2, ISO 27001, FinOps) while reducing operational toil through automation and continuous improvement
Qualification
Required
Bachelor's degree in Computer Science (or related field)
4+ years of hands-on experience operating production-grade cloud and platform infrastructure
Demonstrated expertise in Kubernetes, cloud platforms, and Infrastructure-as-Code, including CI/CD, GitOps, and automated environment management
Strong background in monitoring, observability, and reliability engineering, with experience supporting highly available, distributed systems
Proven ability to diagnose, troubleshoot, and resolve complex platform and infrastructure issues, including participation in on-call rotations and incident response
Proficiency in at least one scripting or programming language (Python, Go, or Bash) for automation, tooling, and operational support
Must be a US Citizen or Permanent Resident – due to the security of this role, no other work authorizations can be used
Company
Swoon
In 2010, Swoon launched an agile, client-focused team that is not only savvy in our core industries but elbow-deep, every day, getting to know the strongest talent in the technology and professional fields.