InStride · 1 month ago
Principal Site Reliability Engineer (SRE)
InStride is a public benefit corporation focused on empowering employees through education. They are seeking a Principal Site Reliability Engineer to enhance their cloud architecture, automation, and reliability engineering efforts, ensuring operational excellence across their platform.
Continuing EducationE-LearningEdTechEducationHigher Education
Responsibilities
Design and operate multi-region, fault-tolerant systems that ensure InStride’s learning platform is always available for learners and partners
Deliver Infrastructure as Code libraries, CI/CD pipelines, and self-service capabilities that reduce operational toil and accelerate developer productivity
Implement defense-in-depth strategies, policy-as-code guardrails, and proactive monitoring to protect sensitive data and maintain trust
Define and enforce SLIs/SLOs, establish error-budget policies, and build monitoring frameworks that inform release readiness and operational decisions
Deploy and manage service mesh solutions that secure, monitor, and optimize service-to-service communication across Kubernetes workloads
Partner with engineering and security stakeholders to shape InStride’s AWS strategy, ensuring scalability, resilience, and cost efficiency
Share expertise, lead design reviews, and guide teams toward modern DevOps and SRE practices, raising the technical bar across the organization
Qualification
Required
10+ years of experience in SRE, DevOps, or Platform Engineering roles operating production AWS workloads
Hands-on expertise with AWS EKS, Kubernetes networking, Helm, autoscaling frameworks (Karpenter/Cluster Autoscaler), serverless architectures, and API Gateways
Proven delivery of service mesh solutions (Istio, Linkerd, or AWS App Mesh) for secure and observable service-to-service communication
Proficiency with Infrastructure as Code (IaC) using AWS CDK (TypeScript preferred/Python), Terraform, or CloudFormation
Strong programming and automation skills in Go, Python, or TypeScript, with additional proficiency in Bash
Demonstrated experience implementing policy-as-code with OPA/Rego or similar tooling integrated into CI/CD pipelines
Solid understanding of SLI/SLO/error-budget methodologies and hands-on experience with monitoring and alerting stacks (Prometheus, Grafana, CloudWatch, Groundcover)
Deep knowledge of AWS security best practices, including IAM policies, encryption, OS hardening, and compliance enforcement
Excellent communication skills with the ability to translate reliability metrics into business impact and guide incident/post-mortem discussions
Experience mentoring engineers and influencing enterprise AWS and DevOps strategies without direct management responsibilities
Preferred
Familiarity with Internal Developer Portals (Backstage, Port, Cortex) and self-service automation is a strong plus
Benefits
401(k) plan with company match
Flexible vacation policy
Paid family leave
Best-in-class health care benefits
And more!
Company
InStride
The premier global provider of strategic enterprise education programs.
H1B Sponsorship
InStride has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (5)
2024 (3)
2023 (4)
2022 (1)
2020 (2)
Funding
Current Stage
Growth StageRecent News
2025-10-24
InStride
2025-10-17
Company data provided by crunchbase