Staff Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Kharon · 6 hours ago

Staff Site Reliability Engineer

Kharon is a highly disruptive organization that navigates risk at the intersection of global security threats and international commerce. They are seeking a Staff Site Reliability Engineer to build resilient and scalable systems, champion best practices in observability and automation, and collaborate across teams to ensure the reliability of critical insights for their clients.

AnalyticsBusiness IntelligenceComplianceRisk Management
check
H1B Sponsor Likelynote

Responsibilities

Stand up and standardize metrics, logging, tracing, and alert hygiene; introduce golden dashboards and alert runbooks
Coach engineers on reliability practices, including leading incident response (MTTA/MTTR) running blameless postmortems, reliability reviews
Plan capacity, conduct load/perf tests, and drive performance tuning and cost–reliability tradeoffs
Collaborate with DevOps on Kubernetes/cloud/IaC standards, including creating paved roads and production-readiness checklists for app teams
Work cross functionally on resilient CI/CD (pre-deployment checks, canary/blue-green, automated rollbacks)
Align with security on least privilege, secrets management, and audit-ready operational practices
Define RTO/RPO, backups, and failover drills; document and test recovery playbooks
Identify opportunities related to repetitive work and automations (scripts, jobs, runbooks, self-service tooling)
Help shape on-call rotations, escalation policies, and handbooks, ultimately improving signal-to-noise and engineer well-being
Assist in defining SLIs/SLOs and error budgets with product/engineering, creating visibility into availability, latency, and quality

Qualification

Site Reliability EngineeringCloud ComputingKubernetesInfrastructure as CodeIncident ManagementNetworking FundamentalsSoftware DevelopmentCommunication SkillsDocumentation SkillsCross-team Collaboration

Required

Bachelor's Degree in Computer Science, Engineering, or a related field
10-12+ years of experience in software engineering or DevOps, with at least 5+ years in a site reliability engineering (SRE) or reliability-focused role
Strong networking fundamentals including DNS, Kubernetes routing, load balancing, WAF, multi-VPC routing in AWS, Traefik
Solid software fundamentals (one or more of: Python, Java, Javascript, Go, Scala or similar) and ability to read/modify production services
Deep experience in a major cloud (AWS/GCP/Azure) and container orchestration (Kubernetes)
Proficiency with IaC (Terraform or equivalent), CI/CD systems, and git-based workflows
Hands-on with metrics/logging/tracing systems and alerting best practices
Proven incident commander experience and skillful facilitation of blameless postmortems
Solid grasp of networking, HTTP, load balancing, caching, and data stores (SQL/NoSQL/queues)
Excellent communication, documentation, and cross-team influence

Benefits

Fully sponsored medical, dental, and vision
FSA program for both medical and dependent care
401k + Roth with matching and immediate vesting
Paid time off + 11 paid holidays

Company

Kharon

twittertwittertwitter
company-logo
Network intelligence at the nexus of global security + international commerce

H1B Sponsorship

Kharon has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2021 (1)

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Joshua Shrager
Executive Vice President
linkedin
Company data provided by crunchbase