Sr. Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Oteemo Inc. ยท 1 day ago

Sr. Site Reliability Engineer

Oteemo Inc. is a leading-edge technology consulting firm focused on empowering organizations through cloud-native and enterprise DevSecOps transformations. The Sr. Site Reliability Engineer will provide design and implementation expertise on infrastructure provisioning, management, and lifecycle implementation of cloud components and services, ensuring high availability and security compliance.

ConsultingInformation TechnologySoftware
check
Comp. & Benefits
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Observability & Monitoring: Design and manage monitoring solutions using Prometheus, Thanos, Grafana, and Mimir to ensure the health and performance of Kubernetes clusters and applications
Logging & Tracing: Implement Loki, Promtail, and OpenTelemetry to collect, process, and analyze logs and traces for debugging and forensic analysis
Kubernetes Operations: Deploy, maintain, and optimize Kubernetes clusters, ensuring observability tools are properly integrated and configured
Incident Response & SLOs: Define SLIs, SLOs, and error budgets, develop alerting strategies using Alertmanager, and automate incident response processes
High Availability & Scalability: Optimize observability stack for high availability in limited connectivity environments, leveraging solutions like Thanos for long-term storage and Minio for object storage
Security & Compliance: Implement observability best practices in compliance with security frameworks and Kubernetes security tools such as NeuVector
Automation & Infrastructure as Code (IaC): Automate observability deployments using Terraform, Helm, and Kubernetes Operators
Collaboration & Documentation: Work closely with DevOps, security, and platform teams to enhance system reliability and maintain comprehensive documentation

Qualification

Kubernetes expertiseObservability stacksInfrastructure as CodeScripting & AutomationIncident managementSecurity monitoringTechnical skillsBusiness acumenCommunication skillsCustomer focus

Required

Active Secret or Top Secret Clearance
Strong Kubernetes expertise in managing and monitoring clusters at scale
Experience with observability stacks including Prometheus, Loki, Thanos, Grafana, OpenTelemetry, and Mimir
Proficiency in logging and tracing frameworks, including Promtail, Fluent Bit, and OpenTelemetry
Hands-on experience with incident management and alerting using Alertmanager, Grafana Alerts, and PagerDuty/Slack integrations
Deep understanding of Kubernetes networking, service meshes (Istio/Linkerd), and security monitoring
Scripting & Automation: Proficiency in Python, Go, or Bash for automating observability tasks
Infrastructure as Code (IaC): Experience with Terraform, Helm, and Kubernetes Operators
Strong troubleshooting and root cause analysis skills in large-scale distributed systems
Experience working in air-gapped or limited connectivity environments is a plus

Preferred

Experience with NeuVector, Falco, or other Kubernetes security monitoring tools
Knowledge of eBPF-based observability tools such as Cilium Hubble
Experience optimizing observability stacks for performance and cost efficiency
Familiarity with DevSecOps practices and compliance frameworks

Benefits

Competitive pay and benefits

Company

Oteemo Inc.

twittertwittertwitter
company-logo
Oteemo is a technology and business transformation consulting firm that combines deep technical expertise with human-centered design principles to deliver innovative solutions.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Raja Gudepu
Founder & CEO
linkedin
Company data provided by crunchbase