Apply on Employer Site

Oteemo Inc. · 1 day ago

Sr. Site Reliability Engineer

San Diego, CA

Full-time

Onsite

Senior Level

Oteemo Inc. is a leading-edge technology consulting firm focused on empowering organizations through cloud-native and enterprise DevSecOps transformations. The Sr. Site Reliability Engineer will provide design and implementation expertise on infrastructure provisioning, management, and lifecycle implementation of cloud components and services, ensuring high availability and security compliance.

ConsultingInformation TechnologySoftware

Comp. & Benefits

No H1B

Security Clearance Required

U.S. Citizen Only

Responsibilities

Observability & Monitoring: Design and manage monitoring solutions using Prometheus, Thanos, Grafana, and Mimir to ensure the health and performance of Kubernetes clusters and applications

Logging & Tracing: Implement Loki, Promtail, and OpenTelemetry to collect, process, and analyze logs and traces for debugging and forensic analysis

Kubernetes Operations: Deploy, maintain, and optimize Kubernetes clusters, ensuring observability tools are properly integrated and configured

Incident Response & SLOs: Define SLIs, SLOs, and error budgets, develop alerting strategies using Alertmanager, and automate incident response processes

High Availability & Scalability: Optimize observability stack for high availability in limited connectivity environments, leveraging solutions like Thanos for long-term storage and Minio for object storage

Security & Compliance: Implement observability best practices in compliance with security frameworks and Kubernetes security tools such as NeuVector

Automation & Infrastructure as Code (IaC): Automate observability deployments using Terraform, Helm, and Kubernetes Operators

Collaboration & Documentation: Work closely with DevOps, security, and platform teams to enhance system reliability and maintain comprehensive documentation

Qualification

Kubernetes expertiseObservability stacksInfrastructure as CodeScripting & AutomationIncident managementSecurity monitoringTechnical skillsBusiness acumenCommunication skillsCustomer focus

Required

Active Secret or Top Secret Clearance

Strong Kubernetes expertise in managing and monitoring clusters at scale

Experience with observability stacks including Prometheus, Loki, Thanos, Grafana, OpenTelemetry, and Mimir

Proficiency in logging and tracing frameworks, including Promtail, Fluent Bit, and OpenTelemetry

Hands-on experience with incident management and alerting using Alertmanager, Grafana Alerts, and PagerDuty/Slack integrations

Deep understanding of Kubernetes networking, service meshes (Istio/Linkerd), and security monitoring

Scripting & Automation: Proficiency in Python, Go, or Bash for automating observability tasks

Infrastructure as Code (IaC): Experience with Terraform, Helm, and Kubernetes Operators

Strong troubleshooting and root cause analysis skills in large-scale distributed systems

Experience working in air-gapped or limited connectivity environments is a plus

Preferred

Experience with NeuVector, Falco, or other Kubernetes security monitoring tools

Knowledge of eBPF-based observability tools such as Cilium Hubble

Experience optimizing observability stacks for performance and cost efficiency

Familiarity with DevSecOps practices and compliance frameworks

Benefits

Competitive pay and benefits

Company

Oteemo Inc.

Oteemo is a technology and business transformation consulting firm that combines deep technical expertise with human-centered design principles to deliver innovative solutions.

Founded in 2014