Oteemo Inc. ยท 13 hours ago
Sr. Site Reliability Engineer
Oteemo Inc. is a leading-edge technology consulting firm focused on empowering organizations through cloud-native and enterprise DevSecOps transformations. The Sr. Site Reliability Engineer will provide design and implementation expertise on infrastructure provisioning, management, and lifecycle implementation of cloud components and services, ensuring high availability and security compliance.
ConsultingInformation TechnologySoftware
Responsibilities
Observability & Monitoring: Design and manage monitoring solutions using Prometheus, Thanos, Grafana, and Mimir to ensure the health and performance of Kubernetes clusters and applications
Logging & Tracing: Implement Loki, Promtail, and OpenTelemetry to collect, process, and analyze logs and traces for debugging and forensic analysis
Kubernetes Operations: Deploy, maintain, and optimize Kubernetes clusters, ensuring observability tools are properly integrated and configured
Incident Response & SLOs: Define SLIs, SLOs, and error budgets, develop alerting strategies using Alertmanager, and automate incident response processes
High Availability & Scalability: Optimize observability stack for high availability in limited connectivity environments, leveraging solutions like Thanos for long-term storage and Minio for object storage
Security & Compliance: Implement observability best practices in compliance with security frameworks and Kubernetes security tools such as NeuVector
Automation & Infrastructure as Code (IaC): Automate observability deployments using Terraform, Helm, and Kubernetes Operators
Collaboration & Documentation: Work closely with DevOps, security, and platform teams to enhance system reliability and maintain comprehensive documentation
Qualification
Required
Active Secret or Top Secret Clearance
Strong Kubernetes expertise in managing and monitoring clusters at scale
Experience with observability stacks including Prometheus, Loki, Thanos, Grafana, OpenTelemetry, and Mimir
Proficiency in logging and tracing frameworks, including Promtail, Fluent Bit, and OpenTelemetry
Hands-on experience with incident management and alerting using Alertmanager, Grafana Alerts, and PagerDuty/Slack integrations
Deep understanding of Kubernetes networking, service meshes (Istio/Linkerd), and security monitoring
Scripting & Automation: Proficiency in Python, Go, or Bash for automating observability tasks
Infrastructure as Code (IaC): Experience with Terraform, Helm, and Kubernetes Operators
Strong troubleshooting and root cause analysis skills in large-scale distributed systems
Experience working in air-gapped or limited connectivity environments is a plus
Preferred
Experience with NeuVector, Falco, or other Kubernetes security monitoring tools
Knowledge of eBPF-based observability tools such as Cilium Hubble
Experience optimizing observability stacks for performance and cost efficiency
Familiarity with DevSecOps practices and compliance frameworks
Benefits
Competitive pay and benefits
Company
Oteemo Inc.
Oteemo is a technology and business transformation consulting firm that combines deep technical expertise with human-centered design principles to deliver innovative solutions.
Funding
Current Stage
Growth StageRecent News
2023-10-30
Company data provided by crunchbase