Site Reliability Engineer - Observability jobs in United States
cer-icon
Apply on Employer Site
company-logo

Rivian and Volkswagen Group Technologies ยท 1 week ago

Site Reliability Engineer - Observability

Rivian and Volkswagen Group Technologies is a joint venture focused on automotive technology for electric vehicles. The role involves designing, implementing, and scaling observability systems to ensure the health and performance of production environments, collaborating with cross-functional teams for actionable insights into distributed systems.

AutomotiveInformation TechnologySoftware
check
H1B Sponsor Likelynote

Responsibilities

Observability Platform Design: Architect, implement, and maintain observability systems, leveraging tools like Datadog, LGTM stack, OpenTelemetry, and Vector to enable real-time performance monitoring, logging, and alerting
Telemetry Optimization: Evolve and scale telemetry pipelines to ensure low latency and high availability for metrics, logs, and traces across multi-cloud environments
Performance Engineering: Proactively identify performance bottlenecks, optimize systems, and provide recommendations for reliability improvements
Scalable Automation: Implement automation solutions to scale systems sustainably while driving improvements in reliability and deployment velocity
Incident Management: Collaborate with the incident response team to establish data-driven debugging and troubleshooting processes using observability data
Tooling Development: Create and maintain self-service observability tools and dashboards to empower teams across the organization
Cross-functional Collaboration: Partner with development, DevOps, and infrastructure teams to define SLOs/SLIs and ensure observability is embedded throughout the software lifecycle

Qualification

Observability platformsKubernetesPythonOpenTelemetryCloud environmentsTelemetry solutionsData-driven decision-makingProblem-solvingCommunication

Required

Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
5+ years in Site Reliability Engineering or a related role with a strong emphasis on observability
Proficiency in designing and operating observability platforms with tools like Prometheus, Grafana, Loki, Jaeger, or Datadog
Experience with OpenTelemetry and distributed tracing in microservices architectures
Deep knowledge of Kubernetes (e.g., EKS), ArgoCD, and Crossplane
Strong proficiency in Python, Go, or similar languages for building automation and custom telemetry solutions
Familiarity with multi-cloud setups, containerization (Docker), and Linux system fundamentals
Exceptional problem-solving, communication, and a data-driven approach to decision-making

Benefits

Robust medical, prescription, dental and vision insurance packages for full-time employees, their spouse or domestic partner, and their children up to age 26
Coverage is effective on the first day of employment
Flex Time Off
Retirement savings plans as well as medical, vision and dental coverage

Company

Rivian and Volkswagen Group Technologies

twittertwitter
company-logo
Rivian and Volkswagen Group Technologies develops scalable automotive software and technology platforms for multiple vehicle segments.

H1B Sponsorship

Rivian and Volkswagen Group Technologies has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (9)

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Kranti Garatkar
Staff Technical Program Manager
linkedin
Company data provided by crunchbase