Site Reliability Engineer (SRE) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Cognizant · 2 hours ago

Site Reliability Engineer (SRE)

Cognizant is seeking a Site Reliability Engineer (SRE) to design and implement advanced observability solutions for edge computing environments. The role involves collaborating with engineering and platform teams to ensure high availability, reliability, and performance across distributed systems.

ConsultingIndustrial AutomationInformation TechnologySoftwareSoftware Engineering
badNo H1Bnote

Responsibilities

Design and implement observability frameworks for edge environments, including monitoring, logging, tracing, and metrics collection
Define and maintain SLIs, SLOs, and business KPIs to improve system reliability across edge and centralized infrastructure
Build and optimize dashboards, visualizations, and alerting systems for real-time insights and rapid incident response
Implement distributed tracing and log aggregation systems to troubleshoot complex issues in edge computing
Collaborate with engineering teams to embed observability best practices into applications and infrastructure
Drive proactive issue detection and resolution, reducing MTTD and MTTR across distributed systems
Lead incident postmortems and implement observability-driven improvements to prevent recurrence
Develop automation scripts and tools to enhance observability pipelines, addressing edge-specific challenges like bandwidth and connectivity

Qualification

Cloud observability (OpenTelemetry)ScriptingAutomationContainerization GCPContainerization AWSObservability frameworksProgramming languages GoProgramming languages PythonProgramming languages JavaProgramming languages RustDatabase managementNetworking protocolsDebugging skillsMonitoring toolsCI/CD tools

Required

3–5 years of experience in service reliability/operations for large-scale, high-performance applications in hybrid environments (on-prem and cloud)
Strong scripting and automation skills for building dashboards and managing application performance
Proficiency in programming languages such as Go, Python, Java, or Rust
Hands-on experience with databases (Oracle, SQL Server, Redis, Clickhouse, Postgres, MongoDB, or time-series DBs)
2+ years of experience transitioning platforms to cloud and containerization (GCP, AWS, Rancher, or similar)
Experience maintaining containerized applications in GKE/RKE/AKE environments
Expertise in implementing cloud observability using OpenTelemetry (OTEL) for monitoring and distributed tracing
Knowledge of networking protocols (TCP/IP, HTTP, DNS) and troubleshooting in high-pressure scenarios
Candidate must be legally authorized to work in the United States without the need for employer sponsorship, now or at any time in the future

Preferred

Experience managing application availability for 24x7 high-availability platforms
Familiarity with monitoring tools like Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace
Hands-on experience with CI/CD tools and Rally, Confluence
Knowledge of in-memory caching solutions (Redis preferred)
Strong debugging skills across integrated technical platforms and API gateways
Exposure to GCS, Cloud SQL, Spanner, Firestore, and enterprise-level infrastructure operations
Experience with HashiCorp Vault, Vertex AI, Gen AI, and BigQuery

Benefits

Medical/Dental/Vision/Life Insurance
Paid holidays plus Paid Time Off
401(k) plan and contributions
Long-term/Short-term Disability
Paid Parental Leave
Employee Stock Purchase Plan

Company

Cognizant

company-logo
Cognizant is a professional services company that helps clients alter their business, operating, and technology models for the digital era.

Funding

Current Stage
Public Company
Total Funding
$0.24M
Key Investors
Summit Financial Wealth Advisors
2025-03-08Post Ipo Equity
2016-11-18Post Ipo Equity· $0.24M
1998-06-19IPO

Leadership Team

leader-logo
Ravi Kumar S
Chief Executive Officer
linkedin
leader-logo
Anil Cheriyan
CTO / EVP Strategy & Technology
linkedin
Company data provided by crunchbase