Cognizant · 1 day ago
Site reliability Engineer
Cognizant is a leading technology company, and they are seeking a Site Reliability Engineer to design and implement advanced observability solutions for distributed edge computing environments. The role involves collaborating with various teams to ensure system reliability and developing automation tools to enhance observability pipelines.
ConsultingIndustrial AutomationInformation TechnologySoftwareSoftware Engineering
Responsibilities
Design and implement observability frameworks for edge environments, including monitoring, logging, tracing, and metrics collection
Define and maintain SLIs, SLOs, and business KPIs to measure and improve system reliability
Build dashboards, visualizations, and alerting systems for real-time insights and incident response
Implement distributed tracing and log aggregation to troubleshoot complex edge issues
Collaborate with engineering teams to embed observability best practices in resource-constrained environments
Drive proactive issue detection and resolution, reducing MTTD and MTTR across distributed systems
Lead incident postmortems and implement observability-driven improvements
Develop automation tools and scripts to enhance observability pipelines
Optimize data storage and querying strategies for performance and scalability
Stay current with emerging observability tools and trends, especially for edge computing
Qualification
Required
3–5 years of experience in service reliability/operations for large-scale hybrid environments
3–5 years of experience in automation scripting and dashboard development for performance monitoring
2–4 years of experience with programming languages such as Go, Python, Java, or Rust
Working knowledge of databases like Oracle, SQL Server, Redis, ClickHouse, PostgreSQL, MongoDB, or time-series databases
At least 2 years of experience with cloud platforms and containerization (GCP, AWS, Azure, Rancher, OpenShift)
Experience maintaining containerized apps in GKE/RKE/AKE environments
Hands-on experience implementing observability using OpenTelemetry (OTEL)
Experience with GraphQL frameworks (Apollo, Prisma, Hasura)
Strong understanding of networking protocols (TCP/IP, HTTP, DNS, Load Balancing, Service Mesh)
Preferred
Proven experience managing 24/7 high-availability platforms for critical applications
Familiarity with monitoring tools like Splunk, AppDynamics, Grafana/Prometheus, Dynatrace
Experience with CI/CD tools and platforms (Rally, Confluence, etc.)
Hands-on experience with Redis and in-memory caching solutions
Strong debugging skills across integrated platforms and API gateways
Experience with GCS, Cloud SQL, Spanner, and Firestore
Background in enterprise-level infrastructure and operations
Expertise in Linux/Windows administration and distributed systems
Experience monitoring and troubleshooting HashiCorp Vault environments
Working knowledge of Vertex AI, Gen AI, and BigQuery
Benefits
Medical/Dental/Vision/Life Insurance
Paid holidays plus Paid Time Off
401(k) plan and contributions
Long-term/Short-term Disability
Paid Parental Leave
Employee Stock Purchase Plan
Company
Cognizant
Cognizant is a professional services company that helps clients alter their business, operating, and technology models for the digital era.
H1B Sponsorship
Cognizant has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (11113)
2024 (11423)
2023 (13054)
2022 (13876)
2021 (12651)
2020 (28659)
Funding
Current Stage
Public CompanyTotal Funding
$0.24MKey Investors
Summit Financial Wealth Advisors
2025-03-08Post Ipo Equity
2016-11-18Post Ipo Equity· $0.24M
1998-06-19IPO
Recent News
2026-01-08
2026-01-06
Company data provided by crunchbase