LinkedIn · 4 days ago
Staff Engineer, Site Reliability
LinkedIn is the world’s largest professional network, built to create economic opportunity for every member of the global workforce. The Staff Engineer, Site Reliability role focuses on managing the incident lifecycle and improving site reliability through innovative platforms and analytics.
Professional NetworkingRecruitingSocial MediaSocial Recruiting
Responsibilities
Designing and evolving the core incident management platforms that power LinkedIn’s full incident lifecycle, from detection and response to problem management and prevention, across thousands of services and teams
Serving in a critical on-call rotation, providing expert incident triage and coordination during high-severity outages. Partnering closely with service owners and product teams to diagnose issues quickly, mitigate member impact, and drive timely resolution under pressure
Transforming raw, unstructured incident data into clear, actionable intelligence using AI and LLM-based systems, including automated summarization, classification, root cause signals, and mitigation recommendations
Building analytics and insights that surface systemic reliability risks, recurring failure patterns, and cross-service dependencies, enabling org-level prioritization rather than isolated, service-by-service fixes
Building platforms and tools that enable realistic, fleet-wide stress testing of data center and regional capacity, validating incident readiness across dependencies, traffic patterns, and growth scenarios before they impact a significant production outage
Driving consistency, clarity, and quality in how incidents are declared, managed, reviewed, and learned from, raising the reliability bar across a large, fast-moving engineering organization
Influencing service architecture, SLOs, and reliability standards through platforms, data, and technical leadership, ensuring improvements are durable, measurable, and adopted at scale
Qualification
Required
Bachelor's degree in Computer Science, Engineering, or related technical field, or equivalent practical experience. Many postings also prefer or require an advanced degree (MS/PhD) for Staff-level roles
6+ years of professional experience in software development, distributed systems, or reliability engineering. Some Principal/Staff roles list around 10+ years of experience
Several years of experience leading technical projects or providing architectural leadership (often 3-4+ years)
Software engineering fundamentals with deep experience in building products and operating large-scale distributed systems
Expertise in two or more backend languages such as Go, Python, or Java with a track record of owning complex production systems
Full-stack engineering experience, including building user-facing web applications and operational dashboards using modern frontend frameworks such as React.js, along with backend APIs and data pipelines
Understanding of web development fundamentals including API design, performance, accessibility, and building intuitive interfaces for engineers and operational users
Understanding of reliability engineering principles, incident management, observability, and operating systems under failure conditions
Demonstrated ability to lead technical design across teams, influence architecture beyond direct ownership, and drive adoption through well-designed platforms
Experience with debugging and root cause analysis skills, with the ability to communicate complex technical findings clearly to engineers, partners, and leadership
Preferred
Bachelor's degree in Computer Science, Engineering, or related technical field, or equivalent practical experience. Many postings also prefer or require an advanced degree (MS/PhD) for Staff-level roles
8+ years of professional experience in software development, distributed systems, or reliability engineering. Some Principal/Staff roles list around 10+ years of experience
Several years of experience leading technical projects or providing architectural leadership (often 3-4+ years)
Experience applying AI or LLM-based techniques to operational or incident data, including automated summarization, classification, root cause hypothesis generation, or reliability recommendations
Familiarity with vector databases and retrieval-based systems used to power context-aware analytics, search, or agentic workflows
Frontend craftsmanship beyond basic UI, including building data-dense, high-signal interfaces for engineers using React.js, modern state management, and visualization libraries
Experience designing end-to-end full-stack systems where frontend, backend, data, and reliability concerns are considered holistically
Background in building internal developer platforms, observability tools, or incident response systems used at scale
A demonstrated ability to simplify complex workflows, reduce operational toil, and replace manual processes with well-designed automation
Benefits
Generous health and wellness programs and time away for employees of all levels.
Fair and equitable compensation practices.
Company
LinkedIn is a professional networking site that allows users to create business connections, search for jobs, and find potential clients. It is a sub-organization of Microsoft.
H1B Sponsorship
LinkedIn has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (892)
2024 (1108)
2023 (913)
2022 (1580)
2021 (1043)
2020 (1146)
Funding
Current Stage
Public CompanyTotal Funding
$154.8MKey Investors
Bain Capital VenturesGreylockSequoia Capital
2016-06-13Acquired
2016-02-15Private Equity
2014-04-01Series Unknown
Recent News
2026-01-16
Help Net Security
2026-01-16
Company data provided by crunchbase