Fastly · 1 day ago
Staff Engineer - Observability & Performance
Fastly is a company that helps people stay connected through its edge cloud platform, enabling customers to create secure and reliable digital experiences. They are seeking a Staff Engineer for Observability & Performance to enhance operational efficiency and platform reliability through automation and observability tooling, while collaborating across various teams to ensure high performance and customer success.
Cloud Data ServicesCloud InfrastructureCloud SecurityContent Delivery NetworkEnterprise SoftwareSecuritySoftware
Responsibilities
This role is approximately 50% Cross-functional Operations, 40% Data Analysis / Traffic Insights, and 10% Site Reliability Engineering, balancing technical expertise with collaboration and strategic impact
Drive the development of automation and observability tooling that improves operational efficiency and platform reliability, including traffic monitoring, alerting, and surveillance tools
Partner with observability teams to implement and improve existing dashboards (Grafana, Prometheus) and metrics pipelines that provide meaningful visibility into traffic patterns, surges, and seasonal trends
Help define SLIs/SLOs, and improve monitoring frameworks, ensuring alerts and dashboards reflect operational reality and proactively surface issues before customer impact
Collaborate with data/analytics teams to leverage data pipelines (e.g., SQL, BigQuery or other large-scale data stores) for trend analysis, capacity planning, traffic pattern recognition
Step in to run daily operational standups or coordination meetings as needed. Ensuring priorities are clear, follow ups are tracked, and cross functional execution maintains momentum
Facilitate cross-team communication during high-impact initiatives or incident reviews, surfacing blockers early and maintaining execution momentum
Assist in root-cause investigations of performance, scalability or traffic anomalies, translate learnings into improvements in tooling and architecture
Act as a technical liaison, helping contextualize traffic behavior, system performance, and support escalations with clear insight
Help define and evolve run-books, incident response processes, post-mortems, knowledge base, ensuring that repeated issues are proactively surfaced and addressed via automation or tooling rather than reactive firefighting
Monitor seasonal patterns, major events, and global traffic distribution, helping ensure the platform remains resilient during shifts in demand
Qualification
Required
8+ years of progressive technical experience in Content Delivery Networks (CDN), Streaming Media, Security, or other high-volume internet traffic environments
Deep understanding of network/distributed/cloud systems: TCP/IP, DNS, HTTP/S, TLS, caching/proxy/CDN technologies; direct experience in CDN, Web Application and API Security a plus
Demonstrated ability to build automation, tooling, and observability systems: e.g., dashboards, alerts, instrumentation, data pipelines. Experience with Prometheus, Grafana, BigQuery/SQL, etc
Hands-on experience with scripting or programming (e.g., Python, Go, Shell) and comfortable building tooling rather than just consuming
Experience working cross-functionally with engineering, infrastructure, operations, analytics, and customer/account teams. Strong communication skills, ability to translate technical findings to non-technical stakeholders
Demonstrated ability to coordinate complex technical work across multiple teams, facilitate daily standups or working sessions, and maintain operational momentum in complex, fast-moving environments
Proven track record of driving mission-critical reliability and performance improvements in production systems. Strong sense of ownership and accountability
Experience with monitoring/alerting systems and incident response. Bonus for experience with live streaming, high-variability traffic, or global seasonality at scale
Preferred
Experience with large-scale data analytics systems (BigQuery, Spark, Presto) to derive operational insights from traffic telemetry
Familiarity with cloud platforms (AWS, GCP, Azure), infrastructure as code, or container orchestration (Terraform, Kubernetes)
Experience evaluating build‐vs‐buy decisions and driving platform wide tooling improvements
Background in media, live events, or streaming operations in a high throughput, latency sensitive environment a plus
Benefits
Medical, dental, and vision insurance
Family planning
Mental health support along with Employee Assistance Program
Insurance (Life, Disability, and Accident)
A Flexible Vacation policy
Up to 18 days of accrued paid sick leave
401(k) (including company match)
Employee Stock Purchase Program
12 paid local holidays
12 paid company wellness days
Company
Fastly
Fastly helps digital businesses keep pace with their customer expectations by delivering secure and online experiences.
H1B Sponsorship
Fastly has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (9)
2024 (11)
2023 (7)
2022 (12)
2021 (6)
2020 (5)
Funding
Current Stage
Public CompanyTotal Funding
$529MKey Investors
DTCPICONIQ GrowthAugust Capital
2025-12-05Post Ipo Debt· $160M
2024-12-02Post Ipo Debt· $150M
2019-05-16IPO
Recent News
2025-12-15
Company data provided by crunchbase