Site Reliability Engineer, Monitoring and Control Engineering jobs in United States
cer-icon
Apply on Employer Site
company-logo

NBCUniversal · 1 week ago

Site Reliability Engineer, Monitoring and Control Engineering

NBCUniversal is one of the world's leading media and entertainment companies. This role is responsible for the engineering, operations, support, deployment and maintenance of core Distribution Engineering Monitoring and Control systems, both on-premises and cloud.

BroadcastingMedia and EntertainmentNews
check
H1B Sponsor Likelynote

Responsibilities

Utilize scripting and automation to develop, customize and enhance monitoring/alerting tools for “on-air” environments
Interact with automated monitoring infrastructure to ensure healthy environments
Create system dashboards that improve system availability and reliability
Query data stores to quantify the scope of reported issues
Create new metrics and identify monitoring deliverables to improve site reliability
Act as a Level 2 resource, drive and own investigations related to Broadcast issues and report back findings in a timely manner to leadership and operations
This role requires on-call 24/7 support on a rotating shift schedule
Follow up with team members & 3rd party vendors if issues found cannot be solved and drive vendors for root cause and solutions if possible
Create comprehensive documentation outlining the intricacies of encountered issue, elucidating the root cause and steps for effective issue resolution
Administer monitoring and control systems within the “on-air” environments
Develop proof of concept deployments for evaluation of products and architectures
Utilize modern frameworks and scripting languages to develop products and services for NBCU's IP video distribution environment

Qualification

MonitoringAlerting toolsIP videoBroadcast technologiesCloud environments (AWS preferred)Scripting languages C#Scripting languages PythonScripting languages BashFrontend technologies ViteFrontend technologies ReactFrontend technologies NodeJSFrontend technologies TypescriptConfiguration management AnsibleConfiguration management SaltConfiguration management ChefPublic cloud platforms AWSPublic cloud platforms GCPPublic cloud platforms AzureContainerization DockerContainerization KubernetesCI/CD practices (Github Actions)Infrastructure as Code (Terraform)LinuxWindows administrationAgile processTroubleshooting technical issuesDevSecOps principlesUser interface designCommunication skillsProblem-solving skills

Required

Bachelor's degree in computer science or related degree
Experience with IP video and broadcast technologies
3-5+ yrs experience with monitoring and alerting tools i.e. Grafana, Splunk, ELK Stack, Dataminer
Ability to develop end-to-end monitoring dashboards, alerts and reports for enterprise level environments
3-5 years of SRE experience in the technology sector supporting and maintaining production-quality software or software-defined infrastructure in a high traffic environment run in a cloud environments (AWS preferred)
Ability to collect data from various systems using COTS APIs
Experience with scripting languages and tools i.e C#, Python, Bash
Experience with modern frontend technologies like Vite, React, NodeJS, Typescript
Experience with configuration management technology i.e. Ansible, Salt, and/or Chef
Experience with public cloud platforms such as AWS, GCP or Azure
Experience with networking and cloud-based network environments
Experience with containerization Docker & Kubernetes
Experience with CI/CD build (Github Actions), deployment practices, and Infrastructure as Code (Terraform)
Experience in administrating Linux and Windows environments
Ability to use Agile process for project management, development & tracking
Comfortable working in a fast-paced agile environment. Requirements change quickly and our team needs to adapt to moving targets

Preferred

Experience with a variety of software and hardware operating environments
Experience in troubleshooting complex technical issues
Experience with SMPTE standards and implementation
Experience with PTP implementation
Good communicator and able to clearly articulate complex issues and technologies
Great design and problem-solving skills
Willing to take ownership of problems and see them through to resolution
Experience with DevSecOps principles
Ability to create user interface designs based on client workflows
Ability to intake project requirements from Operational partners and work with vendors to meet their needs

Benefits

Medical, dental, and vision insurance
401(k)
Paid leave
Tuition reimbursement
Various other discounts and perks

Company

NBCUniversal

company-logo
NBCUniversal is a media company that provides entertainment and news development, production, distribution, and marketing services. It is a sub-organization of Comcast.

H1B Sponsorship

NBCUniversal has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2020 (1)

Funding

Current Stage
Late Stage
Total Funding
unknown
2011-01-29Acquired

Leadership Team

leader-logo
Jeff Shell
CEO
leader-logo
Stephen Burke
Chief executive officer
Company data provided by crunchbase