Gravity IT Resources ยท 3 weeks ago
Site Reliability Engineer Managing Director
Gravity IT Resources is seeking a Site Reliability Engineer Managing Director to lead the expansion of SRE practices globally. This role involves evaluating operational workflows, executing a roadmap for proactive SRE-aligned processes, and fostering a high-performing team culture.
Responsibilities
Lead the expansion of SRE practices from a small and high performing team to a larger global function incorporating on-premise infrastructure technologies
Evaluate current operational workflows and RACIs, identify toil and complete assessment of skills across the global team
Execute a comprehensive roadmap to transition reactive operational day to day activities into proactive, SRE-aligned processes with a focus on reliability, automation, observability, and incident management
Upskill team members through tailored training programs on SRE principles, cloud operations and automation tools
Collaborate with architects, platform engineering, ServiceNow developers and application teams to define and implement an observability framework in order to enhance proactive incident detection and reduce MTTR
Define and implement an automation framework to ensure sustainable, responsible, and effective use of automation to reduce toil and risk
Define and regularly review SLIs, SLOs, SLAs, error budgets, and incident response processes
Oversee recruitment, orientation, and professional development of the global SRE team
Foster a high-performing team culture
Build strong relationships with internal and external stakeholders
Prepare and present reports on operational performance
Oversee incident response and post-incident analysis processes and drive a culture of blameless post-mortems across multiple teams
Qualification
Required
Proven experience in building and leading Operational and Engineering teams
Adept at fostering collaboration between SRE and application development teams to drive operational excellence, reduce downtime, and help application teams accelerate delivery cycles
Have defined and monitored SRE principles including SLIs, SLOs, SLAs, error budgets, and incident response strategies
Has overseen incident response processes, skilled in post-incident analysis and conducting blameless post-mortems with multiple teams, driving proactive measures to prevent future incidents
Experience of spearheading automation initiatives using Terraform, and significantly reducing infrastructure provisioning time
Experience of Monitoring & Observability tools such as Logic Monitor, Azure Monitor, Prometheus, Grafana, Dynatrace and Splunk
Experience with ServiceNow and Azure DevOps and solid understanding of Agile, ITIL and ITSM frameworks
Strong expertise in Azure technologies. Experience with other CSPs highly beneficial
Proficiency in IaC tools including Terraform
Experience with container orchestration
Strong scripting or programming skills (e.g., Python, Powershell)
Excellent communication skills
Preferred
Experience with Sharepoint administration highly beneficial
Experience in managing other managers highly beneficial
Company
Gravity IT Resources
Gravity IT Resources provides the consulting expertise and IT talent that powers digital transformation.
H1B Sponsorship
Gravity IT Resources has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
Funding
Current Stage
Growth StageRecent News
Gravity IT Resources
2025-12-05
Company data provided by crunchbase