Kentro · 1 day ago
Service Reliability & Operations Manager (VA ESOM)
Kentro is a company focused on innovation and collaboration, seeking an experienced Service Reliability & Operations Manager to support their VA-ESOM contract across the United States. This role is responsible for ensuring the stability, performance, and resilience of enterprise IT services, overseeing real-time monitoring, major incident response, and driving operational excellence.
Information Technology & Services
Responsibilities
Lead teams responsible for Application Performance Monitoring (APM), observability, and "eyes on glass" 24/7 monitoring functions
Ensure proactive detection of service degradation and performance anomalies
Drive adoption of modern monitoring tools, dashboards, and alerting frameworks
Oversee the major incident process, ensuring rapid triage, escalation, communication, and resolution
Serve as the escalation point for Critical/High incidents and coordinate cross-functional response
Conduct post-incident reviews and ensure corrective actions are implemented
Manage sustainment of critical integrations, ensuring reliability, version alignment, and lifecycle management
Partner with engineering teams to ensure smooth handoffs from project delivery to steady-state operations
Maintain documentation, runbooks, and operational readiness standards
Track and improve KPIs such as MTTR, service availability, alert fidelity, and incident volume trends
Identify systemic issues and drive continuous improvement initiatives across operations
Ensure alignment with ITIL processes, especially incident, problem, and change management
Lead, mentor, and develop a team of analysts, engineers, and incident managers
Foster a culture of accountability, collaboration, and operational discipline
Build succession plans, training programs, and career pathways for operational staff
Partner with other ESOM teams to ensure end-to-end service reliability
Work closely with the PMO on readiness for new services, innovation pilots, and portfolio changes
Provide clear, concise communication to leadership during incidents and operational reviews
Qualification
Required
Bachelor's degree in computer science, electronics engineering, or other engineering or technical discipline
10+ years in IT operations, service reliability, or incident management, including 5+ years managing managers and large teams
Experience overseeing large teams while supporting a Federal client
Proven experience leading multi-site IT operations and large-scale teams (400+ employees)
Strong background in ITIL practices, incident management, and customer support operations
History of collaboration and flexibility, including innovative solutions to solve challenges facing geographically distributed teams
Exceptional leadership, coaching, and interpersonal communication skills
Strong analytical and problem-solving skills with a data-driven mindset
Ability to build and maintain strong client relationships and manage escalations effectively
Experience with APM, observability platforms, enterprise monitoring tools, and KPI reporting
Ability to prioritize work and self-direct with minimal input
Strong messaging capabilities to create team cohesion, team-focus and ongoing drive
US Citizen or Green card holder
Willing and able to get a Public Trust Suitability clearance
Must meet updated ID requirements: If you do not currently meet the ID requirements outlined, you must be willing and able to update your current forms of ID in a timely manner to complete the suitability process successfully
Preferred
ITIL Certification
Experience with end-user technologies and concepts
Company
Kentro
IT Concepts has transformed into Kentro - your center for innovation, excellence, and growth.
Funding
Current Stage
Late StageCompany data provided by crunchbase