SIGN IN
High-Performance Computing (HPC) Center Operations Manager jobs in United States
cer-icon
Apply on Employer Site
company-logo

Lawrence Livermore National Laboratory · 20 hours ago

High-Performance Computing (HPC) Center Operations Manager

Lawrence Livermore National Laboratory (LLNL) has turned bold ideas into world-changing impact advancing science and technology to strengthen U.S. security and promote global stability. They are seeking a High-Performance Computing (HPC) Center Operations Manager to lead a team providing 24x7 support for HPC systems and facilities, overseeing operations and ensuring reliability through innovative solutions.
Information TechnologyMarketingMarket ResearchSecurity
check
Growth Opportunities
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Provide expert technical leadership to team members, including recruiting, hiring, mentoring, conducting performance appraisals, facilitating quarterly feedback sessions and one-on-one meetings, and managing salary and career development to support staff growth and operational excellence
Oversee 24x7 support of HPC systems and networks while utilizing advanced monitoring and diagnostic tools to ensure reliability and rapid incident response for systems and supporting infrastructure. Provide guidance and support to shift supervisors during operational events and ensure effective incident response, resolution, and reporting
Establish, implement, and continuously improve procedures, schedules, and work priorities for HPC operations, identifying and developing key growth areas for staff and processes
Lead the development and deployment of innovative tools and processes to enhance operational efficiency and technical service delivery for HPC facilities and operations
Manage multiple vault type rooms, oversee siting and infrastructure projects, and ensure strict compliance with safety and security policies and requirements
Develop formal training plans to enhance team skills in alarm response, safety practices, HPC system monitoring, troubleshooting, repair, and issue escalation for operations and facilities teams
Collaborate with senior management in planning, budgeting, and decision-making; and represent the organization in vendor meetings, cross-divisional initiatives, and external organizations such as Energy Efficiency High Performance Computing Working Group, HPC operational reviews, or other professional best-practice groups
Keep pace with the escalating demands of next-generation platforms by providing solutions for highly unusual and complex HPC engineering challenges that arise from the intersection of extreme power density, precision cooling demands, evolving HPC compute loads, and mission-critical uptime requirements
Perform other duties as assigned

Qualification

HPC managementData center infrastructureTechnical leadershipRecruiting technical staffPerformance managementTroubleshooting HPC systemsAdvanced communicationTraining developmentAnalytical skillsCollaboration skillsFacilitation skillsProblem solving

Required

This position requires an active Department of Energy (DOE) Q-level clearance or active Top-Secret clearance issued by another U.S. government agency at the time of hire
Bachelor's degree in engineering, computer science or related field, or equivalent combination of education and experience in HPC Facilities and Operations
Significant experience managing and troubleshooting HPC environments, including monitoring and maintenance of systems (e.g. computers, storage) and facilities (e.g. mechanical, electrical, cooling systems)
Advanced technical experience installing and operating HPC equipment, networks, or associated facilities, and resolving issues in cooperation with vendors and staff
Significant experience in recruiting and supervising technical staff, preparing performance reviews, and participating in performance management processes
Advanced communication, facilitation, and collaboration skills to lead a group, explain policies, and interact with management, technical teams, and vendors
Significant experience developing written processes and/or procedures to improve service delivery and operational efficiency, and experience training technicians and engineers and assessing skills
Advanced knowledge of data center infrastructure and equipment

Preferred

Extensive experience working in a High-Performance Computing Center and responding to emergency situations to diagnose and fix significant issues with computers or mechanical equipment while under pressure
Experience in payroll supervision, organizational performance alignment, salary management, and knowledge of DOE/NNSA/LLNL policies and procedures
Experience with HVAC, electrical, and structural systems in a data center environment

Company

Lawrence Livermore National Laboratory

company-logo
Lawrence Livermore National Laboratory, a national security laboratory, provides transformational solutions to national security challenges.

Funding

Current Stage
Late Stage
Total Funding
$11.4M
Key Investors
ARPA-EUS Department of EnergyDARPA
2023-11-21Grant
2023-08-14Grant
2022-09-19Grant

Leadership Team

G
Greg Herweg
Chief Technology Officer
linkedin
D
David Shaughnessy
Deputy Chief Financial Officer
linkedin
Company data provided by crunchbase