HPC Linux Storage Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Oak Ridge National Laboratory · 1 day ago

HPC Linux Storage Engineer

Oak Ridge National Laboratory (ORNL) is seeking highly skilled professionals to support large-scale storage systems and high-speed parallel file systems critical to advancing scientific discovery and innovation. The HPC Linux Storage Engineer will design, deploy, optimize, and maintain infrastructure that powers cutting-edge research across diverse scientific domains.

Advanced MaterialsClean EnergyEnergyEnergy ManagementManufacturingNuclearRenewable Energy
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Architect, deploy, and manage large-scale storage systems and HPC platforms to support research, scientific, and enterprise workloads
Develop and implement solutions for structured, unstructured, and archival data storage, focusing on scalability, reliability, and performance
Apply systems analysis techniques to consult with users/customers, determine functional requirements, and design, test, or optimize storage and computational solutions tailored to their needs
Develop, document, and modify solutions, including system prototypes and automated workflows, to enhance operational efficiency
Ensure the performance, availability, scalability, and security of diverse infrastructure environments
Diagnose and resolve complex operational challenges quickly and effectively, applying advanced performance optimization techniques for a wide range of workloads
Work closely with stakeholders from research, technical, and operational teams to understand workflows, identify opportunities for improvement, and deliver effective solutions
Define, implement, and enforce best practices, standards, and procedures across projects and teams
Automate system configuration, provisioning, monitoring, and maintenance to reduce manual efforts and downtime
Evaluate emerging technologies and tools to continuously improve system capabilities, adapt to changing needs, and plan for future advancements
Support critical infrastructure through participation in a 24/7 on-call rotation and off-hours maintenance windows
Resolve hardware and software issues in coordination with vendors, ensuring minimal impact on operations

Qualification

HPC storage systemsLinux/UNIX systemsScripting languagesConfiguration management toolsHigh-performance parallel file systemsPerformance monitoring toolsVirtualization platformsCommunication skillsCollaboration skillsProblem-solving skills

Required

Bachelor's degree in computer science, engineering, information technology, or a related field; and at least 5 years of professional experience managing Linux/UNIX systems in heterogeneous environments. An equivalent combination of education and experience will be considered
Demonstrated experience with high-performance computing (HPC) storage systems and enterprise storage platforms (e.g., Lustre, GPFS, BeeGFS, or WEKA)
Proficiency in scripting languages (e.g., Python, Bash, Perl) and configuration management/automation tools (e.g., Ansible, Puppet, Git)
Strong communication, collaboration, and problem-solving skills with the ability to design and implement solutions independently
This position requires the ability to obtain and maintain clearance from the Department of Energy. As such, this position is a Workplace Substance Abuse (WSAP) testing designated position. WSAP positions require passing a pre-placement drug test and participation in an ongoing random drug testing program

Preferred

Active DOE Q, DoD Top Secret, or TS/SCI clearance
Hands-on experience with HPC cluster technologies, including job schedulers (e.g., SLURM) and system deployment tools (e.g., Warewulf, PXEboot, Bright Cluster Manager)
Expertise in high-performance parallel file systems, tape library systems, and storage networking technologies (e.g., RAID, ZFS, NVMe-oF, Infiniband)
Familiarity with performance monitoring tools (e.g., Grafana, Nagios), benchmarking systems, and I/O optimization techniques
Experience with virtualization and containerization platforms (e.g., VMware, KVM, Podman, Apptainer)
Background in open source development, including submitting patches upstream, and building custom Linux packages (e.g., RPM for RHEL)
Demonstrated ability to troubleshoot and optimize high-performance storage, compute, and networking systems in HPC environments
Experience documenting technical processes and contributing to complex technical projects in government, scientific, or highly technical settings

Benefits

Flexible work environment
Professional development and leadership opportunities

Company

Oak Ridge National Laboratory

company-logo
Oak Ridge National Laboratory holds a range of R&D assignments, from fundamental nuclear physics to applied R&D on advanced energy systems.

Funding

Current Stage
Late Stage
Total Funding
$9.8M
Key Investors
US Department of Energy
2023-09-21Grant· $4.8M
2023-07-27Grant
2022-03-14Grant· $5M

Leadership Team

leader-logo
Arjun Shankar
Division Director, National Center for Computational Sciences, Oak Ridge National Laboratory
linkedin
leader-logo
Brett Ellis
Division Director - Research Computing Support
linkedin
Company data provided by crunchbase