Oak Ridge National Laboratory · 1 day ago
HPC Linux Storage Engineer
Oak Ridge National Laboratory (ORNL) is seeking highly skilled professionals to support large-scale storage systems and high-speed parallel file systems critical to advancing scientific discovery. The role involves designing, deploying, optimizing, and maintaining infrastructure that powers cutting-edge research across diverse scientific domains.
Advanced MaterialsClean EnergyEnergyEnergy ManagementManufacturingNuclearRenewable Energy
Responsibilities
Design and Management of Infrastructure: Architect, deploy, and manage large-scale storage systems and HPC platforms to support research, scientific, and enterprise workloads. Develop and implement solutions for structured, unstructured, and archival data storage, focusing on scalability, reliability, and performance
Systems Analysis and Development: Apply systems analysis techniques to consult with users/customers, determine functional requirements, and design, test, or optimize storage and computational solutions tailored to their needs. Develop, document, and modify solutions, including system prototypes and automated workflows, to enhance operational efficiency
Performance, Optimization, and Troubleshooting: Ensure the performance, availability, scalability, and security of diverse infrastructure environments. Diagnose and resolve complex operational challenges quickly and effectively, applying advanced performance optimization techniques for a wide range of workloads
Collaboration and Best Practices: Work closely with stakeholders from research, technical, and operational teams to understand workflows, identify opportunities for improvement, and deliver effective solutions. Define, implement, and enforce best practices, standards, and procedures across projects and teams
Automation and Innovation: Automate system configuration, provisioning, monitoring, and maintenance to reduce manual efforts and downtime. Evaluate emerging technologies and tools to continuously improve system capabilities, adapt to changing needs, and plan for future advancements
Support and Maintenance: Support critical infrastructure through participation in a 24/7 on-call rotation and off-hours maintenance windows. Resolve hardware and software issues in coordination with vendors, ensuring minimal impact on operations
Qualification
Required
Bachelor's degree in computer science, engineering, information technology, or a related field; and at least 5 years of professional experience managing Linux/UNIX systems in heterogeneous environments. An equivalent combination of education and experience will be considered
Demonstrated experience with high-performance computing (HPC) storage systems and enterprise storage platforms (e.g., Lustre, GPFS, BeeGFS, or WEKA)
Proficiency in scripting languages (e.g., Python, Bash, Perl) and configuration management/automation tools (e.g., Ansible, Puppet, Git)
Strong communication, collaboration, and problem-solving skills with the ability to design and implement solutions independently
Preferred
Active DOE Q, DoD Top Secret, or TS/SCI clearance
Hands-on experience with HPC cluster technologies, including job schedulers (e.g., SLURM) and system deployment tools (e.g., Warewulf, PXEboot, Bright Cluster Manager)
Expertise in high-performance parallel file systems, tape library systems, and storage networking technologies (e.g., RAID, ZFS, NVMe-oF, Infiniband)
Familiarity with performance monitoring tools (e.g., Grafana, Nagios), benchmarking systems, and I/O optimization techniques
Experience with virtualization and containerization platforms (e.g., VMware, KVM, Podman, Apptainer)
Background in open source development, including submitting patches upstream, and building custom Linux packages (e.g., RPM for RHEL)
Demonstrated ability to troubleshoot and optimize high-performance storage, compute, and networking systems in HPC environments
Experience documenting technical processes and contributing to complex technical projects in government, scientific, or highly technical settings
Benefits
Professional development and leadership opportunities
Company
Oak Ridge National Laboratory
Oak Ridge National Laboratory holds a range of R&D assignments, from fundamental nuclear physics to applied R&D on advanced energy systems.
Funding
Current Stage
Late StageTotal Funding
$9.8MKey Investors
US Department of Energy
2023-09-21Grant· $4.8M
2023-07-27Grant
2022-03-14Grant· $5M
Leadership Team
Recent News
2026-01-22
2026-01-17
2026-01-17
Company data provided by crunchbase