Penguin Solutions · 1 day ago
Systems Engineer
Penguin Solutions is a provider of dedicated, remote Linux systems DevOps for complex environments. The Systems Engineer role involves managing HPC clusters, maintaining Linux operating systems, troubleshooting issues, and providing IT support in a customer-facing position at the client's data center.
Artificial Intelligence (AI)Cloud ComputingEnterprise Software
Responsibilities
Install, deploy, and administer HPC Clusters
Maintain, administer, and patch Linux Operating systems and associated software
Work as part of a team to provide IT support and resolve errors
Analyze system log files and perform basic troubleshooting
Create Shell/Python/Ansible scripting
Document processes through supporting System Engineers; follow and improve procedures to meet SLAs
Support users with Move/Add/Change requests
Troubleshoot errors and determine root cause
Respond to system alerts and monitoring, sometimes after hours
Stay up-to-date on advancements Linux Operating Systems and associated software
Perform break/fix activities for HPC and datacenter infrastructure, including diagnostics, component swap, firmware validation, and verification testing of repaired systems
Qualification
Required
Bachelor's degree in Computer Science, Information Technology, or a related field; or equivalent experience
UNIX/Linux certification or equivalent experience
5+ years of hands-on experience with UNIX/Linux server environments
HPC Systems Management knowledge
Linux systems administration skills and experience with open-source technologies
Understanding of Linux networking implementation and protocols
Ability to work in ITIL operating models
Able to install, configure, and tune software applications and provide overall support
Requires the ability to stand and walk for extended periods of time, lift and move computer equipment up to 25 pounds, and perform hands-on hardware installation and maintenance tasks requiring fine motor coordination
Demonstrate hands-on break/fix experience and ability to troubleshoot, repair, and validate HPC cluster nodes, GPU trays, power supplies, and other datacenter components
Experience with Kubernetes and containers is highly desired
Preferred
HPC: Application, Systems Management, OS, Optimization, Hardware and data center needs
HPC/AI Performance Specialist and practical knowledge of the administration of High-Performance Computing (HPC) technologies, including cluster resource management, job scheduling, Ethernet networking, InfiniBand, etc
AI & Cloud: Virtualization, Applications, Container Orchestration, Systems Management, and Hardware design
Data: High-Performance Storage and Parallel file systems used in HPC/AI and Cloud
HPC cluster system admin experience
In-depth knowledge of Linux cluster technologies and optimization techniques
HPC Scheduler knowledge (SLURM, PBS, LSF)
Will take initiative to refer to Application OEM/Vendor for Application operations, features, functions, and questions
Familiarity with hands-on HPC hardware service, field replacement procedures, and break/fix workflows in high-availability datacenter environments is preferred
Benefits
Medical, dental, and vision benefits
401k saving plan
Paid Time Off
Life Insurance
Employee Assistance Plan
Company
Penguin Solutions
At Penguin Solutions, we understand the boundless potential of technology and support our customers in turning cutting-edge ideas into outcomes—faster, and at any scale.
Funding
Current Stage
Late StageTotal Funding
$19.39MKey Investors
vSpring Capital
2018-06-11Acquired
2011-04-20Series D· $1M
2009-11-09Series Unknown· $1.5M
Recent News
Inside HPC & AI News | High-Performance Computing & Artificial Intelligence
2025-07-30
Company data provided by crunchbase