Systems Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Penguin Solutions · 1 day ago

Systems Engineer

Penguin Solutions is a provider of dedicated, remote Linux systems DevOps for complex environments. The Systems Engineer role involves managing HPC clusters, maintaining Linux operating systems, troubleshooting issues, and providing IT support in a customer-facing position at the client's data center.

Artificial Intelligence (AI)Cloud ComputingEnterprise Software

Responsibilities

Install, deploy, and administer HPC Clusters
Maintain, administer, and patch Linux Operating systems and associated software
Work as part of a team to provide IT support and resolve errors
Analyze system log files and perform basic troubleshooting
Create Shell/Python/Ansible scripting
Document processes through supporting System Engineers; follow and improve procedures to meet SLAs
Support users with Move/Add/Change requests
Troubleshoot errors and determine root cause
Respond to system alerts and monitoring, sometimes after hours
Stay up-to-date on advancements Linux Operating Systems and associated software
Perform break/fix activities for HPC and datacenter infrastructure, including diagnostics, component swap, firmware validation, and verification testing of repaired systems

Qualification

UNIX/Linux certificationHPC Systems ManagementLinux systems administrationKubernetesContainersLinux networking protocolsITIL operating modelsHPC cluster system adminShell scriptingPython scriptingAnsible scriptingHigh-Performance StorageParallel file systemsHPC Scheduler knowledge

Required

Bachelor's degree in Computer Science, Information Technology, or a related field; or equivalent experience
UNIX/Linux certification or equivalent experience
5+ years of hands-on experience with UNIX/Linux server environments
HPC Systems Management knowledge
Linux systems administration skills and experience with open-source technologies
Understanding of Linux networking implementation and protocols
Ability to work in ITIL operating models
Able to install, configure, and tune software applications and provide overall support
Requires the ability to stand and walk for extended periods of time, lift and move computer equipment up to 25 pounds, and perform hands-on hardware installation and maintenance tasks requiring fine motor coordination
Demonstrate hands-on break/fix experience and ability to troubleshoot, repair, and validate HPC cluster nodes, GPU trays, power supplies, and other datacenter components
Experience with Kubernetes and containers is highly desired

Preferred

HPC: Application, Systems Management, OS, Optimization, Hardware and data center needs
HPC/AI Performance Specialist and practical knowledge of the administration of High-Performance Computing (HPC) technologies, including cluster resource management, job scheduling, Ethernet networking, InfiniBand, etc
AI & Cloud: Virtualization, Applications, Container Orchestration, Systems Management, and Hardware design
Data: High-Performance Storage and Parallel file systems used in HPC/AI and Cloud
HPC cluster system admin experience
In-depth knowledge of Linux cluster technologies and optimization techniques
HPC Scheduler knowledge (SLURM, PBS, LSF)
Will take initiative to refer to Application OEM/Vendor for Application operations, features, functions, and questions
Familiarity with hands-on HPC hardware service, field replacement procedures, and break/fix workflows in high-availability datacenter environments is preferred

Benefits

Medical, dental, and vision benefits
401k saving plan
Paid Time Off
Life Insurance
Employee Assistance Plan

Company

Penguin Solutions

twittertwittertwitter
company-logo
At Penguin Solutions, we understand the boundless potential of technology and support our customers in turning cutting-edge ideas into outcomes—faster, and at any scale.

Funding

Current Stage
Late Stage
Total Funding
$19.39M
Key Investors
vSpring Capital
2018-06-11Acquired
2011-04-20Series D· $1M
2009-11-09Series Unknown· $1.5M

Leadership Team

leader-logo
Phillip Pokorny
Chief Technology Officer
linkedin
leader-logo
Alex Lin
Sr. Technical Product Manager
linkedin
Company data provided by crunchbase