Leidos · 23 hours ago
Graphics Processing Unit (GPU) Engineer
Leidos is a company that delivers innovative solutions through the efforts of their diverse and talented people. They are seeking a highly skilled Systems Engineer with expertise in GPU and high-speed networking to design, develop, and optimize GPU clusters for enterprise AI applications.
National DefenseGovernmentElectronicsSoftwareInformation TechnologyComputerInformation ServicesNational Security
Responsibilities
**GPU Cluster Engineering:** Design, configure, and maintain GPU Clusters. Collaborate with a multidisciplinary team to define and optimize architectures, ensuring they meet performance, power efficiency, and feature requirements
**Operating System Integration:** Work closely with AI/ML engineers to ensure smooth GPU integration with Linux-based systems. Optimize GPU drivers for compatibility, reliability, and performance. Provide regular maintenance and updates
**Performance Optimization:** Analyze GPU performance, identify bottlenecks, and develop strategies to improve efficiency across hardware and software layers
**Tooling and Automation:** Build and maintain debugging tools, profiling utilities, and performance analysis software for Linux environments. Leverage scripting and configuration tools such as Bash, Python, Ansible, Puppet, and Salt
**Compliance & Documentation:** Maintain technical documentation, architectural specifications, and Linux best practices. Support ATO (Authority to Operate) and ensure compliance with federal security standards
Qualification
Required
Bachelor's or higher degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field with at least 12 years of related technical experience. Additional years of experience may be considered in lieu of a degree
10+ years of relevant systems engineering experience
Experience in managing NVIDIA GPU data center platforms. (DGX, HGX, H200, H100, L4s)
Knowledge of enterprise server components (storage/network controllers, HBA, SSDs)
Strong expertise with Linux distributions. (RHEL, Ubuntu, Oracle, and Rocky)
Excellent problem-solving skills and the ability to collaborate within a team
Candidate must, at a minimum, meet DoD 8570.11- IAT Level II certification requirements (currently Security+ CE, CCNA-Security, GICSP, GSEC, or SSCP along with an appropriate computing environment (CE) certification). An IAT Level III certification would also be acceptable (CASP+, CCNP Security, CISA, CISSP, GCED, GCIH, CCSP)
Active TS/SCI clearance with Polygraph required OR active TS/SCI and willingness to obtain and maintain a Poly
US Citizenship is required due to the nature of the government contracts we support
Preferred
Experience with Kubernetes cluster management and AI/ML workflow orchestration (Argo, Airflow, and Kubeflow)
Familiarity with GPU virtualization and cloud computing
Experience with Prometheus/Grafana for monitoring
Knowledge of distributed resource scheduling systems (Slurm (preferred), LSF, etc.)
Company
Leidos
Leidos is a Fortune 500® innovation company rapidly addressing the world’s most vexing challenges in national security and health.
Funding
Current Stage
Public CompanyTotal Funding
unknown2025-02-20Post Ipo Debt
2013-09-17IPO
Recent News
MarketScreener
2025-12-16
2025-12-16
Company data provided by crunchbase