Lambda · 3 months ago
Manager, Super Intelligence HPC Support
Lambda is a company focused on building Gigawatt-scale AI Factories for Training and Inference. They are seeking a hands-on leader to build and guide their Super Intelligence HPC Support Engineering team, responsible for delivering world-class support to complex customers operating hyperscale GPU clusters.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingData CenterGPUMachine Learning
Responsibilities
Lead & Develop: Build, coach, and mentor a team of Super Intelligence HPC Support Engineers, ensuring technical excellence and strong execution in customer-facing work
Escalation Ownership: Take point on high-visibility incidents and escalations with hyperscale customers, ensuring timely, transparent, and high-quality outcomes
Customer Advocacy: Represent the needs of Super Intelligence customers in cross-functional discussions, influencing product design and roadmap decisions to improve supportability
Incident Leadership: Guide your team through major incidents, driving consistency in communication, coordination, and resolution under pressure
Operational Excellence: Define and refine support processes, runbooks, and documentation tailored to hyperscale environments
Partnership: Collaborate closely with Product, Engineering, and Data Center teams to ensure Lambda delivers reliable, scalable solutions at the largest levels of deployment
Metrics & Accountability: Monitor team performance, drive improvements in SLA adherence, response/resolution quality, and customer satisfaction
Hands-On Leadership: Step in to troubleshoot complex issues and model the standard of excellence expected from your team
Qualification
Required
Proven track record leading technical support or engineering teams serving enterprise or hyperscale customers
Skilled at managing customer escalations and major incidents with clarity, confidence, and urgency
Deep expertise in HPC environments including GPU clusters, InfiniBand/RoCE networks, and Linux system administration
Ability to guide engineers through troubleshooting at scale, from orchestration (Slurm/Kubernetes) down to kernel-level debugging
Strong leadership presence: able to inspire, set direction, and build a culture of accountability and customer-first execution
Excellent communication skills, capable of engaging with both engineers and executive stakeholders
Preferred
Advanced degree in Computer Science, Engineering, or related field
Certifications in HPC, networking, or related technologies
Experience with Slurm, Kubernetes, InfiniBand, and other high-performance interconnects (RoCE, NVLink/NVSwitch)
Background supporting Private Cloud environments or other dedicated enterprise clusters
Experience supporting enterprise AI workloads across startups and Fortune 500 companies
Benefits
Health, dental, and vision coverage for you and your dependents
Wellness and Commuter stipends for select roles
401k Plan with 2% company match (USA employees)
Flexible Paid Time Off Plan that we all actually use
Company
Lambda
Lambda is a cloud-based platform that provides high-performance GPU hardware and cloud infrastructure for AI model training and inference.
H1B Sponsorship
Lambda has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (16)
2024 (1)
2023 (3)
2022 (2)
2021 (2)
2020 (3)
Funding
Current Stage
Late StageTotal Funding
$3.19BKey Investors
TWG GlobalJP MorganMacquarie Group
2025-11-18Series E· $1.5B
2025-08-19Debt Financing· $275M
2025-02-19Series D· $480M
Recent News
2026-01-11
2026-01-09
Company data provided by crunchbase