Apply on Employer Site

Lambda · 3 months ago

Manager, Super Intelligence HPC Support

United States

Full-time

Remote

Mid, Senior Level

$160K/yr - $242K/yr

Lambda is a company focused on building Gigawatt-scale AI Factories for Training and Inference. They are seeking a hands-on leader to build and guide their Super Intelligence HPC Support Engineering team, responsible for delivering world-class support to complex customers operating hyperscale GPU clusters.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingData CenterGPUMachine Learning

Comp. & Benefits

H1B Sponsor Likely

Responsibilities

Lead & Develop: Build, coach, and mentor a team of Super Intelligence HPC Support Engineers, ensuring technical excellence and strong execution in customer-facing work

Escalation Ownership: Take point on high-visibility incidents and escalations with hyperscale customers, ensuring timely, transparent, and high-quality outcomes

Customer Advocacy: Represent the needs of Super Intelligence customers in cross-functional discussions, influencing product design and roadmap decisions to improve supportability

Incident Leadership: Guide your team through major incidents, driving consistency in communication, coordination, and resolution under pressure

Operational Excellence: Define and refine support processes, runbooks, and documentation tailored to hyperscale environments

Partnership: Collaborate closely with Product, Engineering, and Data Center teams to ensure Lambda delivers reliable, scalable solutions at the largest levels of deployment

Metrics & Accountability: Monitor team performance, drive improvements in SLA adherence, response/resolution quality, and customer satisfaction

Hands-On Leadership: Step in to troubleshoot complex issues and model the standard of excellence expected from your team

Qualification

HPC expertiseGPU clustersLinux administrationSlurmKubernetesInfiniBandNetworking certificationsCustomer advocacyTeam leadershipCommunication

Required

Proven track record leading technical support or engineering teams serving enterprise or hyperscale customers

Skilled at managing customer escalations and major incidents with clarity, confidence, and urgency

Deep expertise in HPC environments including GPU clusters, InfiniBand/RoCE networks, and Linux system administration

Ability to guide engineers through troubleshooting at scale, from orchestration (Slurm/Kubernetes) down to kernel-level debugging

Strong leadership presence: able to inspire, set direction, and build a culture of accountability and customer-first execution

Excellent communication skills, capable of engaging with both engineers and executive stakeholders

Preferred

Advanced degree in Computer Science, Engineering, or related field

Certifications in HPC, networking, or related technologies

Experience with Slurm, Kubernetes, InfiniBand, and other high-performance interconnects (RoCE, NVLink/NVSwitch)

Background supporting Private Cloud environments or other dedicated enterprise clusters

Experience supporting enterprise AI workloads across startups and Fortune 500 companies

Benefits

Health, dental, and vision coverage for you and your dependents

Wellness and Commuter stipends for select roles

401k Plan with 2% company match (USA employees)

Flexible Paid Time Off Plan that we all actually use

Company

Lambda

Lambda is a cloud-based platform that provides high-performance GPU hardware and cloud infrastructure for AI model training and inference.

Founded in 2012

San Jose, California, USA

501-1000 employees

https://lambda.ai

H1B Sponsorship

Lambda has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (16)

2024 (1)

2023 (3)

2022 (2)

2021 (2)

2020 (3)

Funding

Current Stage

Late Stage

Total Funding

$3.19B

Key Investors

TWG GlobalJP MorganMacquarie Group

2025-11-18Series E· $1.5B

2025-08-19Debt Financing· $275M

2025-02-19Series D· $480M

Leadership Team

Stephen Balaban

Co-founder, CEO

Michael Balaban

Co-Founder / CTO

Recent News

SiliconANGLE

AI cloud provider Lambda reportedly raising $350M round

2026-01-11

Business Wire

Lambda Appoints Leonard Speiser as Chief Operating Officer

2026-01-09

Techmeme

Source: Lambda, which rents access to AI chips and is backed by Nvidia, is in talks to raise $350M+ led by Mubadala Capital, ahead of an IPO planned for H2 2026 (The Information)

2026-01-09

Company data provided by crunchbase