Apply on Employer Site

HeyGen · 3 weeks ago

Tech Lead, AI Compute Infrastructure

Palo Alto, CA

Full-time

Onsite

Senior Level

5+ years exp

HeyGen is a company focused on making visual storytelling accessible to all. They are seeking a seasoned Technical Leader to build and scale the foundational compute infrastructure that powers their AI models, ensuring robust, efficient, and scalable platforms for generative video models.

E-LearningGenerative AISoftwareWeb Apps

H1B Sponsor Likely

Responsibilities

Optimize GPU Utilization: Design and implement mechanisms to aggressively optimize GPU and cluster utilization across thousands of devices for inference, training, data processing and large-scale deployment of our state-of-art video generation models

Develop Large-Scale AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive, heterogeneous compute jobs, including multi-modal high-volume data ingestion/processing, distributed model training, and continuous evaluation/benchmarking

Enhance Observability: Develop world-class observability, tracing, and visualization tools for our compute cluster to ensure reliability, diagnose performance bottlenecks (e.g., memory, bandwidth, communication)

Accelerate Pipelines: Collaborate closely with AI researchers and AI engineers to integrate innovative acceleration techniques (e.g., custom CUDA kernels, distributed training libraries) into production-ready, scalable training and inference pipelines

Infrastructure Management: Champion the adoption and optimization of modern cloud and container technologies (Kubernetes, Ray) for elastic, cost-efficient scaling of our distributed systems

Qualification

AI infrastructureMLOpsGPU optimizationPythonKubernetesRayApache SparkPyTorchTensorFlowTechnical LeadershipCollaborationProblem-solving

Required

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience

5+ years of full-time industry experience in large-scale MLOps, AI infrastructure, or HPC systems

Experience with data frameworks and standards like Ray, Apache Spark, LanceDB

Strong proficiency in Python and a high-performance language such as C++ for developing core infrastructure components

Deep understanding and hands-on experience with modern orchestration and distributed computing frameworks such as Kubernetes and Ray

Experience with core ML frameworks such as PyTorch, TensorFlow, or JAX

Preferred

Master's or PhD in Computer Science or a related technical field

Demonstrated Tech Lead experience, driving projects from conceptual design through to production deployment across cross-functional teams

Prior experience building infrastructure specifically for Generative AI models (e.g., diffusion models, GANs, or large language models) where cost and latency are critical

Proven background in building and operating large-scale data infrastructure (e.g., Ray, Apache Spark) to manage petabytes of multi-modal data (video, audio, text)

Expertise in GPU acceleration and deep familiarity with low-level compute programming, including CUDA, NCCL, or similar technologies for efficient inter-GPU communication

Benefits

Competitive salary and benefits package.

Dynamic and inclusive work environment.

Opportunities for professional growth and advancement.

Collaborative culture that values innovation and creativity.

Access to the latest technologies and tools.

Company

HeyGen

HeyGen is an AI video generation platform that specializes in video creation, AI avatars, and generative AI.

Founded in 2020

Los Angeles, California, USA

51-200 employees

https://www.heygen.com

H1B Sponsorship

HeyGen has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (11)

2024 (5)

Funding

Current Stage

Growth Stage

Total Funding

$69M

Key Investors

Benchmark

2024-03-25Series A· $60M

2022-11-08Seed· $9M

Leadership Team

Wayne Liang

Co-founder, Chief Innovation Officer

Recent News

IndiaTimes

Former Meta employee’s dire warning to tech professionals: “Don’t make promotions the goal… when you start to get into that mindset, you lose…”

2026-01-17

Business Insider

This former Meta engineer, who quickly rose through the corporate ranks, says you shouldn't 'aim for promotions'

2025-12-29

Tech Funding News

Alphabet’s GV backs Synthesia in $200M round at $4B valuation to power the next wave of AI video — TFN

2025-10-31

Company data provided by crunchbase