Apply on Employer Site

Modular · 16 hours ago

Senior AI Kernel Engineer

United States

Full-time

Remote

Senior Level

$198K/yr - $286K/yr

5+ years exp

Modular is on a mission to revolutionize AI infrastructure by rebuilding the AI software stack from the ground up. The Senior AI Kernel Engineer will lead the design and optimization of high-performance kernels for large-scale AI inference on GPUs, collaborating with various teams to ensure efficient implementations that run at scale.

AI InfrastructureArtificial Intelligence (AI)Generative AIMachine LearningSoftware

H1B Sponsor Likely

Responsibilities

Design, implement, and optimize performance-critical kernels for AI inference workloads (e.g., GEMM, attention, communication, fusion)

Lead kernel-level optimization efforts across single-GPU, multi-GPU, and heterogeneous hardware environments

Make informed trade-offs between latency, throughput, memory footprint, and numerical precision

Drive adoption of new hardware features (e.g., Tensor Cores, asynchronous execution, advanced memory spaces)

Analyze performance using profilers, hardware counters, and microbenchmarks; translate insights into concrete improvements

Work closely with compiler and runtime teams to influence code generation, scheduling, and kernel fusion strategies

Review and mentor other engineers on kernel design, performance tuning, and best practices

Contribute to technical roadmaps and long-term performance strategy for AI inference

Qualification

C/C++GPU kernel programmingGPU architecturePerformance optimizationCUDAHIPKernel designAI inferenceProblem-solvingCollaboration

Required

5+ years of experience in performance-critical systems or kernel development (or equivalent depth of expertise)

Strong proficiency in C/C++ and low-level programming

Extensive hands-on experience with GPU kernel programming (CUDA, HIP, or equivalent)

Deep understanding of GPU architecture, including memory hierarchies, synchronization, and execution models

Proven track record of delivering measurable performance improvements in production systems

Strong problem-solving skills and ability to work independently on complex, ambiguous performance challenges

Preferred

Experience with PTX, assembly-level tuning, or code generation frameworks (e.g., Triton)

Experience optimizing distributed or multi-GPU inference pipelines

Familiarity with custom AI accelerators or domain-specific hardware

Understanding of modern AI models (e.g., transformers, LLMs, diffusion) from a systems and performance perspective

Contributions to open-source kernel libraries, compilers, or performance tools

Experience collaborating directly with hardware or compiler teams

Benefits

Premier insurance plans

Up to 5% 401k matching

Flexible paid time off

Stock options

Company

Modular

Modular provides AI infrastructure for deployment, serving, and programming GPUs.

Founded in 2022

Palo Alto, California, USA

51-200 employees

https://www.modular.com

H1B Sponsorship

Modular has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (10)

2024 (6)

2023 (8)

2022 (4)

Funding

Current Stage

Growth Stage

Total Funding

$380M

Key Investors

US Innovative Technology FundGeneral CatalystGoogle Ventures

2025-09-24Series C· $250M

2023-08-24Series B· $100M

2022-06-30Seed· $30M

Leadership Team

Chris Lattner

CEO + Co-Founder

Tim Davis

Co-Founder & President

Recent News

General Catalyst

Our Investment in Modular

2026-01-14

General Catalyst

General Catalyst Portfolio

2026-01-14

Greylock

Greylock Portfolio - Modular

2025-12-29

Company data provided by crunchbase