Cohere · 2 months ago
Member of Technical Staff, Model Efficiency
Cohere is a company dedicated to scaling intelligence to serve humanity through advanced AI systems. The Member of Technical Staff in Model Efficiency will focus on building reliable ML systems and enhancing LLM inference efficiency by optimizing core performance metrics and collaborating with various teams.
Artificial Intelligence (AI)Foundational AIGenerative AIMachine LearningNatural Language Processing
Responsibilities
Develop techniques that improve how models execute in production, driving lower latency, higher throughput, and consistent quality across diverse workloads
Work across the inference stack to improve core performance metrics by diving deep into model execution, identifying bottlenecks, and developing innovative optimizations
Collaborate closely with modeling and systems teams to experiment, measure, and ship improvements that meaningfully accelerate inference
Qualification
Required
5+ years of experience writing high-performance, production-quality code
Strong programming skills in C++ or Python (Rust/Go also welcome)
Experience working with large language models and familiarity with the LLM inference ecosystem (e.g., vLLM, SGLang, etc.)
Ability to diagnose and resolve performance bottlenecks across the model execution stack
A strong bias for action — you ship fast, measure impact, and iterate
Preferred
GPU programming, CUDA, or low-level systems optimization
Language modeling with transformers (MoE, speculative decoding, KV-cache optimizations)
Scaling performance-critical distributed systems (e.g., computation, search, storage)
Benefits
An open and inclusive culture and work environment
Work closely with a team on the cutting edge of AI research
Weekly lunch stipend, in-office lunches & snacks
Full health and dental benefits, including a separate budget to take care of your mental health
100% Parental Leave top-up for up to 6 months
Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
6 weeks of vacation (30 working days!)
Company
Cohere
Cohere is an enterprise AI firm developing secure and private AI technology to address real-world business challenges.
H1B Sponsorship
Cohere has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (11)
2024 (14)
2023 (13)
2022 (5)
2021 (2)
Funding
Current Stage
Late StageTotal Funding
$1.71BKey Investors
Government of CanadaTiger Global ManagementIndex Ventures
2025-09-24Series D· $100M
2025-08-14Series D· $500M
2025-06-17Secondary Market
Recent News
Beyond Bylines
2026-01-11
2026-01-06
Crunchbase News
2026-01-06
Company data provided by crunchbase