Apply on Employer Site

Drafted · 17 hours ago

Senior Backend / ML Ops Engineer

San Francisco, CA

Full-time

Onsite

Senior Level

$150K/yr - $300K/yr

5+ years exp

Drafted is unlocking creativity in the physical world by building foundational models and generative pipelines for floor plans and renderings. The role involves working across the software stack to optimize user experiences and collaborate closely with engineering and research teams.

Artificial Intelligence (AI)Interior Design

H1B Sponsor Likely

Responsibilities

Building parallel generation pipelines where multiple workers race to fill output slots, with dynamic filtering based on post-processing results

Implementing claim coordination to prevent duplicate work, fallback logic to use best-available generations when hitting retry limits, and caching mechanisms to reuse generations across jobs (same user regenerating with the same prompt)

Developing coordination mechanisms for capacity-constrained pipelines where maximum concurrency is fixed (reserved GPU instances, instance quotas, API rate limits) and peak demand exceeds available capacity—implementing backpressure, admission control, and retry logic to prevent overwhelming downstream consumers

Implementing timeout and cleanup policies that account for (1) high variance of computational complexity (p99 is 10x p50) and (2) variable parallelism where completion time depends on concurrent worker count (which fluctuates dynamically based on queue dynamics and capacity constraints) without being overly conservative or prematurely terminating legitimately slow work

Qualification

GPU-based inference servicesJob orchestrationCloud infrastructurePythonTypescriptRustML model fine-tuning5+ years coding experienceFan-out architecturesObservability implementation

Required

Building and scaling GPU-based inference services, optimizing for both low latency and high resource utilization

Job orchestration and load balancing with parallel generations, heterogeneous resource constraints (GPU, CPU, I/O), and multi-tiered queues

Implementing observability for latency attribution and failure diagnosis for multi-stage, asynchronous, and cross-platform pipelines

Designing fan-out architectures where upstream job completion triggers multiple independent downstream consumers that have mixed criticality, with some consumers blocking and others best-effort

Familiarity with modern cloud infrastructure: managed databases, job queues, edge compute/CDN, and PaaS deployment platforms

5+ years experience coding

Vibe coding (criticality on how it's used, when, how much)

Varieties of backends (servers in Python, Typescript, Rust)

Deployed to different infrastructures (AWS, Cloudflare, Railway, etc)

Tried fine-tuning an ML model

Knowledge of training infrastructure, especially distributed GPU training across multiple nodes

Benefits

1-2% equity

Company

Drafted

Drafted.ai is an AI-powered platform that creates personalized home floor plans and layouts based on user inputs

Founded in 2025

San Francisco, California, USA

2-10 employees

https://www.drafted.ai

H1B Sponsorship

Drafted has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2020 (1)

Funding

Current Stage

Early Stage

Total Funding

$1.65M

Key Investors

Convective Capital

2025-12-23Pre Seed· $1.65M

Company data provided by crunchbase