Senior Backend / ML Ops Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Drafted · 17 hours ago

Senior Backend / ML Ops Engineer

Drafted is unlocking creativity in the physical world by building foundational models and generative pipelines for floor plans and renderings. The role involves working across the software stack to optimize user experiences and collaborate closely with engineering and research teams.

Artificial Intelligence (AI)Interior Design
check
H1B Sponsor Likelynote

Responsibilities

Building parallel generation pipelines where multiple workers race to fill output slots, with dynamic filtering based on post-processing results
Implementing claim coordination to prevent duplicate work, fallback logic to use best-available generations when hitting retry limits, and caching mechanisms to reuse generations across jobs (same user regenerating with the same prompt)
Developing coordination mechanisms for capacity-constrained pipelines where maximum concurrency is fixed (reserved GPU instances, instance quotas, API rate limits) and peak demand exceeds available capacity—implementing backpressure, admission control, and retry logic to prevent overwhelming downstream consumers
Implementing timeout and cleanup policies that account for (1) high variance of computational complexity (p99 is 10x p50) and (2) variable parallelism where completion time depends on concurrent worker count (which fluctuates dynamically based on queue dynamics and capacity constraints) without being overly conservative or prematurely terminating legitimately slow work

Qualification

GPU-based inference servicesJob orchestrationCloud infrastructurePythonTypescriptRustML model fine-tuning5+ years coding experienceFan-out architecturesObservability implementation

Required

Building and scaling GPU-based inference services, optimizing for both low latency and high resource utilization
Job orchestration and load balancing with parallel generations, heterogeneous resource constraints (GPU, CPU, I/O), and multi-tiered queues
Implementing observability for latency attribution and failure diagnosis for multi-stage, asynchronous, and cross-platform pipelines
Designing fan-out architectures where upstream job completion triggers multiple independent downstream consumers that have mixed criticality, with some consumers blocking and others best-effort
Familiarity with modern cloud infrastructure: managed databases, job queues, edge compute/CDN, and PaaS deployment platforms
5+ years experience coding
Vibe coding (criticality on how it's used, when, how much)
Varieties of backends (servers in Python, Typescript, Rust)
Deployed to different infrastructures (AWS, Cloudflare, Railway, etc)
Tried fine-tuning an ML model
Knowledge of training infrastructure, especially distributed GPU training across multiple nodes

Benefits

1-2% equity

Company

Drafted

twittertwittertwitter
company-logo
Drafted.ai is an AI-powered platform that creates personalized home floor plans and layouts based on user inputs

H1B Sponsorship

Drafted has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2020 (1)

Funding

Current Stage
Early Stage
Total Funding
$1.65M
Key Investors
Convective Capital
2025-12-23Pre Seed· $1.65M
Company data provided by crunchbase