AI Engineer - Data Platform jobs in United States
cer-icon
Apply on Employer Site
company-logo

Traversal · 3 months ago

AI Engineer - Data Platform

Traversal is an AI Site Reliability Engineer (SRE) trusted by large enterprises to manage complex production incidents. The role involves designing, building, and maintaining backend systems for an AI-driven observability platform, ensuring reliability and performance across various deployments.

Artificial Intelligence (AI)SoftwareSoftware Engineering

Responsibilities

Architecture & Implementation: Contribute to the design and implementation of scalable, resilient infrastructure systems to power AI-driven root cause analysis and observability workflows. That must work in a variety of environments for on Premises deployments
Low-Level System Design: Work on the foundational building blocks of our infrastructure, ensuring efficient use of resources and high performance at scale
Performance Optimization: Profile and tune backend systems to improve throughput, reduce latency, and minimize bottlenecks across the stack
Observability Systems: Help build and maintain the internal observability stack—logs, metrics, and traces—used by our agents to understand and act on production issues
Hybrid Infrastructure: Support cloud and on-prem architecture to serve both SaaS and enterprise customers
Data Infrastructure: Develop and maintain low-latency, high-throughput pipelines using tools like Kafka, Postgres, and S3 for real-time telemetry workflows
Tooling & Automation: Contribute to infrastructure-as-code, CI/CD tooling, and deployment systems to increase platform velocity and stability
Cross-Team Collaboration: Work with AI, platform, and product teams to ensure smooth integration and shared reliability goals
Using Traversal Internally: Help ensure our own observability tooling supports how we debug, monitor, and operate our systems

Qualification

RustDistributed systemsPerformance optimizationLow-level system designDebuggingKafkaTerraformPythonAI familiarity

Required

Professional experience with Rust (our primary language for infrastructure), or strong systems-level programming experience in OCaml, C++, C or Zig
Experience building distributed systems using a variety of application-appropriate datastores (e.g., Postgres, object storage, etc.)
Strength in debugging across cloud infrastructure, networking layers, and production systems (instrumentation, provisioning, bug fixes, reliability improvements)
Experience with performance profiling and optimization in backend systems
Exposure to low-level system design concepts (e.g., concurrency models, storage internals, OS, and DB level tuning)

Preferred

Experience making complex software systems observable using logs, metrics, and traces
Familiarity with Python-based ecosystems
Background in large-scale, complex, data-driven applications, and familiarity with event streaming platforms such as Kafka
Experience provisioning and managing infrastructure using Terraform, Pulumi, or other IaC tools
Familiarity with AI or LLM-powered products

Benefits

Health insurance
Startup equity
Flexible time off
Plenty of in-office snacks

Company

Traversal

twittertwittertwitter
company-logo
Traversal is building the AI SRE for the enterprise.

Funding

Current Stage
Early Stage
Total Funding
$48M
Key Investors
Sequoia CapitalKleiner Perkins
2025-06-20Seed
2025-06-18Series A· $48M
Company data provided by crunchbase