Etched · 17 hours ago
Infrastructure Software Engineer
Etched is building the world’s first AI inference system purpose-built for transformers. The Infrastructure Software Engineer will lead the development of next-generation infrastructure tooling to enable faster iterations and more reliable builds for AI ASICs and software, focusing on building a hybrid high-performance compute cluster and a state-of-the-art observability stack.
Artificial Intelligence (AI)SemiconductorElectronicsHardwareAI InfrastructureComputer
Responsibilities
Architect and Scale Distributed Compute Systems: Design and build the orchestration layers that drive our hybrid high-performance clusters—enabling simulation, synthesis, and continuous integration of AI ASICs at unprecedented scale
Build Infrastructure-as-Code Systems: Develop and maintain a fully programmable infrastructure control plane to ensure reproducibility, auditability, and rapid iteration across the entire stack
Optimize End-to-End Developer Experience: Create tools and abstractions that empower engineers to harness massive parallelism without worrying about the underlying complexity
Workload Elasticity, Reliability, and Efficiency: Prototype and execute workload orchestration and migration strategies between on-premise and cloud environments, balancing performance, storage availability and replication, uptime, and cost across heterogeneous hardware and compute backends
Implement real-time telemetry, tracing systems that surface insights from millions of metrics, enabling proactive debugging and system optimization
Push the Limits of Observability: Build a full observability stack that includes dashboards, alerting, automated responses, and a synthetic testing framework to proactively test infrastructure performance and reliability for various application and data flows, ensuring we remain proactive against issues impacting development and productivity workflows
Qualification
Required
Are a systems-minded software engineer who loves building foundational platforms, working close to the metal and cloud, solving high-leverage problems at scale
Are a deeply technical engineer who treats infrastructure as a software problem - prioritizing clean abstractions, version control, small change lists, easy roll backs, testing, and long-term maintainability over ad hoc configuration
Have strong programming skills in languages such as Python, Go, Rust, and C++, and are comfortable building production-grade tooling
Possess expert-level knowledge of Linux, virtualization, containerization, and CI/CD pipelines, with a deep understanding of how to debug, optimize, and scale complex systems
Are familiar with Infrastructure as Code tools like OpenTofu, Ansible, or Puppet, and enjoy designing declarative, reproducible infrastructure systems
Understand and use PromQL and other telemetry/query languages and have used LLM to extract insight from real-time metrics, and know how to architect and tune observability stacks
Have a track record of debugging and resolving difficult hardware-software integration problems across bare-metal systems, networks, and distributed workloads
Can lead and mentor technical teams, guiding design decisions and helping others develop sound engineering instincts
Have 8+ years of experience in infrastructure engineering, systems programming, or backend software development - ideally in environments where performance, scale, or hardware interaction mattered
Are driven by curiosity, take initiative, and have an innate sense of ownership — you thrive in uncharted territory, design for edge cases, and love making systems more powerful, reliable, and elegant
Preferred
Familiarity with Bazel build system
Deep understanding of ASIC development flows, especially those involving Synopsys, Cadence, and Verilator, including how EDA tools interact with infrastructure for simulation, synthesis, and verification
Hands-on experience architecting systems with AWS, GCP, or Azure, including hybrid on-prem/cloud deployments, workload migration strategies, and cloud-native orchestration tooling
Experience monitoring, provisioning, and debugging bare-metal servers, network hardware, and high-performance storage systems in rack-scale environments
Comfortable in profiling and optimizing compute environments for single-threaded latency, memory-bound workloads, or I/O throughput, especially in the context of simulation or CI performance
Proficiency building or operating telemetry systems at scale using Prometheus, Grafana, Loki, VictoriaMetrics, and tools for distributed tracing, log aggregation, and real-time alerting across heterogeneous mediums (SMS, email, push alerts, etc.)
Benefits
Medical, dental, and vision packages with generous premium coverage
$500 per month credit for waiving medical benefits
Housing subsidy of $2k per month for those living within walking distance of the office
Relocation support for those moving to San Jose (Santana Row)
Various wellness benefits covering fitness, mental health, and more
Daily lunch + dinner in our office
Company
Etched
Building the hardware for superintelligence
H1B Sponsorship
Etched has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (9)
2024 (11)
2023 (1)
Funding
Current Stage
Growth StageTotal Funding
$625.36MKey Investors
StripesPositive Sum,Primary Venture PartnersPrimary Venture Partners
2026-01-14Series Unknown· $500M
2024-06-25Series A· $120M
2023-05-16Seed· $5.36M
Recent News
alleywatch.com
2026-01-20
2026-01-20
Company data provided by crunchbase