Apply on Employer Site

Andiamo · 3 days ago

Network Reliability Engineer - Decentralized High-Performance Computing Leader

Seattle, WA

Full-time

Onsite

Senior Level, Lead/Staff

7+ years exp

Andiamo is a globally recognized staffing and consulting firm specializing in technology and go-to-market professionals. They are seeking a Senior Network Reliability Engineer to architect and optimize high-performance network fabrics for AI and HPC workloads, ensuring seamless and efficient operation of advanced compute infrastructure.

ConsultingHuman ResourcesInformation TechnologyStaffing Agency

Comp. & Benefits

H1B Sponsor Likely

Responsibilities

Engineer next-generation network performance: Fine-tune TCP/IP, RDMA (RoCE), kernel-bypass technologies (DPDK, XDP, eBPF), and NIC offloads to push latency and throughput to their physical limits for high-performance computing workloads

Deploy and scale at massive capacity: Roll out and optimize large-scale network fabrics across datacenters using top-tier hardware (Arista, NVIDIA/Mellanox, Juniper, and more). Configure advanced BGP/EVPN topologies, spine-leaf architectures, and congestion management for lossless transport

Automate network intelligence: Build telemetry pipelines and automated systems for real-time performance monitoring, packet-loss detection, and predictive congestion analysis across complex environments

Debug at the deepest levels: Lead investigations into packet loss, latency anomalies, and congestion hot spots — diving into kernel traces, switch firmware, and flow control mechanisms to pinpoint and resolve issues

Collaborate with the industry’s best: Work directly with hardware and silicon vendors to debug firmware, optimize RDMA and RoCE paths, validate optics, and integrate emerging technologies like 800G+ links and CPO/LPO networking

Design for resilience and reliability: Simulate large-scale network failures, run game-day exercises, and turn lessons learned into robust automation, playbooks, and SLOs that drive measurable reliability improvements

Qualification

Network engineeringLinux networking stackLow-latency networkingPython programmingInfrastructure-as-CodeDPDKXDPAutomationCollaborationProblem-solving

Required

7+ years of experience in network engineering, SRE, or performance infrastructure roles — ideally within AI, HPC, or large-scale cloud environments

Deep understanding of the Linux networking stack, including kernel-level debugging, TCP/IP, InfiniBand, and RoCE

Proven hands-on experience managing multi-layer datacenter networks, network overlays (VXLAN, Geneve), and multi-vendor environments (Arista, NVIDIA/Mellanox, Juniper, etc.)

Strong programming proficiency in Python, Go, or Rust, and experience with Infrastructure-as-Code and modern CI/CD practices

Practical knowledge of DPDK, XDP, eBPF, and hardware acceleration frameworks used in low-latency networking

Demonstrated success in building and scaling high-performance, low-latency network architectures for data-intensive systems or compute clusters

Company

Andiamo

Glassdoor4.0

The Talent Partners for the AI Revolution.

Founded in 2003

New York, New York, USA

201-500 employees

http://andiamogo.com

H1B Sponsorship

Andiamo has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2022 (2)

2021 (1)

Funding

Current Stage

Growth Stage

Leadership Team

Patrick McAdams

CEO & Co-Founder

Steven Kottler

CFO

Company data provided by crunchbase