Senior / Staff Network Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Fluidstack ยท 2 months ago

Senior / Staff Network Reliability Engineer

Fluidstack is building GPU supercomputers for top AI labs, governments, and enterprises. The Network Reliability Engineer will utilize deep networking expertise and software engineering to maintain high-performance network fabrics, ensuring they are fast, reliable, and cost-efficient at scale.

Cloud ComputingCloud StorageGenerative AIGPUInformation TechnologyMachine LearningPrivate CloudSoftware
check
H1B Sponsor Likelynote

Responsibilities

Super-charge the network stack. Tune TCP/IP, RDMA (primarily RoCE congestion control), kernel-bypass frameworks (DPDK, XDP, eBPF) and NIC offloads to squeeze microseconds off packet latency for AI & HPC workloads
Deploy & optimize at scale. Roll out new ToR/spine switches (from NVIDIA, Arista, Juniper, and others), validate SmartNIC and BlueField networking, configure BGP/EVPN fabrics, and optimize flow control (PFC, ECN) for zero-loss transport
Automate observability. Build NIC-to-orchestrator telemetry pipelines, packet-loss detection bots, and real-time throughput/latency dashboards
Root-cause the gnarly stuff. Lead packet captures, congestion analyses and latency regressions; turn insights into switch firmware patches, kernel tuning and topology optimizations
Drive vendor collaboration. Pair with networking vendors to debug hardware, accelerate RDMA paths, validate optics, and integrate emerging network hardware (800G/1.6T, LPO/CPO)
Continuously improve. Inject link failures, run game-days simulating network partitions and codify post-mortem learnings into SLIs/SLOs that matter to customers

Qualification

Linux networking stackTCP/IP tuningRDMA expertisePythonGoRustInfra-as-CodeCI/CDNetwork overlaysPerformance engineeringData-center networkingVendor collaborationPacket capturesCongestion analysesLatency regressions

Required

7+ yrs in network-heavy SRE, performance engineering or data-center networking
Mastery of Linux networking stack and protocol-level debugging (TCP, IB, RoCE)
Production experience with many vendors (Mellanox/NVIDIA, Arista, Juniper, etc.), multi-layer fabrics, and network overlays (VXLAN, Geneve)
Fluency in Python, Go or Rust; solid Infra-as-Code & CI/CD chops
Familiarity with DPDK, XDP, eBPF and InfiniBand/RoCE
Proven track record scaling low-latency, high-throughput networks for AI/ML or HPC clusters

Benefits

Competitive total compensation package (cash + equity).
Retirement or pension plan, in line with local norms.
Health, dental, and vision insurance.
Generous PTO policy, in line with local norms.

Company

Fluidstack

twittertwittertwitter
company-logo
FluidStack is an AI cloud platform for frontier labs and startups.

H1B Sponsorship

Fluidstack has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (2)

Funding

Current Stage
Growth Stage
Total Funding
unknown
Key Investors
Seedcamp
2025-06-01Undisclosed
2024-10-01Private Equity
2018-02-01Pre Seed

Leadership Team

leader-logo
Gary Wu
CEO, Co-Founder
linkedin
Company data provided by crunchbase