Network Architect jobs in United States
cer-icon
Apply on Employer Site
company-logo

Clevanoo LLC · 11 hours ago

Network Architect

Clevanoo LLC is seeking a Network Architect to lead the architecture, design, deployment, and optimization of high-performance InfiniBand fabrics for large-scale GPU clusters supporting AI/ML and HPC workloads. The role involves fabric topology design, routing strategies, and ensuring maximum GPU utilization through advanced technologies and operational excellence.

Staffing & Recruiting
Hiring Manager
Maithili K
linkedin

Responsibilities

Lead the design of high-performance topologies, specifically Fat-Tree (Clos) and Rail-Optimized designs, ensuring non-blocking communication for massive GPU clusters
Architect the deployment of Unified Fabric Manager (UFM) for centralized subnet management, ensuring redundancy, failover, and historical telemetry tracking
Select and tune routing engines (e.g., Up/Down, Fat-Tree, DF+) and configure Adaptive Routing and Congestion Control to eliminate credit loops and head-of-line blocking
Architect and validate NVIDIA SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) integration to offload collective operations from GPUs to the switches
Establish baseline metrics for fabric latency and throughput using tools like perftest, ib_send_lat, and NCCL tests to validate the environment against AI workload requirements
Drive the technical evaluation of NDR (400G) and XDR (800G) hardware, including switches, HCAs (Host Channel Adapters), and rigorous qualification of optical transceivers/cabling
Align with data center facility teams on power, cooling, and cable management, specifically managing Link Budget constraints and signal integrity for high-speed copper/optical links
Lead and mentor mid-level network engineers, establishing best practices for fabric operations, firmware management, and documentation

Qualification

InfiniBand MasteryIn-Network Computing (SHARP)UFM ExpertiseGPUDirect & StorageAdvanced TroubleshootingHost-Side TuningAutomationHPC Workload ManagersOrchestrationOptical PhysicsEthernet Interop

Required

Bachelor's degree in Computer Science, Electrical Engineering, or related field
8+ years of experience in high-performance networking
4+ years dedicated to InfiniBand architecture in AI or HPC environments
Demonstrated experience leading architecture initiatives and influencing cross-functional stakeholders
Deep understanding of IB architecture layers, including LID/GUID management, Partition Keys (P_Keys), Virtual Lanes (VLs), and Service Levels (SL)
Hands-on experience deploying and tuning NVIDIA SHARP for collective offload in GPU clusters
Proficiency in configuring and managing NVIDIA UFM (Unified Fabric Manager), including UFM Telemetry and UFM Enterprise features
Expert-level knowledge of GPUDirect RDMA, NVMe-over-Fabrics (NVMe-oF), and storage multipathing over InfiniBand
Proficiency using ibdiagnet, ibqueryerrors, ibdump, and fabric-wide telemetry to identify flapping links, symbol errors, or credit starvation
Expertise in kernel-level tuning, including PCIe parameter optimization (NUMA affinity), OFED driver configuration, and NCCL parameter tuning
Familiarity with automation and configuration management (Ansible, Python) for consistent fabric provisioning and compliance

Preferred

Master's degree in Computer Science, Electrical Engineering, or related field
NVIDIA Certified Associate/Professional – InfiniBand (or equivalent legacy Mellanox Academy certifications)
NVIDIA Certified Associate – AI in the Data Center (Validates understanding of the GPU+Network ecosystem)
Knowledge of Slurm or Kubernetes integration patterns for RDMA-capable workloads and multi-tenant isolation
Experience with Base Command Manager (BCM) or similar cluster management tools
Experience with advanced optical planning (OSFP/QSFP112), link budget analysis, and calculating Bit Error Rate (BER) impacts
Exposure to multi-site architectures and IB-to-Ethernet gatewaying (e.g., NVIDIA Skyway)

Company

Clevanoo LLC

twitter
company-logo
At Clevanoo LLC, we believe that great companies are built with great talent — and we make it our mission to connect them.

Funding

Current Stage
Early Stage
Company data provided by crunchbase