Together AI · 1 month ago
Network Architect
Together AI is a research-driven artificial intelligence company focused on building the next-generation AI compute platform. As a Network Architect, you will define and evolve the global network architecture that supports AI training and research, collaborating with various teams to ensure optimal performance and resiliency of the network.
AI InfrastructureArtificial Intelligence (AI)Generative AIInternetIT InfrastructureOpen Source
Responsibilities
Define and evolve Together AI’s global routing and backbone architecture, spanning self-built data centers, partner colocation sites, PoPs, cloud regions, and interconnect fabrics
Establish the end-to-end topology strategy for high-bandwidth AI workloads: east–west fabrics, spine/superspine/core, DCI, and cross-region interconnect
Design traffic engineering, load balancing, and capacity planning models to ensure low latency, deterministic performance, and fault tolerance at scale
Develop the multicloud interconnect and peering strategy, including BGP policy frameworks, route leak mitigation, and security posture across heterogeneous networks
Architect the control-plane stack for programmability, stability, and automation—including routing design, provisioning, configuration management, and state consistency
Establish foundational observability primitives for a global backbone (telemetry, flow sampling, path validation, synthetic testing, health models)
Work closely with compute, storage, hardware, and data platform teams to ensure network design meets the performance demands of distributed AI training workloads
Collaborate with operations and NOC teams to ensure designs are supportable, debuggable, and resilient under real-world failure conditions
Provide architectural direction and mentorship to engineers across the org, influencing long-term strategy for both physical and virtual network domains
Model evolving topologies for next-generation workloads (multi-Tbps east–west, high fan-in/fan-out distributed systems, GPU cluster fabrics)
Evaluate and guide the adoption of emerging technologies: advanced optical transport, RoCEv2, high-speed Ethernet fabrics, Infiniband overlays, EVPN/VXLAN, SR-MPLS/SRv6, programmable data planes, and hardware offload
Qualification
Required
Have deep experience designing and operating large-scale GPU clusters or HPC-style compute fabrics, and understand the unique demands these workloads place on network design (east–west dominance, congestion behavior, fan-in/fan-out patterns, loss sensitivity)
Are fluent in building high-throughput data center fabrics (leaf–spine/superspine/core) that support tens of thousands of GPUs, multi-terabit east–west traffic, and strict performance SLAs
Have architected or operated RoCEv2 or lossless Ethernet environments at scale—including PFC/ECN tuning, congestion control, and end-to-end stability considerations
Are experienced designing backbone and DCI architectures that support GPU training clusters across multiple regions, interconnect exotic fabrics, and handle high-volume synchronization traffic
Have led architecture for networks spanning multiple clouds, private backbones, and diverse PoPs, and understand how AI workloads behave across these domains
Design with operational realities in mind: observability, capacity modeling, automation, telemetry, and failure-mode analysis for GPU-heavy environments
Are comfortable setting architectural direction in fast-moving environments where compute, storage, and network evolution are tightly coupled
Benefits
Startup equity
Health insurance
Other competitive benefits
Company
Together AI
Together AI is a cloud-based platform designed for constructing open-source generative AI and infrastructure for developing AI models.
H1B Sponsorship
Together AI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (19)
2024 (6)
2023 (3)
Funding
Current Stage
Growth StageTotal Funding
$533.5MKey Investors
Salesforce VenturesLux Capital
2025-02-20Series B· $305M
2024-03-13Series A· $106M
2023-11-29Series A· $102.5M
Leadership Team
Recent News
2025-11-27
Company data provided by crunchbase