Why Networking Scale Matters for AI’s Next Phase
Introduction:
The AI infrastructure boom shows no signs of slowing. Hyperscalers, neoclouds, sovereign clouds, and enterprises are pouring hundreds of billions into GPU and XPU clusters to fuel ever-larger models. Amazon’s CAPEX guide was $200B for 2026, and a slew of fresh capital should be injected as Anthropic, SpaceX(X.ai), and OpenAI eye IPOs this year. Yet beneath the headlines of massive CAPEX lies a critical bottleneck that receives far less attention, the network fabrics responsible for feeding data to those XPUs at the speed and scale required.
Our latest analysis highlights a growing disparity. Compute FLOPs are scaling exponentially, especially as GPUs go multi-die. Compute is hungry for more scale-out and scale-up bandwidth. Furthermore, as the scaling laws continue to prove, frontier models clearly get more intelligent as scale-out clusters get bigger. More recently, frontier labs are demanding larger scale-up domains to support lager MoE models.
Back-end scale-out network bandwidth growth, though impressive, is struggling to keep pace. In addition, limitations in switch radix hamper the ability of scale-out networks to achieve the goal of the million GPU clusters due to having to add multiple tiers of switching with hundreds of thousands of additional optics. These gaps threaten GPU/XPU utilization, inflate Total Cost of Ownership (TCO), and could slow the industry’s ability to develop frontier models that deliver on the promise of agentic AI and ultimately super intelligence
Scale-up domains, using protocols like ESUN or UALink, would ideally like to get into the thousands of GPUs. However, for latency reasons these high bandwidth domains want to be under a single tier of switches. Here again, the throughput and radix of the slowly evolving underlying switching silicon hampers the ability to achieve lager scale-up domains with higher bandwidth per GPU.
In the Near Term, Compute Explodes While Networks Lag
Consider the trajectory of GPU/XPU compute power. NVIDIA’s Hopper GPUs delivered roughly 4 PFLOPs (FP8) per chip in 2022; Blackwell pushes toward 20+ PFLOPs in similar metrics, with next-gen architectures expected to continue doubling or more every 12-18 months. Aggregate cluster FLOPs for frontier models have grown from exaFLOPs in 2023 to projections of 100+ zettaFLOPs by late this decade, driven by massive parallelism across hundreds of thousands of GPU/XPUs.
In contrast, back-end scale-out network switch bandwidth has advanced steadily but more linearly. Ethernet switches moved from 25.6 Tbps (2022) to 51.2 Tbps (2024-2025), and 102.4 Tbps platforms are emerging in 2H 2026. Per-port speeds have ramped from 400G to 800G, with 1.6T on the horizon. Yet when normalized to the bandwidth required per GPU/XPU, especially in all-to-all collective operations for training large models, the ratio is tilting unfavorably.
The effective scale-out network bandwidth per GPU has increased about 4-5x since 2022, while compute per GPU has surged 10x+. In large clusters, this translates to networks delivering only a fraction of the ideal bisection bandwidth needed for full GPU efficiency, leading to idle time, stragglers, and reduced overall throughput. The result is that networks are becoming the hidden limiter in AI factory economics.
The bandwidth of scale-up networks has moved faster than scale-out, achieving 7.2 Tbps per GPU (per direction) with the NVIDIA GB200 NVL72 platform. However unlike scale-out clusters, the size of scale-up domains has moved slowly. Scale-up started with 8 GPUs networked together with NVLink, first introduced in 2014 with the first NVIDIA DGX. This then finally scaled to 72 GPUs with the NVIDIA GB200 NVL72 which started shipping in 2025, over 10 years later. Frontier labs would like to see this go to 576 and then into the thousands to support larger mixture of experts models with trillions of parameters, more experts and increased active experts per GPU which will deliver more intelligence. This will require a fundamental redesign in silicon to provide a massive leap in radix. Here again, the network is lagging what the compute demands.
Drivers Amplifying the Stress on Network Infrastructure
Several forces are accelerating this imbalance:
Ever-larger models and context windows: Models with trillions of parameters, more experts and million-token contexts demand massive data movement for both training and inference. All-reduce and all-gather operations now consume a larger share of runtime.
Disaggregated and distributed architectures: The shift from monolithic GPU boxes to rack-scale and disaggregated designs (e.g., GPU/XPU pools connected via fabrics) multiplies east-west traffic. Emerging patterns like inference disaggregation and multi-site training further strain intra- and inter-cluster links.
Multimodal models and physical AI: modelsprocessing video, audio, and 3D world models are just emerging and consume an incredible amount of tokens driving both compute and networking
These drivers turn what was once a manageable interconnect challenge into a core constraint on scaling AI effectively.
Challenges to Accelerating Network Capability Growth
Closing this gap isn’t straightforward. Current-generation switch ASIC architectures face fundamental data path limitations and power envelopes that cap radix and SERDES density. New switching silicon architectures with more throughput and higher radix is the obvious answer.
Optics must evolve as well. For scale-out, pluggable optics, while reliable, introduce power and latency overheads that compound in massive fabrics and are also subject to failures, often due to human handling. The industry is slowly transitioning from copper-based DACs to linear pluggable optics (LPO), half re-timed (LRO), and then to near-packaged (NPO), and co-packaged optics (CPO), and advanced multi-wavelength solutions. These promise higher density, lower power and higher reliability, but they require breakthroughs in silicon photonics integration, thermal management, and ecosystem alignment (e.g., standards from UEC, OCP). Supply chain maturity for 2/3nm processes and optical components also lags compute roadmaps.
Scale-up will have to move to optics as the high bandwidth domains become larger and more I/O has to exit the compute tray. While copper has many positive attributes it becomes a barrier at 200G SerDes an beyond. CPO and potentially more exotic solutions like THz RF and microLED will help increase the density and reach of the IO to achieve scale-up domains of thousands of GPUs with 28.8Tbps per package or more.
Without these advances, network growth risks plateauing relative to GPU/XPU demands, forcing costly over-provisioning or acceptance of lower utilization.
Conclusion: The Network Must Catch Up, Or AI Progress Slows
The chasm between GPU/XPU data demands and scale-up and scale-out network delivery capability is widening, but it also represents one of the largest untapped opportunities in AI infrastructure. Vendors that solve for higher higher radix, effective bandwidth, lower latency, and disaggregated flexibility will unlock higher GPU/XPU utilization, better economics, and faster time-to-insight for customers.
Our outlook remains bullish on the overall market; AI networking could approach $200B+ annually by decade’s end, but success will hinge on closing this gap through innovation in fabrics, optics, and architectures. The race is on, and the winners will be those who recognize that in the AI era, compute without commensurate networking is compute left idle.