AI Fabrics for Neo and Sovereign Clouds: Why Non-Blocking Design Matters

Introduction

AI Ethernet Switching provides the backbone for connecting GPUs. We discussed the different types of AI fabrics and the advantages Ethernet had in our previous blog. The fabrics are key to how Neo Clouds and Sovereign Clouds build next-generation AI clusters. Idle GPUs and congestion costs are significant. This blog will discuss the need for non-blocking network designs for back-end fabrics to ensure zero bottlenecks and GPU efficiency.

Neo Cloud and Sovereign Cloud Definitions

To quickly ground on what we define as these two categories. Neo Clouds are next-generation GPU-centric cloud providers tailored to the GPUaaS market (Coreweave, Nebius, Lambda, or the new foundation model builders like Anthropic, OpenAI, X.AI). Sovereign Clouds (HUMAIN [Saudi Arabia], G42 [UAE], Japan Sovereign AI) are the nation-state-controlled infrastructure ensuring data residency and security, and can be built at the national level or via public/private partnership.

In our research for Cloud CAPEX and DC Networking, we expect Neo Clouds to spend over $1T in IT equipment (Compute, Storage, and Networking) through the end of the decade. In DC networking alone, over $100B will be spent by the end of the decade in Neo Clouds alone. For Sovereign Cloud, we expect Sovereign Cloud spend to approach $400B through the end of the decade, with network investment approaching $40B. These two segments are incremental to what would be considered the “Rest of the Cloud” and traditional enterprise investments.

Why non-blocking fabrics matter?

Many AI training task failures and costly inefficiencies result from network failures and job completion delays. This leads to GPU idle time and diminishing returns due to delayed outcomes. The fabric’s ability to move large east-west traffic flows at low latency, handle unpredictable bursty workloads at scale, and safely move data between training clusters, inference end points, and cloud environments is key. A well-designed AI network is therefore critical – and it’s more than just bigger pipes. It is bound by a non-blocking Ethernet fabric that ensures secure, high-speed, efficient, and fully loss-averse data transfer between GPUs. While every customer will be different, even a single optic failure in a networking link can cost thousands of dollars.

AI Fabrics Power the Neo and Sovereign Clouds

AI fabrics represent a paradigm shift from traditional data center networks to intent-driven, AI-native overlays. They integrate compute, storage, and networking into a unified plane, handling the massive language model training runs to real-time inference. The network determines the performance of the system. Neo clouds, as emerging GPU-rental specialists, thrive on these fabrics to deliver elastic, high-density AI resources without legacy constraints. Sovereign clouds, meanwhile, embed sovereignty from the ground up—data stays within borders, processed by vetted hardware to comply with regulations such as the EU AI Act or Saudi Arabia’s NDMO guidelines. Non-blocking networking fabrics guarantee full bidirectional bandwidth with no port becoming a choke point under load.

Cisco’s Silicon One ASIC: The Non-Blocking Engine Behind Cisco’s Fabrics

Cisco Silicon One is a complete portfolio of networking devices across AI, hyperscalers, next-gen cloud providers, data center, enterprise, and service provider use cases. Introduced in 2019, Cisco Silicon One is playing a critical role in major networks around the world. It provides a single programmable ASIC architecture unifying routing and switching, and is purposely designed for efficiency, scalability, programmability, and security. On the switching side, Cisco’s current Silicon One portfolio ranges from 25 Gbps to 800 Gbps per port and up to 51.2 Tbps per ASIC, with a solid roadmap to support 1.6 Tbps ports and 102.4 Tbps ASICs as customers move towards higher speeds and larger radix switches. Cisco Silicon One, with its high-radix architecture, influences scalability, ultra-low latency, and reduced job completion time.

Silicon One has a hierarchical pipeline that supports full mesh topologies with zero blocking. While designs for Silicon One started before the AI boom, the ASIC is matched well with AI, and enhancements like ML-based congestion control and further tie-in to Ultra Ethernet help drive the performance needed in Neo and Sovereign Clouds. Cisco unveils the latest P200 addition for scale-across by seamlessly extending AI workloads and optimizing resources across multiple data center locations.

DC Switching: The Fabric’s Backbone

800 Gbps is the sweet spot for today’s AI fabrics, with a clean line of sight to 1.6 Tbps before the end of the decade. These fabrics can be a mix of Fixed and Modular platforms, but operators tend to prefer Fixed platforms. The two-tier leaf/spine architecture is able to handle most AI factories in this category, but a third tier can be added for the largest deployments. Cisco offers a mix of Nexus 9000 and Cisco 8000 switches based on customers’ unique requirements. We believe the Cisco 8000 SONiC based switch will be more popular, given the high-end nature of these customers’ buildouts.

Data center switching forms the skeletal structure of AI fabrics, evolving from 400G to 800G spines to feed AI’s voracious bandwidth. Cisco offers Nexus 9000 and Cisco 8000 SONiC based switch based on customers’ use case. Cisco’s Nexus series, powered by Silicon One, leads here: the Nexus 9000 delivers non-blocking Clos fabrics, scaling to thousands of ports without over-subscription. Strong RoCE support, along with low-latency fabrics, allows for the reduction of link flaps and increased model performance. Depending on the size of the operator, having to pause and go back to a checkpoint can cost over $1M per occurrence.

Some new sovereign clouds use air-gapped compliance zones and advanced security/network management platforms to provide operational transparency and uptime guarantees. There is also a significant set of government mandates. For example, Middle East governments enforce strict local data residency, cybersecurity controls, and bank compliance (SAMA, UAE NCA), driving design and vendor decisions for infrastructure.

Cisco’s Middle East Sovereign Cloud Deployments

The Middle East is a sovereign cloud hotspot, with nations like Saudi Arabia investing $40B+ in AI by 2030 to diversify from oil. In May 2025, Cisco partnered with Saudi Arabia’s HUMAIN to construct the kingdom’s foundational AI infrastructure. This includes sovereign cloud data centers in Riyadh and Jeddah, featuring Silicon One-based fabrics for non-blocking AI workloads. The setup supports 1PB/s intra-cluster bandwidth, enabling learning on national datasets without cross-border data movement.

Zain KSA’s July 2025 collaboration with Cisco is set to deploy end-to-end AI network fabrics across 5G edges and core DCs. Here, non-blocking DC switching handles telco-grade AI for predictive maintenance, with Silicon One ASICs ensuring 99.999% uptime in air-gapped zones. Cisco also expanded cloud data centers in Saudi Arabia, blending neo cloud elasticity with sovereign controls to train regional LLMs on Arabic IP.

These deployments aren’t isolated. They showcase the speed and size of Sovereign AI. Something that just a few years ago would be considered hyperscale in size and complexity but is now done via partnerships between vendors and between the public and private sectors.

Vendor Partnerships Enhance the Ecosystem

No fabric thrives in isolation and requires compute, storage, and strong SP/integrator relationships. Cisco’s vendor alliances helped drive these AI deployments, blending best-of-breed tech for Neo and sovereign clouds.

Going back to the HUMAIN example above, the Cisco/NVIDIA partnership integrates NVIDIA GPUs into Silicon One fabrics, creating hybrid neo-sovereign platforms. Cisco Nexus 9000 reference architecture is NVIDIA-certified for AI clusters and supports advanced RDMA over Converged Ethernet (RoCEv2) benchmarking. This partnership yields unified management via Cisco’s Nexus Dashboard, converging ACI and NX-OS for simplified ops.

These collaborations aren’t just handshakes, as seen in Cisco’s backlog growth. They ultimately provide a fast time to first token (faster time to AI), which is something all customers are trying to achieve. Faster time to process tokens and lower technical burden to operate are key when an extra day delay or downtime is measured in the $5-10M range for every $1B invested.

Next Steps for Customers That Are Earlier on Their Journey

AI fabrics are redefining clouds, but a non-blocking design ensures they deliver on sovereignty and scale. Customers who are earlier on their journey, even those included in the enterprise space, can learn from these early deployments and should start to look at how to move into the AI journey. Start small, where that is a small pilot project focused on networking or taking a small workload and applying these AI principles; it’s not too early, or too late to start.

I found the two links below as a great place to learn more. The first features benchmarking statistics and good charts on performance and benchmarking AI fabrics. The second dives a little bit deeper into design and some of the changes in AI fabric infrastructure

###