Choosing InfiniBand or Ethernet? Ethernet-based AI Networking to grow at over 100% CAGR

DriveNets Highlights AI/ML Network Performance at a High Scale, with Broadcom’s Jericho 2C+ and Jericho 3-AI.

DriveNets has announced that its Network Cloud-AI software, supporting Broadcom’s Jericho 3-AI-based white boxes and Ramon 3 chipset, is now available for order and also revealed some very impressive performance results. 

One of the highlights being a 10% reduction in Job Completion Time (JCT) with the ability to scale out to 32K 800G ports in a cluster. That is a lot of AI/ML servers and GPUs in a single cluster, but is a scale many customers are planning for in the 800G networking generation.

AI/ML Ethernet is on a journey to become a multi-billion dollar market and help the Ethernet Switch market grow to over $30B by 2027, a remarkable stat for the power and scale of Ethernet. Data center switching bandwidth in AI continues to become a more significant part of the mix, growing at over 100% CAGR. With more and more customers using Ethernet for AI, AI is helping drive the switching market revenue to grow even faster than the traditional server access market.

Figure 1

DriveNets Network Cloud-AI is based on OCP’s Distributed Disaggregated Chassis (DDC) architecture, and that topology was showcased in the OCP experience center. We were impressed with the increased momentum around DDC as both hyperscaler and traditional Telco customers embrace the technology. The main value of DDC over a standard Ethernet Clos architecture is that its distributed cell-based fabric provides a predictable lossless Ethernet connectivity that minimizes GPU idel cycles and maximizes the utilization of the AI hardware.  Its improved JCT performance and ultra-fast recovery which supports seamless failure mitigation, provides similar benefits to InfiniBand with the openness of Ethernet.

These types of performance improvements make Ethernet a good fit for large AI clusters, and will drive the growth of Ethernet in AI back-end networks that is found in 650 Group’s Ethernet switch market forecasts.

We note AI is a new use case compared to a few years ago as customers embrace the cost-effective, high-scale nature of Fixed 1RU switches instead of larger modular chassis switches. We continue to see this trend playing out each quarter in our Data Center Switching and Routing reports. Without DDC, the AI cluster cannot scale to that many ports. Cloud providers struggled with modular chassis switches and settled on fixed topologies in the early cloud days.

AI/ML networking is following the very same path forward. DriveNets deployments at AT&T help demonstrate the size and scale of DDC with Jericho-based chips, and we expect AI/ML networks to scale to even higher amounts of ports and capacity.