InfiniBand and Ethernet Switch Markets Thrive with AI/ML Support

Bandwidth Forecast to Grow at over 100% Through 2027 as AI Networking Approaches $10B in Market Size

The AI/ML market surged into the spotlight early this year as many people saw the potential of AI with ChatGPT. However, AI networks have already been thriving for the past two years. In fact, we have been tracking AI/ML networking for nearly two years and see AI/ML as a massive opportunity for networking and one of the main drivers for DC networking growth in our forecasts.

The key to AI/ML’s impact on networking is the tremendous amount of bandwidth AI models need to train, new workloads, and the powerful inference solutions that appear in the market. In addition, many verticals will go through multiple digitization efforts because of AI during the next ten years. Finally, consumers will see rapid changes because of AI in their personal and professional lives.

Two powerful networks are at the heart of AI/ML clusters. First, a compute (often called back-end or internal) network focused on connecting the AI clusters computing elements. The most common example of AI deployed today would be NVIDIA’s DGX systems interconnected to each other, internal memory, and storage with InfiniBand switches. The second network is external, focused on connecting the AI pod/racks to the rest of the data center.

There is a lot of debate over Ethernet vs. InfiniBand, or how one technology succeeded at the expense or demise of another, and those debates are misplaced. Ethernet and InfiniBand each have advantages, and both thrive in the same market. In our research, we expect the InfiniBand market to more than double compared to the 2022 market size. At the same time, vendors, such as NVIDIA, offer both InfiniBand and Ethernet products to provide customers with both solutions.

There are several advantages InfiniBand has. First, the technology has been around for 20 years and is laser-focused on HPC networks. Second, it is a technology built from its inception for HPC and AI networks. Third, AI can use low latency and items built into the protocol, such as in-network data processing, which helps accelerate AI further. An excellent example of this is InfiniBand’s SHARP In-Network Computing technology improves AI data reduction operations (a key element in AI training) throughput by a factor of two, which makes InfiniBand the highest-performing network for AI platforms, and the leading solution for the computing network.  To date, InfiniBand networks in AI have also had higher-speed ports and a higher radix.  NVIDIA highlighted some performance results at GTC using MLPerf benchmarking that go into some comparison. See NVIDIA’s blog here.

Ethernet is the leading external and management network in AI platforms. On the Ethernet side, Leading vendors from 2022 included Arista, HPE, Slingshot, NVIDIA, Cisco, and Juniper. Each Ethernet switch vendor is coming at AI from their strengths in the Cloud and HPC markets. With the market thriving, we can expect vendors to spend significant time and resources to grow with the AI/ML market.

We can expect specific workloads and clusters to be built with InfiniBand, some with Ethernet, and some with both technologies. So, the key is that both networks can and will live together. In addition, each is experiencing robust growth as the AI Networking market grows from $2B in 2022 to over $10B in 2027. To put this growth in perspective, AI/ML growth will outpace that which the Cloud experienced when it first because a unique market nearly 20 years ago.