AI took the world by storm early this year, with ChatGPT showing a transformational leap in how humans interact with machines and showing how quickly many parts of life will change with AI. Before 2022, AI was niche, relegated to large research labs, Google’s own TPU efforts, and small deployments at the other large Hyperscalers (Amazon, Meta, Microsoft, and Oracle). Throughout the first half of 2023, each major vertical and business is analyzing the implications of AI on their companies and how to embrace the opportunity best.
Size of the AI Networking Market for Ethernet and InfiniBand
To date, AI traffic and the networking opportunity remain relatively small. However, in 2022, the AI networking market reached $2 B, with InfiniBand responsible for 75% of that revenue. As we look towards 2027, AI networking will surge to over $10 B in revenue, with Ethernet exceeding $6 B. Both Ethernet and InfiniBand will grow robustly during this time. At the same time, bandwidth for AI workloads will grow over 100% per year, well above the typical data center bandwidth growth in the 30-40% range annually. This is key. AI will be the most significant growth driver in the Ethernet Switch market for the rest of the decade.
AI Networking Topologies are Different
AI clusters typically have two distinct networks in them. The first, and more traditional is all servers’ external or outward-facing “front-end” network, which need to be based on Ethernet and IP protocols as they face the public Internet. The main difference in AI is the need to get large amounts of data into the cluster, so the pipe is larger than a traditional web or email server. Future AI designs will drive multiple 112G SERDES lanes per server and manifest as 100 G or 400 G ports. As a result, AI server speeds for this network will be 1-2 generations ahead of traditional computing.
The second and new network is the internal or “back-end” network. This is a unique network connecting the AI clusters resources together. For an AI cluster, connecting to its shared storage and memory across compute resources and doing those tasks rapidly and without deviations in latency becomes critical to maximizing the cluster’s performance. Future AI designs for this new network will be multiple 400 G, 800 G or higher ports per compute server.
AI workloads are heavily dependent on this back-end network, as packet-loss or even jitter cause degradation in the workload performance measured in JCT (Job Completion Time), due to the increase in GPU idle-cycles awaiting network resources. This calls for a predictable, lossless back-end networking solution, which at scale is a significant challenge to any networking technology.
This is why the AI networking needs require new hardware and software solutions that can increase the AI cluster’s performance maximizing the use of AI compute resources. Such a new network can drive up to 10% cost savings of the entire AI infrastructure.
DriveNets Enters the Market with a New Approach and a Proven Architecture
DriveNets enters the AI networking market with a unique and proven architecture. Current solutions are based either on Ethernet Clos architecture, that is standard but cannot provide the required performance at scale, or proprietary solutions that have the right scale but ‘vendor lock’, DriveNets Network Cloud utilizes the OCP DDC (Open Compute Project Distributed Disaggregated Chassis) architecture which enable AI clusters to scale at a very high performance while keeping JCT to a minimum (much lower than with standard Ethernet). We note that this architecture runs the majority of AT&T’s traffic in the US and scales beyond the current needs of AI in terms of nodes. DriveNets brings this scale with impressive AI benchmarking results to a market that is trying to find an optimal solution. We view their entry as a positive for the industry as vendors stake their expertise in these new network designs.
Second Half 2023 and Beyond
We expect AI to race beyond the largest Hyperscalers with a combination of premises and tier-2 cloud-based offerings. As we look towards early 2024, we expect next-generation designs from the Hyperscalers and first-generation designs in the enterprise to increase dramatically as each vertical and enterprise embraces an AI-led digitization/modernization effort. By Alan Weckel, Founder and Technology Analyst a