Broadcom Scales Connectivity And Performance For Advanced AI Workloads

By Patrick Moorhead - April 23, 2024

Throughout the history of technological inflection points, the infrastructure “quadrangle” (compute, memory, storage, connectivity) gets tested. A weak link, and any of the four variables will stress the system. When it comes to AI workloads, connectivity and performance matter. This is especially true given the immense volumes of data that must be processed to train large language models. Generative AI is poised to deliver massive disruption to nearly every industry, and infrastructure providers rightly want to capitalize on the AI gold rush. The applications for GAI and other forms of AI are extensive, ranging from improved security posture to new levels of network operational efficiency for enterprises and service providers, as well as the ability to accelerate digital transformation for almost any organization on a broad scale.

Broadcom has long been regarded as a leader in the merchant silicon space. However, recently the company has been on a journey to broaden its reach, as evidenced by its acquisition of VMware. It also recognizes the value of custom silicon designed to optimize discrete workloads that run both on-premises and in the cloud.

This year, the demand for Broadcom’s AI silicon solutions is expected to increase to as much as 35% of its total semiconductor revenue, driving a revised AI revenue target of more than $10 billion for its fiscal 2024. I recently spent time with executives during the Broadcom AI Investor Day and I would like to provide my insights related to the company’s AI silicon mission. I have also asked Will Townsend, who leads the Moor Insights & Strategy networking and security practices, and Matt Kimball, who leads our compute and storage practices, to weigh in with their analysis in this article.

Open, Scalable, Power Efficient

At the AI Investor Day, Broadcom drove home an overarching theme centered on the need for open silicon solutions that are extremely power efficient to enable AI networks at scale. This thinking pervades Broadcom’s approach to AI acceleration and its commitment to Ethernet, PCIe and optics—the critical foundational technologies for AI connectivity. The company’s mantra to develop “Open, Scalable, Power Efficient” silicon solutions to power a rich AI infrastructure ecosystem seems to be very complementary to the breadth of its portfolio.

Tailored AI Acceleration For Consumer AI

Broadcom has taken a unique approach to the AI accelerator market. While virtually every other semiconductor company has focused on building big silicon to support the commercial/enterprise segment, Broadcom has focused on enabling consumer AI. This is a market where the use cases are well defined and (consumer) AI companies are providing a solution to a problem. A good example of this is social media platforms using image recognition to enable tagging of people.

By contrast, in many ways the current enterprise AI hype cycle is a solution in search of many problems. Certainly, AI does and will provide great value to the enterprise. But today the enterprise side is not as clear as the consumer side, where customer needs have been well defined and the AI models required have been identified.

Except for a few niche use cases, AI deployments in the enterprise are in a nascent stage. Because of this, enterprise IT organizations continue to struggle to understand what the ideal AI platform would look like to meet the organization’s needs—or if this is even possible. As these enterprise needs become more apparent, the variety of use cases (and underlying compute characteristics) point to GPUs as the ideal training accelerator. Because of their architecture, an enterprise IT organization can generally rely on GPUs to deliver good performance and support for its workloads. This applies even if performance is less than ideal, even if power consumption is a little too high and even if the system generates a little more heat than hoped for. The net-net is that these generally available GPUs meet the “good enough” criteria to support most enterprise AI deployments.

The consumer AI space and the hyperscale consumer AI providers that serve it are different. The use cases are well defined, and the infrastructure required to deliver optimal user experience is well known. In fact, Broadcom has already served this market for ten years. Because of these factors and the sheer size of the relevant hyperscale environments, a “good enough” deployment would be costly both fiscally and in terms of performance. When considering the value of these platforms (defined as performance/total cost of ownership), the penalties for an unoptimized AI training platform add up. A 10% tax on performance and a 12% tax on power, while costly, can be absorbed by an organization when deploying across, say, 100 servers. These same taxes become extraordinarily burdensome when scaling across 100,000 servers.

Performant AI at scale requires customization

This is why consumer AI hyperscalers have looked to work with Broadcom to develop what it refers to as an XPU—an AI accelerator that is not quite a GPU and certainly not an ASIC. Instead, it has all the foundational elements of a computational platform—memory, networking, interconnectivity and I/O. The only thing missing is the compute processing unit architecture. This critical element, along with the memory and I/O architectures, is optimized for each customer to deliver that ideal performance/TCO equation.

The Broadcom folks liken this process to making an automobile. It’s like having an entire car ready for a customer, except for the engine. Rather than drop in any old engine, the manufacturer sits down with the driver to better understand how and where the vehicle will be used. Based on this understanding, the manufacturer installs an engine that will deliver the best performance at the lowest fuel consumption. What Broadcom delivers is a customer-specific AI accelerator that is performant and efficient.

Broadcom delivers AI acceleration from the ground up

What Broadcom is doing with AI acceleration is semicustom design at its best. It builds best-of-breed platforms on the most advanced packaging, which enables tailoring for each customer. This is a tremendous display of engineering prowess and efficiency. Further, the company has an IP portfolio that rivals any semiconductor player in the market. This approach allows Broadcom to co-engineer and deliver solutions for customers in months versus years.

The XPU co-development cycle

By absorbing more and more of the complexity from AI infrastructure into its own designs, Broadcom is enabling its customers—the largest of consumer AI providers—to focus on delivering AI services at the highest performance and lowest cost. For example, when looking at interconnects that are critical to openness and AI performance such as PCIe, customers often find themselves waiting for the ecosystem to catch up so they can take advantage of higher speeds and lower latencies. However, because Broadcom is in a continuous innovation cycle with its XPU platform engineering, it is able to deliver the advantages of the newest generations of PCIe long before the mass market has adopted the standard.

Broadcom’s approach is clearly not a play for the enterprise AI market. That market, which has incredible volume, has been dominated by Nvidia, with AMD and Intel fighting to grab some market share. The enterprise space, driven by commercialized software and frameworks, would be extremely difficult to penetrate. That’s partly because of Nvidia’s grip on it, and partly because the Broadcom semicustom approach doesn’t scale across a large number of customers.

A Connectivity Battle — Ethernet Versus InfiniBand

Nvidia’s leadership in AI is evidenced by its current financial performance. The company is quickly building a compelling platform approach that includes silicon, software and curated LLMs, potentially delivering generative AI at scale anchored by its leadership in GPUs. The astounding performance enhancements of its latest Blackwell GPU iteration, introduced at Nvidia GTC this year, cements its market position. (Our colleague Anshel Sag wrote about this in detail last week.) However, the jury is still out regarding the broad adoption of the company’s InfiniBand interconnect solution, acquired through its purchase of Mellanox five years ago. Bluntly stated, InfiniBand represents a somewhat proprietary architecture, despite its ability to deliver high bandwidth, reliability and low latency for supercomputing applications. On the other hand, Ethernet has proven its staying power since 1980 and is a mature, open connectivity technology supported by a broad ecosystem.

At Broadcom’s AI Investor Day, the company made a compelling case for the viability of Ethernet to support next-generation AI workloads, connecting GPUs as well as custom and merchant accelerators within hyperscale environments. Broadcom’s success in delivering Ethernet at scale is undisputed, and includes network adapters, PHY and PoE solutions, switches and switch fabric devices, software and reference designs.

The company claims that among endpoint-scheduled devices, its Tomahawk 5 Ethernet network switch delivers twice the throughput of competing solutions, at 51.2 Tbps, while also delivering power efficiency and acceleration of AI and ML workloads through cognitive routing and advanced telemetry. Furthermore, Tomahawk 5 resource virtualization aims to improve security and enable the efficient utilization of shared infrastructure at scale. For switch-scheduled devices, the Jericho3AI Ethernet switch series is designed to provide optimized load balancing, congestion-free operations and sub-10ns auto-path convergence for zero-impact failover—providing a remarkable 32,000 AI accelerator ports at 800 GbE.

Endpoint- and switch-scheduled AI network solutions from Broadcom

Finally, Broadcom’s new Thor2 represents the industry’s first 5nm 400 GbE network interface card. Its SmartNIC-like capabilities scale to support 250 million packets per second—at 50% lower power consumption than previous generations. Based on industry standard RoCE, it is also compatible with all XPUs while offering advanced congestion control and Broadcom’s ability to deliver it cost-effectively.

At a high level, Broadcom’s development of its next-generation Tomahawk, Jericho and Thor solutions represents a leap forward in silicon design, delivering attractive cost points relative to InfiniBand and making them highly viable considerations for power-hungry AI applications. It also demonstrates that Broadcom’s merchant silicon can compete head-to-head with more custom, purpose-built offerings from Intel and others.

PCIe’s Value Proposition

PCIe enjoys broad adoption, leveraging an open industry standard that provides low latency. Broadcom has considerable depth in PCIe development dating back over two decades. PCIe is currently used widely in connecting the XPUs and the CPUs within each server. Broadcom’s current switching products offer 40% more efficiency at 50% less power, making them compelling as interconnect solutions inside the server. However, PCIe’s bandwidth requires improvement to make it a viable consideration for scale-up fabrics. The company’s ongoing collaboration with AMD aims to accomplish this with PCIe 7.

By improving both bandwidth performance and power efficiency, Broadcom is closing the gap for the even broader application of PCIe switching to AI connectivity. The power consumed by AI workloads is alarming to many enterprises, especially given the focus on more sustainable and greener IT operations. Broadcom clearly understands this concern and is focused on delivering customers an optimal balance of performance and power efficiency.

Broadcom and AMD collaborate on PCIe Gen 7 switch

Why Co-Packaged Optics Are Relevant

Optical interconnects are critical for both front-end and back-end networking in AI clusters. As the number of XPUs increases, besides increased bandwidth optical interconnects drive significant costs. CPO—co-packaged optics—offers an elegant solution to this challenge. CPO, which combines optics and silicon on the same substrate, delivers the lowest cost per bit, eliminating a significant number of components and interconnects. CPO also provides an exceptional balance of performance and power consumption through the elimination of significant levels of interconnect power. Finally, CPO improves overall reliability given that it integrates pluggable laser components with underlying silicon.

CPO delivers lower cost, higher performance and increased reliability

Broadcom has consistently demonstrated its leadership position in CPO in recent years. The company was also the first to offer commercial CPO with pluggable lasers—and did so at a significant power and cost savings. At Broadcom’s AI Investor Day, the company showed its continued leadership by demonstrating Bailly, the industry’s first 51.2 Tbps CPO Ethernet switch. The company claims that Bailly provides the lowest optical interconnect latency, power and cost in a highly integrated silicon chiplet configuration, replacing 128 400G optical transceivers. This device is a flexible, high-performance consideration for optical Ethernet switching thanks to a detachable optical connector, pluggable lasers and integrated CMOS EIC with TIA and requisite driver. In addition, the company also talked about its manufacturing innovations that are critical for reliable CPO mass production.

While Broadcom makes good strides in CPO, it continues to also lead the market in the VCSEL space. Broadcom claims that it is offering the industry’s first optical interconnect designed for AI with its 200G VCSEL. It enables 1.6T optical links, facilitating terabit connectivity, and as a bonus supports both Ethernet and InfiniBand protocols. This dual compatibility is an important point to highlight, as it demonstrates Broadcom’s commitment to providing enterprises with interconnect flexibility.

A Strong Foundation Matters

Broadcom continues to significantly invest in foundational silicon IP to ensure optimized performance for AI connectivity. SerDes is the foundation for all analog solutions related to networking and I/O. The better the SerDes, the higher the chance that the end product will be more competitive for power, performance and reliability.

The company’s 224G SerDes provides 1.6 Tbps of throughput, enabling up to 1,024 lanes within a single chip. Furthermore, Broadcom’s PAM4 DSP is positioned to support FR+ (2km) reaches, enabling a path to 1.6T connectivity in a highly integrated 5nm package.

Broadcom’s focus on PAM4 optics puts it squarely into one of Marvell’s core competencies. Consequently, this represents significant potential upside for Broadcom, especially if it can take market share from an established incumbent.

Final Thoughts

At its AI Investor Day, Broadcom needed to demonstrate to investors that it is a relevant technology company for the next generation of datacenter AI in a world that is increasingly looking closed. Many companies are taking a walled-garden approach, combining hardware and software to create their own platforms. Broadcom is decidedly taking a different tack by being more flexible.

I believe that Broadcom demonstrated its relevance and more, not only by showing that it is a leader in AI networking, but also by its delivery of custom AI compute silicon. While the debate will likely rage for some time regarding Ethernet versus InfiniBand adoption for AI workloads, I believe that Broadcom is positioned well regardless.

Note: Moor Insights & Strategy principal analysts Will Townsend and Matt Kimball also contributed to this article.

Patrick Moorhead
+ posts

Patrick founded the firm based on his real-world world technology experiences with the understanding of what he wasn’t getting from analysts and consultants. Ten years later, Patrick is ranked #1 among technology industry analysts in terms of “power” (ARInsights)  in “press citations” (Apollo Research). Moorhead is a contributor at Forbes and frequently appears on CNBC. He is a broad-based analyst covering a wide variety of topics including the cloud, enterprise SaaS, collaboration, client computing, and semiconductors. He has 30 years of experience including 15 years of executive experience at high tech companies (NCR, AT&T, Compaq, now HP, and AMD) leading strategy, product management, product marketing, and corporate marketing, including three industry board appointments.