Google Cloud Next in San Francisco was the first in-person version of the event since 2019. Unsurprisingly, the focus was primarily on the strides Google has made with generative AI and the always-on AI collaborator Duet AI, which is now integrated into Google Workspace and Google Cloud.
You can check out other Moor Insights & Strategy coverage from Robert Kramer(data), Matt Kimball (compute), Melody Brue (modern work) and Will Townsend(networking). Paul Smith-Goodson is working on a report on Vertex AI. In thisarticle, I will discuss, at a high level, the enhancements Google is making to support generative AI.
New compute for generative AI and LLMs
Traditional computing infrastructure cannot support the exponentially growing demands of workloads from large language models (LLMs) or generative AI. As the number of parameters in LLMs increases, so does the need for a cost-effective and scalable AI-optimized infrastructure. In that context, at last week’s event, Google Cloud announced two new additions to its portfolio: an updated Tensor Processing Unit (TPU), which is a custom-designed AI accelerator optimized for training and inference of large AI models, and a graphics processing unit (GPU).
The new Cloud TPU v5e delivers up to 2x higher training performance per dollar and up to 2.5x better inference performance per dollar for LLMs and generative AI models compared to its predecessor, the Cloud TPU v4. The new model integrates with Google Kubernetes Engine (GKE), Vertex AI and leading frameworks such as Pytorch, JAX and TensorFlow.
To give you an idea of how these chips scale, a TPU v5e pod comprises 256 chips networked over ultra-fast links, delivering up to 100 quadrillion int8 operations per second or 100 petaops of compute power, supporting a range of LLM and generative AI model sizes.
The TPU has been an interesting one from a research standpoint, given that in the past every time I asked what percentage of AI workloads were run on the TPU, it was a very small number for enterprise customers. This year was different. I met with a few customers and heard anecdotally that the TPU was “sold out.” I will keep asking every year to see what is really happening with the TPU. I do believe that internally, for consumer Google and ads, the TPU is doing a lot.
Also available next month is the A3 GPU supercomputer, based on Nvidia H100 GPUs and run on Google Cloud VMs to power large-scale AI models. Scale and performance have improved compared to the prior generation, with 3x faster training and 10x greater networking bandwidth. The A3 VM features dual 4th Gen Intel Xeon Scalable processors, eight Nvidia H100 GPUs per VM and 2TB of host memory. It took a while for everybody to bring H100s to market, but they’re there now to train and infer foundational models more efficiently. It’s unclear to me how Google’s version is unique compared to Azure, Oracle or AWS
Amping up containerized AI workloads
Google was first to market with a managed Kubernetes service in 2014 with GKE. The new GKE Enterprise combines the best of GKE and Anthos (Google’s cloud-centric container) into one platform with a unified console. GKE Enterprise edition includes a new multicluster feature that enables grouping similar workloads into dedicated clusters. Each cluster can have configurations and policy guardrails to isolate sensitive workloads and delegate cluster management to other teams.
Most importantly, AI workloads should run more efficiently with GKE, saving on compute cycles by scaling up when demand rises and scaling down when it falls. GKE supports the new TPU v5e and the A3 VM with Nvidia H100 GPUs.
It would be great if these also ran on-prem with its distributed cloud.
Scaling AI models beyond the boundaries of physical TPU pods
Google has introduced Multislice (currently in preview), a training technology that enables scaling AI models beyond the limits of physical TPU pods up to tens of thousands of TPUs. Before Multislice, a training run could only use a single slice—that is, a reservable collection of chips connected via inter-chip interconnect (ICI). Currently, that would mean up to 3,072 TPU v4 chips, which is the largest slice in the largest Google Cloud TPU system. With Multislice, a training run can scale beyond a single slice and use multiple slices across several pods by communicating via data center networking (DCN).
Google has used Multislice technology to train its own large language model, PaLM 2. PaLM 2 excels at advanced reasoning tasks, including code and math, classification and question answering, translation and multilingual proficiency and natural language generation.
This efficiency should bear its head with price and performance.
Access to Google services from any cloud
Modern distributed applications running innovations such as generative AI across on-prem and hybrid cloud environments have special requirements. These applications demand high-performance networking to run foundation models, training and inferencing at scale, independent of where the data resides.
Google’s new Cross-Cloud Network is an open and programmable cloud networking platform that enables connectivity and security for applications across clouds and on-prem locations. Cross-Cloud Network is a portfolio of existing and new products from Google Cloud and partners. It includes machine-learning-powered security products such as Google Cloud Armor and security technologies from companies including Palo Alto Networks. Cross-Cloud Network optimizes workload performance with lower latency and higher throughput and bandwidth, which will be crucial as more organizations adopt generative AI.
Run AI applications anywhere
Google Distributed Cloud (GDC) is a set of managed hardware and software solutions based on Anthos that extends Google Cloud’s infrastructure and services to the edge and on-premise data centers. GDC brings AI applications to the edge or on-premises with a rich set of services and extensible hardware form factors. It can be used with connectivity to Google Cloud, or fully air-gapped. Enhancements include Vertex AI integrations and a new managed offering of AlloyDB Omni on GDC Hosted.
I need to do more research on Google’s commitment to run apps truly anywhere, but as I have said numerous times, the new hybrid, multi-cloud world is here to stay, and enterprises want “fabrics” that can be used across on-prem private clouds, the edge, public cloud and sovereign clouds.
Over the past few years, I have been following and researching Google Cloud, and the business has picked up steam ever since Thomas Kurian took over the reins as CEO of the business unit. Kurian changed the culture to meet customers where they are on their journeys instead of forcing them to jump through hoops. And he has single-handedly taken the “we won’t share your data” mantra and killed that fear for enterprises. At least the reasonable companies.
Google Cloud Next 2023 did not disappoint. I give Google full marks for the general availability of the Duet AI agent for Google Workspace and Google Cloud, which is a big deal. For me, this event marked the spot that Google Cloud was no longer following Microsoft, and I’d call it a tie right now. To be honest, it’s hard to distinguish the AI services right now between Google Cloud, AWS and Azure.
I wish Google would talk more about its planet-scale AI capabilities. When you add Google ads, Google consumer and Google Cloud, the company has the largest data estate, and I will venture the largest AI estate, on the planet. Scale matters in this game. Yet Google doesn’t talk about this enough.
When talking to enterprise customers, I invariably find that although Google Cloud is perhaps not the primary cloud provider, it is often utilized for its analytics and AI capabilities. As generative AI becomes more ubiquitous, Google can increase its market share. The generative AI opportunity could be Google Cloud’s big play to get to the next level of acceptance in the enterprise—and all the pieces of that puzzle seem to be falling into place.