Google Gemini AI 1.0 and New TPU

By Patrick Moorhead - December 13, 2023

The Six Five team discusses Google Gemini AI 1.0 and New TPU.

If you are interested in watching the full episode you can check it out here.

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we ask that you do not treat us as such.

Transcript:

Daniel Newman: Google came out with some big announcements, new LLM and new silicon. First of all, Pat, what are you more excited about, LLM or Silicon?

Patrick Moorhead: I have to tell you, I’m probably excited by both equally, and I know that’s a-

Daniel Newman: Cop out.

Patrick Moorhead: That’s a total cop out, but I got to tell you, I see silicon and I want to dive into it. There’s more LLMs coming out than there is new silicon, but Google came out with TPU v5p. And what is a TPU? TPU is an ASIC that is likely built by Broadcom that is hyper connected together. We talked about the bandwidth part to do a training and inference. And it’s the fifth generation, and you might argue it’s the sixth generation, but they’re calling it 5p. When it first came out, and I would say the first four generations were really about internal use cases. So Google Search, Google Photos, and it was really targeted at machine learning and maybe a little bit of deep learning in there, but as it related to Google Cloud, they really didn’t open it up to people until about the fourth generation.

I would say really opened it up on the fifth generation because when I talked with Google about v4, there really wasn’t a lot of Google Cloud action. I think the reason for that is, excuse me, they may not have needed it. Then the early days in the enterprise, it was really an NVIDIA show for training and people wanted to have that CUDA compatibility. But as NVIDIA gets into 52-week lead times on their AI solutions, as we see ASICs are absolutely more efficient, that’s not even debatable, folks. ASICs are just more. It’s harder to program. You have to put in that layer.

Now I feel like TPU v5, I just got to say it, I feel like it was rushed. Not a lot of preview time on it, not a lot of time to soak. It just hit. The news outlets that covered it first were basically consumer. It’s like, wait a second, what does this mean to the enterprise? Listen, I love consumer and I think it’s sexy and I think it’s cool, but it’s like analysts, we don’t really impact that. I have a lot of questions. And then when I look at the, I’ll call it the raw specifications, which don’t always matter, like bandwidth and stuff like that, AMD and NVIDIA kind of run circles around it. Also, there was no competitive comparisons. You might say, well, AWS didn’t have competitive comparisons. But you know what they said? They said that Trainium 2 is the highest performance, most efficient ship to do training and inference for LLMs that they offer. What that means by proxy is we’re higher performance than the NVIDIA H100. I don’t know if that translated to the H200 that they announced as well, but still it doesn’t matter.

You really don’t know at the end of the day what this can do. The second question I have, probably the final one is how is this available to Google Cloud customers? I think it’s through Vertex AI. I could be wrong. I hope it is. But my final editorial comment is the timing was interesting, which was the morning of AMD’s advancing AI event. Maybe that’s because it was a AMD event, had Azure and Oracle Cloud and it didn’t include Google with AMD’s new stuff. But when I was at AMD, I was Google’s largest chip supplier in 2005. To be a partner that long, like 18, 19 years and to drop a bomb right in front of your partner’s event was interesting.

Daniel Newman: Pat, you hit a lot of the high notes. First of all, the problem, we’re in a bit of this era of leapfrog and compete and everybody’s showing everything and wanting to get it out to market fast. I think some of the reason that we’re seeing, whether it’s silicon innovation coming faster than maybe would be optimal or it’s LLMs being launched faster than what would be optimal, to me, it’s more about needing to continuously show progress in the public eye. Google I think learned its lesson, I hope so, on the first Bard announcement where it definitely fell on its face, but I do think it recovered quite nicely. This one’s interesting, Pat, because the Gemini demo is really impressive, but there are some discrepancies, and some of the discrepancies in the market are, was the demo real? And so I’ve been reading a lot about this and I want to kind of give a two-fold answer to this.

Pat, there’s always been a little bit of the behind the curtain of a demo at any event. The question is on a scale of, hey, we kind of optimized it a little bit into the demo video versus we pushed the truck down a hill. I think it’s a little further to the left or the right, but I do think that this is kind of one of these where there’s explanations and the problem that people don’t understand is we’re seeing progress being made in real time. And so everything the Gemini demo did, it is capable of doing, but it wasn’t capable of doing in the exact way it was presented, like the rock paper scissors demo. Right now, it couldn’t do it in real time watching the hand gestures back to back to back, but if you prompted it with an image of all three at the same time and hinted that it’s a game, it could do that.

I like the analogy of that because what we’re seeing is what we saw is the end state. This is where we’re going to be and we probably will be there in a blink of an eye, but we’re not actually quite there yet, but this stuff is being optimized in real time. I think the same thing could be said about the silicon. Look, they’re training the model, and I think the real important thing Google wanted to get out there is that it’s building silicon that can be used to both train its own models, which is a big sort of statement piece that all the hyperscalers are wanting to make right now. AWS was able to do this first with Trainium 2 and Anthropic, and of course what it’s going to do with Titan. And Google doesn’t want to be left behind, so it’s like, “Hey, we’re doing this.”

But yes, this is definitely not the last piece of silicon they’re going to develop. I’m pretty sure they probably already are taping in or working on their next two, three versions of this thing. It’s kind like, am I happy with where it’s at right now? Do they make as much headway? Are they ready to compete with the…? I saw some analysts write about the H200 like it’s already out. We love to tell stories about things that don’t exist and then we make it look like they do. Vendors love when people do that, but the truth is everything’s a bit in motion right now.

Patrick Moorhead: Here was the sensitivity though. So let’s dial back a year ago, I think when you and I were at the Microsoft copilot event, and we went right from the big announcement a year ago to they let us, with a person next to us, ask copilot questions. Hey, I have $700 and I’m in Barcelona. What should I do? Tell me what you would recommend and here’s where I’m there. It got it right most of the time. It made mistakes. Then a couple of weeks after Google does their first Bard event, their stock goes down like 10% and you can’t even find the replay. It was a disaster, right?

Daniel Newman: Yep.

Patrick Moorhead: But you and I, the whole time, are like, “This is a marathon and not a sprint,” and here we are. I even think even with Google stubbing potentially… I mean, they stubbed their toe in the press on this in the way that they did the demo. I still don’t see any knockout blows that miraculously makes Google Search lose 30 points of market share overnight, but I think it’s very important for Google to do a follow-up where they just nail it. Kind of like they did with their enterprise event that you and I attended and then at I.O., right? They came through and it was very credible, very planned, but I just thought this was rushed.

Daniel Newman: A little bit, and like I said, every demo along the way has had just a little bit of Hollywood, and so the question is pure manipulation versus Hollywood effects.

Patrick Moorhead: The car down, that is so good. That is solid gold, Dan.

Daniel Newman: Thank you. That’s a Six Five annual highlight as we wrap up the year.

Patrick Moorhead
+ posts

Patrick founded the firm based on his real-world world technology experiences with the understanding of what he wasn’t getting from analysts and consultants. Ten years later, Patrick is ranked #1 among technology industry analysts in terms of “power” (ARInsights)  in “press citations” (Apollo Research). Moorhead is a contributor at Forbes and frequently appears on CNBC. He is a broad-based analyst covering a wide variety of topics including the cloud, enterprise SaaS, collaboration, client computing, and semiconductors. He has 30 years of experience including 15 years of executive experience at high tech companies (NCR, AT&T, Compaq, now HP, and AMD) leading strategy, product management, product marketing, and corporate marketing, including three industry board appointments.