On this episode of The Six Five – On The Road, hosts Daniel Newman and Patrick Moorhead welcome Chris Tobias, GM, Americas Technology Leadership/Global Platform ISV Account Team at Intel Corporation for a conversation during VMWare Explore in Las Vegas.
Their discussion covers:
- The evolution in the industry of running everything on the core CPU to now using more specific accelerators and how impacts Intel’s strategy
- Why Intel has chosen a cores and accelerator strategy for Xeon processors, and what workloads these accelerators address
- Details on their AI accelerator, Intel AMX
- How Intel AMX compares to a discrete GPU, as well as software and frameworks
Watch the video here:
Or Listen to the full audio here:
Disclaimer: The Six Five webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.
Sponsored by Intel
Patrick Moorhead: Hi, this is Pat Moorhead and we are live here at VMware Explore 2023, and I’m here with my incredible co-host, Daniel Newman. Dan, how you doing?
Daniel Newman: Good morning from Las Vegas. It’s been a while since we’ve been here for a conference, days, minutes, weeks.
Patrick Moorhead: I know, maybe it’s been a month and I think we’re going to be back here two times more here in Vegas. Listen, this is part of the analyst grind. We go to events, we write stuff, we get up, we do videos like this, but I digress. No. One thing I’m really impressed about at every Explore, and I think this is my 10th as an industry analyst… I came here when I was with those other companies a couple of times. I’m always impressed at the ecosystem activity around it. Whether it’s backup software and restore on the VMware platform, or you have shit makers as well who are adding an incredible amount of value. In fact, and we’ve joked on this about the show, you can’t run software on oxygen.
Daniel Newman: Or air.
Patrick Moorhead: Or air. You need semiconductors. We both thought semiconductors were cool before a lot of other people caught onto that.
Daniel Newman: Yeah, we definitely did. VMware does have a very impressive ecosystem. If you kind of look at where the cloud is heading and you look at what’s going on in the industry, AI obviously is in vogue. Of course, you need to be able to power those data centers and the ability to move data cloud to edge, edge to cloud, and of course the entire ecosystem of running our enterprise apps, Pat, the chip makers are always going to be near and dear to our heart. Of course, you and I love talking about this, so maybe we bring on our guest.
Patrick Moorhead: Yeah, let’s do that. I would say, I mean, these are just the facts folks. More VMware workloads operate on Intel than any other processor out there on the planet. With that, I’d like to introduce Chris from Intel. Chris, how you doing?
Chris Tobias: Doing great this morning. Thanks for having me.
Patrick Moorhead: Yeah, first time guest on the Six Five, not the first person from Intel on The Six Five. In fact, your fearless leader, Pat Gelsinger, and a lot of his, the SLT was on the Six Five and continues to be on The Six Five, so it’s great to have you.
Chris Tobias: Absolutely. Love being here. Love talking about our partnership and our work with VMware.
Patrick Moorhead: I think you probably also think that semiconductors are cool too.
Chris Tobias: Absolutely. I’m second generation semiconductor by…
Patrick Moorhead: I love it.
Daniel Newman: Well, listen, let’s start off talking a little bit about the move of AI. There’s a lot of discussion about how it’s being moved from sort of the traditional core CPU to accelerators. Intel, of course one of the largest semiconductor companies in the world, has to have a story right now around AI and how this is happening in the migration. How does this narrative kind of fit the intel story right now?
Chris Tobias: It fits great. It’s exactly how we’re investing. We’ve got a two part investment. One part is on discrete accelerators. These are parts like our Gaudi two and our Max series GPUs, which are discrete accelerators to focus on AI. Then we’ve been making generation over generation, we’ve been adding AI accelerators directly into the Xeon chip for each generation.
Patrick Moorhead: A lot of people don’t know, and again, I can just get back to stats and facts. More AI is done on a CPU than any other piece of silicon that’s out there. A lot of it has to do with the flexibility. A lot of it also has to do with those resources aren’t always moving, but there’s also certain workloads that need lower latency and a certain level. Is this the main reason that Intel puts AI accelerators into its Xeon processors?
Chris Tobias: Yeah, that’s a great question. The answer is yes, because if you look at how did VMware and Intel start working together? Virtualization, it gave flexibility in the workload.
Patrick Moorhead: I remember it was going to take down the entire market.
Chris Tobias: Exactly.
Patrick Moorhead: Then the market just went crazy.
Chris Tobias: Exactly. It created a virtuous cycle of growth because you had all this flexible compute. That’s the idea behind putting the accelerators readily accessible inside the virtual machine is you could flex between any workload. You don’t have any stranded resources by having purely dedicated resources that are focused on a certain area.
Daniel Newman: The whole idea of AI and the accelerator tends to be about matching workloads to silicon. As part of your strategy, Chris, I’d just love to hear how you tell that story. Which workloads is Intel focusing on as it pertains to the opportunities for acceleration?
Chris Tobias: Yeah, so when you look at some of these giant multimodal workloads, like what ChatGPT was created around with rumor is pushing a trillion of parameters, those require dedicated GPUs to go crunch through that kind of work. Very expensive, very time-consuming, a lot of power involved. When you’re creating that kind of model, great, you need that kind of workhorse for it. Now let’s move over to corporations. If you look at actual corporate database size, most are under a terabyte. Most are actually about 500 gigabytes. Not that big. It’s not multimodal. You don’t have a bunch of 8K video you’re trying to correlate with last quarter sales, SKUs, et cetera. Putting that type of capability where the corporate data is just makes a lot of sense.
Patrick Moorhead: Yeah, it totally does. We’ve seen recommendation engines that don’t require a discrete GPU, even for inference, be very capable of doing this. I mean, there’s a reason when you go on your, pick your favorite e-commerce site, they’ve been doing AI on the CPU for pretty much forever, since day one. Technologically I’ve been tracking, gosh, Intel has been doing accelerator since the 486 DX, I’ll call the math coprocessor an accelerator. Saw MMX, SSE, flavors of that, AVX and then multiple variants of that. Can you talk about which accelerators you have in there today?
Chris Tobias: Yeah, so recently in our last generation we had deal boost which helped with base technologies including AVX-512, VNNI, and those were AI accelerators. When we put in this fourth generation Xeon scalable is something called advanced matrix extensions, AMX. It has a matrix multiply function with big cache in front of it, and that’s exactly what’s required for generative AI and it’s perfect for generative AI inferencing. It’s even powerful enough to fine tune models. You could have up to a 10 billion parameter model and easily fine tune it on a Xeon cluster.
Patrick Moorhead: Well, that is really the key when it comes to corporations in particular. They’re not going to be training a 70 billion parameter model themselves. They might take a Llama or choose anything on Hugging Face or most on Hugging Face, pull that down and mass prompting or grounding, I think the term is, which is to saturate proprietary data off of that model that increases its accuracy, its position and decreases the amount of skew that it has or when it makes stuff up essentially. Is that what we’re talking about here?
Chris Tobias: Yeah, absolutely. If you go take, let’s say Llama 2, we actually have this demo at the show here, a 7 billion parameter model. We’re showing it fine-tuned on Alpaca because that’s a public data set around the finance one. Imagine you’re a financial company, you don’t want to put your data into the open source ecosystem. It’s highly protected. You want to keep your IP, it’s regulated, and we ran it on a four node Xeon cluster, fine-tuned Llama 2 to focus on finance. It would have things like IRR or NPV. A future or a forward means something different in finance than just ChatGPT might initially spit out. We did this on the 7 billion parameter with four Xeon four node cluster in three and a half hours. You could easily just flex your compute to that and that’s the fine-tuning, and then running inference, you could just easily run the inference and not strand resources using those capabilities.
Patrick Moorhead: That’s probably the best example aside from maybe a certain SaaS company that I’ve heard of using private data in a very narrow space do that. By the way, this is the first time. Even analysts learn. This is the first time that I knew you could do that big of a model on Xeon. It’s impressive.
Daniel Newman: Yeah, I think the interesting thing is it’ll keep circling and we’ll hear about big GPUs, big GPUs, big GPUs. It’s in vogue, but the truth is, like we found so many times in history, the workloads will determine. Meaning in this early phase where, Chris, we’re trying to train these massive models, there is a need and there is a capacity. The runway for that’s going to be, it’s a little bit non-determined right now. As we see the accelerators, CPUs, in some cases FPGAs, different technologies become applied to certain AI requirements, we’ll find that the mix will change and the AI will revert.
I mean, I remember watching some of the stuff DL Boost did on SAP years ago and you were going… You have what you need right now for many of these kind of acceleration requirements. I think a lot of people, they want it to be binary, Chris, they want it to be GPU or CPU. I think even other companies have shown that their most advanced technology is a GPU/CPU combination. Having said that though, people want the comparison. Talk about when you’re comparing Intel’s acceleration the AMX strategy to the GPU. How do you sort of compare where one makes sense and where the other makes sense?
Chris Tobias: Yeah, so when you start getting in the double-digit above 10 billion parameters, you’re up 50 billion plus, you want to look at a dedicated GPU. Especially generative AI is the big thing in vogue right now. This is where our Gaudi 2 is very relevant. To do the initial training on those models, absolutely you need dedicated GPU resources, especially when you get multimodal data, you’re trying to correlate pictures to a large language model or video. We’ve all seen, you enter in the verbal prompts and you can go create pictures. That kind of stuff takes pretty heavy compute resources to create the initial model, but fine-tuning in a lot of cases, you see the number of parameters shrinks very rapidly once you know and target the use case. A lot of enterprises, when they enter that segment, it starts to become data security, governance, all those kinds of things. Their market segment has many, much more focused requirements.
Patrick Moorhead: I think that’s a really good way to look at it. Let’s shift to software. Over 30 years in this business, I think history is important. History doesn’t always find the future, but you need to really look at what’s changed, and one of the biggest things is software and the importance as it relates to semiconductors. Now, software was always important and I think x86 and the compatibility showed that this was true. In the AI world and acceleration, it is a different game. Now, you still need some sort of ISA to boot the system and run the application, but as we saw with your special instructions, does require a special way to do software. Whether it’s ML, DL, generative AI, it starts with let’s say a framework or a model. Can you talk about how Intel supports software, certain frameworks for AI?
Chris Tobias: Yeah, absolutely. One is we believe in an open ecosystem. We’ve been in compilers and libraries for a long time. As compute has become more heterogeneous, it’s you build out data center, hybrid cloud, multi-cloud, you can have all kinds of compute accelerators in that ecosystem. What we’re trying to build is one set of software tools that will give you best performance for that heterogeneous compute ecosystem, starting at the compiler through the libraries, and then we’ll optimize versions of PyTorch, TensorFlow, et cetera. Then on top of that, we work with the application companies to work through that. It’s the same thing when we contribute to the open source communities all the way through the software stack, up through the application layer. There’s tools and we have a huge investment on software for, okay, I know if you pick this suite of software, everything will run best.
Patrick Moorhead: You talked about frameworks, libraries, compilers. How about these new, it’s funny, large language models are for language. I like to refer to them more as foundational models where it’s text, it’s code, it’s pictures, it’s videos, it’s audio. Is Intel engaged with these large model makers as well?
Chris Tobias: Yeah, absolutely. We work with both several companies that are generating that. The biggest place, the biggest repository of the libraries is Hugging Face. We have a very close relationship with Hugging Face. You can just type Intel Hugging Face and you could see rarely accessible, all the ready to go models to use.
Daniel Newman: Perfect. Good answer. All right, so we got to wrap up here in just a minute, but before we go, any other big VMware Explore focal areas for Intel?
Chris Tobias: Yeah. We’re very excited with our partnership with VMware because we’re just prolific across all the enterprises and it’s really push button access to turn on AI in your existing footprint and we add your fourth gen Xeon it just becomes all that more powerful. We’ve worked extensively with VMware for that compatibility to make it easy for everyone to use and get the benefits.
Daniel Newman: I just want to thank you so much for joining us here at VMware Explore 2023 in Las Vegas. It’s great to talk to you. AI is red-hot and it’s good to kind of hear this whole continuum from you. Intel has a story and I think it’s really important that the market hears it. I know there’s been a lot of rotation to thinking it’s all about the GPU. Hopefully, here on the Six Five we provide a little bit of clarity to the market that there’s going to be a different subset of semiconductors that are going to be really important for all the different kinds of AI workloads. Chris, you did a really good job breaking it down today.
Chris Tobias: All right, thanks. Happy to be here.
Daniel Newman: All right everybody, hit that subscribe button. We had a number of videos right here at VMware Explore 2023, but of course, subscribe and join and watch all of The Six Five episodes. Patrick and I appreciate your support. For now, we got to say goodbye. We’ll see you all later.