The Uniqueness of Intel’s Developer Cloud – Six Five On the Road

By Patrick Moorhead - September 21, 2023

On this episode of The Six Five – On The Road, sponsored by Intel, hosts Daniel Newman and Patrick Moorhead welcome Intel’s Markus Flier, Corporate VP, Intel Developer Cloud, SiteMana Co-founder Peter Ma, and Prediction Guard Founder Dan Whitenack, for a conversation on what makes Intel Developer Cloud unique during Intel Innovation in San Jose, California.

Their discussion covers:

  • An overview of the recently launched Intel Developer Cloud and the implications for developers
  • How developers are harnessing a spectrum of Intel technologies via the Intel Developer Cloud
  • Real-world applications: SiteMana and Prediction Guard’s utilization of the evolving Intel Developer Cloud
  • Measurable outcomes and onboarding guidance for developers engaging with Intel Developer Cloud

Be sure to subscribe to The Six Five Webcast, so you never miss an episode.

Watch the video here:

Or Listen to the full audio here:

Disclaimer: The Six Five webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript:

Patrick Moorhead: Hi. And we are back at Intel Innovation 2023 and The Six Five is on the road. We have an incredible amount of coverage. We’re covering new technologies all the way from the client to the Hyperscaler Data Center, and pretty much everything in between. AI has been a huge focus here. But, Dan, this is not the Vision conference, this is Innovation conference, which is all about developers, and it’s excited to see all this developer content.

Daniel Newman: Yeah, it’s been a good couple of days here, Pat. The Innovation Conference brings together the technology arc with the developer community. And as we know, none of this technology is going to run without the developers building the software that sits on top of the silicon. It’s a very important ecosystem. And although we like to sometimes wait where the importance lies, this is really the amalgamation of all those important factors that come together to drive an AI future.

And so it’s been a lot of fun to kind of hear the roadmap, see Pat doing his pushups, but also pushing the envelope on getting these five process nodes out in four years and really building an AI future and a story for Intel in the AI realm.

Patrick Moorhead: It’s funny, a lot of people like to see these big differences between silicon and CRM. But one thing that all developers, what’s important to them is simplicity, accessibility, and everybody’s trying to shave time off, getting their products to market. It just makes sense. And one of the biggest announcements was that the Intel Developer Cloud went GA, multiple types of silicon, multiple types of tools and all the way from free versions to paid versions.

And we always like to say that partners and customers are the grand purifier, and we not only have head of Intel Developer Cloud, but two partners here. Markus, Peter, Daniel, welcome to The Six Five first time. Thank you for coming on the show and congratulations on the big announcements.

Dan Whitenack: Thanks.

Markus Flierl: Thank you so much.

Daniel Newman: Yeah, we do like to get that kind ground truth. So we’ll start with Markus because I want to hear the Intel point of view.

Patrick Moorhead: Then compare notes.

Daniel Newman: But then we’ll have Markus put on the earmuffs and then we’re going to ask. But in serious, Markus, so everybody’s very interested in what Intel is doing with the Intel Developer Cloud. Give us that high-level, what’s the sauce, what’s the excitement? Why is everybody still focused on this announcement?

Markus Flierl: Yeah, great question. Yeah, so I think the important one is we want to really cater to the developers top to bottom. We are making major investments in all parts of the stack, we are building out fabs, we are investing process technologies, but very aggressively also investing in the software. And the best way to enable developers, it’s the 21st century, is obviously to provide this as a cloud service. And that’s what we’ve been focused on and we’ve seen two examples of people who are able to take advantage of that very quickly.

A lot of people are just in the cloud-native, and they want access to the latest and greatest. And the only way they can get access to that is as a cloud service. And that’s what my team and I provide as a cloud service, which means that developers can come in and they can come in at different layers of the stack.

Somebody wants to go down, they want to go down to the bare-metal layer, they want to directly optimize things at the very low levels. Or other people come in, as you said, the CRM, not quite the CRM level, but at the MLOps higher level, and they just want to bring in the containers and they just want to run their workload. And then, of course, everything else in between. That’s what we’re really targeting with the developer cloud is make it as easy.

Roll out the red carpet for the developers, have them come in, bring in their workloads, get the job done, and then they also have, as you mentioned, the paid option. And so they can actually then run some of the production also in that environment. But the first and foremost important one is that they can test out and take advantage of the latest and greatest technology. And especially in the context of AI, it’s not just about just putting in an individual box and that’s it, I’m done. In a lot of cases, it means they need larger clusters.

In some cases, they need hundreds of Gaudi cards or GPU Max cards in the same cluster with high-end storage, high-end networking. We can provide them everything from just low-end VM, all the way to these large clusters all in one place. And I think that’s where we can help speed things up because that means that if they want go to a major CSP later on, if they want to go build stuff on-prem, they can really test drive things in developer cloud.

And not just the products that are out of the market today, they already have Emerald Rapids, Fifth Generations Aeon already in the developer cloud today. So if developers come in, they can test drive, there’s restrictions we have in limited quantities, but they can come in and they can test drive that next-generation hardware with the latest software. They can bring in their software, they can optimize it. By the time it is available at the major CSPs, they can instantly hit the ground running.

So it’s a great way for us to expose our technology early on and then get the feedback from the developers, hear what’s working well, what’s not working well. We expect the performance should be this, they’re only getting 90% of the performance. Where is that performance data they’re coming from? Maybe we have to go back, fix our part of the software, we can help them fix things on their end. It’s a great way for us to collaborate with the ecosystem.

Patrick Moorhead: Got a big smile on my face when I saw the slide, the one slide, everything’s on it and it shows CPUs, GPUs, FPGAs, Gaudi, and like you said, not only today but also tomorrow. It’s funny people talk about the circle of life. It’s the circle of silicon. On one end, ultimate programmability. And the other end, you have ultimate efficiency and then everything in between. I thought that was really… It’s very unique out there.

So like every great developer tool, every piece of software that has ever been on the planet, multi-stage step to get it out, you announced a beta previously and also at the show, the big news was it was going GA. What were some of the learnings that you made or that you saw in between beta and going GA? Or what optimizations did you make aside from what you would expect, like bugs? Bug fixes is always there, but any major things that changed?

Markus Flierl: At the most basic level, when we first launched last year, we had a huge supply chain problem and all we had was one gig switches. That’s how we strung things together. Since then, we’ve dramatically upgraded. We have 100 gig switches all across the board. The other major shifted focus has been on AI, on generative AI specifically. And that’s where we’ve dramatically increased the number of accelerators that we put in the developer cloud, because where we see a huge amount of demand.

And then the other thing we’re seeing is that as we work with these communities, we are making changes all across the stack and improving things. The other thing we’re seeing is that the workloads are changing so quickly, so that really also, the cluster size that we’re building out are much larger than what we initially anticipated. The workloads are changing so quickly. That’s probably the biggest change I think that we are learning from. It’s hard to predict what’s happening three months from now.

Patrick Moorhead: How often you do code drops and code changes?

Markus Flierl: It’s continuous. It’s a cloud service. So we’re constantly updating firmware. We have a full validation pipeline, obviously, where we have synthetic workloads running. We make sure that things work well. And then we have a phase deployment out into the cloud, but the changes are flowing out continuously. Because as you can imagine, it’s the cloud service, so there’s many, many different pieces, everything from firmware updates, to OS updates, to network OS changing, storage versions changing. It’s a living organism.

Daniel Newman: A quick adder though to the going GA, are you seeing any material changes in your developer community itself or the developers evolving year to year as you’re seeing this product come to life?

Markus Flierl: Very much so. Very much so. When we first started, it was just a bare-metal, mostly it was the focus. And what we’re seeing now is we’re seeing more and more people coming in at high levels of the stack, and that’s where we’re just going to continue building out higher and higher levels of abstractions. We are going to be introducing a Kubernetes service as well in the next couple of months as well as additional MLOps services.

So there’s some people really care about the bare-metal, but a lot of people just want to go and get the jobs done. And so that’s where we’re rapidly moving up the stack and providing high-level services for people. So it’s more of a turnkey solution. I’m not the OS expert, I want to come in and test something. I don’t want to bring in my containers, I just want to bring in my models and I just want to run things.

One of the other things we’re doing, for instance, for large language models, we just gave a demo earlier today in the demo grounds where we are also providing a large language model builder environment, meaning that I’m not an expert in building something, but I have a nice UI fully abstracted and I can go build stuff on top of it. So that’s the direction that we are going.

Daniel Newman: Cool. We should give Markus a break.

Patrick Moorhead: No, I think we should. We’ve heard all about the Intel Developer Cloud from Intel. Now let’s-

Markus Flierl: Let’s crosscheck.

Patrick Moorhead: … let’s crosscheck these notes here. Maybe, Peter, start with you with SiteMata, what are you doing in the Intel Developer Cloud?

Peter Ma: So Sitemata, we essentially do AI prediction and then we do the email generation. We basically use the Intel Developer Cloud to actually generate those emails. One of the things we talked earlier was like, I’m using one of those 4X, the A1100. So it was like I’m not using the most expensive one because I have to look at long-term unit economic that actually makes sense for me.

Right now, you kind of see emails cranking out every 20 seconds or so, that’s okay because these are not mission-critical. User does not have to interact with it. This is where you can see the GenAI working in the background where the machine actually automates. Rather than me going in there like, “Hey, I like how you interact with most of the LM these days.” Overall experience, that’s been pretty great.

The most important part, again, unit economics for us. Originally, you have these cloud credits. And then when you burn them up, you buy your GPUs, you run these in your home. And then you essentially use your production service, hitting your home server, generating the stuff and then hitting it back because then you’re like at some point, “Oh, man. I don’t want run this stuff in my house.” You’re like, “How do I get a data server running?” And then you’re like, “Okay, these GPUs off the market, you have specific licenses, you’re not allowed to put them in the server into data center.

You’re like, “Okay, am I really breaking the license? Because I’m not really reselling the GPU, I’m just using some of my own.” And this is why you need a service, which is commercially… You know when people order these really expensive GPU servers, I’m always wondering like, “How are you affording this? How are you…” Because I have to make sure that makes sense for me first because every penny I have to pay for that, that’s one penny out of my pocket. And then Intel Developer Cloud does produce… It hits that pretty good balance in terms of affordability and the profitability.

Patrick Moorhead: That sounds good to me. And it sounds like flexibility when it comes to licensing and all the licenses of what you might use instead.

Peter Ma: Yep.

Patrick Moorhead: That’s good.

Daniel Newman: Scale is important. And I like that he used the word union economics. When you’re building it in startup mode, you have to be thinking about every expense and every dollar is ideally going towards that customer acquisition.

Patrick Moorhead: We know something about that. I don’t know, our tiny little companies, Dan.

Dan Whitenack: We’ve done it.

Patrick Moorhead: Done this before. Yeah.

Daniel Newman: We’ve done it. We’ve built it. We’ve helped others. Yep, for sure. Dan, guy with a great name. So with Prediction Guard, we’d like to hear similarly your experience working with the Intel Developer Cloud. Talk a little bit about how you arrived and how that’s proliferating for you.

Dan Whitenack: So Prediction Guard provides safe and trustworthy access to LLM models, which means that we host our own LLMs custom inference and decoding out of those LLMs to control things for things like structure and factuality, checking of the output toxicity, checking of the output. Which means that we need to run a lot of LLMs in particular like the latest LAMA twos models and MPT Falcon, these private open models that people want to use for real enterprise use cases where they have concerns maybe about privacy or other things of using just a closed API service for their AI interactions.

Patrick Moorhead: By the way, every SaaS conference we’ve been to, whether it’s consumer or enterprise, has the magic layer. And sounds like you’ve got the magic layer that turns a homogeneous experience into one that’s safer and a little bit more predictable.

Dan Whitenack: Correct. Yeah, yeah. So it’s that stage between like, “Oh, I’ve prototyped something cool with Open AI,” to like, “How do I scale this out in my business?” That’s right where we live. So in terms of Intel Developer Cloud, we need to host these models. We also need to do this in a very efficient way because a lot of our users are really enterprise use cases where, for example, they’re parsing 7,000 medical transcriptions per day to fill out medical forms.

Or they’re processing 1 million patent applications to generate content for lawyers or something like that. So you can’t do that with these per-token usage models that are out there. So we need to host our own models, and provide that in a way that our customers can access them and put a volume of calls into those. So what we’ve done is actually, we’ve used the 8x Gaudi2 instance in Intel Developer Cloud. And currently, we have seven of our LLM models that we support, hosted on Gaudi2. So mainly Llama 2 models, but also other code generation models like Wizard Coder.

And we’re actually running those in production in our product. So we’ve got live customers hitting the models in Intel Developer Cloud on Gaudi2 to do these safe and reliable interactions with LLMs. And what we found is we had our models running on cloud A100s. Not only is there a shortage of those and various problems with that, but they are expensive, especially if you want to keep them up all the time.

So for us, it was really a game changer to be able to move these models over, host them, have them up all the time, and with really great tooling from Hugging Face and Intel working together on the Optimum Library, and other things with Habana Labs, we were able to port over our model servers. We had them up and running in an hour after getting access to-

Patrick Moorhead: An hour?

Dan Whitenack: Yeah.

Patrick Moorhead: My follow-up is going to be how long and how hard. That blows me away. One hour.

Dan Whitenack: Yeah. And not only in that hour, but we saw at least some of the models are slightly faster, but some of them, like the Llama 2 models we’re getting twice the throughput out of the Gaudi2 is as compared to what we were running in the A100s.

Daniel Newman: I think that’s the highlight real moment.

Patrick Moorhead: No, I know that’s the super clip right there. No, listen-

Markus Flierl: We should expect that. Just to clarify. That’s what we’re seeing. There’s a bunch of Hugging Face blogs also. And one of the most recent blogs that are talking about how we are beating the H100 also by-

Dan Whitenack: Yeah, yeah, I saw that.

Daniel Newman: I think you and I covered that on one of our Friday shows.

Patrick Moorhead: We did. Yeah, we hit that on The Six Five podcast, Friday. So, Markus, Peter and Dan are obviously in here. And-

Markus Flierl: Delighted to have them.

Patrick Moorhead: Delighted, but for the 5 million other programmers out there that you might want to be attracting, how does somebody get started on this? How do they get started on the Intel Developer Cloud?

Markus Flierl: Very easy. You go to cloud.intel.com. It’s all self-service. You register for the service and off you go. We have the different tiers. You also can apply for cloud credits and you just get started and start using the service.

Patrick Moorhead: Are there any parameters of certain size, or does it have to be a company? Can it be a university or any restrictions on that?

Markus Flierl: We don’t have any restrictions. It’s really open to anybody.

Patrick Moorhead: What’s the advantage for getting in early now that the Intel Developer Cloud is GA?

Markus Flierl: We have an early adopter program and the first 100 people who sign up for it will get super steep discounts. So don’t be late for that.

Patrick Moorhead: Gosh, let’s get the smartphones out, Dan. We’re on this.

Daniel Newman: Already building.

Patrick Moorhead: I love it.

Daniel Newman: Well that’s very exciting, Markus. And we appreciate you giving us a little bit. For everyone out there, listen, there you go. Check out that link in the show notes. Go give it a shot if you’re a developer. But, Markus, appreciate you giving us the background from the Intel perspective. Dan, Peter, thank you both very much for sharing a little bit about your company’s and how you’re using the Intel Developer Cloud. I look forward to having all of these gentlemen back maybe next year.

Patrick Moorhead: Absolutely. And I’m really impressed that there’s actual production workloads being done. I hear developer, I think. It’s for development and then you move it somewhere else. But it’s impressive to think that real live production workloads are-

Daniel Newman: You got to run businesses. Print cash.

Patrick Moorhead: Exactly. And what a benefit of not having to move it.

Peter Ma: Next year, we’re going to basically 100x the revenue per inference.

Patrick Moorhead: Oh, wow. You definitely have to come on next year. This would be great. This is a to-be-continued here, folks. But really appreciate that.

Daniel Newman: Remind me to talk founder shares.

Patrick Moorhead: Exactly.

Daniel Newman: All right. Thanks, everybody, for tuning into this episode of The Six Five at Intel Innovation 2023 in San Jose. That was a great conversation. But for now, for Patrick and myself, we got to say goodbye. But we’ll see you back really soon.

Patrick Moorhead
+ posts

Patrick founded the firm based on his real-world world technology experiences with the understanding of what he wasn’t getting from analysts and consultants. Ten years later, Patrick is ranked #1 among technology industry analysts in terms of “power” (ARInsights)  in “press citations” (Apollo Research). Moorhead is a contributor at Forbes and frequently appears on CNBC. He is a broad-based analyst covering a wide variety of topics including the cloud, enterprise SaaS, collaboration, client computing, and semiconductors. He has 30 years of experience including 15 years of executive experience at high tech companies (NCR, AT&T, Compaq, now HP, and AMD) leading strategy, product management, product marketing, and corporate marketing, including three industry board appointments.