Unlocking On-Prem Data for Next-Wave Gen AI – Six Five On the Road

By Patrick Moorhead - December 12, 2023

On this episode of The Six Five – On The Road, hosts Daniel Newman and Patrick Moorhead welcome Cloudera’s Senior Director of Product Management David Dichmann, during Cloudera Evolve NYC for a conversation on how Cloudera is helping customers unlock their on-prem data and take advantage of the next wave of generative AI.

Their discussion covers:

  • What the next wave of generative AI looks like from Cloudera’s perspective, particularly around Customer Experience
  • The state of perceived risk versus overall value when it comes to generative AI adoption for enterprises today
  • How businesses across industries can take advantage of this next wave of generative AI
  • What Cloudera is doing to help customers get the most out of their on-prem data

Watch the video here:

Or listen to the full audio here:

Disclaimer: The Six Five webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.


Patrick Moorhead: The Six Five is on the road in New York City here for Clouder’s Evolve 2023 event. We are talking some of our favorite topics here. You know, you had to guess it was generative AI. But a lot of discussions here, Dan, I mean, and quite frankly, they’re not necessarily new discussions that to make any type of analytics or machine learning or generative AI work you have to have your data in order.

Daniel Newman: Yeah, you really do. And we love the new tools. Everybody loves the icing, the cherry, it’s the sundae, but you know all that stuff that goes in you know

Patrick Moorhead: Yeah.

Daniel Newman: You need the ice cream you know. You need all the stuff that goes in the middle to make it good.

Patrick Moorhead: I’m starting to get hungry here. This is good.

Daniel Newman: I’m here for you. But it doesn’t happen without all of it. And there’s a lot of that here. You sometimes hear things like picks and axes or picks and shovels. Today’s been a lot about how we get that prepared so that we can build these generative apps that really enable businesses to be more productive, more efficient. And of course, driving futuristic CX.

Patrick Moorhead: One of the bigger conversations we’re having is, well wait a second. If 75 to 90% of the data is still On-prem enterprise data, how are we going to activate generative AI out there? And there’s no better person, I think, than Cloudera’s David Dichmann, great to see you.

David Dichmann: Great to see you guys again.

Patrick Moorhead: Yeah, thanks for coming on the show.

David Dichmann: My pleasure.

Patrick Moorhead: Yeah, no need to say welcome first timer. Gosh, You’ve been on a bunch.

David Dichmann: Yeah, it’s been a few yeah.

Patrick Moorhead: And your episodes always do great. And we’re looking forward to this conversation of essentially activating your On-prem data for generative AI. So thank you.

David Dichmann: No, my pleasure.

Daniel Newman: Yeah. Let’s start there. We love to opine. We like to tell stories and tales about what the future might look like, and sometimes we let our guests do that, don’t we?

Patrick Moorhead: Most of the time we try to

Daniel Newman: We try, we try. So one of my favorite Stephen Covey lines from the Seven Habits was something like “you begin with the end in mind.” So let’s begin, David, with the end in mind here and say, what do you think this wave, this next wave of generative AI is going to look like?

David Dichmann: Sure. So when we think about generative AI, especially when we think where some of our customers have already put generative AI into productive services today, a lot of it’s based on general data that is and general models found in the cloud. A lot of the safe data, the data that pretty much doesn’t expose us or reveal us or put us in any form of risk. And that’s great. You’ve got these conversational chat bots. We’ve got these content generators we’re seeing used in marketing, video creation, photo creation, flyer creation, and even in health sciences, we see a lot of predictive medicine going on. But the next wave of generative AI moves us from this conversational experience to something more of a concierge experience where we reveal our private personal information into the generative AI in exchange for a higher value service. So being able to do things like have a detailed conversation with a particular customer and get them into something like a next best offer.

So here’s something I’ve offered you and you personally would like this offer next, and I might not offer this to anyone else very unique or in the health sciences field where I can take the collective understanding and intelligence of all the medical science and apply it perhaps to the most remote locations and treat that specific injury or that specific medical condition, knowing the current facilities available and the collective consciousness of all the medical science. And in order to do that, we have to give up what I like to call the concierge effect. I have to give up a little bit of my personal identity in a trusted relationship with a service provider to get that high value service back. So next wave generative AI is going to provide very personalized, very tailor fit, high value to the business and to the business’s customers and patients.

Patrick Moorhead: So is this domain specific, at least the POCs that I’m seeing that are at least external, are very domain specific. But it seems to me though that what the future holds is crossing the lines, let’s say between ERP and CX maybe hitting proprietary product data. There’s some companies that even talked about legal data that were internalized. What’s the state of that?

David Dichmann: Well, absolutely. So again, today’s generative AI, we have a lot of guardrails in place because we’re still getting the perceived overall value versus the perceived risk. When the value is greater than the risk, we’ll take the leap. And so today we’re putting those guardrails on to constrain that risk and domain specific is a good way to go. We think of autonomous vehicles. If you’re driving on a closed track, you reduce dramatically the number of parameters that autonomous vehicle has to take into account versus cars, the driverless taxis out on the street. So the more domain specific constraint we can be,

Patrick Moorhead: Right.

David Dichmann: The happier we are,

Patrick Moorhead: Mh-mm.

David Dichmann: But we also lose that precise value, that differentiated value. Imagine being able to deliver your customers a much more tailored fit experience. Amazon’s a great example. When you go to Amazon, you get those predictions, right?

Patrick Moorhead: Sure.

David Dichmann: It’s these people who bought these things, probably like those. But I like to think for a second, how many false positives do you still get on that?

Patrick Moorhead: Right

David Dichmann: Quite a few still. Now that doesn’t mean that they’re not decent recommendations for people like me with a similar shopping habit, but imagine instead of going to Amazon.com, I go to Dave.Amazon.com where only what I want is ever presented to me. It’s already figured out who I am. I’ve given a lot more information about myself to this environment and they’ve learned from everyone else what I would like.

Daniel Newman: Let me double click on that though.

David Dichmann: Yeah.

Daniel Newman: Because it’s a fascinating thought.

David Dichmann: Right

Daniel Newman: But I would also say AI as we know it, the pre-generative AI has actually been doing this to some extent.

David Dichmann: For sure.

Daniel Newman: When you turn on Netflix, you get a pretty good set of things that you’re most likely to want to watch based on behavior.

David Dichmann: Absolutely.

Daniel Newman: When you go to Amazon, you’re getting filtration, recommender engines.

David Dichmann: Mh-mm.

Daniel Newman: They are pretty good. So just do it for the audience.

David Dichmann: Yeah.

Daniel Newman: And I’ll pretend like I already know the answer, but what’s the difference? Meaning with generative, how is this actually making that better? Because I think people have actually come really accustomed to, would you even shop like Twitter? Everything. I expect my feed to be tailored to me.

David Dichmann: Absolutely.

Patrick Moorhead: So what is generative change?

David Dichmann: So it does a couple of things. The first thing it does is, and for analytical AI, a lot of what we’re talking about with our customers were enterprise AI, Being able to make those predictions. We’ve gotten pretty good at that. But imagine that we can take the prediction and go to the next level. So one of our customers is listening to the sounds that turbines make to determine when the pitches change, when a turbine is ready for service. So you get the signal, this one’s ready for service. Now what do I do about that? Now imagine I include my financial data. I include all of the historical records about not just this piece of equipment, but the maintenance records of all the equipment. I even include some material sciences in there to determine the wear and tear on the metals and other materials.

And then I make a recommendation. Given this piece of equipment’s current state and our known idea, we recommend replacement versus repair at this point. And I can go a step further, generative AI can also then order the work team, order the part, prepare the materials, schedule everything to be done. So all you have to do is show up at work and say, I am going to replace your turbine tomorrow because I’ve figured it all out. So we’re adding additional value by putting things like sensitive financial information into the equation, deep histories of unstructured data. And here’s the other piece of the puzzle, that’s gold in those hills that unstructured data.

Patrick Moorhead: Yeah

David Dichmann: But the mining tool to get that out used to be a thousand people taking a thousand years to figure it all out. Looking at x-rays, blueprints, handwritten documents, stuff that’s really hard to decipher, meaning from gen AI can be trained on that. And that becomes yet another input to gen AI as well. So now we have better mining equipment to turn the hardest to get at gold into valuable business data. And that gives us even deeper insight. So what we’re getting today is things that are pretty good, right?

Patrick Moorhead: Mh-mm.

David Dichmann: You’re getting those nice recommendations, but imagine instead of Netflix saying, here’s some movies you might like before you even get home, it’s queued up the movie you would want to watch and you didn’t even know it yet. So being able to be more prescriptive, more recommendations, and even more automatic actions all come from generative AI.

Patrick Moorhead: So we’re progressing as an industry. We’ve gone from data is the new oil to data is the new gold. I caught that. That was good.

David Dichmann: Can we electrify it?

Patrick Moorhead: Exactly. There we go.

David Dichmann: Absolutely.

Patrick Moorhead: And we saw, we talked about some examples about a big retailer, but also, which I think enterprise of no question is, hey, if all this data that I have, I’m a manufacturer, I’m an energy company, I’m a pharmaceutical company and I’m a financial institution,

David Dichmann: Yeah.

Patrick Moorhead: What do I have to do to take advantage of it? I mean, quite frankly, we’re not going to take data that we haven’t uploaded into the cloud in 15 years and auto magically start doing it. Oh, by the way, co-mingling all that data.

David Dichmann: Right.

Patrick Moorhead: It’s just not going to happen.

David Dichmann: Right.

Patrick Moorhead: So how do they take advantage of it?

David Dichmann: So what we’ve seen is, and we’ll use the gold analogy, we think of things like lake house, what a lake house is not just a better way to organize and store structured unstructured data together. That was its original premise, but it’s also a set of services that sit on top of that to do the data preparation, to do the data warehousing analytics, to do data flow and streaming for real-time capture to do ML on that data. And the data lake house have been born in cloud on cloud data. So the problem is that on-premises data, not only is it big and deep histories, we have one company we’re working with that has 150 years of financial records. They have financial records that predate computers. They’ve digitized this, but getting meaning from it is an unstructured data problem.

So the first thing you have to do is bring Lake house technology and apply it on-prem because exactly to your point, I want that trust on that data. I don’t want to whisper my secrets to the public. So I’m going to bring the technology we’re already using in public cloud to mine the best gold out of the data that’s born there and bring it into the enclosure of on-premises and do the mining there as well. But we recognize that that data and the cloud data and the systems and ML that serves them, both the gen AI that’ll serve them both needs to come together. So if we can extend that bubble of trust into a hybrid environment that covers every cloud, we can now have that on-premises data, be mined appropriately to participate and give us that rich personal context what’s in there. That’s my customers, my patients PII, HIPAA data, my trade secrets, my blueprints, stuff that I want to keep safe and secure. I want to use it. I’m scared to use it, but if I can create a trust bubble around all of that, I’ll be glad to use it and get us to those much more high valuable generative AI.

Daniel Newman: It’s sort of a parallel use case to how certain workloads, whether that’s been confidential computing, of course we know nitro enclaves. And what I’m saying is in the gen AI era, there’s a on-prem to cloud sort of trust. And I’ll use Vector as a great example. And also vector databases because it’s put together, creates a lot of opportunity. But there’s this trust that needs to be created and why a lot of things stay on-prem. Why most of the data’s still on-prem. So we’ve spent a lot of time talking about what could happen and how it can work and even why you might want to do it. But let’s talk about one more thing before we let you go. And by the way, thanks so much for joining us.

David Dichmann: My pleasure.

Daniel Newman: How are you talking to customers about getting this in motion? I mean, I think we have a lot of sort of stuckness, is that a word, stuckness in terms of companies are like, yeah, we want to do this and we’re using enterprise search on Google Cloud, or we are playing with Bedrock, but really getting going.

David Dichmann: Well, absolutely, and I think that the analogy used at the very beginning with Stephen Covey saying, start with the end and bring it back to the beginning. If you have a vision in mind and know that getting to high value next wave, generative AI is going to be an iterative process. And we were talking a little bit earlier about the crawl, walk, run and got to scuff your knees a little bit before you can really make it go. But knowing where this is going to apply business value, understanding how this can work when you train it on your most sensitive data to understand that that is going to pay massive dividends, get that value system higher than the perceived risk, we can then encourage customers to take some of those first steps. Like bringing in the chat bot. OCBC, Singapore is using Cloudera already to do things like chat bot to reduce pressure on customer service and support organizations.

They’re using the code generator to improve RND behavior, and they’ve already started using Gen AI specifically for a next best offer program that’s already netted them two times conversion rate on prospects. So they’re already getting small bites that have a big impact on their business. And now the next step is to continue to feed and grow and trust that data to feed and grow these into the next wave, knowing where ultimately that’s going to take you. So start at the end, work to the beginning and take those first baby steps that immediately pay you back as you move towards the future that you want.

Daniel Newman: Well David, I want to thank you so much for joining us here on The Six Five. Always appreciate you coming on. I’m sure this isn’t the last time.

David Dichmann: Absolutely.

Daniel Newman: But best to you. Congratulations on a great Evolve event.

David Dichmann: Thank you.

Daniel Newman: And we’ll talk soon.

David Dichmann: Thank you very much.

Daniel Newman: Thanks. All right, everybody tune into all the episodes of Cloudera Evolve 2023 here in New York City for The Six Five On The Road. Hit that subscribe button, be part of our community. We appreciate it. But for this episode, for Patrick Moorhead and myself, we’re signing off. See you soon.

Patrick Moorhead

Patrick founded the firm based on his real-world world technology experiences with the understanding of what he wasn’t getting from analysts and consultants. Ten years later, Patrick is ranked #1 among technology industry analysts in terms of “power” (ARInsights)  in “press citations” (Apollo Research). Moorhead is a contributor at Forbes and frequently appears on CNBC. He is a broad-based analyst covering a wide variety of topics including the cloud, enterprise SaaS, collaboration, client computing, and semiconductors. He has 30 years of experience including 15 years of executive experience at high tech companies (NCR, AT&T, Compaq, now HP, and AMD) leading strategy, product management, product marketing, and corporate marketing, including three industry board appointments.