On this episode of The Six Five – Insider Edition, hosts Daniel Newman and Patrick Moorhead welcome Ram Venkatesh, CTO at Cloudera.
Their discussion covers:
- Why adaptability is crucial for cloud deployments
- The data-driven nature of Cloudera’s customer base
- The implications of transitioning to the hybrid multi-cloud
- How data optimization affects customers and their security
- How data storage capabilities can be most efficiently utilized
It’s a stimulating conversation, and one you won’t want to miss.
Be sure to subscribe to The Six Five Webcast so you never miss an episode.
Watch the episode here:
Listen to the episode on your favorite streaming platform:
Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we do not ask that you treat us as such.
Patrick Moorhead: Hi, this is Pat Moorhead and we are here for another Six Five podcast and this is an Insider Edition where we have the most influential executives from the most impactful companies on the show to chat with us. Daniel Newman, how are you my friend? You are an amazing host and co-host.
Daniel Newman: Well, thanks. It’s always good to begin with a little bit of flattery. Excited to be back. The insider editions are some of my favorite, Pat, and it sounds like something I may say all the time, but we’re just really lucky that we get to do what we do. We get to talk to some really thoughtful and intelligent people about solving technological and business challenges. And today, we’re going to do that again.
Patrick Moorhead: I know. I’m super excited. We have Ram from Cloudera, the CTO of Cloudera, nonetheless.
Daniel Newman: The CTO Cloudera.
Patrick Moorhead: Yes. And we are here to talk about our favorite topic, Dan. Man, we just cannot get enough of the hybrid multi-cloud. Ram, welcome to the show.
Ram Venkatesh: Thank you. Thanks Pat. Hello Dan. It’s nice to be back in the show. Yeah, I know usually, you talk to smart people. Today, you got me. So let’s see what we do.
Patrick Moorhead: I don’t know. You’ve got the T in your name, that instantly gets you a lot of credibility out there. So we’re going to have to speak in layperson’s terms here, but we are talking to an audience who cares about data and as you know from working with a lot of customers, the hybrid multi-cloud is on their mind and they’re all trying to figure this out. And I think we’re starting to see the connection between… Well first of all, nobody is questioning that there’s the hybrid cloud. There’s some cloud companies who won’t use the word multi-cloud, but the fact is, that 95% of all Fortune 500 customers are using more than one IS service. So multi-cloud is here, whether we all want to embrace it or not.
Ram Venkatesh: True. What a difference a year makes.
Patrick Moorhead: I know, right?
Daniel Newman: Yeah. It has been a pretty crazy year for that and I think at this point, just about everyone is at least utilizing in some way, in their, I like to call it mark-itecture of how they’re communicating, but I would agree there seems to be this continuum of how much we are admitting. But Pat, you and I have no problem admitting when we get things right and we got this one right. That multi-cloud would be the winning architecture. It is the winning architecture now, the multipart of hybrid because that’s really what it is. Hybrid’s the decision to use prem and public. But then the idea of using prem in multiple public or multiple different privates and publics, that’s like I said, that’s where the continuum comes into play.
And Ram, I just wanted to also point out that you were really gracious when you let me say that you’re still a CTO. For those out there, we have a green room here. Occasionally there’s a little fun thrown around here and Ram threw that out and I said, I would find a way to insert that he’s still the CTO of Cloudera, but let’s kick it off here and let’s talk a little bit about data management, Ram, and let’s talk about it through the customer lens. Cloudera is a company that really does serve a large swath of the world’s largest enterprises and they’re your customers. Talk about how you broadly characterize what makes up this subset of customers for Cloudera.
Ram Venkatesh: Yeah. Thank you. I think it’s a great place to start with our customers, and I think the simplest way to put it is, I think Cloudera has been very fortunate to work with some of the largest enterprises in the world when it comes to data. So these are folks who are well along on their journey to be a data-driven enterprise. They think of themselves as data, as DNA for their very existence. They view data as the data estates that they have are probably some of the most valuable parts of their business. We don’t have to explain the value of data to them. And if you ask any of our customers across the board, they tell you over the next 18 to 24 months, they probably expect to double the amount of data that they have under management. So data’s not going out of style for any of those folks.
Now the other side of it is, they live in the real world, very squarely so. And sometimes, I think even more than vendors and service providers do. So I think that’s where this notion of hybrid multi-cloud for our customer base is just acknowledging the reality of optionality that they need to operate on a daily basis. There’s a lot in there, which is for the vast majority of our customers, they are on premise today and they are in public cloud today. There’s only a small subset who might be exclusively public cloud or exclusively on premise. So that’s where the customer base is at. So for them, this is a matter of how do we do hybrid right, so that we can actually take advantage of the capabilities of the various platforms where they are. That’s what is in their mind. It’s not a question of if, but it’s a question of how do we do this right? How do we get this right so that hybrid actually is a force multiplier for us as opposed to being a drag in terms of complexity and cost and things like that?
Patrick Moorhead: Yeah. I like the way you characterize your customers, in fact, these folks are leaders in a data world and before people were coining data is the new oil, they were actually doing it, right?
Ram Venkatesh: Oh, yeah. Oh, yeah.
Patrick Moorhead: Yeah. I jumped on the bandwagon. And y’all have just petabytes of data under management when a lot of folks are talking about it. So we’ve been slinging around the word hybrid and multi-cloud. I like to say hybrid multi-cloud. We all have a different way of saying it. We’re almost too, in the standard way of saying it, but let’s talk about the benefit. As analysts, we’d love to talk about the buzzwords, but how are your customers articulating the benefit of the hybrid multi-cloud? Or what are they asking you for related to that? Or what are their expected benefits they want to get from that?
Ram Venkatesh: Yeah. I think one small correction, I think petabytes are cool, but exabytes are where it’s at.
Patrick Moorhead: Yeah. Just another reason we have you here, Ram. Set us straight on the size of this opportunity here.
Ram Venkatesh: Oh, believe me, we ourselves, constantly, when we see customers talk about the amount of data that they have and how much they expect things to grow, I’m amazed. Having started this journey in data 30 years ago, I think it’s just amazing to see how this continues to grow and this appetite for more volume, more velocity, more kinds of data. There’s just no end in sight to that. And that’s great for us as an industry.
So if you think of what customers are seeking from hybrid, I think a key part of it is public cloud’s really important in their strategy today, right? Because public cloud is where I think, experimentation happens with data. One of the interesting things about building out data use cases for our customers, unlike building conventional software, is there is that experimentation piece to it, right? This is the science in data, is that you got to set up a hypothesis. You bring the data sets together. Sometimes it may not work out and sometimes it works out really well, and then you got to scale this and deploy it out in the next three weeks.
So in that mindset of being able to experiment with confidence, public cloud is a key part of where they want to do that today, because especially given the supply chain concerns around the pandemic, it was really hard to get hardware for a while, especially if you’re not even sure if the use case is going to go forward, try justifying that to your business. So I think if you want to experiment and innovate and think about bringing new use cases to production quickly, I think public cloud is a big component of that strategy. But they also realize that unit economics are crucial for them. So if they do instrument a use case, they want to make sure that they can operate it with the budgetary constraints that they live under, especially now, given the economic uncertainty, the budget’s tight and so people really want to make sure that they’re getting the most value for the amount of money that they’re spending.
So if you have a use case that has a particular consumption footprint to it, you want to run it… If it makes sense to run in the public cloud, that’s where you want to be. But if it really makes sense for that to be run and operationalized on premise, they want the optionality to go do that. So I think this is the lens through which customers see hybrid is it has to be optimized for both of these environments. It’s definitely not a least common denominator. This is why I joke that the vendor view of hybrid, regardless of who you talk to, it’s our stuff everywhere. That’s really not what customers are asking for. What customers are asking for is they want to be able to operate efficiently where they choose to operate.
Patrick Moorhead: And the great news about all this is we’re finally getting to, or I think 10 years ago, we all knew where we were going to end up, but maybe we’re afraid to bring it up. I mean, listen, I used to work for a company that was a supplier to the largest public cloud out there, so I’m not a cloud denier, a public cloud denier. But I also know that what we’ve seen over the last 50 years of IT is, is that people want choice. And also you have to abide by the law of physics and understand that moving data, big amounts of data, not just petabytes, but exabytes that costs money. And one of the bigger talked-about things in my circles is the cost of egress and the amount of moving data. And most of the data is created at the edge and the cost to even move that data is pretty excessive.
And I think what we’ve seen in the history of IT is typically you have the data and the processing of that data in a similar place. So data has gravity and it’s finding that you better have a good reason to ship it to another place, but also the price performance of compute on the edge is getting so much better and moving over time. And then when you add on top the as a service capabilities that some of the on-prem vendors have added that give you scalability and they’re only charging for what you use, you kind of have your cake and eat it too. We’re not all the way there, Ram, but it’s like, man, we are getting so close. And with packages like CDP that you’re bringing out, that I know we’re going to talk about later, you guys are making it a reality too.
Ram Venkatesh: Absolutely. So I think that unit economics matter. You talked a little bit about data as the new oil. So many of these customers, they realize that, just like oil, the threat of a spill or a leak or a breach, that’s top of mind for them. So one of the pieces we have not spoken about yet is there’s all these top line implications of data, but you want to be able to be confident that wherever you are, your data is secured properly, it’s governed properly. Many of them are under tremendous regulatory scrutiny. There’s new data localization, data sovereignty rules all over the world. And to be able to operate confidently in this kind of a global posture, I think that’s kind of what enterprises are looking for us for help on. It’s like how do we do this consistently where we worry about being able to say yes to the business, but we can do this in a way where we’re not taking on additional risk.
Daniel Newman: Yeah. Ram, I’m glad you brought some of that to the forefront because something that was crossing my mind as you were discussing that was sort of the… I know kind of ESG has been a little bit of a topic that’s gotten some… It gets a little bit of criticism when it’s not done in a way that’s measurable, especially in what I would call a tougher macro. But there are these kind of three important considerations when it comes to your data placement and your architecture that I like to talk about, and that’s compliance, sustainability and security. And so right now, every company has a fiduciary and has a long-term responsibility to its shareholders and its stakeholders to be A, making sure the data’s safe and that it meets all of the regulatory and compliance. B, is that you are in compliance with data and meeting rules and localization and laws. And then like I said, the third part of it is, and this is a little bit more still in debate, but is, how do you build an architecture that is best for your sustainability goals and what your shareholders want?
And so that’s got to be a part of the conversation. And I’ll push back and say, a good architecture also makes these things measure, something like sustainability, because, I, for one, believe that the world doesn’t only need to hear about these targets that companies have, but the world wants to see it metrically delivered through analytics and data, which architecture matters. So I kind of maybe hinted directionally towards some use cases, but I also kind of wanted to pivot the conversation there a little bit away from hybrid cloud architecture and a little bit more towards what Cloudera is focused on, and that’s what hybrid cloud architectures enable. And that’s sort of a hybrid data analytics environment that you basically, all the data in all different formats in any location as simply as possible becomes accessible for utilization across your organization. So you as a CTO, you in front of all the customers, talk about some of the most important use cases for hybrid data analytics. And if you don’t mind, if you’re willing, Ram, if you can share any specific customers, that would be super interesting to us as well.
Ram Venkatesh: Happy to. So in fact, I think let’s combine the sustainability comment, and we know I can talk about, this was a very interesting learning for me as well. I’m constantly seeing how the corporate level desires or goals around sustainability, how do they translate in action, especially in a hybrid context, and what are customers doing with Cloudera that helps them achieve their sustainability goals.
So just recently this was one of the largest credit card processing companies in the US. So they realized that when it comes to storage, there’s data that’s pretty hot and that’s kind of used all the time, then there’s data that is infrequently accessed. The need to, for regulatory compliance purposes, for example, they may have to hold onto it for three years, seven years, 11 years in some cases based on the kind of sensitivity of the data that we’re talking about, right? So for them, increasing the storage density of how they actually capture all this information would lead them to measurably achieve 40% of their carbon footprint goals for their organization, for that division could be done just by focusing around storage density.
So one of the cool things about our hybrid architecture, at least the way Cloudera realizes it is that, the separation between storage and compute, we want this to be consistent everywhere. So Apache Ozone is our community-based open-source answer for dense storage. So increasing the storage density helped this particular customer capture information at the volume that they needed, but also with the appropriate carbon footprint that they were looking for.
And on a similar side of the story actually, even on the compute utilization, is that again, for many of our customers, one of the important points of consideration, this comes up very commonly with our eCommerce customers, our retail customers, increasingly our financial services customers, is that there is a seasonality to the business. So the eight weeks between Halloween and Thanksgiving and Christmas, their business looks very different during that part of the year than it does the rest of the year. And so your dilemma always is: do I size for peak or do I size for steady state?
If you size for steady state, then you might be exposed over the holidays, and that’s the worst time of the year for you to be under duress when it comes to being able to service demand. Whereas having a bunch of capacity doesn’t do anything for your bottom line the rest of the year. So one of the things that they’re doing with hybrid is that they’re looking at their seasonal patterns and what they want to do, put simply, is they want to run their always-on, steady-state workloads with very predictable unit economics on premise, and they want to leverage the cloud to be able to actually take some of these use cases that require additional compute and take advantage of the cloud’s infinite scalability and elasticity to do this in a cost-efficient way. So this is one of-
Patrick Moorhead: Can I ask a quick question on that?
Ram Venkatesh: Of course.
Patrick Moorhead: Is that running the same application on different clouds and doing some load-balancing between them? Is that how that works?
Ram Venkatesh: Yeah, so what’s fundamental to this is they want to develop this application once, and they want to be able to deploy this application… It’s a deployment-type choice as opposed to development-type choice, and physics is an important consideration. To your point, what data sets that this use case actually need for it to be able to run in a different context? Instead of running on premise, if it’s running in cloud.
Usually, that direction, you have one set of economic considerations. In the other direction, you have to think about cloud egress costs also. It’s the same application, it’s the same APIs, it’s the same services. Users don’t have to be retrained. So if you think of it, there’s above-the-waterline considerations for portability and then there are below-the-waterline operational considerations to make sure that all the dependent data-sets are available everywhere. Even something as simple as you want the same user. When the user comes in, whether they’re coming in through the public cloud or whether they’re accessing an on-premise system. Dan has got to be the same in both sides so that then Dan’s security permissions are the same in both places.
So there’s a lot of very, I would say, fairly fundamental concerns around security and networking and storage. The good news is, I think, that below the waterline, as an industry, we have made a tremendous amount of progress. So things like identity federation and SAML, probably 98% of our customer base at this point, they are ready to do that. From a bandwidth-capacity standpoint, they have the relationships with the hyperscalers so that they have enough sufficient network bandwidth between their on-premise deployments and their cloud deployments.
This still requires consideration and thought and planning and all of that, but the basic capabilities are there as table-stakes. So then what they now want is they want their use cases to be portable. They want their applications to not have to be rewritten when they migrate between the different environments, and that is what Cloudera can help them with that portable architecture.
Daniel Newman: So real quick follow up, just something I get asked a lot, and I’m really interested in how you answer this question, is: versions of this conversation are happening with a lot of vendors. And Pat likes to tell everybody their cloud sucks, but the truth is that…your cloud is complicated and your data is dispersed and it’s in many versions and iterations. And Cloudera, you can tell your customers that you have a vehicle or a modality to make it easier and faster for them to get to that state, because ultimately, like you said, it’s the right services. It’s the comfort and familiarity between the IT and the data leaders. But ultimately, it’s that I just need to make sure that when I’m querying my analytics systems, that I’m able to get to the right data to get the right answer in as low-latency time as possible. How do you speed it up? How does Cloudera speed up? What’s your secret sauce? Is there anything you could say that you’re uniquely doing, if you got asked that question, that makes it easier to be able to get to this state?
Ram Venkatesh: Yeah. So focusing on being able to have customers get the velocity they need for actually deploying their use cases, this is very fundamental to us. I think over the years, we have learned there’s at least three ways we could help with this problem.
The first one is if we can support a whole bunch of different tools that people use to access the data, then we are not in the business of retraining them. We don’t want customers to say, “Oh, to use Cloudera, I need to learn a particular user-facing tool,” because the barrier to retraining them is so huge. If you want to add thousands of users to a platform, the best way to do it is we support… You come in through Tableau, you come in through Power BI, you can come through Jupyter Notebooks, you can come in through R-Studio, you’ll come in three more tools that we don’t know anything about today, and that’s okay.
We expect that self-service analytics really means meeting the users where they want to be. So if we can do that… And they can use the same tools, whether they’re running on premise, whether they’re running on Google Cloud or AWS or Azure, it doesn’t matter. That’s a critical part, I think, to make sure that the user base is confident that they have the experience that they need.
Similarly, I think that the second piece is: anything that touches the actual customer in terms of an API or a file format or an engine, we want to make sure that it’s based around an open standard with the community. So this, again, for us, the ethos here is that we want to build things that you can plug other things into. For us, a realization over the last eight or nine years is that data is a team sport, not just inside a company, but increasingly across companies. So in an enterprise deployment, you’re going to have us, you’re going to have the hyperscalers, you’re going to have some SaaS providers. So being able to actually bring in data from large different places, and to be able to work with systems both as sources and destinations in a much more open way.
The third piece is interoperability. So we think that these three, for us, these are core to how we think about data architecture, and building around them lets our customers then get agility by not having to retrain their users, by not having to redo their data formats, and by working with the tools that they care about: systems upstream as well as downstream.
Patrick Moorhead: Yeah. So I think we’re probably in the mutual-agreement club here. I like to talk about this in terms of fabrics. As crazy about this in terms of fabrics. As crazy analysts, we have to come up with something. But to enable hybrid multi-cloud across private cloud, a company you just acquired and their cloud, primary public cloud, secondary public cloud, special function cloud, edge cloud, sovereign cloud, you really need what I like to call fabrics. There seems to be an application fabric that’s emerging if you standardize on a certain Kubernetes stack.
You talked about SAML, which is, for lack of better term, a security fabric that can be used across multiple clouds. It seems like Cloudera is the data fabric that goes across the hybrid and public cloud, but can you talk in specifics about which products that you have are most appropriate for the hybrid multi-cloud?
Ram Venkatesh: Yeah, so I think that for me, fabric implies consistency. Being able to view these things with a level of harmony, or consistency or sameness, if you will, across on-premise and cloud. So some of the key… And these are all not coincidentally open source projects that we made deep investments in. I think that Apache Iceberg is a core piece of the fabric because that’s the piece that actually lets us bring unstructured data and structured analytics together. It lets us bring multiple engines in a consistent way. Iceberg is the same wherever you’re running it. That’s really key – Apache Ranger and Apache Atlas. Because the way you think about…For us, the cornerstone of having the same user, that’s just a start. Then we want the same policies. Again, just pick on that. If Dan is not allowed to see PII data somewhere, he shouldn’t be able to see it in a dataset that’s going to be replicated somewhere else. So wherever the data is, we want the same consistent application of security policies and so Ranger is sort of key to that.
I think that increasingly as you have these data diaspora, you have data in all these different parts of your enterprise, being able to know where things came from and where they’re going is really important. So I think Apache Atlas for us is key to the fabric is that it helps you. It’s traffic cop essentially, right? It lets you say at any given point in time, if you had to go tell a regulator, this is why my AI model spit out this result, you can go back all the way to say and here’s where this host data came from. I think these three are pretty key.
The fourth one that we are adding to the mix here is Apache Ozone. You’re saying that we like the object store model. We like this disaggregated model between storage and compute, and so having that scalable object store on premise lets us support the same interaction model, whether you’re running on S3 or maybe LS Gen 2 or GCS or now Ozone. So I think these are sort of the four key things that I think enable us to have this consistent experience and hence, the fabric. Does that make sense?
Daniel Newman: Oh yeah. You hit it home and hopefully we set you up to do it. Ram, frankly, first of all, thank you for letting me go a little bit off the deep end there and challenging you on the spot. This pod is all about bringing in what’s…You know, we want these data leaders that are listening to us to basically be able to walk away and say, I learned something here.
Ram Venkatesh: Exactly.
Daniel Newman: Like I said, is I’m out there and Patrick, myself, we talk to both the entire ecosystem that you work with, but we also talk too in large scale those that are looking to adopt these solutions. What I often think is missed is you’ll hear things that are very anecdotal, like the data is growing at exponential rates. It’s like yeah, yeah, I think we’ve all agreed that’s happening. But then what you don’t hear is what’s the sauce? What is the magic that really enables companies to get the most out of their data? It goes back to an old saying, we don’t invest in technology to solve technology problems. We invest in technology to solve business problems. Data solves business problems and you today have been a great add to the show. So Ram, thank you so much. Can’t wait to have you back on the Six Five again sometime soon.
Ram Venkatesh: Absolutely. Thank you for having me on the show. Appreciate it.
Daniel Newman: All right everyone, there you have it for this insider edition. We really appreciate you tuning in. If you like what you heard, hit that subscribe button. You can join us on all the different podcast channels as well as across social media. Send all your positive commentary to me and any complaints you have to Patrick Moorhead, he’s all over Twitter prolifically. We appreciate you tuning in though but for this show, we got to say goodbye. We’ll see you later.