Their discussion covers:
- An overview of MongoDB’s long term vision for Atlas, including the latest updates on Atlas Search
- The value developers receive from using Vector Search through MongoDB Atlas Search
- The capabilities Atlas Stream Processing, another big product update from MongoDB
- Sahir shares some use cases for incorporating real-time analytical capabilities into applications and how MongoDB Atlas is addressing this trend
You can watch the full video here:
You can listen to the conversation here:
Disclaimer: The Six Five webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.
Patrick Moorhead: Hi, this is Pat Moorehead and The Six Five is live at MongoDB Local here in New York City, 2023. As you can hear around us, there’s a lot going on, five simultaneous stages going on, developer, developer, developer, amazing product launches, great technology. Daniel, are you having fun?
Daniel Newman: I’m having fun. Anytime you put growth, technology, developers-
Patrick Moorhead: AI, AI, AI.
Daniel Newman: I was going to try to come up with something else.
Patrick Moorhead: All right, okay, but that’s what everybody wants to talk about, Dan.
Daniel Newman: No, it’s great being here in New York City. This is a great venue, it’s got a little bit of that kind of cool, open startup vibe and it’s also creating a lot of sound, a lot of ambiance around us, but truly, you don’t need that because the ambiance, you can see people are excited, people are pumped up. MongoDB had a lot to say today, it had announcements, it had a growth story.
Patrick Moorhead: They did. The big purveyor of all that goodness on stage was of course, Dave, but you as chief product officer, Sahir, welcome to The Six Five.
Sahir Azam: I am happy to be here. Thanks for making the time, guys.
Patrick Moorhead: Absolutely.
Daniel Newman: It is great to have you here. You made a lot of announcements today, some unveils, some-
Patrick Moorhead: Six press releases.
Daniel Newman: Some press. You’re only doing 30 of these in 19 countries. Can you do six announcements in each one that would make a product guy?
Sahir Azam: We’ve tried to bundle as many as we can around this.
Daniel Newman: That would make a product guy cry real tears.
Sahir Azam: But we have some other surprises throughout the rest of the year.
Daniel Newman: Well, the nice thing about software is you can move a little faster. It doesn’t happen quite as fast on the hardware front, but in the database space, with gen AI and AI and everything and the expectation that MongoDB is going to have a story, there’s a lot going on there. You made a bunch of announcements around Atlas. Atlas Search was one of the things that really caught my attention. I know it’s not brand new, you’ve been rolling this out for some time and the adoption’s been fast, the customer demand has been fast. Talk a little bit about that evolution of Atlas Search and maybe, is this sort of gen AI moment creating a real accelerator for what was already doing quite well?
Sahir Azam: Sure, yeah. Happy to get into it. Atlas Search is a great example of a capability in the platform that’s very customer driven. We were in so many customers as their core operational database and a lot of them are coming to us and frankly saying, “Why do I have to stand up a separate search database side by side and then manage all these data pipelines that fall over half the time and the data?” It’s hard to keep saying… They said for the application use cases of search in particular, this seems like a problem Mongo should be solving and so, that’s really what drove us to it a few years ago.
We started working on an incubation project internally, we identified what kind of indexing technology we wanted to use and really had to build from the ground up an extension to the engine that would make the developer feel like they’re just using the single database, but have the capability of collapsing really two databases into one as well as getting rid of the need to manage all that synchronization tasks.
As always with our business, we tend to see innovative fast moving startups. Perhaps they’re getting started in our self-service channel or developers prototyping, that’s kind of the early stage of adoption. I’ll say in the last year, year and a half, we’ve really started to see mission-critical production, enterprise workloads start to come to the platform because the cost of managing different environments and not just direct costs of the infrastructure or the licenses or whatnot, but the distraction on people to operate secure and develop against a fragmented system is something that architects, CIOs, CTOs and executives are starting to pay more attention to and that’s really started to drive, I’d say larger workloads on the platform, which then leads to some of the announcements.
You can see some of the announcements on search this year are about maturity. It’s about how do we give more flexibility to scale the search environment, it’s giving insights into how search is being used so that you can build a smarter search experience. It’s kind of that second phase of product evolution that we’re focused on.
Patrick Moorhead: I really liked your commentary on simplification, and if I look at IT over the last 30 years as I’ve been in here, it’s kind of an accordion where hey, best of breed, best of breed, best of breed and IT’s like, “Man, we got too many best of breeds, we need the cost of integration,” is outweighing the benefits of having that.
But I wanted to drill down on Atlas Search, a specific type of search, vector search. Can you talk about why you brought that out specifically and maybe the incremental value that it’s bringing to your customers? I heard a little bit of LLM attachment in there too for Vector.
Sahir Azam: Yeah, it’s interesting. We actually started working on vector search almost a year and a half, two years ago, and this was before the whole generative AI boom started, six or eight weeks because our search customers were saying, “Okay, these models are being democratized and we need more than just relevance or keyword based search rankings on text. We want to be able to index other types of information,” so images, video, audio, et cetera; add that in, so that’s being one driver. And secondarily, they wanted to be able to have a system that can use models to match things that are alike as opposed to having to just have the descriptions or the metadata on the system and all the raw text. We started to hear those requirements, we saw that semantic search use case, I think, that Dave articulated in his keynote because of Atlas Search, and that’s what led us to say, “Okay, there’s actually a second product opportunity here.”
But it really almost has two flavors of use cases. There’s ones that are an expansion of Atlas Search into this hybrid search model where you have a combination of relevance or keyword based search and vector search that can broaden the types of data and do similarity search alongside keyword search. But then there’s this new gen AI boom, which is driving use cases, especially recently which is really saying you can use vectorization to augment these big foundational models like OpenAI, et cetera with proprietary data so you can get company’s unique information, domain specific information to either make those LLMs more accurate by constraining them to a new ground truth or add proprietary data that they didn’t ever trained on in the public sphere to be able to answer against that as well.
Daniel Newman: Is really the holy grail for enterprises.
Sahir Azam: Yeah, exactly. Exactly.
Daniel Newman: You think it’s become overwhelmingly obvious that the table stakes of open AI of all of them. Even Microsoft and Google have allegedly come out and said, “The real beauty is when you have that proprietary data.”
By the way, that example you gave the automotive one with the sounds and then being able to take those, man. I’m like, “God, that’s brilliant.”
Sahir Azam: Yeah, that’s a real world example we were lucky enough to get started and working with. When I heard it for the first time myself, I was like, “That is pretty cool.”
Daniel Newman: Because the kind of data that is, that’s a very unique kind of data and using Vector to connect these different data archetypes and then, make it really seamless in the application. Super powerful.
Another thing that you talked about with Atlas, the stream processing. Now, when you say stream processing and you talk about it, the immediate thing that comes to everyone’s mind is Kafka. Talk a little bit about the approach, what you’re doing with stream processing, where you see that going.
Sahir Azam: Yeah, absolutely. We’ve been working with Confluent on integrations to Kafka for years. We have a native connector that we work on with them that integrates Atlas to Confluent Cloud, also works with self-manage offerings. There’s a lot of news in the market today.
But what we realized is for application developers who want to power the application experience, we needed a real stream processing engine and our developer community is used to working in a very natural language in idiomatic way of working with data. They don’t want to deal with SQL for streaming and then MQL and documents for data. Really, they pulled us to expand our query engine to be able to process the data that’s sitting in Kafka or other streaming systems across the enterprise. It’s a very complimentary extension to be able to say, you have this plumbing, this transport flowing data across your enterprise. If you want to power an application experience, now, a developer can extend and integrate data at rest in the database with data in motion in a singular experience that powers that live application.
That’s pervasive in any type of use case across many verticals, fraud detection and financial services. I’ve used an energy grid example, there’s live energy sensor grids and reactive maintenance use cases. There’s plenty of examples of how this is broadly applicable and so, we’re excited to see obviously how the adoption goes as we’ve made it public this week.
Patrick Moorhead: I really have enjoyed over the past couple years how you’ve been adding different capabilities to your… I’m a visual learner, so I love the marketecture diagram that shows-
Sahir Azam: I’ll let the design team know. That’s great.
Patrick Moorhead: It’s how my brain works, fortunately or unfortunately, but it’s been fascinating to see how you keep adding capabilities, but it’s still the one API model document based, which is very consistent and consistency is good.
Sahir Azam: That’s the north star for us because we’re not trying to have the longest list of features or services under one brand or one bill. The way we compete and the wide developers love us is the intuitiveness of how easy it is to work with MongoDB and therefore, do their jobs more easily, faster, more efficiently. We only go after areas that serve developers that can power those modern application experiences and we think we have a right to really differentiate because of documents and that API as our integration point.
Patrick Moorhead: One thing I want to dig into we haven’t talked about yet is analytics. Most people when they think of analytics, it’s more batch mode. I have a question I need, or maybe it’s batched up every week and this is the analysis that I get. I get pretty pictures that Pat likes upfront, maybe it’s a spreadsheet, but real time analytics embedded into applications looks like a very hot opportunity. It’s hard though, it’s not easy to do because of that little timeframe thing because you can’t wait for your customer to give them what they want.
I’m curious, what kind of use cases are you looking at with your customers for embedding real time analytics into applications?
Sahir Azam: Yeah, that’s an area. Last year, we made a lot of new announcements and we’ve obviously been iterating throughout the year. Analytics for us, at a very macro level, what we’re paying attention to is over time, software automates a lot of manual business processes or human reasoning. We saw this with operations moving to DevOps or code automates operations. We’re seeing the same thing happen with DevSecOps where now, security is being automated and software by development teams from the get go of the process. We think there’s a portion of analytics that’s not going to be an executive looking at a dashboard saying, “Go do this,” it’s going to be software that’s getting a realtime view of what’s happening, processing that and leveraging rules and logic and models to do something off of that. That’s not to say the batch systems go away, but the exciting stuff for us is in the in-app use cases that you’re alluding to.
Now, there are a lot of requirements that are put on a system. We have to invest heavily in new indexing types, our query engine performance. I mentioned a lot of the performance enhancements in 7.0 and Mongo are aimed at these, what you would call, in-app or real-time use cases. What’s interesting is the end user of that or the end developer, a person building these is the developer. It’s not a business analyst writing SQL queries to power a dashboard, it’s a developer writing code. That’s where we think we have a really compelling story for all the reasons we’ve talked about, idiomatic approach to drivers, this amazing developer community that we’re surrounded with.
We’re focused on that really first with the database itself and we’ve been used for things like Customer360, taking a bunch of sources of information about a customer. For example, MetLife is a longtime MongoDB user, they take realtime information from different systems of record that power different insurance lines in different businesses. They pull it together continuously into an ODS layer on Mongo so that their agents can get a real time view of all the policies across the lifecycle of engaging with the brand that they have access to.
We work on logistics management. We work with the train operator in the UK. They take information about real time train routing where there’s congestion, they built software that then makes intelligent decisions on where to slow down and route trains in an automated way where there’s human oversight, but it’s no longer a human having to decide that train’s behind, I got to go slow it down or slow it down, I can let another one by. They’ve automated that process.
To us, these are all query patterns that are analytical in nature, but applied to the use cases of applications and stream processing just takes out further because now, you don’t have to necessarily persist all that data in a database before you can start to run these types of queries. You can run Windows and functions right on top of the data as it’s being collected in the stream in motion.
Daniel Newman: Powerful. Yeah, the tendency definitely is heading towards everything being real time when possible. I know that batch will still make sense, but the direction of compute, the direction of software, of networking is all about making the data move faster, being brought, and obviously, you gotta have that software layer on top of it.
We only got a minute left, but I would be missing an opportunity to talk about security. One of the things that you did talk about that I thought was really important was dealing with the data in use. Meaning, you talked about data at rest and emotion, something I think most companies that figured out how to secure, but that data in use is a problem and you brought a solution to light today.
Sahir Azam: This is an exciting space for us and we’ve been working on security in various different ways for a long time but encryption in particular is interesting, especially in a cloud context because there’s a trust factor or shared trust model between you as a customer and your provider. Your provider could be a database provider like MongoDB that’s operating and scaling your database to the cloud, it could be a major hyperscaler that’s running the underlying infrastructure. It’s a real shared trust model and for extremely sensitive data, customers are very hesitant to potentially take the most sensitive data in a trust model where there are third parties involved.
In 2019, we released a feature called client side field level encryption, which allows the data to be encrypted before it ever gets to the cloud service or to the database, in our case, the database vendor or the cloud providers we partner with. That way, if you ever need to make sure there’s no data exfiltration, data leakage, you can be sure that the only entity with access to the key is your organization. That helps with things like GDPR and the right to forget. If a customer calls and says, “I want my information deleted, you don’t have to go trace all the backups that who knows across,” where they might say you just delete the key. Really powerful.
But the limitation with that approach, like every other database that’s implemented client encryption, is the only types of queries you can run are exact matches, so just a point query on a single record. I really have to be specific on what you’re looking for. The ability to aggregate or find things based on filters and searches has been impossible.
We got in touch around that timeframe with the team from Brown University that was writing some really interesting research around, it’s called Structured Encryption. They started a company called the Roki Systems or Roki Labs. We acquired the team and the founders and the early technology and for two years, we’ve been saying, “How do we take this structured encryption and apply it to an operational database?” That’s what’s been in preview for the last few quarters, we’re announcing the GA and MongoDB 7.0 and it really changes the game because it allows that same value proposition, a single key that you control that none of your providers or any third parties can access. The data is encrypted while it’s in memory in the database server, but yet, you can start to run query queries that are searching multiple records and filters as well. It’s an industry first, there’s some real heavy R and D that went into this one, it’s early on this journey. We have great plans for the future, but we’re really excited about where this can go in our industry.
Patrick Moorhead: Yeah, as I hear you talking about this kind of encryption capability, it sounded a little bit of magic. The magic is real and quite frankly, that is many of the tricks that people bring. You’re bringing magic to your developers and they appreciate it.
Sahir Azam: You don’t need to be a cryology expert to leverage this. It’s all very accessible for any developer. It’s worth noting all of the math and the research behind this is third party validated, open, so it’s trusted, this is not proprietary. We want the world to see how this is done so they can trust it and know that it’s real.
Daniel Newman: I want to thank you so much for joining us here on The Six Five. Great conversation. I think we covered it all. I think we got through all the announcements more or less.
Patrick Moorhead: The only thing we didn’t cover was the financial part, but we talked about that with your financial customers.
Sahir Azam: Oh, all right. That’s a good benefit.
Patrick Moorhead: Citibank, Wells Fargo.
Daniel Newman: And they were both here. And in fairness, basically what you’re building is all that stuff with a financial lens on it. Sahir, thanks so much for joining us at Six Five.
Sahir Azam: Thank you, I really appreciate the time.
Daniel Newman: Thanks. All right. Right. You heard it here. This was a great run through of all of the announcements here today at MongoDB Local in New York City. It is 30 of these events, 19 different countries, but this was a big one. Six different announcements. We’ve been here all day talking to the executives of MongoDB, the customers, and some of the big SIs. If you liked what you heard here, hit that subscribe button, check out all our other episodes. But for Patrick and myself, we got to say goodbye for this one. We’ll see you all really soon.