Artificial Intelligence (AI) and Machine Learning (ML) are without a doubt the hottest things happening out there in the tech world. They are changing the way virtually every industry is approaching solving their biggest problems. AI and ML leaders currently span from healthcare to transportation to social media and it’s hard to attend a major tech event without getting a dose of it, not because of the hype, but because of how it’s changing everything.
Moor Insights & Strategy analysts have extensively covered different implementation aspects of AI & ML stretching from mobile devices all the way to the datacenter. Most of the industry tends to focus on the computing aspect of training and inference, but not a lot on memory and storage, and I’ll admit, analysts like me are partially responsible for this. The reality is, the best AI and ML solutions have the right combination of compute, memory, and storage. I’m not the only one thinking about this architectural “imbalance”. While doing some research on AI and ML, I ran across a provocative blog written by Pure Storage Inc’s Roy Kim. I thought it was spot-on and inspired me to write and research even more on the topic.
Data parallel compute workloads need parallel storage
The implementation of deep neural training networks powered by GPUs from companies like Advanced Micro Devices (AMD) and NVIDIA and the big data sets that feed them is enabling the explosion of AI and ML. By using these GPUs to simulate different neural networks using these large data sets, experts have found that you start to run into bottlenecks as the GPUs get faster and the data sets get bigger and richer. One major issue is that the traditional storage that feeds these massively parallel deep neural networks inside of the GPUs is serial, outdated, and unable to keep up. This is understandable as in the past, most machine learning training occurred on the CPU which is around 10X slower than the fastest GPU. The performance of the compute for machine and deep learning and artificial intelligence tasks have been growing faster than the speed of the storage, and this is creating performance problems right now.
Pure FlashBlade, DirectFlash and NVIDIA DGX-1
Pure Storage likes to compare themselves to another leader in the AI and ML space, NVIDIA, with their DGX-1 system. Pure Storage is comparing themselves to NVIDIA in the sense that they also deliver the performance of roughly 100 nodes in a single very compact 4U package. I think that’s fair. In Roy Kim’s blog, he refers to a customer story where their 4U FlashBlade solution replaced 20 racks of a customer’s mechanical disks.
Pure has developed technologies that run AI & ML great. These technologies include FlashBlade storage solutions that utilize their DirectFlash technology which help to manage the storage functions at a low level. Pure’s Purity software is what does most of the orchestrating of the parallel functions that tie together the DirectFlash capabilities of the hardware with higher level software. When Purity and DirectFlash are combined, you get the complete Pure Storage FlashBlade solution which is designed to meet the needs of virtually any high-demand environment, including AI and ML.
Comparing to and partnering with NVIDIA is a smart move on their part because NVIDIA has a lot of respect in the AI and ML community and is commonly seen as the game-changer that has helped to bring AI and ML to the forefront with their GPU technologies and SDKs. Pure sees themselves as a complementary technology to NVIDIA’s DGX-1 and as a technology that can accelerate the performance of DGX-1 deployments with high throughput and low-latency storage like their FlashBlade. I think this is true by the way.
Note: Analyst Anshel Sag contributed to this article.