Artificial Intelligence (AI) and Machine Learning (ML) are without a doubt the hottest things happening out there in the tech world. They are changing the way virtually every industry is approaching solving their biggest problems. AI and ML leaders currently span from healthcare to transportation to social media and it’s hard to attend a major tech event without getting a dose of it, not because of the hype, but because of how it’s changing everything.
Moor Insights & Strategy analysts have extensively covered different implementation aspects of AI & ML stretching from mobile devices
all the way to the datacenter
. Most of the industry tends to focus on the computing aspect of training and inference, but not a lot on memory and storage, and I’ll admit, analysts like me are partially responsible for this. The reality is, the best AI and ML solutions have the right combination of compute, memory, and storage. I’m not the only one thinking about this architectural “imbalance”. While doing some research on AI and ML, I ran across a provocative blog
written by Pure Storage Inc’s Roy Kim. I thought it was spot-on and inspired me to write and research even more on the topic.
Data parallel compute workloads need parallel storage
The implementation of deep neural training networks powered by GPUs from companies like Advanced Micro Devices (AMD) and NVIDIA and the big data sets that feed them is enabling the explosion of AI and ML. By using these GPUs to simulate different neural networks using these large data sets, experts have found that you start to run into bottlenecks as the GPUs get faster and the data sets get bigger and richer. One major issue is that the traditional storage that feeds these massively parallel deep neural networks inside of the GPUs is serial, outdated, and unable to keep up. This is understandable as in the past, most machine learning training occurred on the CPU which is around 10X slower than the fastest GPU. The performance of the compute for machine and deep learning and artificial intelligence tasks have been growing faster than the speed of the storage, and this is creating performance problems right now.
Solutions do exist
Pure FlashBlade, DirectFlash and NVIDIA DGX-1
In my research, I have found that Pure Storage has been advocating the most on this and the company believes they have the solution to match the needs of the AI and ML community with their FlashBlade technology. It seems like a very compelling value proposition. Pure Storage bills their FlashBlade as a massively parallel storage solution that delivers the performance of over 100 disk-based nodes with only 15 of their blades. This solution takes up 4Us of rack space and delivers 17 GB/s of read throughput and 1.5 million IOPS at latency under 3ms. That’s some seriously impressive performance, and because Pure Storage’s FlashBlade is designed for scale and is massively parallel, the company says they can build it out all the way to 75 blades (20U) which can achieve throughputs of up to 75 GB/s and 7.4M IOPS with a capacity of 8PB. I have more companies to sort through, but I can say without a doubt this is impressive. What’s even more impressive is that this isn’t just theory- web giants are using this configuration for their ML workloads. While Pure cannot say who many of their customers are by name, think of one of the largest social media companies on the planet who rigorously uses machine learning. Pure was able to talk about Zenuity and Man AHL's ML FlashBlade and NVIDIA DGX-1 implementation
Pure Storage likes to compare themselves to another leader in the AI and ML space, NVIDIA, with their DGX-1 system. Pure Storage is comparing themselves to NVIDIA in the sense that they also deliver the performance of roughly 100 nodes in a single very compact 4U package. I think that’s fair. In Roy Kim’s blog, he refers to a customer story where their 4U FlashBlade solution replaced 20 racks of a customer’s mechanical disks.
Pure has developed technologies that run AI & ML great. These technologies include FlashBlade storage solutions that utilize their DirectFlash technology which help to manage the storage functions at a low level. Pure’s Purity software is what does most of the orchestrating of the parallel functions that tie together the DirectFlash capabilities of the hardware with higher level software. When Purity and DirectFlash are combined, you get the complete Pure Storage FlashBlade solution which is designed to meet the needs of virtually any high-demand environment, including AI and ML.
Comparing to and partnering with NVIDIA is a smart move on their part because NVIDIA has a lot of respect in the AI and ML community and is commonly seen as the game-changer that has helped to bring AI and ML to the forefront with their GPU technologies and SDKs. Pure sees themselves as a complementary technology to NVIDIA’s DGX-1 and as a technology that can accelerate the performance of DGX-1 deployments with high throughput and low-latency storage like their FlashBlade. I think this is true by the way.
An optimized AI and ML workflow requires the right balance of compute, memory and storage. Anything other than that risks slowing down the entire training process. There’s been a lot of talk about optimized ML compute but not storage. Based on my initial research, Pure is one of the first storage companies who get this, has optimized pieces of their hardware and software stack for the workload and has some of the AI giants already using it.
Note: Analyst Anshel Sag contributed to this article.