I was nineteen years old, working my way through college at a big oil company, when IBM’s storage team taught me that some technology can seem like magic. Something the company continues to do today.
We were readying IBM’s new RT PC servers, its first production RISC machine, for deployment across a fleet of oil tankers. We liked the machine, but we had a problem keeping them running in our small lab. Hard drives continually failed, always in the late afternoon. That would never work on a tanker.
IBM replaced these then (late 1980’s) state-of-the-art IBM hard drives almost daily. Finally, after weeks of field support not figuring out the issue and hard drives continuing to fail, IBM dispatched a team of engineers from its storage unit in San Jose. We marveled as IBM’s design engineers placed probes and antennas of various sizes not just in the machines, but throughout the offices we were using. Everything connected together with a dizzying array of scopes and measurement devices. We all sat back and waited.
They told us to move the servers 46 inches to the west and your storage will stop failing. The room we were using bordered an elevator shaft, and the afternoon rush was generating too much radio interference from the electrically noisy elevators. The engineers headed home, ultimately updating the product with additional shielding around the drive cages. It worked.
IBM solved our problem because it does far more than assemble a computer from a collection of commodity parts. IBM designed nearly every element of the storage subsystem and understood the intricate interplay between components, even when faced with random blasts of noise from a nearby elevator motor. We likely would never have solved the problem if that had been a commodity hard drive.
I tell this story to illustrate the power of building technology from the ground up, as IBM does with most of its storage products. While nearly every one of IBM’s storage competitors delivers products based on commodity SSDs, IBM builds its storage technology around its intelligent FlashCore Module. FlashCore is at the heart of what makes IBM’s FlashSystem line unique.
FlashCore Enables Momentum
I don’t think it’s an exaggeration to say that IBM has perhaps the most aggressive product introduction cadences in the storage industry. There is something new announced nearly every quarter, hardware and software. It’s almost too much to keep track of. What helps enable this level of momentum at the platform level is IBM’s FlashCore Module
IBM has been building its FlashCore Modules for a long time. It’s at the heart of what IBM’s FlashSystem portfolio can deliver, and the company continues evolving technology. Earlier this year, for example, IBM released its third generation FlashCore Module into its new FlashSystem 7300 and FlashSystem 9500 arrays.
The FlashSystem 9500 is a significant platform refresh of the IBM FlashSystem 9200, and provides a doubling of performance, capacity, and connectivity. The FlashSystem 9500 is a performance monster. Driven by four 24-core Intel Ice Lake processors and up to 3TB of DRAM cache, the machine can deliver 106 GB/s and 2.4M IOPs across up to 48 ports of either Fiber Channel or RoCE ethernet.
Some of the 9500’s performance comes from its use of Intel’s latest-generation Ice Lake Processors, which brings PCIe 4 into the product line. However, the remaining performance boost, and all the capacity increase, is thanks to the new IBM FlashCore Module v3. With FlashCore v3, the FlashSystem 9500 can deliver 4.6 PB effective in a 4U package, while delivering an impressive 3:1 compression.
IBM’s FlashSystem 7300 also leverages the new FlashCore v3 modules. That machine delivers a 25% performance increase over the previous generation FlashSystem 7200. The FlashSystem 7300 serves up to 2.2 PB effective at up to 50GB/s. The FlashCore v3 modules give the array the same 3:1 compression of its big brother, the 9500.
FlashSystem is how you consume the data stored on an IBM FlashCore Module. More than the system itself, IBM’s FlashCore technology is at the heart of what IBM is delivering. It’s the critical enabler that allows IBM to continually leap over the competition as it produces new generations of FlashSystem storage at an astonishing rate.
IBM’s FlashCore Module is not an ordinary SSD. It doesn’t just serve bytes across an NVMe interconnect. Instead, the IBM FlashCore Module is a complex computational storage device. The logic inside FlashCore makes QLC flash perform better and more reliably, with higher densities than most TLC-based solutions. This is a big win not just for IBM, but for its storage customers overall.
IBM, which has shipped over 200,000 FlashCore Modules, tells us that the technology allows for a media device with twice the endurance of a standard NVMe flash drive. The latest generation FlashCore Module is available in capacities ranging from 22 TB to 116 TB effective. All of this in a 2.5-inch dual-ported U.2 NVMe form factor. That’s all impressive, but what is inside the device? That’s what I wanted to know.
I asked that question, and IBM kindly connected me with like Andy Walls. Andy is an IBM Fellow and the CTO for FlashSystem. He knows FlashCore better than maybe anyone in the world. Andy is also one of those rare engineers who can make the incredibly complex seem simple. I enjoy spending time with him.
Andy describes FlashCore as a computational storage device. This means that a FlashCore Module combines NAND flash, DRAM and MRAM for caching, and an astonishing amount of compute to deliver greater functionality than a traditional SSD could. It also takes some of the computationally heavy work, such as compression, from the storage array and performs that work on the drive itself. As a result, this is a very efficient and flexible architecture.
The computational part of this storage are the ARM processor cores built into flexible and reprogrammable onboard FPGA. The primary purpose of this logic is to manage the module’s QLC flash. QLC has a bit of a bad reputation, as the raw media is a little slower and has more endurance concerns than the TLC flash typically found in an enterprise storage product. But QLC also has tremendous benefits. It’s cheaper and denser. I can build fatter and more power-efficient arrays with QLC, but I need to overcome its limitations.
Andy explained that IBM has managed to deliver a QLC-based flash drive that is as reliable and often more performant than any TLC-based solution on the market. There’s a lot of low-level technology involved in this (Andy gave an excellent presentation containing many of the technical details at the 2020 Flash Memory Summit), but the essence of the work is to manage the health of the NAND at the block level. It’s about managing hotspots, reducing write amplification, and performing intense health checks and error correction. I’m over-simplifying, but that’s the essence.
It’s not just about the algorithm; a lot of work is also happening across IBM’s engineering and operations teams. Any semiconductor product will contain slight variances across manufacturing runs. Most of these don’t matter, but, with NAND, some do.
Andy shared that IBM works with its NAND partner to do deep characterizations of each batch of raw QLC that IBM brings to its factory floor. These characterizations are programmed into each FlashCore Module in a way that increases reliability.
Beyond making QLC enterprise-ready, the first generation of FlashCore focused on compression. This is one of the most critical attributes of any enterprise storage array. Compression influences efficiency and cost, critical concerns for anyone in IT. IBM didn’t stop there.
Andy said that while IBM started with compression and continues refining that capability, the goal is to offload and accelerate the storage applications where they make sense. It also promises to enable new and exciting capabilities that we haven’t yet seen in a storage array.
Looking forward, it’s conceivable that a FlashCore Module could help with the problem of managing unstructured data, performing filtering, searching, and scanning at the media level. In addition, the processor could potentially be used to deliver real-time statistics about entropy changes of the data stored on the drive itself. The potential is nearly limitless.
The Analyst’s Take
There is little dispute that IBM builds some of the fastest and most resilient storage you can buy, something that hasn’t changed in the nearly 70 years since IBM invented the disk drive. When coupled with its very capable Spectrum software suite, IBM storage delivers to an enterprise a level of control of its data that no other vendor can match. IBM can do this because it continues to invest in innovation at every level.
IBM’s only real weakness in storage is that of perception. IBM is alone among its storage competitors in viewing the market not as an opportunity to move boxes, but rather as using storage as a strategic lever for solving more critical digital transformation problems. IBM will sell you systems, but its strategic focus is on helping to make your digital transformation successful. Storage is just an enabler.
This makes it hard to compare IBM against the rest of the market. Our industry loves a good horse race, and we wait breathlessly every quarter for the analysts who count things to tell us who’s winning. This isn’t fair to IBM. It’s playing a different game, but it’s playing that game with a portfolio of storage solutions that can solve nearly any data problem.
I’ve learned much about storage technology in the thirty years since I first met the IBM storage team. But, even though I understand the technology at a much deeper level, when I look at how IBM builds its storage arrays from the NAND up, along with the software to optimize my data experience across a hybrid-cloud infrastructure, it can still feel a little like magic.
Note: Moor Insights & Strategy writers and editors may have contributed to this article.