Google recently announced that the Google Cloud TPU (announced last May), is now available in limited quantities as a beta on the Google Cloud Platform (GCP) for running TensorFlow-based AI applications. Clearly, the search giant is excited about its shiny new device, which the company has claimed will substantially reduce its datacenter footprint and cost outlays. However, one does have to marvel a bit at Google’s marketing audacity, considering the plain facts: Google’s 4-chip beast, replete with 64 GB of expensive High Bandwidth Memory (HBM), is roughly 33% more expensive per unit of performance than NVIDIA ’s single chip, 1-year old Tesla V100 GPU accelerator.
What did Google announce?
Google initially launched its TensorFlow Processing Unit for training neural networks—the meat and potatoes of building AI applications—one week after NVIDIA’s 2017 GTC event, where the graphics chipmaker unveiled its mammoth Volta GPU. Certainly, the concept of an ASIC (Application Specific Integrated Circuit) for Machine Learning has its appeal; eliminate all the graphics-specific silicon and you should be able to build a faster chip by harvesting that die area for more processing. How much die area actually frees up is debatable.
Frankly, Google obviated the truth by defining a Cloud TPU as a four-chip board and then compared that board to an NVIDIA Maxwell GPU accelerator that was two generations old at the time.,There was no need for all the fancy footwork. The Cloud TPU chip itself is very fast, at 45 Trillion Operations Per Second (TOPS)—more than twice the performance of NVIDIA’s Pascal GPU Accelerator. At 125 TOPS though, NVIDIA’s Volta GPU is even faster—thanks in part to its TensorCore feature. So, from a raw performance standpoint, chip to chip, NVIDIA Volta V100 is almost 3 times faster (125 vs. 45 TOPS). The big disclaimer is that this applies if (and only if) your model can take advantage of TensorCores, which perform a 4×4 matrix multiply in a single clock cycle.
The following table shows that the 4-die Google Cloud TPU costs over twice as much as the NVIDIA Volta GPU available on the Amazon AWS cloud while delivering ~67% more performance, based on the training time for the ResNet50 neural network. Net it out, and the Google part costs ~33% more to do the same work.
Table 1: Comparing Google Cloud TPU vs. NVIDIA’s Volta (V100) GPU for training convolutional neural networks for image analysis.
Google also announced that its TPU Pods—interconnected TPUs that form a massive compute cluster—would be ready later this year. Google had already announced that a 1000 TPU Pod, called the TensorFlow Research Cloud, would be available at no charge to the “world’s top researchers” to help drive innovation on TensorFlow-based AI models. I should note that Google’s TPU hardware is only available from Google as a service and that the TPU silicon only supports TensorFlow, Google’s open sourced Machine Learning Framework. NVIDIA’s GPUs, on the other hand, support virtually every machine learning software package.
As I have discussed before, Google is intent on building a complete stack of hardware and software for AI applications to run on its cloud. The company clearly envisions a virtuous cycle of benefits from its hardware to its software, and vice versa. I do find it unsettling that the company announced this hardware so early; 9 months since the initial announcement, and it is just now becoming available—as a beta service, and only in “limited quantities”. But the real disappointment comes from the pricing strategy: why would anyone pay more to get the same job done? Yes, a 4 die Cloud TPU can get the training done faster, but it is >2X slower than a 4 GPU instance on AWS. I can only surmise that the pricing is in part due to the high costs of the extra HBM2 memory chips, widely believed to be over $300 per TPU die, (or $900 more than the 16GB needed for Volta). Keep in mind, though, that Google is getting the ASIC at manufacturing costs, so even the HBM2 delta does not fully explain the higher pricing. Hopefully, these prices will come down after Google irons out whatever wrinkles are limiting the quantities.
Long story short, advanced silicon is hard to do—even if you are Google!