When Google announced its second generation of ASICs to accelerate the company’s machine learning processing, my phone started ringing off the hook with questions about the potential impact on the semiconductor industry. Would the other members of the Super 7, the world’s largest datacenters, all rush to build their own chips for AI? How might this affect NVIDIA, a leading supplier of AI silicon and platforms, and potentially other companies such as AMD, Intel, and the many startups that hope to enter this lucrative market? Is it game over for GPUs and FPGAs just when they were beginning to seem so promising? To answer these and other questions, let us get inside the heads of these Goliaths of the Internet and see what they may be planning.
The Google Cloud TPU is a four ASIC board that delivers 180 Teraflops of performance Source: Google.
The landscape of silicon for AI
As I explored in an article earlier this year, there are four major types of technology that can be used to accelerate the training and use of deep neural networks: CPUs, GPUs, FPGAs, and ASICs. The good old standby CPU has the advantage of being infinitely programmable, with decent but not stellar performance. It is used primarily in inference workloads where the trained Neural Network guides the computation to make accurate predictions about the input data item. FPGAs from Intel and Xilinx, on the other hand, offer excellent performance at very low power, but also offer more flexibility by allowing the designer to change the underlying hardware to best support changing software. FPGAs are used primarily in Machine Learning inference, video algorithms, and thousands of small-volume specialized applications. However, the skills needed to program the FPGA hardware are fairly hard to come by, and the performance of an FPGA will not approach that of a high-end GPU for certain workloads.
There are many types of hardware accelerators that are used in Machine Learning today, in training and inference, and in the cloud and at the edge. Source: Moor Insights & Strategy
Technically, a GPU is an ASIC used for processing graphics algorithms. The difference is an ASIC offers an instruction set and libraries to allow the GPU to be programmed to operate on locally stored data—as an accelerator for many parallel algorithms. GPUs excel at performing matrix operations (primarily matrix multiplications, if you remember your high school math) that underlie graphics, AI, and many scientific algorithms. Basically, GPUs are very fast and relatively flexible.
The alternative is to design a custom ASIC dedicated to performing fixed operations extremely fast since the entire chip’s logic area can be dedicated to a set of narrow functions. In the case of the Google TPU, they lend themselves well to a high degree of parallelism, and processing neural networks is an “embarrassingly parallel” workload. Think of an ASIC as a drag racer; it can go very fast, but it can only carry one person in a straight line for a quarter mile. You couldn’t drive one around the block, or take it out on an oval racetrack.
Why did Google build the TPU?
If you think about Google’s business, it has three attributes that likely led it to invest in custom silicon for AI. Examining these factors may be helpful in assessing other companies’ potential likelihood of making similar investments.
- Strategic Intent: Google has repeatedly stated that it has become an “AI First” company. In other words, AI technology has a strategic role across the entire business: Search, self-driving vehicles, Google Cloud, Google Home, and many other new and existing products and services. It, therefore, makes sense that Google would want to control its own hardware (TPU) accelerators and its own software framework (TensorFlow) on which they will build their products and services. The company is willing to invest to give themselves an edge over others with similar, albeit perhaps less ambitious, aspirations.
- Required Scale: Google’s computing infrastructure is the largest in the world, and that scale means that it may have the required volume needed to justify the significant costs of developing and maintaining its own hardware platform for AI acceleration. In fact, Google claims that the TPU saved the company from building another 12 datacenters to handle the AI load. Let’s do some sensitivity analysis to understand the likely scale required for a single ASIC cycle. For the sake of argument, let’s assume Google spent an Order of Magnitude of O($100M) including mask production, and that each chip will save them around O($1K). For reference, a single Cloud TPU chip at 45 TFLOPS potentially has a little more than 1/3rd the performance of a NVIDIA Volta GPU at a peak 120 TFLOPS, so you need 3 TPU chips to displace a high-end GPU. That implies Google can just about break even if they deploy an Order of Magnitude O(100K) TPUs, not accounting for the time value of money. That’s a lot of chips for most companies, even for Google. On the other hand, if it only cost Google O($60M) to develop the chip and TPU board, and they save O($2K) per chip, then they only need O(30K) chips to break even. Therefore a similar effort by another large datacenter may require an Order of Magnitude of O(30-100K) chips just to break even over a 2-3 year period).
- Importance of Google Cloud: Google execs can’t be satisfied to remain a distant 3rd in the global cloud computing market behind Amazon and Microsoft. They are investing a great deal in Google Cloud under the leadership of Diane Greene, and are now enjoying some of the fastest growth in the industry. Google could use the pricing power and performance of the Google Cloud TPU, along with the popularity of TensorFlow, as a potentially significant advantage in capturing market share for the development of machine learning in the cloud. However, it is important to note that Google says the use of their Cloud TPU would be priced at parity with a high-end GPU for cloud access. Google does not intend to sell the TPU outright.
Who else looks and thinks like Google?
Frankly, while all of the other Super 7 members (Amazon, Alibaba, Baidu, Facebook, Microsoft, and Tencent) are capable of building their own accelerator, nobody exhibits all three attributes to the extent that Google does. Furthermore, of the companies that are actually close, several seem to be moving in different directions:
- Baidu has recently stated publicly that it is in partnership with NVIDIA for its AI initiatives in the Cloud, Home, and Autos. This doesn’t mean Baidu can’t and won’t build its own chip someday, but for now, the company seems to be satisfied to concentrate on its software and services, which the Chinese market already values. Also, Baidu’s cloud remains a relatively small part of its business.
- Microsoft is the 2nd largest Cloud Services provider, has a large (>5000) stable of AI engineers, and is on a mission to “democratize” AI through its tools and APIs for enterprise customers. However, the company has decided (at least for now) to use Altera FPGAs from Intel in its Azure and Bing infrastructure, believing that it can benefit from a more flexible hardware platform in a fast-changing world. Also, Microsoft uses NVIDIA GPUs to train its neural networks.
- Amazon is perhaps the closest to the Google model outlined above; AWS is huge, and the company is investing heavily in AI. While Amazon may favor Apache MXNet framework for AI development, its AWS cloud services for AI supports all major frameworks, making it the open software Switzerland of the AI development world. Also, being an NVIDIA-based Ying to Google’s TPU-centric Yang could be an effective strategy. However, Amazon has gone down the ASIC path before; it acquired Ana Purna Labs in 2012, apparently to shave off costs and latencies in the AWS infrastructure. Because of this, the company already has a chip team on board in Israel. Finally, Amazon, like Baidu, seems to be keen on using FPGAs for their all-programmable nature.
I’m not forecasting that none of the other Super 7 companies will jump the GPU ship and hop on board their own ASIC, but it seems highly unlikely to me that many will—not soon, anyways. They all seem to have their hands full developing Machine Learning models with their vast troves of data, and are busy monetizing those models in a variety of products and services. Building an ASIC, and the software that enables it is an ongoing and expensive proposition that could be a distraction. Alternatively, combining the performance of a GPU for training with the flexibility and efficiency of an FPGA, for inference, also holds a great deal of promise.
So I for one do not think the GPU sky is falling, at least not in the near future. AMD certainly believes there is plenty of demand for GPUs and is aiming their Vega technology right at it.