Arm’s OD and ML processors work together to find and identify faces.
This blog was co-written by Patrick Moorhead, President and Principal Analyst and Karl Freund, Sr. Analyst, Machine Learning and HPC, Moor Insights & Strategy.
Machine learning is the hottest thing in tech right now; hotter than smartphones, hotter than virtual reality, even hotter than the fabled Internet of Things (IoT). As a result, all of the leading tech companies are now attempting to define themselves as the leader in some kind of machine learning. And it’s not just ecosystem companies like Google, Microsoft, Amazon.com, IBM and Facebook; it’s chip companies like Advanced Micro Devices, NVIDIA, Intel, Qualcomm, Xilinx and now ARM Holdings; not to mention that there are at least twenty startups, from Silicon Valley to Shanghai, all building silicon for teaching machines to think, and then, well, thinking, with neuromorphic approaches.
ML “training” versus “inference”
While followers and shareholders of NVIDIA have benefited from that company’s domination of machine learning in the data center, it is becoming increasingly important to understand that there are two phases of ML: training and inference. The former demands very fast processors, primarily NVIDIA GPUs today, and tons of data, and occurs mostly in the datacenter. The latter demands more affordable, lower power chips that can quickly traverse the neural network to categorize an image, or identify a face, or even translate English to French. Most training occurs in the datacenter, with some on the edge, while most inference will occur on the edge, with some in the datacenter.
ML in mobile devices today
Inference will increasingly take place in apps on smartphones and other “edge” devices. While most phones have chips that can process rudimentary neural nets, additional performance beyond the CPU and GPU is needed for images and language processing. As a result, Huawei’s latest Kirin 970 has what it calls a Neural Processing Unit, I believe supplied by Tensilica LLC. The iPhone X has the A11X Bionic chip with a custom silicon block for neural network processing to enable face detection and portrait photography with promises to do more in the future. The Qualcomm Snapdragon 835 accelerates TensorFlow, Caffe, Caffe2, MxNet and Android NNAPI across its CPU, GPU, and most importantly, its DSP. But ARM Holdings, the supplier of the common CPU technology that enables all these devices, has been notably quiet on the AI front. All that is about to change.
What did ARM Announce?
Arm Holdings has announced “Project Trillium”, an internal handle for the IP and software for machine learning, consisting of three primary components: an Object Detection (OD) processor, a Machine Learning (ML) processor, and neural network software libraries that support heterogeneous compute across the CPU, GPU and ML processor. As one would expect, Arm will focus first on the IP for smartphones and tablets, smart cameras, and what it’s calling “AI-enabled devices” which we will assume are devices like drones and smart home devices. Arm says their architecture can scale to autonomous cars and the datacenter in the future, but we’ll need to see more details before we can weigh in on those claims.
The ARM OD and ML Processors
By adding an OD processor to the mix, Arm is delivering more of a workflow solution. When you place an OD processor in front of an ML processor, the ML processor can work on higher-value workloads. This reminds us in spirit of the way Intel’s Movidius operates. The OD processor can identify if there’s a face and the ML processor can identify faces that look similar.
For processing a trained neural network, Arm is offering IP for building a neural network processor that can be standalone or can work in conjunction with the OD processor, an ARM CPU, and/or an ARM MALI GPU. It will be interesting to see how this move affects those 20-odd startups we mentioned earlier. If Arm gets traction selling the ML processor IP to chip builders, and their track record is certainly very good, then companies building ML acceleration chips will face a difficult decision: they can build their own IP or license ARM’s ML IP if their added value is on top of the ML processor. Either way, they will face a large competitive field.
Arm Neural Network software
Similar to what Qualcomm has done, the Arm NN software works across Cortex CPUs, Mali GPUs and the new Arm ML & OD processors, supporting TensorFlow, Caffe, Caffe2, MXNet, and Android NN frameworks.
Arm is providing software libraries that enable developers to use many of the open source ML frameworks to build and execute their neural networks.
Arm’s entry into the machine learning space may seem a bit late, but the market for edge device ML inference is just now beginning to emerge and should become mainstream in the next 2-3 years as 5G networks enable low cost and low latency connectivity. It’s also important to recognize when Arm makes a move, the company is impacting billions of devices and thousands of companies in its decisions, so they better get it right the first time.
It will be very interesting to see how many mobile device and IOT manufacturers sign up, but we expect that the low end of the mobile market will be ripe for Arm to penetrate while higher end manufacturers will continue to build out their own ML capabilities, at least for the next few years if for no other reason to come across as unique. Smart IOT may be a larger opportunity for Arm, if the industry’s prognosticators are correct, however, Arm’s IP for lower-end products will not come out until after the initial IP for mobile is released later this year.
In the end, it is all about performance per watt, and ARM is targeting about 3 TOPS/Watt on the 7nm manufacturing process; not bad, but there are startups shooting for much more, but with the OD processor in front of it, the entire workflow could be more efficient. We can’t wait!