Last fall, a bit of nerdy controversy arose around AI chip startup NovuMind when the company announced its first low-power chip for processing neural networks. The company claimed that its patented design could natively process 3-D tensor data far more efficiently than other designs that require pre-processing the 3D tensors into 2D matrices. NovuMind portrayed its advantages in low-resolution and high-resolution environments, while typical (Resnet 50) benchmarks target medium resolution algorithms common in datacenters (where NovuMind has a smaller advantage). Now, recent customer wins and trial deployments may provide the trump card in the debate. Let’s take a look at the company, its first product, and some case studies that seem to bear out NovuMind’s dramatic performance/watt claims. More details are available in my research report here.
What makes NovuMind special?
As an industry analyst, I’ve helped a number of investment firms examine various startups over the last 18 months. One thing I’ve noted is that many of these companies are dismissive of NVIDIA and its reliance on GPUs. This is a dangerous attitude in my opinion, given NVIDIA’s strong ecosystem, engineering bench strength, and innovation track record. Many startups plan to go head-to-head with NVIDIA for the high-end training market where NVIDIA has a dominant position that will be tough to crack. A competitive product would need to be perhaps 10 times faster than NVIDIA to justify the expenses and risks of adoption.
Other challengers, such as Habana Labs and Thinkci, decided to focus first on the inference market. Here, accelerators will be used in a wide range of applications and power limitations, from cameras and TVs to autonomous vehicles. Ren Wu, NovuMind’s founder and CEO, is also steering his startup to target these large inference markets. His bet is that NovuMind’s very low power and low-cost offerings will open doors to design wins while flying below NVIDIA’s high-performance and high-price radar.
NovuMind’s unique approach to efficient Domain Specific Architectures (DSA) seems to be winning over early customers. The company focuses on the largest market for neural networks that are tailored for image processing using Convolutional Neural Networks. Dr. Wu explains that his product’s efficiency is enabled by the company’s patented NovuTensor approach: a full 3D tensor convolution operation, which lies at the heart of many deep learning algorithms, is processed without first requiring a decomposition of the 3D tensor into 2D matrices. While many AI chips seek to increase compute efficiency by reducing computational precision, the NovuTensor approach is potentially more impactful. It stands to increase performance through a higher level of parallelism, reduce data copy operations normally required for the unfolding of 3D tensors, and increase compute density by loading two tensors at a time. The company’s current chip, which it has been sampling since last fall, is based on 28nm technology. The company’s roadmap utilizes more advanced processing nodes to increase compute bandwidth.
I’m particularly intrigued by the interesting applications being tested by NovuMind’s customers, some of which could produce some pretty massive demand. Here are a couple examples:
The first customer example is Singapore’s NCS Pte. Ltd., a subsidiary of telco giant SingTel . NovuMind is enabling surveillance systems with advanced video analytics capabilities to detect things of interest such as particular vehicles, people, or activities. This local intelligence transforms cameras from mere video capture devices to endpoints that can provide alerts so that authorities can respond to situations immediately. In this application, low power (3 Watts!), low cost, and low latency are critical factors. “Our application environment demands high performance at very low power levels,” said Kar Han Tan, head of R&D at NCS. “NovuMind is able to deliver a solution that meets our customers’ needs, and at a competitive price-point, thanks to their unique industry-leading architecture.” NovuMind is being tested in the first phase rollout of some 200,000 cameras.
A second customer is a large consumer electronics firm that is testing NovuMind’s first chip in prototype 8K Super Resolution TV sets. Here it is used to upscale 4K images to a full 8K image, utilizing AI to fill in the details to produce ultra-quality images while reducing the bandwidth requirements of native 8K image transmissions. Obviously, power requirements are stringent; nobody wants a cooling fan in their TV. Since the screen in front of you is likely not an 8K display device, the images in Figure 2 cannot do this task justice. If you enlarge it, however, you can get a feel for the benefits.
The world of AI inference accelerators will soon begin to cross the high-volume chasm, demanding many millions and even billions of chips to accelerate embedded AI at the edge for smart cities, factories, and appliances. NovuMind seems to have developed a good fit for many of these applications that use Convolutional Neural Networks by increasing compute density and lowering power through a novel and patented design. Early customers are testing the first-generation chip now, with excellent results in consumer product, medical device, retail, and smart city surveillance applications. The fact that NovuMind can achieve such performance at very low power using the conservative and inexpensive 28nm manufacturing process means that it has a lot of headroom to grow as its product line advances in the next few years.
We will soon see more interesting startups showing off their AI chips, but I believe NovuMind has an early lead that bears watching.