Modern data scientists have an insatiable appetite for more performance to train and run deep neural networks (DNNs) for artificial intelligence (AI). In fact, research by Open.ai has shown DNNs are doubling their performance requirements every three and a half months compared to the traditional Moore’s Law rate for central processing units (CPUs), which have historically doubled every 18 months. While NVIDIA graphics processing units (GPUs) have largely enabled this advancement, some wonder if a new, grounds-up approach to silicon and system design might be better suited for this task. Given the growth prospects for AI, it’s no surprise there are scores of startups and large companies like Intel readying new silicon to enter the race. Wave Computing(“Wave”) believes their early time to market and novel “data flow” architecture will pavetheir way to success. In particular, Wave’s system design has the potential to improve scalability, which is essential for large model training for AI. This article will look atWave’s architectural foundation for performance and scalability.

Table Of Contents
- Introduction
- A Dataflow Primer
- Beyond Dataflow: System-Level Scalability
- Putting It All Together
- Conclusions
- Figure 1: A Typical Neural Network For Deep Learning
- Figure 2: Slack-Matching Buffers
- Figure 3: Distributed Agent Management
- Figure 4: Dataflow Processing Units Interconnected Through Fabric
Companies Cited
- Broadcom
- MIPS Technologies
- MIT
- NVIDIA
- Wave Computing