From the cars we drive to the smartphones we use to every IaaS to SaaS service, AI is going to have an impact across every category of digital devices. But for AI to scale up and maximize its impact, AI processing needs to happen in a hybrid form—both on the cloud and at the edge of the network.
The large language models (LLMs) powering ChatGPT and other AIs require a significant amount of compute, to say the least. Training and optimizing an LLM on hundreds of billions of parameters takes many months. The more parameters, the more time and resources it requires—but also the more capable the AI model becomes. Similarly, the more users access an AI model, the more resources it takes and the more it costs.
As the scale of use gets large enough, the demand for AI processing becomes impractical to handle on the cloud alone. That’s when on-device processing needs to be there to pick up the slack to meet our AI demands.
In this article, I want to unpack the need for strong on-device AI processing and the leadership role Qualcomm is taking on that front.
LLMs are not too large for devices
While large language models take months to train using very powerful server hardware, mobile devices actually do have the performance to run smaller LLMs locally. Below is a very helpful graphic from Qualcomm’s whitepaper The Future of AI is Hybrid; it shows how mobile devices can run many different generative AI models and model sizes
Forbes Daily: Get our best stories, exclusive reporting and essential analysis of the day’s news in your inbox every weekday.
I expect two things to happen as LLMs become more common and begin to run regularly on-device: (1) they will become more optimized; and (2) mobile devices will become more performant and thus more capable of running larger AI models.
AI engineers and developers are going to get better at optimizing training data so that LLMs are more capable with fewer parameters. I expect researchers to find a sweet spot for the number of parameters for LLMs, looking more at optimizing the data the models are trained on rather than maximizing the number of parameters for training more capable models.
In many use cases, AI models do not need to know everything under the sun. For example, an AI model being used as a medical assistant does not need a large set of data on legal cases. So rather than using a general-purpose LLM like GPT-4 that is trained on hundreds of billions of parameters and must run in the cloud, applications can use customized AI models trained on fewer than a hundred billion parameters to adequately address one specific task.
A couple of great examples come to mind when looking at the capability of on-device AI. First, starting from these simple instructions for implementing it on Android, I was able to run LLaMA, a fine-tuned 7-billion-parameter AI model released by Stanford University. It can run on a Snapdragon Insider smartphone with a Snapdragon 888 SoC.
Second, I was also given the opportunity to demo Qualcomm’s 1-billion-parameter Stable Diffusion model at Mobile World Congress this year. Using a text-to-image prompt, it was able to generate a new image in under 15 seconds.
Being able to run an LLM on mobile devices and the ability to create custom, less resource-intensive LLMs for specific tasks will work together to open up a future of scalable AI in the palms of our hands.
Qualcomm’s leadership in on-device AI
I believe that with its hardware and software capabilities, Qualcomm is well-positioned to scale hybrid AI. AI is involved across every one of its product segments, from smartphones and PCs to automotive solutions and networking.
One of Qualcomm’s strategies is to scale its technology across these segments. So the same AI technology it builds for smaller devices—think smartphones and IoT components—can then be scaled for larger devices like PCs and vehicle SOCs. Through this strategy, Qualcomm creates a unified AI stack that enables developers to deploy AI apps across the full range of the company’s segments. This strategy has been in place for years and has positioned Qualcomm as a leader in AI research; as with its other innovative technologies, Qualcomm now has a long list of AI “firsts” in terms of research and proofs-of-concept.
I am also confident in Qualcomm’s understanding that AI’s future requires a hybrid approach combining cloud-based and on-device computing—an approach that Qualcomm is well situated to accommodate and cultivate.
Other developments in on-device AI
At the Google I/O developer conference last month, Google announced new AI features and a new emphasis for on-device AI. Specifically, it announced the PaLM 2 family of AI models, the smallest of which is codenamed Gecko and is capable of running on mobile devices. Google also released a new machine learning kit API that enables developers to run TensorFlow Lite models on Android.
Meanwhile, at Microsoft Build 2023, Microsoft shared a new Windows AI Library, ONNX Runtime and Hybrid Loop. ONNX Runtime enables third-party developers to have access to the same tools that Microsoft uses to run AI models on Windows. While ONNX Runtime connects developers to machine learning models, Windows AI Library will house a curated collection of ready-to-use machine learning models for features like Windows Studio Effects.
Qualcomm, Google, Microsoft and many others within the Windows and Android ecosystem are creating an incredible environment for developers to scale and enable AI at the edge. We are moving well beyond the stage of using AI only for predictive texts, turning speech into text or improving autocorrect. From this point forward, AI will scale far beyond the cloud, and a crucial enabler of this scaling will be on-device AI.
Fortunately, the companies I just mentioned did not start their AI roadmaps yesterday but have spent a long time strategically planning for the emergence of AI across different devices and functions. (Apple, meanwhile . . . not so much.) I am excited to see the new era of LLMs create new possibilities for all devices on the edge. LLMs are only going to become more capable, and edge processing is only going to get better. The era of hybrid AI could change everything.
Note: Moor Insights & Strategy co-op Jacob Freyman contributed to this article.