Amazon.com has built a formidable position in Infrastructure as a Service (IaaS) with Amazon Web Services Elastic Compute Cloud (AWS EC2), and it is hoping to extend that position to applications that benefit from acceleration using GPUs and FPGAs. FPGAs are relatively difficult to program, and the cloud leader is laying the foundation to simplify FPGA adoption by creating a marketplace for accelerated applications built on Xilinx FPGAs. This article examines the progress the two companies have made since their announcement of Amazon EC2 F1 instances a year ago, and sheds some light on an important advantage Xilinx brings to the table compared to their competition. The bottom line here is AWS and Xilinx have made progress in establishing the category, and now need to broaden the base of applications, enhance the tools, and turn up the marketing volume.
Slow but tangible progress
Meeting with Xilinx and its partners this week at Amazon AWS re:Invent gave me a chance to understand where FPGAs are making progress, and what is preventing them from making more. If you recall, the companies initially made the F1 instances available as a service, and then added pre-configured and optimized FPGA applications on the Amazon Marketplace, targeting the genomics, video encoding and complex analytics markets.
While examining the companies’ progress since these announcements, keep in mind that the long pole in the tent here is the hard work of building reliable high-performance applications on FPGAs. Programming tools can help and are essential, but the optimization devil is in the details, slowing the pace of adoption. The good news is that once an FPGA application is finally built, optimized and released on AWS Marketplace, the solutions provider can start marketing its app as a cloud-based service to its customers worldwide—realizing a potentially sizable return on the company’s investment, by leveraging the easy-to-use availability of the solution on AWS EC2.
Here is a list of the accomplishments Xilinx and Amazon have made in the last year:
- AWS has now deployed the F1 instances to four regions, with more to come, currently supporting NA-East, NA-West, Europe (Dublin), and most recently the GovCloud region to support the US federal government. We might see additional regions being added over the next year; it would certainly be a leading indicator of market traction.
- To support the Asian markets, where AWS has limited presence, Xilinx has won over support from the Alibaba and Huawei cloud operations. While not a large primary cloud provider, Huawei has one of the largest FPGA developer communities, with thousands of RTL programmers who can now easily access Xilinx FPGAs in the cloud. Baidu and Tencent also have adopted Xilinx FPGAs as a service this year.
- Xilinx has launched a global developer outreach program, and has already trained over 1,000 developers at 3 Xilinx Developer Forums—with more to come.
- Xilinx has recently released a Machine Learning (ML) Amazon Machine Instance (AMI), bringing the Xilinx Reconfigurable Acceleration Stack (announced last year) for ML Inference to the AWS cloud. This code has matured considerably, and can now compile neural networks from GPU-trained CAFFE models into neural networks optimized for FPGA acceleration, including support for 8-bit execution. Xilinx has partnered with DEEPHi to demonstrate a 40X performance boost over CPUs for speech recognition, and as we will see below with the RYFT case study, Xilinx has achieved nearly 100X performance improvement for complex image queries. I expect Xilinx will continue to enhance this offering to include support for MXNet and TensorFlow DNN Frameworks in the coming months.
- Xilinx partner Edico Genome recently achieved a Guinness World Record for decoding human genomes, analyzing 1000 full human genomes on 1000 F1 instances in 2 hours, 25 minutes; a remarkable 100-fold improvement in performance, which will save time in diagnosing genetic diseases and saving lives.
- AWS has added support for Xilinx SDAccel programming environment to all AWS regions for solution developers—another step in making it easier to build FPGA-accelerated solutions for cloud or on-prem deployments.
- Xilinx partner Ryft has built an impressive analytic platform on F1, enabling near-real-time analytics by eliminating data preparation bottlenecks (including ETL tasks such as indexing and transformation).
A deep dive into the RYFT
I wanted to follow with a deep dive into RYFT, in order to examine the benefits and challenges that FPGA acceleration can entail. RYFT is a Washington Beltway company that provides a comprehensive suite of analytic tools for unstructured data, selling into the intelligence and public safety communities to help them extract actionable, time-critical information. The challenge with this concept is that it takes a tremendous amount of time to transform and index the unstructured data in order to enable queries using traditional approaches. RYFT has demonstrated near-real-time performance for conducting elastic searches on unstructured data using the AWS F1 instances, achieving performance improvements approaching 100-fold over CPU based searches. Intelligence teams cannot wait for hours or days when trying to find a specific suspicious vehicle in a metropolitan area—public safety is at stake.
Figure 1: By using the Xilinx-powered AWS F1 instances to accelerate their Elasticsearch function, RYFT has decreased response times by two orders of magnitude.
To make the problem of real-time complex analytics on unstructured data even more difficult, each data type demands a different algorithm. Because of this, RYFT provides ten primitives for complex queries. In order to provide low latency response time, RYFT hosts all of these primitives on AWS F1 instances; however, each primitive’s code typically consumes about 1/4th of the logic gates on an FPGA. In order to provide an agile platform that supports all ten of these search modes on each FPGA, RYFT leverages Xilinx ’s unique capability of partial reconfiguration. As a result, RYFT can reconfigure each FPGA on the fly, in seconds, to switch to a different analytic primitive. When I asked RYFT why it chose Xilinx over Intel Altera, the company responded that only Xilinx can fully support this requirement.
Figure 2: RYFT provides 10 different analytics and primitives, each of which can be loaded onto the Xilinx FPGA via partial reconfiguration, greatly speeding and lowering the costs of agile FPGA acceleration.
Many investors and industry observers ask me, “When will FPGAs take off in the data center, and why would Amazon.com or solution providers select Xilinx over their competitors?” The answer to the first question is that adoption has finally started, but we need to see the cumulative effect of having these solutions available on AWS Marketplace ramp up with additional services. There are currently around a dozen AWS AMIs for F1 solutions, and it has taken considerable time and effort for each to be built—years in some cases. However, as more solutions are added to the platform, the channel multiplier effect will eventually help FPGAs cross the chasm.
As for the differentiation that Xilinx brings to the table, the company has typically been first to market with the newest process node, and is already developing its 7nm product. In addition, the ability to support partial reconfiguration helps its customers do more with each FPGA, enabling a larger solution set to be implemented on each one. Just how durable this advantage proves to be remains to be seen, but at least for now, Amazon.com , Huawei , Alibaba , Baidu, and Tencent have all voted for Xilinx .