The AI Revolution Requires Accelerated Compute

By Hubert Yoshida posted 07-24-2019 00:00


We are in the midst of a new technology revolution that is described as the AI revolution. It is different from all previous revolutions like the industrial revolution or the information revolution in that it is not based on improving our human efforts based on explicit human knowledge. It goes beyond that by provided machines with the ability to learn and develop tacit knowledge, the intuitive know how, that is in the human brain. The AI revolution can provide super human artificial intelligence that could provide incalculable benefits for society.


AI or Machine Learning requires the computation of an enormous amount of data and is very compute intensive. One of the limitations for AI and Machine Learning is the limitations of today’s computers. CPUs are built on the Von Neumann architecture which connects a processing unit to a memory over a bus. The Von Neumann architecture can only process one instruction at a time in sequence. The speed and capabilities of this type of processor used to double every two years as researchers packed more transistors on to a microchip per Moore’s Law. Unfortunately, this is no longer the case and companies are scrambling to find different ways to improve processing speeds to support the AI revolution.

One approach that is gaining a attention today is through “Accelerated Computing” (AC).  IDC defines AC as the practice of offloading key workloads to silicon subsystems like high-speed GPUs (Graphics Processing Unit) and low latency FPGAs (Field Programmable Gate Arrays). These multi-chip configurations are increasingly targeting the unstructured data workloads leveraged by artificial intelligence, advanced data analytics, cloud computing and scientific research.

CPUs, GPUs and FPGA process tasks in different ways: A typical CPU is optimized for sequential serial processing and is designed to maximize the performance of a single task within a job, like transaction processing. GPUs, on the other hand, use a massively parallel architecture aimed at handling multiple functions at the same time. As a result, GPUs are 50 to 100 times faster than CPUs in tasks that require multiple parallel processes such as machine learning and big data analysis. While CPUs and GPUs execute software, FPGAs are hardware implementations of algorithms, and hardware is always faster than software. However, FPGAs do not handle floating point which is used for intensive signal- and image-processing applications. While FPGAs can be reprogrammed, it requires a special hardware description language which differs from normal programming languages in that they are able to accommodate parameters including propagation delays and also signal strengths.

All three types of processors could be used in combination. The FPGA could forward incoming data at high speeds, while the GPU would handle the heavy algorithmic work. CPUs would play a management role, interpreting the results of the GPU and sending the “answer” to the user. Such a combined system would play to the strengths of each type of processor while maximizing system efficiency. Since the FPGA would have fewer responsibilities, it could be smaller and less difficult to design and therefore cheaper and faster to implement. Accelerated Computing can create powerful compute engines out of standard CPUs, GPUs and FPGAs.

Hitachi has been using Accelerated Compute for some time, since the introduction of the HNAS, high performance NAS controller, over a decade ago. HNAS combines the use of FPGAs to accelerate data movement while a CPU handles the data management. (refer to my previous blog post Solving The Von Neumann Bottleneck With FPGAs).

Last year we introduced an Accelerated Compute model of our Unified Compute Processor, the Hitachi Advanced Server DS225.  The Hitachi Advanced Server DS225 delivers unparalleled compute density and efficiency, to meet the needs of the most demanding high-performance applications in the data center. DS225 takes full advantage of the ground-breaking Intel Xeon Scalable Processor family in combination with NVIDIA Tesla GPUs. By combining the Intel processors with up to four dual-width 300W graphic accelerator cards and up to 3TB memory capacity in a 2U rack space package, this server stands ready to address the challenging compute demands of the AI Revolution.



 Super Computers are also turning to accelerated compute. In June of 2018, the Summit computer at the United States Department of Energy's Oak Ridge National Laboratory (ORNL) topped the supercomputing list with a sustained theoretical performance of 122.3 petaflops on the High Performance Linpack test used to rank the Top500 supercomputing list. This surpassed the Sunway TaihuLight system at the National Supercomputing Center in Wuxi, China, which is capable of 93.01 petaflops.


Unlike earlier supercomputers, the Summit Computer uses standard components and software. designed by IBM and NVIDIA. Summit has a 4,608 node hybrid architecture, where each node contains multiple IBM POWER9 CPUs (2/Node) and NVIDIA Volta GPUs (6/Node) all connected together with NVIDIA’s high-speed NVLink. Each node has over half a terabyte of coherent memory (high bandwidth memory + DDR4) addressable by all CPUs and GPUs plus 800GB of non-volatile RAM that can be used as a burst buffer or as extended memory. To provide a high rate of I/O throughput, the nodes are connected in a non-blocking fat-tree using a dual-rail Mellanox EDR InfiniBand interconnect. The operating system is Red Hat Enterprise Linux (RHEL) version 7.5.

Supercomputers with accelerated computing are breaking the boundaries around many sciences. The Summit computer will be used in several studies, including the following:

Astrophysics: With 100 time more compute power than was previously available scientist will be able to build higher resolution models to study things like super novas for clues on how heavy metals were seeded in the universe.

Materials: Studying the behavior of sub-atomic particles to develop new materials for energy storage, conversion and production.

Cancer Surveillance: Acomprehensive view of the U.S. cancer population at a level of detail typically obtained only for clinical trial patients.This will help to uncoverhidden relationships between disease factors such as genes, biological markers and environment.

Systems Biology: Using a mix of AI techniques researchers will be able to identify patterns in the function, cooperation and evolution of human proteins and cellular systems. These patterns can collectively give rise to clinical phenotypes, observable traits of diseases such as Alzheimer’s, heart disease or addiction, and inform the drug discovery process.


Accelerated Computing will become more ubiquitous as demand for AI and machine learning continues to increase. Accelerated Computing will need to fill the demand for intensive compute power until Quantum computers become available for commercial use. When that happens, Artificial Intelligence will take an exponential step forward providing social innovations which will vastly improve our lives and society.




08-02-2019 22:34

The power of parallelism!

Get Outlook for iOS<>

08-02-2019 20:50

So true, the power of supercomputers and the innovation to come is amazing to ponder. it's so affordable now as well. I'm running 6GB Nvidia GPU in a gaming PC and now most gamers are using 10GB or more. Amazing to think how fast they offload CPU performance to allow for amazing gaming experiences. Thanks for the blog! artificial intelligence