Hu Yoshida

Solving The Von Neumann Bottleneck With FPGAs

Blog Post created by Hu Yoshida Employee on Feb 1, 2019

Lately I have been focused on the operational aspects of data, how to prepare data for business out comes and less about the infrastructure that supports that data. As in all things, there needs to be a balance, so I am reviewing some of the innovations that we have made in our infrastructure portfolio which contribute to operational excellence. Today I will be covering the advances that we have made in the area of hybrid-core architecture and its application to Network Attached Storage. This hybrid-core architecture is a unique approach which we believe will position us for the future, not only for NAS but for the future of compute in general.

 

FPGA Candy.png

The Need for New Compute Architectures

The growth in performance of non-volatile memory technologies such as storage-class memories, and the growing demand for intensive compute for graphics processors, analytics/machine learning, crypto currencies, and edge processing are starting to exceed the performance capabilities of CPU processors. CPUs are based on the Von Neumann architecture where processor and memory sit on opposite sides of a slow bus. If you want to compute something, you have to move inputs across the bus, to the processor. Then you have to store the outputs to memory when the computation completes. Your throughput is limited by the speed of the memory bus. While processor speeds have increased significantly, memory improvements, have mostly been in density rather than transfer rates. As processor speeds have increased, an increasing amount of processor time is spent idling, waiting for data to be fetched from memory. This is referred to as the Von Neumann bottleneck.

 

Field Programmable Gate Arrays

Hitachi has been working with combinations of different compute architectures to overcome this bottleneck for some time. One architecture is a parallel state machine FPGA (Field Programmable Gate Arrays). Hitachi has been working with FPGA technology, investing thousands of man hours in research and development, producing over 90 patents. Unlike a CPU which is an instruction stream processor that runs through the instructions in software, to access data from memory and move, modify, or delete it in order to accomplish some task, FPGA’s are a reconfigurable systems paradigm that is formulated around the idea of a data stream processor—instead of fetching and processing instructions to operate on data, the data stream processor operates on data directly by means of a multidimensional network of configurable logic blocks (CLBs) connected via programmable interconnects. Logic blocks compute a partial result as a function of the data received from its upstream neighbors, stores the result within itself and passes it downstream. In a data-stream based system, execution of a program is not determined by instructions, but rather by the transportation of data from one cell to another—as soon as a unit of data arrives at a cell, it is executed.

 

Today’s FPGAs are high-performance hardware components with their own memory, input/output buffers, and clock distribution - all embedded within the chip. In their core design and functionality, FPGAs are similar to ASICs (Application Specific Integrated Circuits) in that they are programmed to perform specific tasks at high speeds. With advances in design, today’s FPGAs can scale to handle millions of tasks per clock cycle, without sacrificing speed or reliability. This makes them ideally suited for lower level protocol handling, data movement and object handling. Unlike ASICs (that cannot be upgraded after leaving the factory), an FPGA is an integrated circuit that can be reprogrammed at will, enabling it to have the flexibility to perform new or updated tasks, support new protocols or resolve issues. It can be upgraded easily with a new firmware image in the same fashion as for switches or routers today.

 

Hitachi HNAS Incorporates FPGAs

At the heart of Hitachi's high performance NAS (HNAS) is a hybrid core architecture of FPGAs and Multicore intel processors. HNAS has over 1 million logical blocks inside its primary FPGAs, giving it a peak processing capacity of about 125 trillion tasks per second – an order of magnitude more tasks than the fastest general purpose CPU. Because each of the logic blocks is performing well-defined, repeatable tasks, it also means that performance is very predictable. HNAS was introduced in 2011 and as new generations of FPGAs increased the density of logic blocks, I/O channel and clock speeds, increasingly more powerful servers have been introduced.

 

FPGAs are not always better to use than multi-core CPU’s. CPU’s are the best technology choice for advanced functions such as higher-level protocol processes and exception handling, functions that are not easily broken down into well-defined tasks. This makes them extremely flexible as a programming platform, but it comes at a tradeoff in reliable and predictable performance. As more processes compete for a share of the I/O channel into the CPU, performance is impacted.

 

HNAS Hybrid-Core Architecture

Hitachi has taken a Hybrid-core approach, combining a multi-core Intel processor with FPGAs to address the requirements of a high performance NAS system. One of the key advantages of using a hybrid-core architecture is the ability to optimize and separate data movement and management processes that would normally contend for system resources. The HNAS hybrid-core architecture allows for the widest applicability for changing workloads, data sets and access patterns. Some of the attributes include:

  • High degree of parallelism
    Parallelism is key to performance. While CPU based systems can provide some degree of parallelism, such implementations require synchronization that limits scalability.
  • Off-loading
    Off-loading allows the core file system to independently process metadata and move data while the multi-core processor module is dedicated to data management. This provides another degree of parallelism.
  • Pipelining
    Pipelining is achieved when multiple instructions are simultaneously overlapped in execution. For a NAS system it means multiple file requests overlapping in execution.

Pipeline.png

Another advantage of the hybrid-core architecture is the ability to target functions to the most appropriate processing element for that task, and this aspect of the architecture takes full advantage of the innovations in multi-core processing. High-speed data movement is a highly repeatable task that is best executed in FPGAs, but higher level functions such as protocol session handling, packet decoding, and error / exception handling need a flexible processor to handle these computations quickly and efficiently. The unique hybrid-core architecture integrates these two processing elements seamlessly within the operating and file system structure, using dedicated core(s) within the CPU to work directly with the FPGA layers within the architecture. The remaining core(s) within the CPU are dedicated to system management processes, maintaining the separation between data movement and management. The hybrid core approach has enabled new programmable functions to be introduced and integrated with new innovations in virtualization, object store and clouds through the life of the HNAS product.

 

For us, it’s not just about a powerful hardware platform or the versatile Silicon file system; it’s about a unified system design that forms the foundation of the Hitachi storage solution. The HNAS 4000 integrally links its hardware and software together in design, features and performance to deliver a robust storage platform as the foundation of the Hitachi Family storage systems. On top of this foundation, the HNAS 4000 layers intelligent virtualization, data protection and data management features to deliver a flexible, scalable storage system.

 

The Basic Architecture of HNAS

The basic architecture of HNAS consists of a Management Board (MMB) and a Filesystem Board (MFB).

HNAS Architecture.png

File System Board (MFB)

The File System Board (MFB) is the core of the hardware accelerated file system. Responsible for core file system functionalities such as object, free space management, directory tree management etc. and Ethernet and FC handling. It consists of four FPGA s connected by Low Voltage Differential Signaling (LVDS), dedicated point to point, Fastpath connections, to guarantee very high throughput for data reads and writes. Each FPGA has dedicated memory for processing and buffers which eliminates memory contention between the FPGAs unlike a shared memory pool in a CPU architecture

  • Network Interface FPGA is responsible for all Ethernet based I/O functions
  • The Data Movement FPGA is responsible for all data and control traffic routing throughout the node, interfacing with all major processing elements within the node, including the MMB, as well as connecting to companion nodes within a HNAS cluster
  • The Disk Interface FPGA (DI) is responsible for connectivity to the backend storage system and for controlling how data is stored and spread across those physical devices
  • The Hitachi NAS Silicon File System FPGA (WFS) is responsible for the object based file system structure, metadata management, and for executing advanced features such as data management and data protection. It is the hardware file system in the HNAS. By moving all fundamental file system tasks into the WFS FPGA, HNAS delivers high and predictable performance
  • MFB coordinates with MMB via a dedicated PCIe 2.0 8-lane bus path (simultaneous 500MB/s per lane for send and 500MB/s for receive, per lane).

Management Board (MMB) The Management Board provides out-of-band data management and system management functions for the HNAS 4000. Depending on the HNAS model, the platform uses 4 to 8 core processors. Leveraging the flexibility of multi-core processing, the MMB serves a dual purpose. In support of the FPGAs in the File System Board, the MMB provides high-level data management and hosts the operating system within two or more dedicated CPU cores in a software stack known as BALI. The remaining cores of the CPU are set aside for the Linux based system management, monitoring processes and application level APIs. The MMB is responsible for

  • System Management
  • Security and Authentication
  • NFS, CIFS, iSCSI, NDMP
  • OSI Layer 5, 6 & 7 Protocols

 

A Growing Industry Trend.

The market for FPGAs has been heating up. Several years ago,Intel acquired Altera, one of the largest FPGA companies, for $16.7 Billion. Intel, the world largest chip company, has identified FPGAs as a mature and growing market and is embedding FPGAs into their chipsets. Today Intel offers a full range of SoC (System on Chip) FPGA product portfolio spanning from high-end to midrange to low-end applications.

 

Microsoft announced that it has deployed FPGAs in more than half its servers. The chips have been put to use in a variety of first-party Microsoft services, and they're now starting to accelerate networking on the company's Azure cloud platform. Microsoft's deployment of the programmable hardware is important as the previously reliable increase in CPU speeds continues to slow down. FPGAs can provide an additional speed boost in processing power for the particular tasks that they've been configured to work on, cutting down on the time it takes to do things like manage the flow of network traffic or translate text.

 

Amazon now offers an EC2 F1 instance, which use FPGAs to enable delivery of custom hardware accelerations. F1 instances are advertised to be easy to program and come with everything you need to develop, simulate, debug, and compile your hardware acceleration code, including an FPGA Developer AMI (An Amazon Machine Image is a special type of virtual appliance that is used to create a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2and supporting hardware level development on the cloud). Using F1 instances to deploy hardware accelerations can be useful in many applications to solve complex science, engineering, and business problems that require high bandwidth, enhanced networking, and very high compute capabilities.

 

FPGA Developments in Hitachi Vantara

Hitachi Vantara, with its long experience with FPGAs and extensive IP portfolio is continuing several active and innovative FPGA development tracks along similar lines as those explored and implemented by Microsoft and Amazon.

 

Hitachi provides VSP G400/600/800 with embedded FPGAs that tiers to our HCP object store or to Amazon AWS and Microsoft Azure cloud services. With this Data Migration to Cloud (DMT2C) feature, customers can significantly reduce CAPEX by tiering “cold” files from their primary Tier 1 VSP Hitachi flash storage to lower cost HCP or public cloud services. Neil Salamack’s blog post explains the benefits that this provides for Cloud Connected Flash – A Modern Recipe for Data Tiering, Cloud Mobility, and Analytics

 

Hitachi has demonstrated a functional prototype running with HNAS and VSP to capture finance data and report on things like currency market movements, etc. Hitachi has demonstrated the acceleration of Pentaho functions with FPGAs, and presented FPGA enabled Pentaho-BA as a research topic at the Flash memory summit. Pentaho engineers have demonstrated 10 to 100 time faster analytics with much less space, much less resources, and at a fraction of the cost to deploy. FPGAs are very well suited for AI/ML implementations and excel in deep learning where training iterative models may take hours or even days while consuming large amounts of electrical power.

 

Hitachi researchers are working on a software defined FPGA accelerator that can use a common FPGA platform on which we can develop algorithms that are much more transferable across workloads. The benefit will be the acceleration of insights on many analytic opportunities, many different application types, and bring

things out to market faster. In this way we hope to crunch those massive data repositories and deliver faster business outcomes and solve social innovation problems. It also means that as we see data gravity pull more compute to the edge, we can vastly accelerate what we can do in edge devices with less physical hardware because of the massive compute and focused resources that we can apply with FPGAs.

 

Hitachi has led the way and will continue to be a leader in providing FPGA based innovative solutions

Outcomes