VSP 5000 Breaks NVMe Performance Challenges

By Hubert Yoshida posted 10-15-2019 18:45


In May of this year, George Crump of Storage Switzerland, commented on the state of NVMe storage performance.

“NVMe Flash Arrays promise an unprecedented level of performance thanks to the higher command count, queue depth and PCIe connectivity. Most NVMe Arrays are boasting IOPS statistics of close to one million IOPS and latency in the low hundreds of microseconds.”
George Crump 

Last week at Hitachi’s NEXT 2019 Event in Las Vegas, Hitachi Vantara announced our next generation, high performance storage array, the VSP 5000 series with NVMe. The VSP 5000 series storage array shatters NVMe based storage system performance numbers by delivering 21 Million IOPs and latencies of 70 microseconds!

How was this done? Wouldn’t NVMe provide the same performance improvements for any storage array? The answer is that NVMe is just one component of a storage array and the difference is in how it is implemented in the storage controller. The VSP 5000 with the Hitachi Storage Virtualization Operating System RF (Resilient Flash) is flashed optimized for NVMe as well as SAS intermix and built around a new internal PCIe based switching fabric and AI/ML based operations center for greater performance, scalability, availability and ease of use. 

Other storage systems vendors have rushed to deliver NVMe in their storage arrays, without making the necessary changes to release it from the constraints of older disk based controllers. Marc Staimer blogged about how NVMe performance challenges expose the CPU choke pointSome vendors have tried to address this through adding more CPUs. However, that adds more software and additional cost. Hitachi chose to optimize the storage architecture before implementing NVMe in order to get the maximum benefit from the new performance capabilities of NVMe. 

IOPs has been the traditional way to compare performance of storage arrays. However, the more important performance number is latency. The less latency a storage system has the more work it can do, delivering more data to more processors in less time. This means your applications can run faster, do more work, and require less compute and networking resources which reduces cost. 70 microseconds is less than half the latency of competitive NVMe storage systems. 

While NVMe’s blazing fast performance has exposed CPU choke points in the server and storage controllers, the choke points are not in the CPU but in the internal interconnects, the controller cache architecture and the storage software that was designed for the latencies of mechanical hard disks. With hard disks, CPUs had to wait after issuing an I/O request while the R/W (Read/Write) head was positioned over the right track, and the target disk sector rotated under the R/W head. During this time, the CPU could branch out to do other tasks or sit idling. As a result, more and more software was added to provide features such as deduplication, compression, snapshots, clones, replication, tiering, error detection and correction, etc. The CPUs in the storage controllers were able to do more and more work which increased the availability, security, and operational efficiency of enterprise storage systems. However, this meant less CPU to process I/O when it came to NVMe storage controllers. While all-flash storage vendors were able to operate with disk oriented Serial Attached SCSI (SAS) and SATA storage controllers without hitting these bottlenecks, the introduction of NVMe has exposed these bottlenecks which limit the ability to realize the full capability of the NVMe architecture.

New technologies like storage Class Memory (SCM) and containers will add to the problem because their increased performance will put even more pressure on the CPUs. The problem of NVMe exposing CPU chokepoints was a topic of several presentations at the last Flash Memory Summit in Santa Clara in August. Those vendors who rushed to deliver NVMe without redesigning their storage controllers are suffering the effect of these choke points on their NVMe performance.

Hitachi had the foresight to redesign the VSP storage controller for NVMe and NVMe-oF starting with the Storage Virtualization Operating System RF (Resilient Flash) where we began to off load some of the software into FPGAs. Now with VSP 5000, Hitachi has introduced a new high performance internal switch fabric which is implemented with PCIe. This patented Hitachi Accelerated Fabric allows the Hitachi Storage Virtualization Operating System RF to offload I/O traffic between blocks. It uses an architecture that provides immediate processing power without wait time or interruption to maximize I/O throughput. As a result, your applications suffer no latency increases since access to data is accelerated between nodes even when you scale your system out. Hitachi also redesigned the shared memory and data cache to streamline the movement of data through the controller for increased performance and resiliency. This not only improves the performance of NVMe flash but also the performance of lower cost SAS flash devices, future SCM devices and other data services, such as data reduction, automation and metro-clustering, which are available with the VSP 5000 series.  

The VSP 5000 is a major advancement for organizations that are looking to modernize their data center, realizing the full benefit of the latest technologies like NVMe to meet the demanding service-level requirements across a wide range of workloads and edge-to-core-to-multi-cloud deployment models. For more information see the VSP Series 5000 data sheet.

Congratulations to Josef Newgarden of Team Penski  who finished Number 1 in the 2019 Indy Car Standings

1 comment



04-25-2022 08:27

Great Write-up