Flash Storage​

Virtual Storage Platform E590 and E790: High Performance With a Low Profile

By Charles Lofton posted 04-21-2021 00:09

  

Introduction

Hitachi’s new Virtual Storage Platform (VSP) E590 and E790 combine the fast NVMe protocol with advanced processing power and sophisticated cache architecture to deliver the highest I/O performance available in a 2U form factor. In this post we will examine the industry-leading performance of the 2U VSP E Series arrays. Before we compare the “hero” performance numbers measured on VSP E590 and VSP E790 to other platforms, let us begin with a few details about the data:
 
  • The competitive hero numbers are well-informed estimates based on publicly-shared data.
  • All hero numbers represent 100% random read cache hits (best-case performance).
  • The block size used for comparison is 8K unless stated otherwise.

IOPS

Figure 1 displays the peak IOPS per system achieved by the 2U VSP E790 series arrays compared to peak IOPs estimates from competitors’ midrange platforms. The E790 leads the way in performance density, logging 1.36X more IOPS than its closest competitor, while using 75% less rack space. Many platforms in this category require as many as eight controllers to achieve the result shown in Figure 1. In fairness, note that the Huawei Dorado 6000 V6 and the IBM FlashSystem 7200 can exceed E790 peak system IOPS in large scale-out configurations (twelve or more controllers for Huawei Dorado 6000 and eight controllers for IBM FlashSystem 7200). Figure 2 highlights the efficient VSP architecture which allows the E790 to achieve at least 1.5X more IOPS per controller than the competition. As Figures 1 and 2 make clear, VSP E790 is unsurpassed in IOPS per unit of rack space and IOPS per controller.
 

Figure 1. VSP E790 Peak IOPS Per System Compared to the Competition

Picture1.png


Figure 2. VSP E790 Peak IOPS Per Controller Compared to the Competition

Picture2.png

The VSP E590 also achieves the highest performance density of any array in its class. As shown in Figure 3, The E590 logged 1.23X more IOPS per system than its closest competitor, while occupying 75% less rack space. Again, it must be noted that one competitor, the Huawei Dorado 5000, could outperform the 2U, dual-controller E590 when configured with 10 or more controllers (up to 16 are possible). But as shown in Figure 4, the E590 delivers at least 4X more IOPS per controller than any competitor.
 

Figure 3. VSP E590 Peak IOPS Per System Compared to the Competition

Annotation 2021-04-20 113826.png
 

Figure 4. VSP E590 Peak IOPS Per Controller Compared to the Competition

Annotation 2021-04-20 114317.png


Latency

Turning to response time, the VSP E590 and E790 achieved latency as low as 66 microseconds (.066 milliseconds). In both cases an IBM system achieved a response time nearly as fast but note that our comparison with IBM is based on a published 4 KiB block size measurement for “hero” numbers, while Hitachi uses 8 KiB. The smaller block size used by IBM tends to increase IOPS and reduce latency. The VSP Ex90 performance achievements are even more impressive when viewed in this context.
 

Figure 5. VSP E590 and E790 Latency Compared to the Competition

Annotation 2021-04-20 115044.png
 

Hitachi Cache Architecture

The industry-leading performance density of the VSP E590 and E790 is enabled by several factors, including the fast NVMe protocol, compact and efficient hardware design, and advanced processors. But because the IOPS and response time measurements in Figures 1-5 come from cache-hit random read measurements, it is worth discussing Hitachi’s cache architecture in a bit more detail. Figure 6 presents a simplified block diagram of VSP Ex90 memory allocation. Note that each core has a dedicated area of cache (local memory) for its own use. Within each core’s local memory is a local cache directory containing pointers to the cache addresses of the most frequently referenced blocks. When a read command comes into the controller, it’s assigned to one of the cores, and that core checks its local directory to see whether the requested block is in cache. If the block is listed in the local cache directory, the data can be returned to the host in the shortest possible time. If the requested data are not found in the local cache directory, the assigned core will then check the full cache directory (also called the virtual cache directory) in the controller’s package memory. If a pointer to the requested data is found in the virtual cache directory, the command can be completed with only a few microseconds higher response time (to allow for the extra lookup). When the host is reading from a relatively narrow range of blocks, Hitachi’s cache directory system allows most reads to be completed after a simple lookup in a core’s local cache directory, enabling extremely low latency as well the highest IOPS available from a 2U array. Also note that the performance metrics we have discussed so far would be just as good with data reduction enabled. When reading data from cache, there is no decompression or re-duplication overhead.


Figure 6. VSP E590 and VSP E790 Memory Allocation

Picture5.png


Beyond The Hero Numbers

While we commonly observe cache hit rates of 70% read and 40% write in our install base, it is helpful to know how VSP Ex90 performs vs. the competition when the workload has lower cache hit rates and/or a mixture of reads and writes with block sizes larger than 8KiB. We recently conducted some apples-to-apples comparisons of VSP E790 with Dell PowerStore 7000T and HPE A670 based on performance testing done by Principled Technologies. The summary results in Table 1 below show that the Hitachi Vantara VSP E790 system outperformed the Dell EMC PowerStore 7000T and HPE Primera A670 midrange storage system in both throughput and latency. The VSP E790 was slightly behind the Dell EMC PowerStore 7000T and superior to A670 in data reduction savings. (Data reduction was enabled for all tests). Several points are worth noting about the tests in Table 1. First, the data reduction test used an 800 GB data set, while a 25 TB data set size was used for all performance tests. Because 25 TB is roughly 32X VSP E790 cache capacity and because workload was distributed evenly across the data set, cache hit rates were low for the performance tests. Hitachi cache architecture is a distinct advantage, but the E series array can outperform the competition on workloads with lower cache hit rates as well. Second, the sequential read test used multiple threads per LUN, so the result includes cache hits obtained by sequential pre-fetch and is higher than what we normally achieve in our standard sequential read test. (The same test method was used on all platforms). Finally, the OLTP-like latency was measured at a fixed I/O rate of 107,000 IOPS on all platforms. The VSP E790 achieved the fastest response time of 0.34 ms on the simulated OLTP test. See the GPSE White Paper for all the details of the VSP E790 competitive testing.


Table 1. Performance Comparison of VSP E790 With Dell/EMC PowerStore 7000T and HPE Primera A670

Workload descriptionVSP E790PowerStore 7000TPrimera A670
Data reduction   
50% 128KB and 50% 256KB Write5.7:1 (reduced 800 GB of data to 141 GB)7.1:1 (reduced 800 GB of data to 111 GB)2.0:1 (reduced 800 GB of data to 401 GB)
Rate: Input/Output per second (IOPS)   
8KB Random 100% Write (IOPS)259,206232,60275,160
32KB 70/30 Read/Write (IOPS)263,772222,865111,026
OLTP-like (IOPS)227,650182,030105,687
Throughput: Bandwidth (MB/s)
256KB Random Read (MiB/s)25,07023,23910,763
256KB Sequential Read (MiB/s)37,71723,2389,877
Latency (ms)
OLTP-like0.340.562.01


This concludes our brief review of the best-in-class performance available from the VSP E590 and VSP E790. For additional information, see the GPSE connect page.


 
 

#ThoughtLeadership
#FlashStorage
#Blog
0 comments
7 views

Permalink