Block Storage

View Only

Hitachi Vantara VSP 5600: World’s Lowest Latency Flash (NVMe) Storage system at 39㎲!

By Stephen Rice posted 05-10-2022 15:51

Hitachi’s all-new enterprise storage product, the Virtual Storage Platform (VSP) 5600 series storage system, proves that it is best-in-class! Additionally, the VSP 5600 NVME storage system broke the record for low latency, achieving 39µs. Along with setting a record for low latency, the hardware design has eight 9’s of availability, which is 99.999999%. In an FC/SCSI environment, the VSP 5600 SAS SSD storage system also achieved a leading rate of 33 Million IOPS.

When this latency is compared to competitive offerings, Hitachi reigns:

IBM FlashSystem 9500 = Under 50 µs latency
Dell EMC PowerMax= Under 100µs latency
NetApp AFF A900= Under 100µs latency
Pure FlashArray= //XL at 150µs latency

Internal lab tests confirm its superiority: This measurement was achieved using a 100% Random Read 8KB Cache Hit workload generated by the industry-standard benchmarking tool, Vdbench.

HOW DOES LOW LATENCY HELP GIVE MY BUSINESS A COMPETITVE EDGE?

Low latency equates to fast response time. Having the lowest latency or the fastest response time determines how quickly the host can request and receive data from the storage subsystem. A fast response time helps your business achieve the quickest possible response to your demanding customer needs. However, a slower response time causes customers to complain, leave your site, and go to a competitor’s site for their needs.

WHAT EXACTLY ARE THE TEST RESULT DETAILS?

When the VSP 5600-2N NVMe storage system was measured for best latency capabilities, the optimally configured and tuned system achieved an average response time of 39µs (39 microseconds(µs) = 0.039 milliseconds(ms)).

At 39µs response time, the VSP 5600-2N storage system achieved over 700K IOPS, as shown in the following figure:

Figure 1: 8K Random Read Cache Hit

HOW DOES THE VSP 5600-2N COMPARE TO THE VSP 5500-2N STORAGE SYSTEM?

In the record low latency tests, both the VSP 5600 and VSP 5500 storage systems were in an end-to-end NVMe protocol test setup, which is not a legacy SCSI configuration. Additionally, the VSP 5600-2N NVMe storage system broke the record for the best latency mark of 39µs, which is a 44% reduction in latency compared to the VSP 5500-2N NVMe storage system.

Figure 2: VSP 5600-2N NVMe storage system compared with the VSP 5500-2N NVMe storage system

Using the legacy SCSI protocol, the industry leading rate of 33 Million IOPS was achieved on a VSP 5600-6N SAS storage system, which is a 57% increase in performance over the VSP 5500 storage system.

VSP 5600-2N CONFIGURATION AND TEST METHODOLOGY DETAILS

It’s important to understand how the VSP 5600-2N NVMe storage system test bed was optimally configured to achieve a low latency of 39µs result because a change in the setup may result in different response times.

Factory default SOMs were used.
Each Parity Group (PG) was created using the default dispersed method (drives equally distributed across all trays and DKBs).
12 RAID-6 (6D+2P) PGs were used (NVME 7.6TB SSD).
No capacity savings or encryption functionality was enabled.
Dynamic Provisioning (HDP) was used.
One HDP pool was created per PG, using 12 PGs. Default pool threshold settings were used.
64 x 4,275GB (1024**3) DP-Vols were created from the pool.
MPU ownership of DP-Vols used the default round-robin setting.
DP-Vols were mapped evenly across FE ports = 1 DP-Vol per port. No multi-pathing was used (one path per DP-Vol) and straight I/O access (both front-end and back-end) was used for peak performance. Straight I/O access is critical for reaching champion performance levelsand is described in detail in our Cross/Straight internal blog.
Fibre Channel HBA parameters were set on hosts. The queue depth settings prevent bottlenecks from occurring on the hostside for Random workloads.
On Linux hosts, application performance tuning was set using the ulimitcommand to run a large number of Vdbench processes.
Vdbench prefill (using 256KB sequential writes) step was completed. Prefilling thin-provisioned volumes prevents artificially highperformance caused by reading from the HDP special null area, or from writing to unused blocks (to avoid garbage collection overhead). Additionally, prefilling avoids the overhead of new page allocation.
Vdbench 100% Random 8KB Read Cache Hit performance tests were conducted (see the parameter file in the next section).

BENCHMARK TOOL USED TO PRODUCE PEAK IOPS

Vdbench is an open-source enterprise-grade benchmark tool that was designed to help engineers and customers generate disk I/O workloads for validating storage performance and storage data integrity. Vdbench was written in Java, runs on various platforms (Windows, Linux, and so on), and can be downloaded from the Oracle website. This I/O load simulation tool gives control over workload parameters like I/O rate, LUN or file size, transfer size, cache hit percentage, and so on.

VDBENCH PARAMETER FILES

In these tests, Vdbench was run in multi-host mode using JVMS=12. Vdbench runs as two or more Java Virtual Machines (JVMs). The master JVM starts the test, parses parameters, determines which workloads to run, and completes all the reporting. Workloads are run on each JVM (except for sequential workloads). The default number of JVMs is eight (per host). In lab tests, we experimented with this setting, as you may want to do, because the maximum number of IOPS that a single JVM can handle is dependent on host CPU speed.

The Vdbench parameter files used to achieve 39µs are as follows:

Sequential Prefill Job

Performance Test: Random 8kb Read Cache Hit Workload

This random read cache hit workload used the Vdbench default value for the hitarea parameter (1MB), which means that the first 1MB of each volume is accessed for cache hits and the remaining space is accessed for cache misses. When Vdbench generates a cache hit, it generates an I/O to the hit area, which assumes that the data accessed is, or soon will be, residing in cache. Cache misses are targeted toward the miss area, which assumes that the miss area is large enough to ensure that most random accesses are cache misses.

KEY TAKE-AWAYS

If you want to achieve a response time of 39µs on a VSP 5600-2N storage system yourself, keep these tips in mind:

Run a Random 8KB Read workload with 100% Cache Hit.
Use full cache on the storage system (2TB).
Use a single Vdbench thread - increasing Vdbench threads might result in higher latency but will also result in higher IOPS.
Check that the queue depth settings on the host side are not throttling I/O request to the storage system (check the number of application threads per DP-Vol, the number of DP-Vols per port, and the relevant HBA parameter settings).
Use a single path per DP-Vol and confirm that both front-end and back-end straight I/O access is established.

#FlashStorage #HitachiVirtualStoragePlatformVSP

14 comments

157 views