Blogs

Hitachi Vantara VSP 5600: New World’s Fastest Flash Storage System at 33 Million IOPS!

By Sumit Keshri posted 04-11-2022 19:32

Hitachi’s all-new enterprise storage product, the Virtual Storage Platform (VSP) 5600 storage system, out-performed the previous peak of 21M IOPS (which was achieved with the VSP 5500 storage system) and yet again logged best-in-class performance! Our lab tests confirm its superiority with peak performance hitting a massive 33 million IOPS. This measurement was achieved using a 100% Random 8KB Read Cache Hit workload generated by the industry-standard benchmarking tool, Vdbench.

When this performance is compared to competitive offerings, Hitachi reigns more than 2x:

IBM Flash System 9500R = Up to 16M IOPS (4K Block size)
Dell EMC Power Max 8000 = Up to 15M IOPS
NetApp AFF A900 = Up to 14.4M IOPS

Test Result Details

When the VSP 5600-6N storage system is pushed to peak performance capabilities, the optimally configured and tuned system achieved 33M IOPS with an average response time of 0.69ms, which is 57% higher than the throughput achieved in the VSP 5500 storage system (21 Million IOPS at 2.66 milliseconds).

The VSP 5600-6N storage system achieved the following:

At 98µs response time, reached approximately 21M IOPS.
At 0.21ms response time, reached up to 28.6M IOPS.
In a 256KB Random Read Cache Hit workload, attained 312 GB/s throughput.

Using 2N and 4N, the VSP 5600 storage system scaled proportionally and achieved more than one third and two thirds of IOPS achieved in the VSP 5600-6N, respectively.

VSP 5600-6N Configuration Setup Details

It’s important to understand how the VSP 5600-6N storage system test bed was configured to achieve a 33M IOPS result because variation from this fully populated and optimally configured setup may result in lower performance:

Test Methodology

No capacity savings or encryption functionality was enabled.
Dynamic Provisioning (HDP) was used.
Factory default SOMs were used.
Each Parity Group (PG) was created using the default dispersed method (drives equally distributed across all trays and DKBs).
12 RAID-6 (6D+2P) PGs were created per two node module.
Eight 2.53TB LDEVs (pool volumes) were created per PG, using 100% of PG capacity (for 3.8TB SSDs).
Multiple HDP pools were created per two node module. However,this was not integral to reaching peak IOPS and was only a byproduct of the configuration created to test a variety of drive counts. (In the past, the HDP pool count has had no effect on performance). Default pool threshold settings were used.
In each two Node module, 768 x 162.5 GB DP-Vols were created (in units of 340717440 blocks each), to consume 50% pool capacity (total of 2304 DP-Vols). The number of DP-Vols was determined using a rule-of-thumb of eight DP-Vols per flash drive to facilitate concurrent I/O to flash drives.
MPU ownership of DP-Vols used the default round-robin setting.
DP-Vols were mapped evenly across FE ports = 12 DP-Vols per port. No multi-pathing was used (one path per DP-Vol) and straight I/O access (both front-end and back-end) was used for peak performance. Straight I/O access is critical for reaching champion performance levelsand is described in more detail in our Cross/Straight internal blog.
Fibre Channel HBA parameters were set on hosts. The queue depth settings prevent bottlenecks from occurring on the hostside for Random workloads. In these lab tests, with lpfc_lun_queue_depth=128, up to 128 I/O requests per DP-Vols can be “in flight” to the storage. So, with 12 DP-Vols per port, a maximum of (12*128 = 1536) I/O requests can be in progress from host to storage, which is well within the HBA port threshold (lpfc_hba_queue_depth=8192).
On Linux hosts, application performance tuning was set using the ulimitcommand to run a large number of Vdbench processes.
Vdbench prefill (using 256KB sequential writes) was completed. Prefilling thin-provisioned volumes prevents artificially highperformance caused by reading from the HDP special null area, or from writing to unused blocks (to avoid garbage collection overhead). Additionally, prefilling avoids the overhead of new page allocation.
Vdbench Random 8KB Read Cache Hit performance tests were conducted (see the parameter file in the next section).

Replicating 33M IOPS on a Similar Sized Setup

If you want to measure 33M IOPS on a VSP 5600-6N storage system yourself, keep these tips in mind:

Run a Random 8KB Read workload with 100% Cache Hits.
Use full cache on the storage system (6TB).
Check that the queue depth settings on the host side are not throttling I/O request to the storage system (check the number of application threads per DP-vol, the number of DP-vols per port, and the relevant HBA parameter settings).
Use a single path per DP-vol and confirm that both front-end and back-end straight I/O access is established.
Use enough hosts to drive the storage to maximum performance (for example, eight hosts that can only drive 11M IOPS won’t be able to drive the VSP 5600-6N storage system to 33M IOPS). In our test, we used 24 powerful hosts.
If you are using Vdbench, you can experiment with the JVM count as needed.

Benchmark tool used to produce peak IOPS

Vdbench is an open-source enterprise-grade benchmark tool that was designed to help engineers and customers generate disk I/O workloads for validating storage performance and storage data integrity. Vdbench was written in Java, runs on various platforms (Windows, Linux, and so on), and can be downloaded from the Oracle website. This I/O load simulation tool gives control over workload parameters like I/O rate, LUN or file size, transfer size, cache hit percentage, and so on.

Vdbench Parameter Files

In these tests, Vdbench was run in multi-host mode using JVMS=12. Vdbench runs as two or more Java Virtual Machines (JVMs). The master JVM starts the test, parses parameters, determines which workloads to run, and completes all the reporting. Workloads are run on each JVM (except for sequential workloads). The default number of JVMs is eight (per host). In lab tests, we experimented with this setting, as you may want to do because the maximum number of IOPS that a single JVM can handle is dependent on host CPU speed.

The Vdbench parameter files used to achieve 33M IOPS are as follows:

Prefill operation Vdbench parameter file

8KB Random Read Cache Hit Workload Vdbench Parameter file

The random read cache hit workload used the Vdbench default value for the hit area parameter (1MB), which means that the first 1MB of each volume is accessed for cache hits and the remaining space is accessed for cache misses. When Vdbench generates a cache hit, it generates an I/O to the hit area, which assumes that the data accessed is, or will be soon, residing in cache. Cache misses are targeted toward the miss area, which assumes that the miss area is large enough to ensure that most random accesses are cache misses.

#EnterpriseStorage
#VSP5000
#VSP5600
#HitachiVirtualStoragePlatformVSP
#FlashStorage

6 comments

99 views