Jeff Chen

VMware Multipathing Impact on VSP F/G

Blog Post created by Jeff Chen Employee on Jan 28, 2019

With vSphere 6.7U1 and vSphere 6.5 P03 or later, the default VMware's multipating rule is set to round-robin for Hitachi VSP storage family.

VMware Native Multipathing rules for Hitachi VSP now enabled by default in vSphere builds

 

By default, VMware's multipathing sends 1000 of IOs or 10MB of data down to a single path before switching to the next path. Other storage vendors might recommend to change this IO limit to 1 for better performance. So this time, we have performed some detail tests on this with Hitachi VSP G900, and this is what we found.

 

Summary

  • In VMware's general mixed environment with multiple ESXi hosts with multiple datastores, the difference in between 1 IO limit and 1000 IO limit is minimal. VMware's default configuration is still recommended for Hitachi VSP F/G series.
  • In certain specific configuration, such as 1 ESXi host with 1 datastore, changing IO limit to 1 increases IO performance on sequential workload with larger block size. This might be the case for the back up scenario. However, it is not common to use flash storage as backup media.
  • In certain circumstances, setting the IO limit value to 20 can provide a potential 3-5% reduction in latency as well as a 3-5% increase in IOPS. (For the testing purpose, all the data was served from storage cache. With real environment with backend disk access, this improvement can be even less significant)
  • For VSP F/G storage with flash disks, it is always recommended to provision multiple datastores per parity group (within a DP pool) to increase the storage IO performance as shown in test case 1 and 2.

 

Note: To eliminate the storage latency from accessing the data from the backend disks, small VMDKs with 1GB was used. This ensures that the storage can serve all the data from cache.

Test Environment

Hitachi Unified Compute Platform CI was used for this testing.

Hitachi Unified Compute Platform CI (UCP CI) Series | Hitachi Vantara

 

Main components used are listed below:

  • 2x UCP CI Compute nodes with 2x 16Gbps FC HBA ports
    • 2 paths per datastore
  • Hitachi VSP G900
    • 2x 16Gbps FC ports used
    • 1x RAID6 (6D+2P) Parity Group with 1.9TB SSDs
  • VMware vSphere 6.7
  • VMware HCIBench was used as workload generator
    • Test VMs: 4 vCPU, 4GB RAM, 16GB OS VMDK, OS - Photon OS 1.0
    • Test Data VMDKs per VM: 2x 1GB VMDKs

 

Test Case

  1. 1 ESXi host with 1 Datastore with 8 VMs running the workload below
  2. 1 ESXi host with 8 Datastore with 8 VMs running the workload below
  3. 2 ESXi host with 8 Datastore with 16 VMs (8VM/host) running the workload below

 

Workload Definition

WorkloadI/O Size
100% Random Read8K
100% Random Write8K
100% Sequential Read256K
100% Sequential Write256K
100% Random 50% Read / 50% Write8K

 

Detailed Test Results

1 ESXi Host with 1 Datastore

In this configuration, we observed more advantage on configuration with IO limit to 1. Especially on large block sequential workloads, there were about 2x of the improvement. However, configuration with 1 datastore from 1 DP pool might not be very common setup for flash storage.

 

1 ESXi Host with 8 Datastore

For Hitachi flash storage, it is recommended to create multiple LUNs/datastores from the same storage pool to increase the I/O performance. This configuration still focuses on 1 ESXi host, but with multiple datastores. Comparing to the 1 datastore on previous test case, the I/O performance increased 2x. The difference in between 1 IO limit and 1000 IO limit is much less, especially for 50%-read 50%-write random workload.

 

2 ESXi Host with 8 Datastore

This configuration uses multiple ESXi hosts sharing same 8 datastores. This is considered more common configuration in the VMware's mixed environment. In this configuration, the IO performance difference in different IO limit setting is much less significant.

 

We threw in an additional 20 IO-Limit test on the 50%-Read 50%-Write random workload test, and it achieved the best result by small percentage improvement.

 

In summary, we still recommend to use the default 1000 IO limit for general VMware use cases, but setting it to lower number like 20 or 1 IO limit might improve IO performance in certain workload and environment.

Outcomes