With vSphere 6.7U1 and vSphere 6.5 P03 or later, the default VMware's multipating rule is set to round-robin for Hitachi VSP storage family.
By default, VMware's multipathing sends 1000 of IOs or 10MB of data down to a single path before switching to the next path. Other storage vendors might recommend to change this IO limit to 1 for better performance. So this time, we have performed some detail tests on this with Hitachi VSP G900, and this is what we found.
- In VMware's general mixed environment with multiple ESXi hosts with multiple datastores, the difference in between 1 IO limit and 1000 IO limit is minimal. VMware's default configuration is still recommended for Hitachi VSP F/G series.
- In certain specific configuration, such as 1 ESXi host with 1 datastore, changing IO limit to 1 increases IO performance on sequential workload with larger block size. This might be the case for the back up scenario. However, it is not common to use flash storage as backup media.
- In certain circumstances, setting the IO limit value to 20 can provide a potential 3-5% reduction in latency as well as a 3-5% increase in IOPS. (For the testing purpose, all the data was served from storage cache. With real environment with backend disk access, this improvement can be even less significant)
- For VSP F/G storage with flash disks, it is always recommended to provision multiple datastores per parity group (within a DP pool) to increase the storage IO performance as shown in test case 1 and 2.
Note: To eliminate the storage latency from accessing the data from the backend disks, small VMDKs with 1GB was used. This ensures that the storage can serve all the data from cache.
Hitachi Unified Compute Platform CI was used for this testing.
Main components used are listed below:
- 2x UCP CI Compute nodes with 2x 16Gbps FC HBA ports
- 2 paths per datastore
- Hitachi VSP G900
- 2x 16Gbps FC ports used
- 1x RAID6 (6D+2P) Parity Group with 1.9TB SSDs
- VMware vSphere 6.7
- VMware HCIBench was used as workload generator
- Test VMs: 4 vCPU, 4GB RAM, 16GB OS VMDK, OS - Photon OS 1.0
- Test Data VMDKs per VM: 2x 1GB VMDKs
- 1 ESXi host with 1 Datastore with 8 VMs running the workload below
- 1 ESXi host with 8 Datastore with 8 VMs running the workload below
- 2 ESXi host with 8 Datastore with 16 VMs (8VM/host) running the workload below
|100% Random Read||8K|
|100% Random Write||8K|
|100% Sequential Read||256K|
|100% Sequential Write||256K|
|100% Random 50% Read / 50% Write||8K|
Detailed Test Results
1 ESXi Host with 1 Datastore
In this configuration, we observed more advantage on configuration with IO limit to 1. Especially on large block sequential workloads, there were about 2x of the improvement. However, configuration with 1 datastore from 1 DP pool might not be very common setup for flash storage.
1 ESXi Host with 8 Datastore
For Hitachi flash storage, it is recommended to create multiple LUNs/datastores from the same storage pool to increase the I/O performance. This configuration still focuses on 1 ESXi host, but with multiple datastores. Comparing to the 1 datastore on previous test case, the I/O performance increased 2x. The difference in between 1 IO limit and 1000 IO limit is much less, especially for 50%-read 50%-write random workload.
2 ESXi Host with 8 Datastore
This configuration uses multiple ESXi hosts sharing same 8 datastores. This is considered more common configuration in the VMware's mixed environment. In this configuration, the IO performance difference in different IO limit setting is much less significant.
We threw in an additional 20 IO-Limit test on the 50%-Read 50%-Write random workload test, and it achieved the best result by small percentage improvement.
In summary, we still recommend to use the default 1000 IO limit for general VMware use cases, but setting it to lower number like 20 or 1 IO limit might improve IO performance in certain workload and environment.