Table of Contents (Quick Links)
Executive Summary
Test Environment Configuration
Test Design and Implementation
Test Procedures
Environment Prerequisite
Test Items
Test Results
Summary
Executive Summary
Hitachi Virtual Storage Platform (VSP) 5000 series storage systems provide high availability and represents the industry’s highest performing and most scalable storage solution. VSP 5000 series offers high performance, high availability, and reliability for enterprise-class data centers and features the industry's most comprehensive suite of local and remote data protection capabilities. In a high-availability environment, the primary SVP is the active unit, while the secondary SVP acts as the hot standby. If the primary SVP fails, the hot standby SVP takes over. This means that the dual-SVP configuration eliminates single points of failure with the SVP.
In this case, we used an OpenShift environment along with Hitachi Storage Plug-in for Containers (HSPC) to test the high availability configuration for the SVP.
The primary purpose of the tests are as follows:
- Check if the Dual SVP failover completes successfully (observe status change of Primary/Standby). Then try reinstating the SVP back to the original state after failover. Also, measure the time taken for the SVP failover (moving from Primary to Stand-by completely).
- Run OpenShift operations during the failover. This includes a mix of storage operations and container operations (both using HSPC).
Test Environment Configuration
A detailed component summary of the test environment is provided in Table 1.
Table 1: Testbed Information
Testbed Configuration
The following image shows a high-level overview of the test bed configuration:
The following lists the environment prerequisites:
- One VSP 5600-2N storage system used as the target storage system (Dual SVP configuration).
- A six-node Redhat OpenShift Cluster installed using the “BareMetal (x86-64) Assisted Installer” from the RedHat official site.
- The OpenShift Cluster consisted of three Worker nodes and three Controller/Master nodes.
- HSPC v3.10 using Operator Hub (installed after the cluster is deployed).
- After deploying HSPC v3.10, the following was configured:
- Persistence Volume Claim (PVC) underline Persistence Volume (PV) is created by default.
- StatefulSets app consisting of WordPress and MariaDB using the HELM tool was installed, and related PODs were created automatically.
- Initially, two PODs for MariaDB were created for testing.
Test Items
The following lists the test items targeted in this project:
- Test the failover of the SVP in a Dual SVP configuration.
- During an SVP failover, the ‘Resize existing PVC (ONLINE- Storage Operation using HSPC)’ operation in OpenShift was triggered. We observed the behavior of the operation as recorded in the OpenShift UI.
- During an SVP failover, the ‘Scale the StatefulSet Replica from 2 to 3 (DB Pods)’ operation in OpenShift was initiated. We observed the behavior of the operation as recorded in the OpenShift UI.
- During an SVP failover, the ‘Scale down the underlying pods for StatefulSets app from 3 to 2’ operation in OpenShift was triggered. We observed the behavior of the operation as recorded in the OpenShift UI.
- Test items 2, 3, and 4 were tested serially, and multiple ‘Switch SVP’ operations were triggered as needed.
Test Results
Checking the Failover of SVP in Dual SVP Configuration
In the lab, the ‘SWITCH SVP’ operation was triggered, and it was verified that the SVP switch works successfully.
Observations are as follows:
- ‘Switch SVP’ takes about 25 minutes to complete, with the maximum time spent on copying the configuration. The SVP is available for most of the time.
- At the end of this process, there are approximately three minutes of actual switchover when the SVP stops responding (user connections to primary SVP using RDP timeout because of restart). These three minutes are the critical failover period, during which all HSPC and OpenShift testing occurs.
- Following the switchover (failover), the SVP continued to operate with the same primary IP address.
- Other than the change in desktops, identifying the primary and secondary SVP OS is very difficult.
- Initiating another round of Switch SVP operation reinstates the SVPs to the original state.
The following are some screenshots of the full procedure:
Figure 1: Switch SVP initiate operation.
Resizing the existing PVC (ONLINE- Storage Operation using HSPC)
The ‘Resize existing PVC’ operation was performed during the SVP failover period (approximately three minutes of downtime). The operation was triggered a minute after the actual switching started. The operation failed and awaited the availability of the storage. After the storage was available, it passed through, as seen in the OpenShift UI.
The screenshots and HSPC logs are as follows:
Figure 25: The HSPC logs show that the scale down of POD operation completes when the storage is available.
Summary
- ‘Switch SVP’ in a dual SVP configuration was successful and took about three minutes for the actual switch/failover. During this period, the SVP was unavailable (downtime).
- When triggered during SVP failover (downtime), ‘Resize PVC’ fails. From OpenShift UI, it was observed that it keeps retrying until the Switch SVP completes. Later, it runs successfully.
- When triggered during SVP failover (downtime), ‘Scaling up the Replica’ also fails. From OpenShift UI, it was observed that it keeps retrying until the Switch SVP completes. Later, it runs successfully.
- When triggered during SVP failover (downtime), ‘Scaling down the underlying pods for Stateful app’ passes. However, the Persistent Volume remains in the ‘release’ state until the SVP is available.
The tests in lab conditions were successfully completed.
#DataProtection #FlashStorage