Do you have mission critical apps running on Red Hat OpenShift (OCP) in your datacenter?
Is application business continuity or zero down time is one of your top priorities?
Do you need to move stateful applications or virtual machines (VMs) across multiple datacenters?
If your answer is YES to any of the questions above, this multi-site OpenShift stretch cluster solution is for you.
Hitachi Vantara, in collaboration with Red Hat, has innovated and validated a modern high availability (HA) architecture using a stretched OpenShift cluster in conjunction with Hitachi VSP One Block. It combines our stretched storage solution leveraging both Global-active device (GAD) capabilities and Hitachi Storage CSI - Hitachi Storage Plugin for Containers (HSPC) and associated operators which enables automation of both standard and stretched persistent volumes.
The high-level diagram of this solution is shown below. These cities are just an example.
Key features of global-active device include:
- Synchronous Replication: Ensures real-time data mirroring between primary and secondary storage systems, maintaining data consistency and integrity.
- High Availability: Provides continuous server I/O operations even during failures, ensuring that applications remain operational.
- Disaster Recovery: Facilitates rapid recovery from unexpected failures by enabling seamless failover and failback without impacting storage.
- Load Balancing: Allows for the migration of virtual storage machines without impacting storage, optimizing resource utilization.
- Active-Active Design: Enables production workloads on two systems simultaneously, ensuring full data consistency and protection.
- Zero Recovery Time Objectives (RTO): Offers zero downtime and no data loss for applications that require continuous operations.
- Global Storage Virtualization: Allows read and write copies of the same data across two systems or geographic locations.
- Simplified Operations: Automates high availability and simplifies distributed system design. Fault-Tolerant Infrastructure: Provides failover clustering and server load balancing without impacting storage.
The diagram below shows more detailed components and architecture.
In this configuration, each VM disk is a global-active device pair, automatically provisioned and exposed to the OpenShift cluster as a stretched Persistent Volume (PV) and to the VM pod(s) with a stretched persistent volume claim (PVC) using the HSPC CSI plugin enabled StorageClass. HSPC CSI operator can be easily installed with OpenShift Operator Hub.
These are the distance/network requirements for each site.
- GAD requires 10 ms or less of round-trip time (RTT) in between 2 VSP One storages.
- OCP control plane requires 500 ms or less of round-trip time (RTT) in between VM in public cloud and control plane nodes in data centers.
The validated main resiliency test cases are listed:
- Zero downtime in the event of a storage system failure or a partial network failure, provided at least one path to the global-active device pair remains active.
- Minimal downtime (a few minutes) in cases of host failure, total network failure, or site failure, as VMs need to restart on a new host.
For more detailed validations, see the reference architecture paper below:
https://www.hitachivantara.com/content/dam/hvac/pdfs/architecture-guide/deploy-openshift-stretched-clusters-with-storage-one-platform.pdf
Summary
Stretched Persistent Volume (PV) dynamically provisioned by Hitachi CSI provides a volume that is enabled with synchronous bi-directional gloabal-active device replication (also referred to as active-active storage) between VSP storage systems across multiple sites within a single Kubernetes or Red Hat OpenShift cluster spanning three sites.
Benefits:
- High resiliency and business continuity for mission critical applications:
- Zero downtime in the event of a storage system failure or a partial network failure
- Minimal downtime (a few minutes) in cases of host failure, total network failure, or site failure
- Application and data mobility and load balancing:
- VM live migration in between 2 data centers
- Move stateful applications easily in between 2 data centers
Make sure to see the following reference architecture paper below for more information:
https://www.hitachivantara.com/content/dam/hvac/pdfs/architecture-guide/deploy-openshift-stretched-clusters-with-storage-one-platform.pdf
Key Elements and Functions
Hitachi Virtual Storage Platform One Block (VSP One Block)
The Hitachi Virtual Storage Platform One Block series simplifies system setup and management through Hitachi Clear Sight and VSP One Block Administrator. Dynamic Drive Protection reduces RAID complexity, and always-on compression and deduplication enhance simplicity.
Dynamic Carbon Reduction optimizes energy usage by switching CPUs to ECO mode during low activity. Adaptive Data Reduction (ADR) is always on, enhancing efficiency and reducing the overall CO2 footprint.
Thin Image Advanced (TIA) integrates with major snapshot ecosystems, prioritizing security by defending against threats and ensuring data confidentiality. CyberArk Privileged Access Manager plugins enhance block storage system security by prioritizing data confidentiality, ensuring compliance, and actively defending against security threats.
Global-Active Device (GAD)
Global-active device is Hitachi’s data mirroring technology that delivers high availability and disaster recovery for storage environments by synchronously replicating data between geographically separated storage systems. This active-active replication maintains real-time data consistency and integrity, allowing for seamless failover and failback operations which are vital for continuous data availability and rapid recovery. The technology configures virtual storage machines on both primary and secondary systems with identical virtual LDEV numbers, enabling the host to view paired volumes as one. A quorum disk—located on an external storage system, an iSCSI server, or even in a cloud environment—acts as a heartbeat to monitor and coordinate the pair, ensuring that in the event of a communication failure, host I/O is correctly redirected.