Seamless Upgrade of Google Distributed Cloud (software only) on Bare-Metal with Cassandra StatefulSet workloads Backed by Hitachi VSP Storage system

By Prasanta Dey posted 12-23-2025 03:02

Like

Introduction

Managing distributed cloud environments on bare-metal infrastructure can be challenging, especially when integrating enterprise storage for stateful workloads.

The setup consists of Google Distributed Cloud (software only) on Bare-Metal where Cassandra statefulset is running with persistent volume from Hitachi VSP storage systems. The volumes from VSP storage systems are provisioned by CSI Driver HSPC (v3.16.1).

In this blog, we walk through the process of deploying Google Distributed Cloud (software only) on Bare-Metal and creating a StatefulSet (running Cassandra) with volumes provisioned by CSI Driver HSPC (v3.16.1). Upgrade the GDC cluster from v1.31 to v1.32 and verify that the StatefulSet is still running and has no downtime during the upgrade.

Overall objective was to ensure stable, persistent storage for Cassandra throughout the upgrade process. The upgrade proved seamless, the HSPC plugin v3.161 delivered reliable storage and the StatefulSet remained healthy after the cluster upgrade, demonstrating the robustness of the solution.

Key Elements and Functions

Google Distributed Cloud (software only) for bare metal:

GDC (software only) is a cloud-centric container platform that provides you with a consistent platform to construct and manage modern hybrid and multi-cloud environments through a single pane of glass with Google Cloud console. GDC (software only) runs on-premises in a bare metal environment. GDC on bare metal is software that brings Google Kubernetes Engine (GKE) to on-premises data centers.

GDC unifies the management of infrastructure and applications across on-premises, edge, and in multiple public clouds with a Google Cloud-backed control plane for consistent operation at scale.

HSPC: A CSI plugin from Hitachi used to provision persistent volume from the Hitachi storage system to Red Hat OpenShift or Kubernetes cluster to preserve and maintain data after the container life cycle ends.

Hitachi Storage Plug-in for Containers (HSPC) provides connectivity between Kubernetes containers and Hitachi Virtual Storage Platform (VSP) storage systems.

Integration between GDC on-prem and VSP series storage using Container Storage Interface (CSI) means users can consume enterprise-class storage by abstracting the complexity of the underlying storage infrastructure.

Hitachi CSI plugin HSPC is GDC Ready storage qualified. Refer the following URL for validated versions.

https://docs.cloud.google.com/kubernetes-engine/enterprise/docs/resources/partner-storage#hitachivantara

Hitachi VSP: A VSP storage system was used for persistent volume in GDC deployed on on-premises.

Testbed Configuration

Resources used in this environment

Hardware:

Item	Descriptions
Hitachi Advanced Server HA820 G2	5x Bare Metal servers (1x Control Plane, 3x Worker nodes and 1x Admin Workstation).
Hitachi VSP 5600	Hitachi VSP 5000 Storage System serving as Block storage for persistent volume to the cluster. 2x 32G FC Ports were used.
Brocade and Cisco MDS Switch	Fibre Channel (FC) Switches providing SAN connectivity to the datacenter storage network.
Ethernet Switch	Cisco Network switch at the data center to provide Management network connectivity between resources.

Software:

Item	Descriptions
Google Distributed Cloud (Software Only)	v1.31 and v1.32
HSPC	Hitachi Storage Plug-in for Containers (v3.16.1) CSI driver.
RHEL	OS used in Control Plane, Worker node and Admin Workstation
bmctl	bmctl is a command line tool for Google Distributed Cloud that simplifies cluster creation and management.

Procedure

Step 1: Create Google Distributed Cloud (software only) for bare metal previously known as Anthos clusters on bare metal. Download bmctl tool and install the cluster using it. Before running the cluster create command below, prepare the setup following the instructions in the Installation guide.

[root@gdcv-admin baremetal]# bmctl create cluster --cluster=hv-anthos

[2025-12-09 04:43:30-0500] Runnning command: ./bmctl create cluster --cluster=hv-anthos

Please check the logs at bmctl-workspace/hv-anthos/log/create-cluster-20251209-044330/create-cluster.log

[2025-12-09 04:43:35-0500] Creating bootstrap cluster... OK

[2025-12-09 04:44:33-0500] Installing dependency components... ⠏

[2025-12-09 04:45:54-0500] Waiting for preflight check operator to show up... OK

[2025-12-09 04:46:24-0500] Waiting for preflight check job to finish... OK

[2025-12-09 04:49:44-0500] - Validation Category: machines and network

[2025-12-09 04:49:44-0500] - [PASSED] node-network

[2025-12-09 04:49:44-0500] - [PASSED] 172.23.57.159-gcp

[2025-12-09 04:49:44-0500] - [PASSED] 172.23.57.160

[2025-12-09 04:49:44-0500] - [PASSED] 172.23.57.164-gcp

[2025-12-09 04:49:44-0500] - [PASSED] 172.23.57.166

[2025-12-09 04:49:44-0500] - [PASSED] pod-cidr

[2025-12-09 04:49:44-0500] - [PASSED] 172.23.57.159

[2025-12-09 04:49:44-0500] - [PASSED] 172.23.57.160-gcp

[2025-12-09 04:49:44-0500] - [PASSED] 172.23.57.164

[2025-12-09 04:49:44-0500] - [PASSED] 172.23.57.166-gcp

[2025-12-09 04:49:44-0500] - [PASSED] gcp

[2025-12-09 04:49:44-0500] Flushing logs... OK

[2025-12-09 04:49:45-0500] Applying resources for new cluster

[2025-12-09 04:49:45-0500] Waiting for cluster kubeconfig to become ready OK

[2025-12-09 04:56:55-0500] Writing kubeconfig file

[2025-12-09 04:56:55-0500] kubeconfig of cluster being created is present at bmctl-workspace/hv-anthos/hv-anthos-kubeconfig

[2025-12-09 04:56:55-0500] Please restrict access to this file as it contains authentication credentials of your cluster.

[2025-12-09 04:56:55-0500] Waiting for cluster to become ready OK

[2025-12-09 05:05:05-0500] Please run

[2025-12-09 05:05:05-0500] kubectl --kubeconfig bmctl-workspace/hv-anthos/hv-anthos-kubeconfig get nodes

[2025-12-09 05:05:05-0500] to get cluster nodes status.

[2025-12-09 05:05:05-0500] Waiting for node pools to become ready OK

[2025-12-09 05:05:25-0500] Waiting for metrics to become ready in GCP OK

[2025-12-09 05:05:46-0500] Waiting for cluster API provider to install in the created admin cluster OK

[2025-12-09 05:05:56-0500] Moving admin cluster resources to the created admin cluster

[2025-12-09 05:05:59-0500] Flushing logs... OK

[2025-12-09 05:05:59-0500] Deleting bootstrap cluster... OK

[root@gdcv-admin baremetal]#

After successful preflight checks and resource application, the cluster was ready with nodes running v1.31.13-gke.300.

[root@gdcv-admin baremetal]# kubectl get nodes

NAME STATUS ROLES AGE VERSION

cp164 Ready control-plane 18m v1.31.13-gke.300

worker159 Ready worker 12m v1.31.13-gke.300

worker160 Ready worker 12m v1.31.13-gke.300

worker166 Ready worker 13m v1.31.13-gke.300

Step 2: Install Hitachi CSI Plugin (HSPC) and Configure Storage. Refer HSPC Quick Reference Guide to install the driver. After installation, verify that HSPC is installed

[root@gdcv-admin operator]# kubectl get hspc -n kube-system

NAME READY AGE

hspc true 3m25s

Create a secret: Create a secret for the storage credentials. Provide base64 encoded storage username and password.

[root@gdcv-admin sample]# cat secret-e990.yaml

apiVersion: v1

kind: Secret

metadata:

name: secret-e990

type: Opaque

data:

# base64 encoded storage url. E.g.: echo -n "http://172.16.1.1" | base64

url: aHR0cDovLzE3Mi4yMy42Ni4xNQ==

# base64 encoded storage username. E.g.: echo -n "User01" | base64

user: bWFpolRlbmFuY2U=

# base64 encoded storage password. E.g.: echo -n "Password01" | base64

password: cmFpZC1tYWludLLuYW5jZQ==

[root@gdcv-admin sample]# kubectl create -f secret-e990.yaml

secret/secret-e990 created

[root@gdcv-admin sample]# kubectl get secret

NAME TYPE DATA AGE

metrics-server-operator Opaque 0 36m

secret-e990 Opaque 3 7s

Create a storage class: To enable persistent storage for Cassandra, created a storage class. StorageClass file contains storage settings that are necessary for Storage Plug-in for Containers to work with your environment. Updated YAML file with StorageClass name, Storage serial number, HDP pool ID, Storage Port ID, connection type, filesystem type and secret name. The following sample provides information about the required parameters.

[root@gdcv-admin sample]# cat sc-e990.yaml

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

name: sc-e990

annotations:

kubernetes.io/description: Hitachi Storage Plug-in for Containers

provisioner: hspc.csi.hitachi.com

reclaimPolicy: Delete

volumeBindingMode: Immediate

allowVolumeExpansion: true

parameters:

serialNumber: "415855"

poolID: "3"

portID: CL5-A,CL6-A

connectionType: fc

csi.storage.k8s.io/fstype: ext4

csi.storage.k8s.io/node-publish-secret-name: "secret-e990"

csi.storage.k8s.io/node-publish-secret-namespace: "default"

csi.storage.k8s.io/provisioner-secret-name: "secret-e990"

csi.storage.k8s.io/provisioner-secret-namespace: "default"

csi.storage.k8s.io/controller-publish-secret-name: "secret-e990"

csi.storage.k8s.io/controller-publish-secret-namespace: "default"

csi.storage.k8s.io/node-stage-secret-name: "secret-e990"

csi.storage.k8s.io/node-stage-secret-namespace: "default"

csi.storage.k8s.io/controller-expand-secret-name: "secret-e990"

csi.storage.k8s.io/controller-expand-secret-namespace: "default"

[root@gdcv-admin sample]#

[root@gdcv-admin sample]# kubectl create -f sc-e990.yaml

storageclass.storage.k8s.io/sc-e990 created

[root@gdcv-admin sample]# kubectl get sc

NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE

anthos-system kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 37m

local-disks kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 37m

local-shared kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 37m

sc-e990 hspc.csi.hitachi.com Delete Immediate true 11s

Step 3: Deploy Cassandra StatefulSet

We deployed a Cassandra StatefulSet with three replicas using Hitachi storage.

Service: Create a Headless service for StatefulSet. Open the URL cassandra-service.yml for the manifest file.

[root@gdcv-admin sample]# kubectl apply -f cassandra-service.yaml

service/cassandra created

[root@gdcv-admin sample]# kubectl get svc

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

cassandra ClusterIP None <none> 9042/TCP 42m

kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 21h

StatefulSet: Open the link cassandra-statefulset.yml for manifest file of Cassandra statefulset. The volume section is as follows. Provide the storageClass name and size of the volume.

Run the following command to create the Cassandra StatefulSet

[root@gdcv-admin sample]# kubectl apply -f cassandra-statefulset.yaml

statefulset.apps/cassandra created

[root@gdcv-admin sample]#

Verify that the Cassandra nodes were healthy and replicated across the cluster.

[root@gdcv-admin sample]# kubectl get sts

NAME READY AGE

cassandra 3/3 51m

[root@gdcv-admin sample]#

[root@gdcv-admin sample]# kubectl get pod

NAME READY STATUS RESTARTS AGE

cassandra-0 1/1 Running 0 8m35s

cassandra-1 1/1 Running 0 10m

cassandra-2 1/1 Running 0 11m

[root@gdcv-admin sample]kubectl exec -it cassandra-0 -- nodetool status

Datacenter: DC1-K8Demo

======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

-- Address Load Tokens Owns (effective) Host ID Rack

UN 192.169.5.172 210.31 KiB 32 64.8% da9eb032-e726-419c-86d1-c95399b90d69 Rack1-K8Demo

UN 192.169.3.142 229.36 KiB 32 65.1% 70331a1e-ff02-4e54-96df-a07541da75c1 Rack1-K8Demo

UN 192.169.6.181 236.33 KiB 32 70.1% 5f599838-a957-46a6-b712-9d0bc7eeebda Rack1-K8Demo

[root@gdcv-admin sample]#

Verify that each pod mounted persistent volumes provisioned by the HSPC CSI driver.

[root@gdcv-admin sample]# kubectl get pvc

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE

cassandra-data-cassandra-0 Bound pvc-d377fe31-d07f-4f4b-a26b-cbc660a6cefe 12Gi RWO sc-e990 <unset> 42m

cassandra-data-cassandra-1 Bound pvc-6a9079f5-669d-4770-84b5-8719e6e09746 12Gi RWO sc-e990 <unset> 41m

cassandra-data-cassandra-2 Bound pvc-78bffc07-6914-48b6-986e-f4efaae6d07c 12Gi RWO sc-e990 <unset> 40m

[root@gdcv-admin sample]#

Step 4: Upgrade Cluster from v1.31 to v1.32

Download the latest bmctl binary and initiate the upgrade:

[root@gdcv-admin baremetal]# gcloud storage cp gs://anthos-baremetal-release/bmctl/1.32.700-gke.64/linux-amd64/bmctl .

Copying gs://anthos-baremetal-release/bmctl/1.32.700-gke.64/linux-amd64/bmctl to file://./bmctl

Completed files 1/1 | 129.3MiB/129.3MiB

Average throughput: 49.8MiB/s

[root@gdcv-admin baremetal]# chmod +x bmctl

[root@gdcv-admin baremetal]# bmctl version

bmctl version: 1.32.700-gke.64, git commit: 129fb6950142c813e047b5e1edc155dd3b7cc191, build date: 2025-11-22 12:27:40 PST , metadata image digest: sha256:24e91590349c5ad47df602196f16172e2dae1399826af479b326eceeda458900

[root@gdcv-admin baremetal]#

Update anthosBareMetalVersion in the cluster configuration file.

[root@gdcv-admin hv-anthos]# cat hv-anthos.yaml |grep anthosBareMetalVersion

anthosBareMetalVersion: 1.32.700-gke.64

[root@gdcv-admin hv-anthos]#

Cluster version before upgrading.

[root@gdcv-admin baremetal]# kubectl get nodes -o wide

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME

cp164 Ready control-plane 21h v1.31.13-gke.300 172.23.57.164 <none> Red Hat Enterprise Linux 9.4 (Plow) 5.14.0-427.13.1.el9_4.x86_64 containerd://1.7.28-gke.0

worker159 Ready worker 21h v1.31.13-gke.300 172.23.57.159 <none> Red Hat Enterprise Linux 8.10 (Ootpa) 4.18.0-553.el8_10.x86_64 containerd://1.7.28-gke.0

worker160 Ready worker 21h v1.31.13-gke.300 172.23.57.160 <none> Red Hat Enterprise Linux 9.4 (Plow) 5.14.0-427.13.1.el9_4.x86_64 containerd://1.7.28-gke.0

worker166 Ready worker 21h v1.31.13-gke.300 172.23.57.166 <none> Red Hat Enterprise Linux 9.2 (Plow) 5.14.0-284.11.1.el9_2.x86_64 containerd://1.7.28-gke.0

[root@gdcv-admin baremetal]#

Upgrade the cluster.

[root@gdcv-admin baremetal]# bmctl upgrade cluster -c hv-anthos --kubeconfig /root/baremetal/bmctl-workspace/hv-anthos/hv-anthos-kubeconfig

[2025-12-10 03:01:21-0500] Running command: ./bmctl upgrade cluster -c hv-anthos --kubeconfig /root/baremetal/bmctl-workspace/hv-anthos/hv-anthos-kubeconfig

Please check the logs at bmctl-workspace/hv-anthos/log/upgrade-cluster-20251210-030121/upgrade-cluster.log

[2025-12-10 03:01:21-0500] Before upgrade, please use `bmctl backup cluster` to create a backup.

[2025-12-10 03:01:27-0500] "spec.gkeOnPremAPI" isn't specified in the configuration file of cluster "hv-anthos". This cluster will enroll automatically to GKE onprem API for easier management with gcloud, UI and terraform after upgrade if GKE Onprem API is enabled in Google Cloud services. To unenroll, set "spec.gkeOnPremAPI.enabled" to "false" after upgrade.

[2025-12-10 03:01:28-0500] The current version of cluster is 1.31.1100-gke.40

[2025-12-10 03:01:28-0500] The version to be upgraded to is 1.32.700-gke.64

[2025-12-10 03:01:28-0500] Waiting for preflight check operator to show up... OK

[2025-12-10 03:01:38-0500] Waiting for preflight check job to finish... OK

[2025-12-10 03:03:18-0500] - Validation Category: machines and network

[2025-12-10 03:03:18-0500] - [PASSED] 172.23.57.164-gcp

[2025-12-10 03:03:18-0500] - [PASSED] 172.23.57.166

[2025-12-10 03:03:18-0500] - [PASSED] gcp

[2025-12-10 03:03:18-0500] - [PASSED] node-network

[2025-12-10 03:03:18-0500] - [PASSED] 172.23.57.159

[2025-12-10 03:03:18-0500] - [PASSED] 172.23.57.166-gcp

[2025-12-10 03:03:18-0500] - [PASSED] cluster-upgrade-check

[2025-12-10 03:03:18-0500] - [PASSED] pod-cidr

[2025-12-10 03:03:18-0500] - [PASSED] 172.23.57.159-gcp

[2025-12-10 03:03:18-0500] - [PASSED] 172.23.57.160

[2025-12-10 03:03:18-0500] - [PASSED] 172.23.57.160-gcp

[2025-12-10 03:03:18-0500] - [PASSED] 172.23.57.164

[2025-12-10 03:03:18-0500] Flushing logs... OK

[2025-12-10 03:03:18-0500] Bumping the old version 1.31.1100-gke.40 to new version 1.32.700-gke.64 in the cluster resource.

[2025-12-10 03:03:18-0500] Waiting for machines to upgrade... pending: 4/4 upgraded: 0/4⠋ I1210 03:03:40.164895 1487306 request.go:697] Waited for 1.180098689s due to client-side throttling, not priority and fairness, request: GET:https://172.23.56.151:443/api/v1/namespaces/cluster-hv-anthos/pods/bm-system-machine-upgrade-preflight-chec67f2d98450a7c108518g2p5/log?container=a[2025-12-10 03:03:18-0500] Waiting for machines to upgrade... OK

[2025-12-10 03:25:09-0500] Writing kubeconfig file: clusterName = hv-anthos, path = bmctl-workspace/hv-anthos/hv-anthos-kubeconfig

[root@gdcv-admin baremetal]#

The upgrade process validated machine and network checks, then rolled out the new version v1.32.9-gke.700 across all nodes.

[root@gdcv-admin baremetal]# kubectl get nodes -o wide

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME

cp164 Ready control-plane 22h v1.32.9-gke.700 172.23.57.164 <none> Red Hat Enterprise Linux 9.4 (Plow) 5.14.0-427.13.1.el9_4.x86_64 containerd://1.7.29-gke.1

worker159 Ready worker 22h v1.32.9-gke.700 172.23.57.159 <none> Red Hat Enterprise Linux 8.10 (Ootpa) 4.18.0-553.el8_10.x86_64 containerd://1.7.29-gke.1

worker160 Ready worker 22h v1.32.9-gke.700 172.23.57.160 <none> Red Hat Enterprise Linux 9.4 (Plow) 5.14.0-427.13.1.el9_4.x86_64 containerd://1.7.29-gke.1

worker166 Ready worker 22h v1.32.9-gke.700 172.23.57.166 <none> Red Hat Enterprise Linux 9.2 (Plow) 5.14.0-284.11.1.el9_2.x86_64 containerd://1.7.29-gke.1

[root@gdcv-admin baremetal]#

Post-Upgrade Validation

After the upgrade, Cassandra pods remained stable:

[root@gdcv-admin baremetal]# kubectl get sts

NAME READY AGE

cassandra 3/3 3h4m

[root@gdcv-admin baremetal]#

Verify that persistent volumes were intact, and the cluster continued to serve workloads without disruption.

[root@gdcv-admin hv-anthos]# kubectl get pod

NAME READY STATUS RESTARTS AGE

cassandra-0 1/1 Running 0 4m56s

cassandra-1 1/1 Running 0 86s

cassandra-2 1/1 Running 0 9m12s

[root@gdcv-admin hv-anthos]#

[root@gdcv-admin baremetal]# kubectl get pvc

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE

cassandra-data-cassandra-0 Bound pvc-d377fe31-d07f-4f4b-a26b-cbc660a6cefe 12Gi RWO sc-e990 <unset> 179m

cassandra-data-cassandra-1 Bound pvc-6a9079f5-669d-4770-84b5-8719e6e09746 12Gi RWO sc-e990 <unset> 178m

cassandra-data-cassandra-2 Bound pvc-78bffc07-6914-48b6-986e-f4efaae6d07c 12Gi RWO sc-e990 <unset> 177m

pvc-e990 Bound pvc-cfa6a148-60c5-4c05-9053-13f6a30297b9 1Gi RWO sc-e990 <unset> 22h

[root@gdcv-admin baremetal]#

Key Takeaways

Cluster Upgrade: Smooth transition from v1.31 to v1.32 using bmctl upgrade.

Hitachi CSI Integration: HSPC plugin v3.16.1 provided robust storage for stateful workloads.

Stateful Workload Resilience: StatefulSet remained healthy post-upgrade, validating the resilience of the setup.

Conclusion

This approach demonstrates how enterprise-grade storage and distributed databases can coexist in a modern hybrid cloud environment. By leveraging GDC (Software Only) on Bare Metal and Hitachi VSP storage, organizations can run stateful workloads with minimal downtime during upgrades.

Reference

· Upgrade GDC for bare metal cluster: https://docs.cloud.google.com/kubernetes-engine/distributed-cloud/bare-metal/docs/how-to/upgrade

· Quick Reference Guide on Hitachi Storage Plug-in for Containers Version 3.16.1: https://docs.hitachivantara.com/api/khub/documents/ue0_hsREFOjsI8_GLe065Q/content

#HybridCloudServices

#VSP5000Series

0 comments

18 views