Data Protection

 View Only

Advanced Snapshots to Immutability: IRIS database Protection on Hitachi VSP via HDPS

By Dipta Kundu posted 2 days ago

  

This blog provides a deep, end-to-end walkthrough of how InterSystems IRIS database is protected, from intelligent content discovery to atomic application quiescence using the ExternalFreeze mechanism using HDPS. It further explores how Hitachi TIA SafeSnap snapshots are created and committed at the storage layer, and how these snapshots enable both rapid out-of-place recovery and long-term immutable tiering to VSP One Object storage.

Section A: Introduction - Why Hardware Snapshots for IRIS? 

The integration of InterSystems IRIS with Hitachi Virtual Storage Platform (VSP), orchestrated through HDPS IntelliSnap, represents a highly advanced data protection architecture. For mission-critical environments, especially in healthcare, where Epic Systems relies on IRIS, rapid, application-consistent snapshots are essential for operational resilience. Traditional backups (agent-based, file-level, or database exports) introduce I/O overhead and long backup windows, making them unsuitable for always-on environments.

Hardware snapshots eliminate this constraint. With Hitachi Thin Image Advanced (TIA), combined with IntelliSnap orchestration and IRIS ExternalFreeze / ExternalThaw, the production impact is reduced to seconds while ensuring point-in-time consistency

Section B: Architecture Overview: Components & Data Flow:

The environment under analysis consists of a source host (DBIRIS), Hitachi VSP One Block High End, a CommServe server (siscommserve-2), a dedicated MediaAgent (EPIC2), and a Hitachi VSP One Object target array. 

Component

Details

Database Platform

InterSystems IRIS (EPIC instance)

Version

IRIS 2025.2 (Build 227U)

Operating System

Red Hat Enterprise Linux 9 (x86-64)

Backup Software

Commvault v11.42.60

Storage Platform

Hitachi VSP B85 & VSP1O

Freeze/Thaw Scripts

Default IRIS scripts (Freeze / Thaw). HDPS invokes it automatically, by user defined subclient with moniker: %IRISDB%

Pre-configuration for a snap backup:

User-defined subclient configuration:


A screenshot of a computer

AI-generated content may be incorrect.

  •  IRIS database volumes on the source host are mapped via HBAs to 10 multipath devices (mpathdw–mpathef), each corresponding to an LDEV on the VSP One BHE. Access mode communicates and issues raidcom commands to the VSP One BHE via HORCM instance #10
  • All IRIS components, database files, the Write Image Journal (WIJ), and the journal directory, co-reside on the single LVM volume group striped across all 10 LDEVs. This design means a single consistency group snapshot atomically captures the entire IRIS data estate.

Section C: Anatomy of an application consistent IRIS snapshot

A forensic examination of log artifacts of four distinct log sources (messages.log, CVFSSnap.log, CVMA.log, and job-11541-log), is used to decode the technical intricacies involved in orchestrating TIA snapshots, the specific database quiescence mechanisms employed by IRIS, and the granular workflows underpinning application-consistent data protection at scale.

  • Phase 1: Job Initiation, Content Discovery & Preparation

IRIS Content Discovery via irissession: The CommServe JobManager on siscommserve-2 receives an IMMEDIATE SNAPBACKUP REQUEST for subclient 2026IRIS (AppType 29 = Linux File System, BkpLevel Full). After license validation, CVFSSnap.exe is launched on DBIRIS. The agent immediately identifies SnapEngine type 34 (Hitachi/TIA).

What makes this workflow particularly elegant is the IRIS-aware content discovery phase. Rather than relying on static path configuration, CVFSSnap uses the special %IRISDB% wildcard to dynamically enumerate all IRIS components at runtime

SVOL Allocation on the Hitachi VSP: CVFSSnap sends prepareVolumeSnaps to the local CVMA. CVMA authenticates against the CommServe SnapManager for array lookup validation, then proceeds with SVOL provisioning. The HORCM instance #101 is confirmed running, and all 10 PVOLs (LDEVs 20–29) are correctly identified as DRS (Data Reduction Shared) volumes.

The capacity figure 83,886,080 blocks equates to exactly 40 GiB per SVOL (83,886,080 × 512 bytes). Ten SVOLs totaling 400 GiB are provisioned in approximately 70 seconds (23:51:07 --> 23:52:02 including labeling and port mapping). This entire preparation phase completes before IRIS is touched in any way.

NOTE: Safe Snap Retention Confirmed: Copy 129 has Safe Snap protection configured for 24 hours on VSP model RH20ETP. The snapshot pair cannot be deleted within this window, providing a recovery safety net independent of HDPS snap management, a critical secondary safeguard.

 

Phase 2: Application Quiescence: The ExternalFreeze Mechanism

This is the most significant phase of the entire workflow. After SVOLs are fully provisioned and the array is primed, CVFSSnap initiates the IRIS quiescence sequence. This is an application-aware freeze, not a generic OS-level filesystem freeze.

Pre-Freeze Write Daemon State Check: Before issuing the freeze, CVFSSnap defensively queries the IRIS Write Daemon (WD) state. Return code 3 = WD running normally (not suspended). This guard prevents a double-freeze scenario that could corrupt the database state machine: 

 ExternalFreeze() — Three Simultaneous Atomic Actions

CVFSSnap pipes a multi-line ObjectScript command into an irissession process. The description ID 1773813046 is the Unix epoch backup reference timestamp; it becomes the snapshot's canonical identity anchor for future recovery operations. IRIS performs three actions atomically:

  1. Journal Roll: The currently active journal file is closed and a new one opened (20260316.002). This creates a clean temporal boundary, the backup reference journal is precisely known, enabling exact point-in-time recovery with no ambiguity about which journal transactions need replaying.
  2. Write Daemon Suspension: The IRIS Write Daemon is halted. This stops all dirty page flushes from the in-memory global buffer cache to the on-disk database files. The on-disk state is guaranteed consistent, every already-flushed transaction is stable, and no partial page writes occur during the snapshot window.
  3.  WIJ Integrity: With the Write Daemon suspended, the WIJ contains zero pending (in-flight) write intentions. The snapshot captures the WIJ in a clean state, confirmed empirically by the 0 pending blocks observation at restore startup time.

NOTE: Freeze Window Impact: During the ExternalFreeze window, IRIS continues to accept incoming transactions into memory (the global buffer cache), but no dirty pages are written to disk. Applications experience no downtime, only a micro-latency increase from the brief WD suspension. Observed freeze windows across all analyzed jobs: 3 - 4 seconds.

 

Phase 3: Snapshot (TIA Safesnap) creation on the VSP Array

With IRIS frozen, CVFSSnap immediately calls createVolumeSnaps. CVMA issues the raidcom snapshot creation command against the pre-staged device group. This is the critical atomic moment, all 10 PVOL à SVOL pairs are snapped simultaneously by the array's consistency group mechanism.

 Snapshots as seen via raidcom:

 Snapshots as seen via CommServe UI:

NOTE: Retention is on and is shown as “Protected Until”

Phase 4: ExternalThaw & Post-Snapshot Metadata Commit

The moment createVolumeSnaps returns success, CVFSSnap performs a post-snap Write Daemon state verification (rc=5 = suspended, as expected) and immediately issues ExternalThaw(). This is not a fire-and-forget call, CVFSSnap explicitly confirms the WD was still suspended before thawing, preventing edge cases where an external process might have already resumed the database.

On thaw, the IRIS Write Daemon restarts and immediately begins flushing queued dirty pages. Journaling continues in file 20260316.002 (opened at freeze time). The closed journal 20260316.001 is compressed by 73% (262144 à 69632 bytes). 

Section D: Out-of place restores

Snapshots captured using IntelliSnap can be restored to an alternate location (out-of-place). The restored IRIS database instance comes up cleanly without any recovery errors. The log snippet below shows the restore process in progress:

View from HDPS UI:

The messages.log on the destination host shows:

Starting WIJ recovery for '/irisdbvol/mgr/IRIS.WIJ'. 0 blocks pending in this WIJ. Exiting with status 3 (Success). Confirming a perfectly clean, application-consistent recovery. Post restoration Database Integrity check shows, Error Count:0

 

Section E: Backup to VSP One Object immutable tier:

After the snapshot-based backup of the IRIS database, an additional backup copy is tiered to VSP One Object (VSP1O). By enabling retention mode on the target bucket, this copy becomes immutable, providing an extra layer of protection against accidental deletion or ransomware.

 

Section F: Key Takeaways:

  • The "All-or-Nothing" HDPS subclient configuration requirement: While Hitachi Vantara storage is incredibly flexible, the HDPS Snap Engine enforces a strict data path requirement.For IntelliSnap to function, every single volume included in the subclient (including paths referenced by %IRISDB%) must reside on Hitachi LDEVs. If any component, be it a log directory, a small configuration file, or a database segment, resides on local/internal disks or non-HDS storage, the snapshot preparation will fail immediately. NOTE: This is a HDPS orchestration requirement, not a hardware limitation. The SnapEngine validates array identity for every volume during the prepareVolumeSnaps phase. 
  • Safe Snap 24-Hour Retention Active: Copy 129 in the log indicates that Safe Snap protection is configured for 24 hours on VSP model RH20ETP. The snapshot pair cannot be deleted within this window, providing a recovery safety net independent of CommVault snap management.
  • DRS Volume Handling: All 10 PVOLs correctly identified as DRS (Data Reduction Shared) volumes, and the appropriate SVOL creation path is taken. The IS_570_DRS_ naming convention in the snapshot group confirms DRS-aware processing end-to-end.
  • If a consistency group is enabled in the Snap configuration, HDPS will not show snapshots in the Disk View. This is a known issue in version 11.42.60 and is expected to be fixed in a future release. However, the snapshots are successfully created and can still be seen in Volume View.

Section G: Conclusion

The integration of InterSystems IRIS with HDPS IntelliSnap and Hitachi VSP Thin Image Advanced represents a resilient, production-grade data protection architecture for mission-critical healthcare workloads. The forensic analysis of Job 11541 demonstrates that when all components are correctly configured, the system delivers atomically consistent, application-aware snapshots with sub-five-second production impact windows and a comprehensive metadata trail that supports reliable, auditable restore operations.

Hitachi VSP provides the instantaneous consistency group snapshots of DRS volumes with Safe Snap retention. HDPS SnapEngine wraps that capability with IRIS-aware quiescence (via the ExternalFreeze API), correct DRS detection, and full catalog integration. The combination eliminates the traditional trade-off between backup consistency and production impact.


#DataProtection
2 comments
10 views

Permalink

Comments

2 days ago

Very impressive content! The detailed explanation makes it very informative.

2 days ago

Great blog! A clear and comprehensive explanation of protecting InterSystems IRIS using ExternalFreeze, HDPS, and TIA SafeSnap—especially the insights into atomic quiescence and immutable object storage with VSP One Object.