VSP 360

 View Only

Bringing Hitachi VSP Block Storage into Splunk SIEM

By Karthik Raja Balasubramanian posted 2 days ago

  

Why We Built This

Enterprise storage does not live in isolation. Servers, networks, applications, and security systems are already monitored inside SIEM platforms, and Splunk is the most widely deployed among them. But storage? Storage has always been the missing piece. Teams managing Hitachi VSP arrays had rich, granular metrics available inside VSP 360, but that data never made it into the same operational dashboards where the rest of the infrastructure was visible.

We believed this gap was worth closing for every customer running VSP arrays in a Splunk-centric environment. Storage observability should not require a separate console. Capacity alerts should feed into the same workflows as every other infrastructure alert. Performance data should sit alongside application and server metrics so that teams can correlate, not guess.

So, we built it a Splunk Technology Add-on (TA) and a Splunk App that pull configuration, performance, and synthetic metrics from Hitachi VSP block arrays via the VSP 360 platform and stream them into Splunk securely, automatically, and at scale. This blog explains what we built, how it works, and what it means for customer operations teams.

What This Adds to Customer Infrastructure

When a customer runs Hitachi VSP One Block Storage arrays alongside a Splunk SIEM deployment, there is a natural question: why is the storage estate invisible inside Splunk while everything else is there? This add-on answers that question directly.

Here is what customers gain the moment the add-on is deployed:

Storage inside the SIEM: VSP One Block array metrics capacity, IOPS, response time, cache health, and CHA port load are indexed in Splunk alongside server, application, network, and security data. Storage is no longer a separate investigation.

Correlation without context-switching: When application response times spike, operators can check whether storage latency or cache pressure is contributing from the same Splunk dashboard, using the same time range, without logging into a separate console.

Fleet-wide capacity planning: Pool utilization trends across every enrolled array are visible in one place, queryable with Splunk Search Processing Language (SPL), and alertable on thresholds so that capacity issues surface before they become incidents.

Proactive performance monitoring: The add-on collects data at 60-second intervals. Cache write-pending rate, IOPS spikes, port saturation, and parity group bottlenecks are all visible in near real time.

Audit-ready, secure data pipeline: Every connection is HTTPS-only, SSL-verified, and authenticated with encrypted OAuth2 credentials, meeting the security and compliance requirements of enterprise environments.

The Foundation: ClearSight Advanced (CSAD) in VSP 360

Hitachi VSP 360 includes ClearSight Advanced, an analytics component that provides visibility into storage configuration, capacity, health, and performance data through resource-based views and API-driven access.

CSAD organizes its data around four concepts: resource types (categories of storage components such as raidPool, raidLdev, and raidPort), attributes (properties of each resource such as usageRate or totalIOPS), signatures (unique IDs for each resource instance, for example raidPool#70000‑5), and relations (parent–child links between resources, such as raidPool → raidStorage). The add-on is built around all four of these it knows what to ask for, how to identify what it receives, and how to connect related components together in Splunk.

The schema the add-on ships with today covers 27 configuration resource types with over 200 attributes, 11 performance resource types with 97 metrics, and 5 synthetic resource types with 48 CSAD-computed aggregate metrics. This is the full picture of a VSP array’s health, available in Splunk from the first collection cycle.

How the Add-on Works: A 5-Step Pipeline

Every collection runs whether for configuration, performance, or synthetic data follows the same five-step pipeline. This consistency is deliberate: it makes the add-on predictable, maintainable, and easy to extend as new VSP resource types or metrics become available.

The diagram illustrates the four-layer integration architecture: Hitachi VSP block storage arrays feed metrics into the VSP 360 CSAD database, which the custom Splunk add-on extracts via secure HTTPS (OAuth2) through a five-step pipeline — schema, MQL query construction, API call, transformation, and event emission — indexing data under three sourcetypes into Splunk, where two Dashboard Studio dashboards and a 78-pair relations lookup deliver fleet-wide and deep-dive storage observability.

Step 1: Embedded Schema (vsp360_schema.py)

The add-on ships with a Python dictionary that defines every resource type and every attribute it knows about. This is the single source of truth for what to ask CSAD. If an attribute is not in the schema, the add-on does not request it. This prevents API errors and makes collection fully deterministic.

Step 2: MQL Query Construction (vsp360_common.py → mql_for())

For each resource type, the add-on builds a Metric Query Language (MQL) string. Configuration attributes use the = prefix for scalar values. Performance and synthetic attributes use the @ prefix for timeseries arrays. If a resource type has more than 25 attributes, the query is automatically chunked to prevent CSAD API timeouts. Results from all chunks are merged per signature before any further processing.

Step 3: Secure CSAD API Call (VSP360Client)

The add-on authenticates against the VSP 360 Keycloak realm using an OAuth2 client credentials grant, then sends the MQL query via HTTPS POST to the CSAD endpoint. The token is refreshed at 80% of its lifetime with a 30-second minimum floor, ensuring that collections never fail mid-run due to an expired session. All credentials are stored in Splunk’s encrypted credential store and are never written to any configuration file on disk.

Step 4: Data Transformation (vsp360_common.py)

Three functions process every response record before it becomes a Splunk event. parse_signature() extracts the array serial number and component instance from the CSAD signature string. unwrap_scalar() strips the JSON envelope from configuration attributes and keeps the value and unit. expand_timeseries() iterates each performance sample and computes its exact epoch — so a metric sampled at 14:14:00 lands in Splunk as 14:14:00, not as the collection time. Port names are normalized from CL3^A to CL3:A to match Hitachi storage standards. Any attributes that CSAD does not return for a given array model are filled with null, so dashboards never break.

Step 5: Splunk Event Emission

Events are written to Splunk in a fan-out pattern under three distinct sourcetypes:

Sourcetype

Pattern

What It Contains

hitachi:vspblock:config

1 event per signature + attribute

Static configuration - pool names, capacities, volume sizes, port types

hitachi:vspblock:perf

1 event per signature + metric + sample timestamp

Live performance - IOPS, response time, transfer rate, utilization

hitachi:vspblock:synth

1 event per signature + metric + sample timestamp

CSAD-computed aggregates - e.g., LDEV IOPS rolled up to the array level

 

Separate checkpoint namespaces (configuration::, performance::, synthetic::) prevent any overlap between the three inputs, and a 60-second duplicate guard prevents double-writes if Splunk triggers the same input twice in quick succession.

Security: Hardened by Design, Not as an Afterthought

Adding a new data pipeline into a production SIEM environment carries real security obligations. The add-on was built with those obligations in mind from the first line of code.

        HTTPS enforced at startup: The client raises a ValueError if an HTTP URL is configured — before any network call is made. No credentials or data ever travel in plaintext.

        SSL certificate verification: Enforced on every call to both the OAuth2 token endpoint and the CSAD query endpoint. Self-signed certificates will fail unless the CA is explicitly trusted.

        Encrypted credential storage: The client ID and client secret are stored in Splunk’s encrypted credential store and are never written to any configuration file on disk.

        Environment isolation: The HTTP client ignores proxy settings and .netrc credential files present on the host server, ensuring that only explicitly configured connections are used.

        Smart token refresh: The OAuth2 token is refreshed at 80% of its lifetime with a 30-second minimum floor, preventing mid-collection expiry without unnecessary over-refreshing.

        NaN/Inf filtering: Corrupt or sentinel values in timeseries responses are silently dropped so that they never produce invalid events in Splunk.

        Duplicate guard: A 60-second checkpoint window prevents the same data from being indexed twice if a collection cycle overlaps.

The Splunk App: Two Dashboards Built for Storage Operations

The add-on handles the data pipeline. The Splunk App puts that data to work. It ships with two Splunk Dashboard Studio dashboards, each designed for a specific operational workflow. Both support three shared filters: Time Range, Storage Serial Number (with an “All Arrays” option), and VSP 360 Host.

Dashboard 1: Fleet Operations & Performance

This is the command centre - the screen a storage operations lead opens first thing to know whether anything needs attention across the entire fleet.

        Storage System KPIs: Eight single-value tiles showing model, total capacity (TiB), used (TiB), free (TiB), allocated (TiB), attached volume count, port count, and data reduction ratio - color-coded for quick visual scanning.

        Pool Capacity Analytics: Two side-by-side line charts tracking pool used vs. total capacity (TiB) over time, and pool utilization percentage over time.

        Storage Performance Trends: Area charts showing total IOPS and transfer rate (GiB/s) per array - useful for detecting fleet-wide surges and identifying arrays approaching their performance ceiling.

        Cache & Processor Utilization: Cache utilization per CLPR, cache write-pending rate per CLPR (an early-warning metric when drives cannot absorb writes fast enough), and MPB utilization per processor core.

        Top Port Performance: Top 10 busiest front-end ports by IOPS, response time (ms), utilization percentage, and transfer rate (MiB/s), calculated dynamically over the selected time window.

Dashboard 2: LDEV, Pool & PG Performance Analytics

This is the deep-dive dashboard for performance troubleshooting, workload characterization, and disk-level bottleneck identification.

        LDEV Performance: Top 10 automatically: Seven metric groups covering total/read/write IOPS; total/read/write response time (ms); total/read/write transfer rate (MiB/s); random IOPS; sequential IOPS; random transfer rate; and sequential transfer rate. The top 10 selection is automatic - no manual filtering is required.

        Pool Performance: Total/read/write IOPS, response time (ms), and transfer rate (MiB/s) at the pool level across all pools.

        Parity Group Performance: PG utilization as a shaded area chart (making hotspots immediately visible), total/read/write IOPS, transfer rate, and read/write response time - where disk-level bottlenecks become visible before they affect volumes.

Cross-Component Intelligence: The Relations Lookup

The add-on ships with a lookup table (vsp360_relations.csv) that maps 78 parent–child relationship pairs across all 27 resource types. It is used at Splunk search time, enabling SPL queries that join related storage components together. Without it, each resource type is an island. With it, the Splunk environment understands the topology of storage arrays.

Examples:

  • Show all volumes (LDEVs) in Pool 5 on array 40000 - uses the LDEV → Pool → Array chain.

        Which host groups are connected to port CL1-A? - uses the Host Group → Port relation.

        Which drives belong to parity group 1‑3? - uses the Drive → Parity Group relation.

Key Engineering Decisions

Design Choice

Why It Matters

Query chunking at 25 attributes

Prevents CSAD API timeouts on resource types with large attribute lists

Null-fill for missing attributes

Dashboards never break when a field is absent on a specific VSP model

Exact sample timestamps

Accurate Splunk time-range searches and charting - not collection time

Separate checkpoint namespaces

The three inputs never overwrite each other’s checkpoint state

props.conf DATETIME_CONFIG = CURRENT

Prevents Splunk from misinterpreting LDEV instance IDs (e.g., 00:04:39) as timestamps

 

What’s Next

        Splunk Alerting: Any dashboard panel - pool above 85%, cache write-pending above threshold, port saturation - can become a Splunk alert that feeds into existing incident management workflows.

        Cross-Domain Correlation: Correlate VSP storage events with application logs, server metrics, and network data already in Splunk.

        Multi-VSP 360 Fleet Expansion: Multiple VSP 360 hosts are supported through separate input stanzas. New arrays are picked up automatically.

        Extending to Other Hitachi Platforms: The same five-step pipeline architecture can be adapted for Hitachi VSP One Block File Storage, Hitachi VSP One Block Object Storage, and other platforms exposed through the CSAD API.

Getting Started

1.      Install the Splunk Add-on for Hitachi VSP Block Storage on your Splunk instance.

2.      Configure the VSP 360 connection - provide the hostname, OAuth2 client ID, and client secret. Credentials are encrypted automatically.

3.      Enable the three modular inputs: Configuration, Performance, and Synthetic.

4.      Install the Splunk App for Hitachi VSP Block Storage to activate both dashboards.

5.      Open the dashboards. VSP arrays will begin populating on the first collection cycle.

Closing

VSP 360 already captures everything happening on the storage estate. This add-on brings that data into Splunk - where operations teams, security teams, and capacity planners can put it to use alongside every other signal in the customer’s environment.

#VSP360 #VSPOneBlock #VSPOneBlockHighEnd #VSP5000Series #VSPESeries


#VSP360
#VSP360ClearSight
#Governance
#Compliance

0 comments
10 views

Permalink