Getting Control of Data Copies

By Hubert Yoshida posted 03-06-2018 00:00


Back in November 2014 I posted a blog on how “Controlling your explosion of copies may be your biggest opportunity to reduce costs”. I quoted a study by Laura DuBois of IDC which reported that 65% of external storage systems capacity is used to store non-primary data such as snapshots, clones, replicas, archives, and backup data. This was up from 60% just a year earlier. At this rate it was estimated that by 2016 the spend on storage for copy data would approach $50 billion and copy data capacities would exceed 315 million TB. I could not find a more recent study, but I would estimate that the percentage may have increased due to more online operations, ETL for analytics, DevOps, and the larger number of shorter lived applications which tend to leave dark data behind that never gets cleaned up. Copies serve a very useful purpose in an agile IT environment, just like the massive under water bulk of an iceberg provides the displacement that keeps the iceberg afloat. However, the copies need to be monitored and managed and automated to reduce costly waste and inefficiencies.


At that time in 2014, our answer for Copy Data management was a product called Hitachi Data Instance Manager which came from the acquisition of the Cofio Aimstor product. Most users at that time were using this product as a lower cost backup solution. A key feature was a workflow manager with settable policies for scheduling the operations it controlled. Since that time Cofio and Hitachi Engineers worked to provide the latest enterprise features into this product and renamed it Hitachi Data Instance Director or HDID (which sounds better than HDIM). HDID provides continuous data protection, backup and archiving with storage-based snapshots, clones, and remote replication in addition to application server hosts.

In October of last year with the announcement of the new Hitachi Vantara company, we announced Hitachi Data Instance Director v6which was re-architected with MongoDB as the underlying database. The more robust database enables HDID to scale to hundreds of thousands of nodes compared to previous versions which scaled to thousands of nodes. Now you can set up as many users as you want with access rights. Another improvement was an upgrade from a single login access control to granular role-based access controls to align user access capabilities to the business’ organizational structure and responsibilities.

Another major enhancement was a RESTful APIlayer which enables the delivery of recovery, DR and copy data management as a private cloud service. Rich Vining, our Senior World Wide Product Manager for Data Protection and Governance explains this in his recent blog post Expand Data Protection into Enterprise Copy Data Management:

“Hitachi Vantara defines copy data management as a method for creating, controlling and reducing the number of copies of data that an organization stores. It includes provisioning copies, or virtual copies, for several use cases, including backup, business continuity and disaster recovery (BC/DR), and data repurposing for other secondary functions such as DevOps, financial reporting, e-discovery and much more.”

Read Rich’s blog to see how HDID can solve the copy explosion that I described above by automating the provisioning of snapshots, clones, backups and other copy mechanisms, mounting virtual copies to virtual machines, automatically refreshing them and, more importantly, expiring them when they are no longer needed.

Think of HDID as a way to automate the copy data process and reduce the estimated $50 Billion spend on storage for copy data.

1 view