We all create and store copies of important data files for recovery against a range of threats, as I summarized in my last blog. But we all create additional copies for other purposes, and if they are created ad hoc, or outside of a backup or copy data management system, you probably lose visibility and control into them as soon as they are delivered.
Copy data is a real problem for many organizations, even if they don’t know it (yet). Businesses often keep 20 copies or more of their data to support functions such as backup and disaster recovery, test and development, sales and marketing campaigns, reporting, e-discovery, analytics, and more. Phil Goodwin, an analyst at IDC, said in our joint webcast that the average number across all organizations is about 13 copies of every data asset. The problem is that no one knows how many copies there are, who has access to them, how long they are retained, or what they are all used for.
Addressing the copy data problem should be a component of digital transformation or data center modernization initiatives as part of the goals to optimize costs and enhance data governance.
Uncontrolled copy data becomes an immediate problem with the implementation of new regulations, such as the European Union’s General Data Protection Regulation (GDPR) and the recently passed California Consumer Privacy Act (CCPA). In these cases, it will be impossible for an organization to comply with a data subject’s request to have their personal information deleted if you can’t find all the instances of that data.
Your employees will always find ways to get the data they need to perform their jobs, but this can and does lead to bad behaviors and a lack of control. This is especially true for copy data. An application developer needs a copy of your production database? No problem, right? But should she have access to critical, and perhaps sensitive data? Can she create additional copies for her test and quality assurance (QA) colleagues? What happens to this copy when you give her a fresh copy next week?
The right approach is to provide copies of data as needed but in a controlled manner. Hitachi Data Instance Director (HDID) automates policy-based copy creation, refresh, and expiration. HDID automates and orchestrates the fast, non-disruptive snapshot and cloning capabilities built into the operating system of the Hitachi Virtual Storage (VSP) family of enterprise arrays to create copies. These copies can be automatically mounted for user access, automatically refreshed so you don’t have to create another copy, and then automatically expired and deleted, all based on the policies of your organization.
This solution offers a range of choices to meet any requirement, including local or remote physical or virtual copies, created on a schedule or on an ad-hoc basis. The policies specify what to copy, when, where to make it available, and to whom access is granted, and how long to retain it. While this approach will not prevent bad behavior, it mitigates the need to behave badly in order to get your job done.
Our goal is not to reduce the number of copies to 1 or 2, but to reduce it to the right number for your organization by repurposing the copies you’re already making and to refresh existing copies so that you don’t have to store many orphaned, dated and forgotten copies.
Rich Vining is a Sr. WW Product Marketing Manager for Data Protection and Copy Data Management Solutions at Hitachi Vantara and has been publishing his thoughts on data storage and data management since the mid-1990s. The contents of this blog are his own.