Rich Vining

GDPR: What About Backups?

Blog Post created by Rich Vining Employee on Jul 26, 2018

Note: This is an update to a blog posted in October 2017


There have been hundreds of stories and blogs published about the perceived lack of market preparation for the EU's General Data Protection Regulation, the potentially crushing fines that may be imposed for non-compliance, different approaches to achieving compliance, and many other topics.


But there is one subject that I have seen very little coverage on: the data that is stored in copies for backup and disaster recovery.


Of the 99 provisions in the GDPR, one that gets a lot of attention is Article 17, the new “right to be forgotten”. Essentially, an EU resident can request a data controller to delete all copies of their personal information from their systems, as well as those copies that have been shared with their data processors. There are some exceptions, such as health and financial firms that are required to retain the data for some period of time, but those exceptions only apply to the regulated information and not to other copies that may be used for sales and marketing.


Falling into this requirement to delete all copies of personally identifiable information includes the copies stored for backup and disaster recovery operations. As part of their data governance regime, most organizations retain multiple point-in-time copies of backup data for extended periods of time (many years), traditionally on tape and usually off-line in a remote vault. That data is viewed as a “last resort” to recover from some large disaster, to recover old data that has since been purged from production systems, or to archive the data from systems that have been decommissioned.


Let’s say that you are one of those businesses that has years’ worth of backup tapes containing customer transactional records. Each full backup contains largely the same data as the previous full backup, meaning that you probably have many backup tapes containing the same customer records.


When one of your customers asks you to “forget me”, can you really be expected to bring every tape back from the vault, restore each tape, find the relevant customer records, delete them, and then re-commit the remaining data to the tape? And then repeat the process every time you receive a new customer request to be forgotten?


The answer is an emphatic “yes”. Noted data security expert David Froud provides a thorough explanation of why in his article, “GDPR: Does the Right to Erasure Include Backups?”.


To address this very difficult challenge, I believe you have only 4 choices:

  1. Employ a team of backup administrators whose sole job will be to perform the process noted above following every erasure request. You may need additional tape and backup server hardware to handle the volume. This seems like a very expensive solution, but probably the path of least resistance.
  2. Develop a run-book of data deletion requests, and each time a backup copy is restored, the data is compared to the run-book and all flagged records are deleted before the restored data is operationalized. You’ll probably need to add staff for this method, and it will undoubtedly extend recovery times, which could incur increasingly costly system downtime.
  3. Pretending that your backup data doesn't include any PII, and praying that you never get audited or restore a tape that contains previously deleted PII is not an option, as Mr. Froud points out. You're going to have a very difficult time defending yourself to the supervisory authority and a court of law. This is a recipe for financial disaster.
  4. Take this opportunity to modernize your approach to data protection, rethink your data retention requirements, and get rid of all those tapes. This is likely the most painful course of action, at least in the short term, but will yield significant savings in the future and enable much easier compliance with GDPR and other data-centric regulations.


If you are interested in exploring option 4 above, Hitachi Vantara offers modern data protection solutions for both structured and unstructured data types.


For high-performance, highly-available database environments, Hitachi Data Instance Director (HDID)uses storage-based snapshot and replication technologies to create fast and frequent copies that are application-consistent, non-disruptive to the production environment, and offer almost instant recovery when needed. It is a fast and simple process to mount a snapshot or clone and search it for PII.


For unstructured data, HDID can backup file system data to Hitachi Content Platform, our highly-scalable object storage system. HDID stores the data on HCP in a native file format that allows it to be used for purposes other than recovery. For example, the backup data can be indexed by Hitachi Content Intelligence, and when PII is discovered during that process, custom metadata can be added to the object to make it easy to find that data later.


For more information on the ways Hitachi can help you manage and control your copy data, download "Get Value from Backup Data with Governance Copy Services".


I would love to hear your thoughts on this topic.


Rich Vining is a Sr. Product Marketing Manager for Data Protection Solutions at Hitachi Vantara and has been publishing his thoughts on data storage and data management since the mid-1990s. The contents of this blog are his own.