Rich Vining

GDPR: What About Backups?

Blog Post created by Rich Vining Employee on Oct 11, 2017

If you’ve been following my blog, you’ve seen snippets of many news articles about the European Union’s General Data Protection Regulation (GDPR). Many have been about the perceived lack of preparation in the market, the potentially crushing fines that may be imposed for non-compliance, different approaches to achieving compliance, and many other topics.


But there is one subject that I have seen very little coverage on: the data that is stored in copies for backup and disaster recovery.


Of the 99 provisions in the GDPR, one that gets a lot of attention is Article 17, the new “right to be forgotten”. Essentially, an EU resident can request a data controller to delete all copies of their personal information from their systems, as well as those copies that have been shared with their data processors. There are some exceptions, such as health and financial firms that are required to retain the data for some period of time, but those exceptions only apply to the regulated information and not to other copies that may be used for sales and marketing.


Falling into this requirement to delete all copies of personally identifiable information includes the copies stored in backup and disaster recovery data stores. As part of their data governance regime, most organizations retain multiple point-in-time copies of backup data for extended periods of time (many years), traditionally on tape and usually off-line in a remote vault. That data is viewed as a “last resort” to recover from some large disaster, to recover old data that has since been purged from production systems, or to archive the data from systems that have been decommissioned.


Let’s say that you are one of those businesses that has years’ worth of backup tapes containing customer transactional records. Each full backup contains largely the same data as the previous full backup, meaning that you probably have many backup tapes containing the same customer records.


When one of your customers asks you to “forget me”, can you really be expected to bring every tape back from the vault, restore each tape, find the relevant customer records, delete them, and then re-commit the database to the tape? And then repeat the process every time you receive a new customer request to be forgotten?


The answer is an emphatic “yes”. Noted data security expert David Froud provides a thorough explanation of why in his article, “GDPR: Does the Right to Erasure Include Backups?”.


To address this very difficult challenge, I believe you have only 4 choices:

  1. Employ team of backup administrators whose sole job will be to perform the process noted above following every erasure request. You may need additional tape and backup server hardware to handle the volume. This seems like a very expensive solution, but probably the path of least resistance.
  2. Develop a run-book of data deletion requests, and each time a backup copy is restored, the data is compared to the run-book and all flagged records are deleted before the restored data is operationalized. You’ll probably need to add staff for this method, and it will undoubtedly extend recovery times, which could incur increasingly costly system downtime.
  3. Pretending that your backup data doesn't include any PII, and praying that you never get audited or restore a tape that contains previously deleted PII is not an option, as Mr. Froud points out. You're going to have a very difficult time defending yourself to the supervisory authority and a court of law. This is a recipe for financial disaster.
  4. Take this opportunity to modernize your approach to data protection, rethink your data retention requirements, and get rid of all those tapes. This is likely the most painful course of action, at least in the short term, but will yield significant savings in the future and enable much easier compliance with GDPR and other data-centric regulations.


If you are interested in exploring option 4 above, Hitachi Vantara offers modern data protection solutions for both structured and unstructured data types.


For high-performance, highly-available database environments, Hitachi Data Instance Director uses storage-based snapshot and replication technologies to create fast and frequent copies that are application-consistent, non-disruptive to the production environment, and offer almost instant recovery when needed.


For file and user data, the Hitachi Content Platform family provides a highly-scalable object storage solution that does not require backup, and includes advanced indexing and search capabilities to easily handle data erasure requests.


Rich Vining is a Sr. Product Marketing Manager for Data Protection Solutions at Hitachi Vantara and has been publishing his thoughts on data storage and data management since the mid-1990s. The contents of this blog are his own.