This blog series examines the benefits of aligning data protection processes to business requirements, as opposed to asking the business to accept the limitations of the current backup and recovery infrastructure. The previous installment covered the challenges that this approach entails and can overcome. In this blog, I look at operational recovery.
Operational Recovery is the ability to locally restore data following an event, such as a lost file or email, an application error, or a failed disk volume or system. The technology used accomplish operational recovery really depends on the service level objectives of the application or data, as described in my earlier post.
The service level objectives can be defined in a number of ways, each measured in time:
- The backup window is the maximum amount of time that a given backup or copy operation should take. Another way to look at it is, how much time is the organization willing to pause its access to the data while it is backed up. Traditionally, this may be several hours each night to perform an incremental backup, or longer periods on weekends to perform a full backup. But as businesses become more global and interconnected, this amount of downtime is not ideal for many applications. The backup window goal for critical or important applications may be zero.
- Recovery point objective (RPO) defines the frequency of the previous points in time from which data can be restored, and comes down to how often the backup is performed. A nightly backup results in a 24 hour RPO, meaning that up to 24 hours of your most recently created data is at risk of loss. That might be fine for standard data, but probably not for important and critical data, which may require far more frequent protection.
- The recovery time objective (RTO) is the amount of time in which a system, application or process must be restored following an outage. This measure could include the time to troubleshoot the problem, apply a fix, restart and test. It is very common to have a different RTO for each failure type. For example, you may have an RTO of 2 days following a major disaster, 30 minutes to restore a single file or email, or less than a minute to restore a business-critical application.
- Retention defines how long the copy of the data object needs to be stored, and in which manner. This service-level objective (SLO) can be applied for point-in-time recovery purposes, as in how long to keep a backup set, or for longer-term requirements which may be set by government regulations and corporate governance mandates. The other side of retention is expiration, which specifies if, and when, to delete the copy data, and whether to do so in a manner that prevents its future discovery.
Operational Recovery Technology Choices
All approaches to operational recovery involve creating a point-in-time copy of the data to be protected and providing a method to restore it on demand. The differences in technology choices are in their service level capabilities, as noted above, as well as in the granularity of the restore (a single email or an entire mailbox), and of course, the cost.
The available choices can be broken down into these categories:
- Batch Backup (full, differential or incremental) is the traditional model that has been around since the beginning of computing. Batch backup is most appropriate for standard data, such as employee files, sales and marketing collateral, and other non-essential data. It usually cannot be run fast enough, or frequently enough, to meet today’s tighter service level objectives, especially for critical data and large database files.
- Live Backup is similar to an incremental batch backup in the way it copies data, but it is able to do so without stopping the application or file system. This results in a less disruptive backup window.
- Continuous Data Protection, or CDP, captures and copies every change that is stored to disk in real, or near-real time. This completely eliminates the need for a backup window, and drives RPO to almost zero. However, since every change, even interim changes, are captured, CDP can result in much larger storage consumption. It is best used as a near-term operational recovery solution for critical and important data.
- Snapshots quickly capture changes in a data set using advanced pointer-based storage techniques. By themselves, snapshots do not offer application consistency, and hardware-based snapshots are stored on the same system as the production data, so they do not offer protection from a system-level failure.
- Cloud-based Backup, or Backup-as-a-Service, may include any of the techniques noted above, but uses a third-party service to host the protected data on a cost / capacity / month basis. This model can provide significant cost savings, but can have less-than-ideal performance and opens the door to questions about data security and long-term availability.
Your data protection solution for operational recovery may look like the above diagram, with a combination of technologies to solve individual requirements. However, as noted in the previous post, you really don’t want to have to manage 5 different protection tools. Hitachi Data Systems announced today a solution to this dilemma.
Hitachi Data Instance Director (HDID) provides a unified platform for managing backup, CDP, snapshots, replication, archiving and more. It’s unique whiteboard-style user interface makes it easy to create and manage sophisticated policy-based workflows that combine these protection technologies.
Also for VMware vSphere environments, HDS offers Virtual Infrastructure Integrator, which provides storage-based operational recovery from within the standard vCenter interface.
More to come on business-defined data protection, focusing on disaster recovery, in my next installment.