Hu Yoshida

Hitachi Content Platform Eliminates Backup For Unstructured Data

Blog Post created by Hu Yoshida Employee on Sep 29, 2016

Analysts tell us that 80% of our enterprise data is unstructured data and that percentage is being driven higher by mobile, social, IoT, analytics and cloud. That data is as important to your business as structured data and needs to be protected and retained for disaster recovery and compliance. When unstructured data was a small fraction of the total compared to structured data, we managed it as we managed structured data.

 

As unstructured data began to grow rapidly, the fundamental differences between structured and unstructured data began to impact the IT environment. The huge amounts of unstructured data led first to the adoption of lower cost NAS filers which were easy to install. However, it became quickly obvious that these NAS systems did not scale. The first few NAS systems were easy to manage but when they started to proliferate, it became difficult to manage and maintain the infrastructure and provide the data governance, protection, and search required for their use.

 

Backup.jpg

Object storage can eliminate the scale limitations of NAS. An object is defined as data (typically a file) along with its meta data that is bundled together into an object. This object is given a unique ID and stored in a flat structure that makes it possible to scale beyond the hierarchical limitations of a NAS file system. An application can retrieve an object by presenting an Object ID.

 

The power in the object store is in the meta data. The meta data can be local or geographically separated for redundancy, provide self-healing, enable automation and life cycle management by specifying policies, and provide opportunities for analytics that could never be done before with structured data warehouses. Object storage is ideal for the burgeoning demand of unstructured data. One problem that still remains is the management and infrastructure needed to backup this data.

 

You have heard us recommending that you store your unstructured data to our Hitachi Content Platform (HCP) and eliminate backup. Some of you may have assumed that all object storage platforms can do this or others have been skeptical since their practice is to snapshot and backup unstructured data just as they do for their structured data.

 

Let me set the record straight. Not all object storage systems can eliminate backup and yes you still need to snap shot and backup your unstructured data if you are not using HCP. The reason you need to snapshot and backup your unstructured data is for file or object updates. If you update a stored object, you lose the original object, unless you have a copy of it in a snapshot or a backup.

 

This is not the case with HCP. HCP solves the backup problem by hashing every object to insure that it is unique. If an update is made to the object, it is stored as a new version of the object and the original is not overwritten unless you request it to do so. HCP also stores the file in two places (or more if you like) so that you always have a copy of an object, thus eliminating the need for backup. The hash is also checked when the file is accessed to ensure that it has not been corrupted in anyway and also to prove immutability to the auditors. HCP can have policy based retention periods and can delete objects automatically or mark them as Write Once Read Many (WORM) for long term retention.

 

HCP eliminates the need for backup of unstructured data. It not only eliminates the need for snapshots and multiple generations of backup infrastructure, it can also work with or even eliminate the need for backup management software and backup administrators. Data can be directly loaded into HCP from the application, and be available for use by other applications subject to authorization and authentication procedures. In our DevOps, builds and test results go directly into HCP. Other tools like Git, Wiki, and GIRA are easily backed up to HCP using Duplicity, a free software for backup orchestration. Everything in our DevOps operation is backed up on HCP and is searchable.

 

Gartner’s recent Critical Capabilities for Object Storage ranked HCP as the best for data protection.

Gartner HCP.jpg

 

For more information on HCP and data protection see this recent whitepaper.

Outcomes