Meta Data: The Key To Managing The Zetabyte Apocalypse

By Hubert Yoshida posted 08-19-2020 18:57


Jon Toigo, blogged about the Z-pocalypse in his post Are We Trending Toward Disaster? Back in 2016 he posed the question: Will stage 6 be the “zettabyte apocalypse” – the cataclysmic disaster created by too much data and not enough storage for it all?  While the storage industry has been doing their best to make storage larger, cheaper, and faster, there is no way that they can match the zettabyte growth of storage demand. The solution lies in making the storage systems smarter. The next generation of intelligent storage will need much more information about the data in order to enable the smooth integration of storage resource management, storage service management and data management processes. This information about the data, or metadata, remains with the data throughout its lifecycle and provides the hooks necessary to classify the data’s contents, establish context and enable highly granular data management automation and lifecycle controls that are missing in most traditional storage architectures.

In 2016 there was 16 zettabytes of data. Now more than 59 zettabytes (ZB) of data will be created, captured, copied, and consumed in the world this year, according to a new update to the Global DataSphere from International Data Corporation (IDC). “The COVID-19 pandemic is contributing to this figure by causing an abrupt increase in the number of work from home employees and changing the mix of data being created to a richer set of data that includes video communication and a tangible increase in the consumption of downloaded and streamed video.” The movement to IoT and the increase in operational data is adding to the increase in unstructured data.

Hitachi Vantara has commissioned 451 research to provide a Pathfinder Report on Metadata

In this Pathfinder Report, 451 Research explains the value of next-generation storage systems that use the power of metadata to better identify, utilize, protect and control unstructured business data. Tackling infrastructure costs is a good strategy for managing the rapid growth of enterprise data, but the bigger challenges and value lie in making the best use of business data throughout its entire life. The traditional model of “save everything just in case” is becoming impractical and costly, especially because it fails to consider that not all data is created equal.

            Key Findings:

  • Unstructured data is becoming the new mission-critical data. Documents and digital media files represent a substantial percentage of the business data being generated today, and the need to protect unstructured data can be directly tied to the overwhelming growth of data storage costs.
  • Legal and industry compliance issues are driving the need for data awareness. Healthcare, financial services and other heavily regulated sectors require ready access to all forms of data – unstructured or not – and availability can mean the difference between financial success and failure, or perhaps even life and death.
  • Metadata-based storage and indexing is the key to long-term unstructured data management. Metadata ‘sticky notes’ with no indexing provide marginal long-term value. Visionary solutions will offer the tools to identify, categorize and search stored data, and help automate and control its movement throughout its lifecycle.
  • Capturing useful metadata is a major challenge. The best time to collect metadata is at the time of data creation, but there is no common mechanism at the OS/storage level for metadata creation. Until metadata gathering is enforced at data creation, the big challenge will be generating metadata after the fact, which requires systems with search and cataloging abilities that can address text, sensor and digital media files.
  • Best practices for business metadata generation are poorly defined. Unstructured data storage should contain a common and extensible set of basic metadata fields that enable policy-based management regardless of where the data physically resides or what business environment it serves.
  • All object-based cloud storage platforms are not alike. Both private and public cloud services from vendors such as Amazon, Microsoft, IBM and Google vary substantially in their feature sets, metadata environment and performance tiers, making movement between providers a challenge. Hybrid cloud customers should have the flexibility to utilize all hybrid cloud storage options based on the combination of cost, performance, resilience and availability that best suits business needs.

Metadata provides the tools for organization, automation, policy management and visibility and it is the key to managing data growth in a zettabyte world. Hitachi Content Platform is recognized by Analysts as having a robust metadata architecture coupled with intelligent policy based management. HCP based solutions treat file data, file metadata and custom metadata as a single object that is tracked and stored among a variety of storage tiers. HCP also allows custom metadata and the ability to store multiple annotations of that metadata for more advanced data management and analytics.

If your conception of object storage is that it is great for archive but too slow for tier 1 applications due to the overhead of meta data creation and management, the reality is that – with all the processing power, memory, flash storage, NVME,  and high-speed networking – object storage today can accommodate a growing number of tier 1 applications. HCP has partnered with WEKAIO to further increase tier 1 performance. The latest performance enhancement comes with the OEM partnership with the WekaIO, which provides a fast, efficient, and resilient distributed parallel file system that is cloud native and delivers the performance of all flash arrays, the simplicity of file storage and scalability of the cloud.