How to Find Valuable Nuggets in the Data Gold Rush

By Lothar Schubert posted 02-01-2020 18:19


 In the world of enterprise technology, an evolution is taking place that’s being driven by DataOps. DataOps momentum is growing fast and enthusiasm is boiling over.

The excitement is palpable and understandable but businesses must fight the urge to rush headlong into the hype. Pause and take time to understand what DataOps means in the context of your business and how it can be gradually and systematically adopted.

In the end, the companies that strike “DataOps gold” won’t be those that head off to gold country but rather the businesses that embrace a more methodical approach focused on identifying smaller nuggets, those projects that can have a large impact and over time will gradually transform their businesses.

At Hitachi Vantara, we know DataOps represents the synthesis of what has worked well in enterprise software over the past few decades and we’ve a lot. First and foremost, when it comes to gaining transformative value from data, most victories come from the creation and productization of data supply chains. This is true of ERP systems, CRM and social media, digital advertising and others.

We’ve also learned that the days when data supply chains were created by one-off custom engineering projects are long gone. Today we’re operating in a world where there are a multitude of powerful tools on the table. This includes data lakes, data discovery tools, special-purpose data repositories of many types, advanced analytics, artificial intelligence (AI) and machine learning (ML), and innovations in edge computing and storage. These new tools have created incredibly powerful services that have been supercharged by the cloud.

In this new world, data landscapes and services are being described and controlled by metadata, which lays the groundwork for policy-based automation, risk management, and orchestration at a new scale. Enter DataOps, which is essentially a new meta-design pattern for combining these elements into data supply chains that are much easier to create, maintain, and automate. DataOps takes the lessons learned in DevOps—breaking silos and expanding automation—and extends them to encompass all data in an enterprise.

Like DevOps, DataOps isn’t a new product category. It’s an emerging discipline or as we say at Hitachi Vantara, a methodology that combines new data management architectures, policies, and collaborative best practices. And like many emerging concepts, vendors and internal departments are all claiming their DataOps leadership.

For this very reason, vendors must explain what they mean when talking about DataOps in order to help businesses find the right fit. At Hitachi Vantara we did just that in a new white paper titled, “Is DataOps Your Windfall of Value?” This paper breaks DataOps down into three layers:



      Operations agility

 Here’s a quick summary of each of these layers.



DataOps for analytics creates agile data pipelines that streamline the process of getting data into the hands of end users who want to solve business challenges. Now, instead of having data “pushed” at them by data gatekeepers, DataOps lets end-users “pull” data from wherever it is located which ultimately restructures the entire data supply chain.

As a result we have are more agile data pipelines. However these pipelines are not focused solely on providing data for data dashboarding and analytics. Instead, we have different pipelines for everything from AI/ML models to applications to support for automation.

Because of this DataOps requires a more searchable and automated data management infrastructure than most companies have today in order to help users locate and transform data to feed these agile data pipelines.

With DataOps, the amount of data and the demand for it will increase, as will the number of users looking to tap into it. If that’s not challenging enough, these users will also be looking for different types of data pipelines, with some relying on real-time data and others operating with batch processes that run less frequently. Because of this, DataOps must ensure that all of these pipelines can be easily maintained and supported to help the business tap into use full value and make each piece more useful to the organization.


Governance and Measurement

There is no doubt that DevOps addressed governance challenges at some level. However, they pale in comparison with those that DataOps must address. DataOps is far more complex and will require a larger number of individuals and units within an organization to work together to get results. Basically, the entire organization, from data experts, IT, operations, data analysts, data scientists to end users, customers and business partners and others must all work together to achieve DataOps success.

Why you ask? DataOps involves large and diverse datasets originating from varying sources with different characteristics. This makes controlling usage and documenting compliance with regulations far more challenging. Just think about it. Any piece of data that contains identifiable information must be regulated and its usage tracked while at the same time protecting the confidentiality or sensitivity of the data.  

Therefore, the automation that DataOps provides also requires rigorous policy-based governance, that kind capable of monitoring and limiting access and also encrypting or anonymizing data using any tactics deemed necessary. This automation must be applied to governance processes operations like search, archive, retention management, legal holds, GDPR compliance, and more to handle the increased complexity of controlling the landscape and satisfying regulatory requirements. Data Catalogs play an increasingly important role here.


Operations Agility

Just as DataOps allows businesses to create and manage dramatically more agile data pipelines, it also necessitates more agile operations environments that can support the creation, implementation and management of the data flowing through the business.

Remember now, unlike in the past, the goal of modern data management and DataOps is not to reflexively centralize data. Quite the opposite. DataOps allows it to be distributed to every corner of a business while simultaneously managing the data to optimize it for business access, cost and compliance wherever it resides.

The good news is that many current repositories and storage mechanisms can support multicloud infrastructures that allow data to be accessed from IoT devices, branch offices, edge computing, on-premises data centers, co-location facilities, SaaS applications and public clouds. However, this expansive infrastructure will require the operation of data pipelines and data storage to become policy-based and automated, which is possible through a greater reliance on metadata.

A metadata-driven data storage strategy delivers details on the contents of each dataset and related data management policies (such as retention, encryption, governance, etc.) as well as the location and risk profile of each piece of data. As a result, metadata delivers on two levels. It improves governance while also increasing automation for all DataOps infrastructure by ensuring that properly tagged data is managed and delivered in the right way.

Now, success here requires new advances in terms of automation and the flexibility of configuration. It also requires that data management be led by metadata-driven policies capable of allowing configuration to occur without undue complexity or copious amounts of code.  I’ll be honest, these advances are not easy to achieve, but once in place, the true power of DataOps can be unleashed.

At Hitachi Vantara, we believe the concepts I’ve touched on here are what it takes to bring your DataOps goldmine to life. Some of the nuggets will likely come from a victory in one of these layers while the larger successes will occur when all three come to life and work together using pervasive automation to access data, transform it and deliver it to the right person where it matters most.

At Hitachi Vantara we call this Your DataOps Advantage. And we enable this with Lumada. Please check out our DataOps Resource Center to learn more.