The Changing Demands of Data Require DataOps

By Hubert Yoshida posted 07-08-2019 00:00


At a high level DataOps may be described as the process to enable the right data to get to the right place at the right time.


According to Wikipedia,this process includes the following:

  • Validation – Ensuring that supplied data is correct and relevant.
  • Sorting – "arranging items in some sequence and/or in different sets."
  • Summarization – reducing detail data to its main points.
  • Aggregation – combining multiple pieces of data.
  • Analysis – the "collection, organization, analysis, interpretation and presentation of data."
  • Reporting – list detail or summary data or computed information.
  • Classification – separation of data into various categories.

While one could expand the definition of these terms, there are a few processes that are being driven by new demands in the market place.

Big data is driving the demand for data curation. Data curation involves all the processes needed for data creation, maintenance, and management, together with the capacity to add value to data through the appendage of metadata. Data curation also ensures that data can be reliably retrieved for future research purposes or reuse.

Data Governance requires data to be immutable, forgotten on demand, and users notified of data that has been exposed. Data that is stored on electronic systems must be able to prove that it has not changed since it was ingested. Under certain conditions, users may require that all their data be deleted or forgotten. If data breaches occur, the stewards of that data must notify all affected users of the breach within a certain time period.

Video Analytics requires a tremendous amount of visual data points, including LIDAR for remote sensing, to distinguish whether an object in the road is a plastic bag that is blowing in the wind or a person. There are also privacy requirements for pixilation of certain parts of an object.

IoT data often requires four dimensional data. The element of time. Time series data tells you what occurred when so that you can correlate it with other events that may have led to an outage. IoT data may also require streaming data, and algorithms to compensate for time gaps.

For many reasons, the provenance of data becomes important. Where did the data originate and what has happened to the data in its life time? This is where blockchain could be used.

All these processes come under DataOps, getting the right data to the right place at the right time. There is a lot involved so a key requirement would be to automate as much of this as you can and use a common platform like Pentaho,HCPand Lumada. There are also solution services that can help like Hitachi Vantara’s Insight servicesand REAN Cloud services