Hu Yoshida

DataOps and Hitachi Vantara

Blog Post created by Hu Yoshida Employee on Apr 11, 2019

According to the Harvard Business Review, "Cross-industry studies show that on average, less than half of an organization’s structured data is actively used in making decisions—and less than 1% of its unstructured data is analyzed or used at all. More than 70% of employees have access to data they should not, and 80% of analysts’ time is spent simply discovering and preparing data. Data breaches are common, rogue data sets propagate in silos, and companies’ data technology often isn’t up to the demands put on it." That was in a report back in 2017. What has changed since then?

 

Few Data Management Frameworks are Business Focused

Data management has been around since the beginning of IT, and a lot of technology has been focused on big data deployments, governance, best practices, tools, etc. However, large data hubs over the last 25 years (e.g., data warehouses, master data management, data lakes, Hadoop, Salesforce and ERP) have resulted in more data silos that are not easily understood, related, or shared. Few if any data management frameworks are business focused, to not only promote efficient use of data and allocation of resources, but also to curate the data to understand the meaning of the data as well as the technologies that are applied to the data so that data engineers can move and transform the essential data that data consumers need.

 

Introducing DataOps

Today more customer are focusing on the operational aspects of data rather than on the fundamentals of capturing, storing and protecting data. Following the success of DevOps (a set of practices that automates the processes between software development and IT teams, in order that they can build, test, and release software faster and more reliably) companies are now focusing on DataOps. DataOps can best be described by Andy Palmer, who coined the term in 2015, “The framework of tools and culture that allow data engineering organizations to deliver rapid, comprehensive and curated data to their users … [it] is the intersection of data engineering, data integration, data quality and data security. Fundamentally, DataOps is an umbrella term that attempts to unify all the roles and responsibilities in the data engineering domain by applying collaborative techniques to a team. Its mission is to deliver data by aligning the burden of testing together with various integration and deployment tasks.”

 

At Hitachi Vantara we have been applying our technologies to DataOps in four areas: Hitachi Content Platform, Pentaho, Enterprise IT Infrastructure, and REAN Cloud.

 

  • HCP: Object storage for unstructured data through our Hitachi Content Platform and Hitachi Content Intelligence software. Object storage with rich meta data, content intelligence, data integration, and analytics orchestration tools enable business executives to identify data sources, data quality issues, types of analysis and new work practices needed to use those insights

HCP DataOps.png

 

  • Pentaho: Pentaho streamlines the entire machine learning workflow and enables teams of data scientists, engineers and analysts to train, tune, test and deploy predictive models.

Pentaho DataOps.png

  • IT Infrastructure: Secure Enterprise IT Infrastructure that extends across edge to core to Cloud, based on REST APIs for easy integration with third party vendors. This gives us the opportunity to not only connect with other vendor’s management stacks like ServiceNow, but also apply analytics and machine learning and automate deployment of resources through REST APIs.

 

IT Data Ops.png

 

  • REAN Cloud: A cloud agnostic managed services platform for DataOps in the cloud. Highly differentiated offerings to migrate applications to the cloud, modernize applications to leverage the cloud offerings for data warehouse modernization, predictive agile analytics, and real time IoT. REAN Cloud also provides ongoing managed services.

REAN Data Ops.png

Summary

  • Big Data systems are becoming a center of gravity in terms of storage, access and operations.
  • Businesses are looking to DataOps, to speed up the process of turning data into business out comes.
  • DataOps is needed to understand the meaning of the data as well as the technologies that are applied to the data so that data engineers can move, automate and transform the essential data that data consumers need.
  • Hitachi Vantara provides DataOps tools and platforms through
    • Hitachi Content Platform,
    • Pentaho data integration and analytics orchestration,
    • Infrastructure analytics and automation
    • REAN Cloud migration, modernization, and managed services.

 

Blak Hole Pic.png

Grad Student Katie Bouman uses DataOps to capture first picture of a black hole.

Outcomes