Last week Hitachi Vantara announced that it completed the acquisition of the assets of privately held Waterline Data, Inc., a provider of intelligent data cataloging solutions. Hitachi Vantara also introduced Lumada Data Catalog, incorporating Waterline's data cataloging technology into the Lumada portfolio to solve modern data challenges for analytics and governance across edge-to-core-to-cloud environments.
I would like to extend a warm welcome to their founder Alex Gorelik and the Waterline Data team as they join the Hitachi Vantara Family.
Today data management and big data is much more complicated than it was just a few years ago. The business world is much more demanding, diverse, dynamic and unstructured. The data bases that used to be several terabytes are now data lakes with multiple petabytes. We are also finding that data that previously may have been unrelated can offer new insights when curated and analyzed together. We also have new governance requirements that restrict us from accessing the actual data in certain regions but can offer valuable insight if we know its attributes. The whole process of business intelligence has become more agile with automated pipelines, low code, containers, and self-service. This is where the addition of an intelligent data catalogue can help DataOps unlock the value of your data, delivering the right data to the right people at the right time.
This week we added the Lumada Data Catalog to our portfolio as a result of integrating Waterline Data’s technology into the Hitachi Vantara Lumada portfolio. Waterline Founder and CTO Alex Gorelik led the product integration. Alex published a blog explaining the value of the Data Catalog and how the catalog complements and integrates with the products in the Lumada Data Services portfolio, including Pentaho, Lumada Edge Intelligence, Lumada Data Lake, Lumada Data Optimizer for Hadoop, and Hitachi Content Intelligence. It is a tool designed to help organizations find and manage large amounts of data – including tables, files and databases – in their enterprise data stores. Data catalogs centralize metadata in one location, providing a full view of each piece of data across databases and contain information about the data’s location, profile, statistics, summaries and comments. It enables collaboration between the various data gate keepers. This systematized service helps make data sources more discoverable and manageable for users and helps organizations make more informed decisions about how to use their data.
Data catalogues are a hot topic today, and the Waterline Data Catalogue has unique advantages which Alex describes in his blog. A key differentiator is FingerprintingTM.
“Waterline Data specifically provides a unique, patented technology called “data fingerprinting.” The basic idea is that data fields carry unique fingerprints, just like we all have our own fingerprints. Based on similarity between fingerprints for different fields, we can tell whether these fields contain the same information. Once a field is given a tag – a business term that describes its contents, Waterline uses AI and ML to automate the discovery and classification of other fields with similar fingerprints and suggest semantic tags for them. The analysts and data stewards can then accept or reject these suggestions, which trains the catalog to become more accurate in its classifications.”
This Fingerprinting™ and automated tagging technology, provides for faster and easier discovery of the vast amounts of data stored in data warehouses, cloud services and databases across the enterprise. Data that is originally identified with Waterline Data Fingerprinting combines big data analysis, machine learning and human curation to automatically catalog data and data lineage at scale, reducing the manual tagging of data by over 80%, increasing the overall data inventory and lineage accuracy 10x, and reducing the people cost for manual tagging and inventorying of data by 90%.
While most data has technical metadata – the names that developers gave it, this technical metadata is inconsistent and frequently misleading. Data consumers want to use business metadata – standard business terms to find the data that they need. Waterline’s machine learning tool, Aristotle, is a machine learning system for associating technical metadata with business terms or business metadata, thereby closing the operational gap.
Waterline’s Fingerprinting technique is the industry’s only data catalog to combine AI and machine learning with best-in-class crowdsourcing and big data scalability to deliver a modern data catalog that meets today’s enterprise needs. The patent publication US 2015-0356123 is titled “Systems and Methods for Management of Data Platforms” with Waterline Data founder and CTO Alex Gorelik as the inventor. Over his career, Alex Gorelik has been granted over 20 patents in Data Management.
In addition to his many patents Alex is a prolific blogger and author. Last year he published a book, the “Enterprise Big Data Lake” which is a great read on understanding what is required of data lakes. His blogs on the Waterlinedata.com are also very educational and we are looking forward to more of his blogs on the Hitachi Vantara Community.
We are very pleased to welcome Alex and the creative Waterline Data team to our Hitachi Vantara Family. Alex has joined us as a Senior Fellow, Digital Solutions.#Blog#Hu'sPlace