Search Options
Skip to main content (Press Enter).
Sign In
Skip auxiliary navigation (Press Enter).
Skip main navigation (Press Enter).
Toggle navigation
Search Options
Communities
General Discussion
My Communities
Explore All Communities
Products
Solutions
Services
Developers
Champions Corner
Customer Stories
Insights
Customer Advocacy Program
Badge Challenges
Resources
Resource Library
Hitachi University
Product Documentation
Product Downloads
Partners Portal
How To
Get Started
Earn Points and Badges
FAQs
Start a Discussion
Champions Corner
Blog Viewer
Blogs
Introducing Lumada Data Catalog 6.0, business rules and field lineage
By
Glen Martin
posted
12-09-2020 07:11
1
Like
I’m excited to announce our imminent release of Lumada Data Catalog version 6.0 on December 8
th
. This release has been in the works since we were acquired by Hitachi Vantara in January of this year, and I’m very proud of what the team has put together. Let me tell you about a few highlights.
Business Rules
A buzzword that has been bandied about lately is “Active Metadata”. In simplest terms, metadata is active when actions are taken autonomously on it.
In a separate announcement, Pentaho 9.1 added search and input steps for Lumada Data Catalog, which allows a Dataflow to be parameterized, to search the catalog for files matching a condition, and process them.
But a first step is the automatic classification of data, labelling it with business-relevant tags or properties such as “Purchase Order”, and “Meets Policy”.
Here’s an example. Maybe your business has this rule:
Sales is allowed to discount to 35% for normal customers, and 45% for Tier A customers.
Of course you try to control this in bid creation, but exceptions could slip in so you want to verify. You find a file of sales bids that includes one row per bid, each bid including the customer, and discount rate that was applied. To determine if the bid file meets rule, you need to look at the data, row by row, figure out whether the customer is Tier A, and accordingly determine if the discount rate is too high. If any rows in the file fail this test, we’ll mark the file as “Policy Violation”.
Our new Business Rules feature is intended for these kinds of complex tests of data, metadata, or combinations of both. In version 6.0 we have included a rules processing job, to be executed after our normal fingerprint based tagging. It automatically determines which rules to run against a data file based on the content tags. And the rule can then further describe the data with appropriate business tags.
Which brings us back to where we started, Active Metadata. Fingerprint-based tagging automatically labels fields according to their contents. Rules are automatically bound to the data using those content tags, and the rules then add additional business-relevant tags. Finally, a scheduled Pentaho 9.1 job queries the catalog for those same business flags, and pulls the right data into a processing pipeline. Automatically, start to finish. That’s active metadata.
Field-based Data Lineage
Data lineage has gained a lot of attention in the past few years, in part to document data included in regulatory reports. But a big problem with creating end-to-end lineages is missing links. Many lineage tools are very good at capturing formal ETL lineages, database queries and view, and the like. But that’s only a fraction of the data movements in a modern enterprise.
An unique feature of Lumada Data Catalog is the ability to infer missing lineages, using an offshoot of our fingerprint-based data matching. We infer lineage both at the file/table, and field/column levels.
With version 6.0 we’ve added a powerful feature to examine and update inferred field lineages. While viewing a lineage, a steward can view a lineage step in tabular form and examine the lineages that were inferred or supplied by other users. If you want to add a new lineage, the tool calculates data correlation to help find the best source field to link. The new field lineage editor also provides text field in which you can document important context about the lineage, such as joins or filters, or a description which could mention the business purpose or job information.
Role-based Access Controls
5 years ago, Waterline built our data catalog expecting that it would be deployed by and for a data analytics team, where everyone in the team used the same data lake, and their organization already had effective ways to control permissions on data files. In that scenario, Waterline access control permissions followed the permissions of the files containing data, making it very simple to administer.
In the past few years, our data catalog has attracted attention for use in a much broader organizational context, cataloguing data spanning the enterprise, in which new classes of users are concerned with broad data flows and controls. We were asked to help organizations provide appropriate visibility of metadata.
In version 6.0, we’ve released a new role-based access control feature, that allows responsible administrators to grant permissions to metadata and catalog features on a fine-grained level. This allows them to create a permissions group where, say, a Privacy Steward can assign and clear Privacy flags on data wherever it may be located. A Shipping Staff member can see Bills of Lading, including the flags identifying which part is sensitive content, but can’t change those privacy flags. That sort of thing.
The Role-based Access Control feature will allow organizations to create and fine-tune permissions to meet the varied and matrixed roles held by their staff.
These are only the high points of the release, there are many other new features large and small. If you’re already a customer, thank you, and I hope you enjoy the new version. And if you’re not, I’ll be happy to chat about what you’re doing and how we can help.
If you’ve read this far and would like a little more depth, check out the webinar I recorded a few weeks ago:
Illuminate Data with AI-Powered Catalog, Lineage and Business Rules
#Blog
#ThoughtLeadership
1 comment
9 views
Related Content
Lumada Data Catalog is certified on Cloudera CDP!
Charles Yarbrough
Added 02-01-2021
Blog Entry
Hitachi Vantara Welcomes the Waterline Data Team and Announces New Data Ops Software
Hubert Yoshida
Added 04-01-2020
Blog Entry
Introducing Lumada Data Catalog
Anand Sagar Rao Vala
Added 04-17-2020
Blog Entry
Hitachi Vantara’s DataOps Advantage With Waterline Data Catalog
Hubert Yoshida
Added 02-18-2020
Blog Entry
Building an enterprise business glossary
Rishu Shrivastava
Added 03-23-2023
Blog Entry
Permalink
Comments
Dipta Kundu
05-05-2022 13:23
Informative
© Hitachi Vantara LLC 2023. All Rights Reserved.
Powered by Higher Logic