Introducing Lumada Data Catalog 6.0, business rules and field lineage

By Glen Martin posted 12-09-2020 07:11

I’m excited to announce our imminent release of Lumada Data Catalog version 6.0 on December 8th. This release has been in the works since we were acquired by Hitachi Vantara in January of this year, and I’m very proud of what the team has put together. Let me tell you about a few highlights.

Business Rules

A buzzword that has been bandied about lately is “Active Metadata”. In simplest terms, metadata is active when actions are taken autonomously on it.

In a separate announcement, Pentaho 9.1 added search and input steps for Lumada Data Catalog, which allows a Dataflow to be parameterized, to search the catalog for files matching a condition, and process them.

But a first step is the automatic classification of data, labelling it with business-relevant tags or properties such as “Purchase Order”, and “Meets Policy”.

Here’s an example. Maybe your business has this rule: Sales is allowed to discount to 35% for normal customers, and 45% for Tier A customers. Of course you try to control this in bid creation, but exceptions could slip in so you want to verify. You find a file of sales bids that includes one row per bid, each bid including the customer, and discount rate that was applied. To determine if the bid file meets rule, you need to look at the data, row by row, figure out whether the customer is Tier A, and accordingly determine if the discount rate is too high. If any rows in the file fail this test, we’ll mark the file as “Policy Violation”.
Graphical user interface, text, application, email  Description automatically generated
Our new Business Rules feature is intended for these kinds of complex tests of data, metadata, or combinations of both. In version 6.0 we have included a rules processing job, to be executed after our normal fingerprint based tagging. It automatically determines which rules to run against a data file based on the content tags.  And the rule can then further describe the data with appropriate business tags.

Which brings us back to where we started, Active Metadata. Fingerprint-based tagging automatically labels fields according to their contents. Rules are automatically bound to the data using those content tags, and the rules then add additional business-relevant tags. Finally, a scheduled Pentaho 9.1 job queries the catalog for those same business flags, and pulls the right data into a processing pipeline.  Automatically, start to finish. That’s active metadata.

Field-based Data Lineage

Graphical user interface, table  Description automatically generatedData lineage has gained a lot of attention in the past few years, in part to document data included in regulatory reports.  But a big problem with creating end-to-end lineages is missing links.  Many lineage tools are very good at capturing formal ETL lineages, database queries and view, and the like. But that’s only a fraction of the data movements in a modern enterprise.

An unique feature of Lumada Data Catalog is the ability to infer missing lineages, using an offshoot of our fingerprint-based data matching. We infer lineage both at the file/table, and field/column levels.

With version 6.0 we’ve added a powerful feature to examine and update inferred field lineages. While viewing a lineage, a steward can view a lineage step in tabular form and examine the lineages that were inferred or supplied by other users. If you want to add a new lineage, the tool calculates data correlation to help find the best source field to link. The new field lineage editor also provides text field in which you can document important context about the lineage, such as joins or filters, or a description which could mention the business purpose or job information.

Role-based Access Controls

5 years ago, Waterline built our data catalog expecting that it would be deployed by and for a data analytics team, where everyone in the team used the same data lake, and their organization already had effective ways to control permissions on data files.  In that scenario, Waterline access control permissions followed the permissions of the files containing data, making it very simple to administer.

In the past few years, our data catalog has attracted attention for use in a much broader organizational context, cataloguing data spanning the enterprise, in which new classes of users are concerned with broad data flows and controls.  We were asked to help organizations provide appropriate visibility of metadata.

In version 6.0, we’ve released a new role-based access control feature, that allows responsible administrators to grant permissions to metadata and catalog features on a fine-grained level. This allows them to create a permissions group where, say, a Privacy Steward can assign and clear Privacy flags on data wherever it may be located. A Shipping Staff member can see Bills of Lading, including the flags identifying which part is sensitive content, but can’t change those privacy flags.  That sort of thing.

The Role-based Access Control feature will allow organizations to create and fine-tune permissions to meet the varied and matrixed roles held by their staff.

These are only the high points of the release, there are many other new features large and small.  If you’re already a customer, thank you, and I hope you enjoy the new version.  And if you’re not, I’ll be happy to chat about what you’re doing and how we can help.

If you’ve read this far and would like a little more depth, check out the webinar I recorded a few weeks ago: Illuminate Data with AI-Powered Catalog, Lineage and Business Rules

1 comment



05-05-2022 13:23