Michael Pacheco

Announcing Hitachi Content Intelligence v1.4:  Data Processing - Part 5 of 6

Blog Post created by Michael Pacheco Employee on Mar 25, 2019

One of the key advantages of Content Intelligence is its flexibility to connect to and transform data in various ways. Content Intelligence includes a comprehensive library of various processing stages to analyze, extract, filter, transform, enrich, and further act upon data. The Plugin Software Development Kit can also be leveraged to create new processing stages or connectors to new data sources. Below, are the new data processing enhancements in Content Intelligence v1.4.

 

Decompression Stage

Just because data is compressed shouldn’t mean that Content Intelligence can not access and process it. This new Decompression stage will automatically decompress documents that have been compressed with popular compression formats, including GZIP, BZIP2, and XZ files, so that files of these types residing on your data sources can be read and processed by Content Intelligence workflows.

 

Of course, this is not to be confused with the existing TAR Expansion stage, which expands TAR archives.

Content Intelligence - Decompression Stage.png
Attach Stream Stage

This new processing stage can blend data streams from multiple sources into single documents for further processing. This can be useful if you would like to index related files together.

 

For example, you can index e-mail attachments together with their corresponding e-mail messages, or subtitles along with video files.

Content Intelligence - Attach Stream Stage.png
JavaScript Stage
New with Content Intelligence v1.4, you can now include your own JavaScript in workflows to perform custom actions against individual document fields or entire documents.

 

In comparison to developing completely new plugins, the JavaScript stage requires far fewer steps to get code in place. This is especially useful if you need to do something relatively simple, but there is not an existing processing stage for it.

 

Let’s say that you have a database of medical records which contains height information in centimeters, and you want to convert to inches. Should be straightforward, right? But, there is not an existing processing stage for something like that. By using the JavaScript stage and a couple of simple and short lines of code, you could easily convert all height values from centimeters to inches by dividing by 2.54. (1 inch = 2.54 centimeters).

Content Intelligence - JavaScript Stage.png

Conditionals For Field Values And Date Math

In a workflow, if you have a processing stage that you want to affect only certain documents, but not others, you can include conditional statements to determine which documents will be either processed or bypassed by the stage. New with Hitachi Content Intelligence v1.4, you now can include field values and date math in conditional statements.

 

For example, you can compare the values of two fields (foo and bar) with each other and then conditionally continue processing based on the results.

Content Intelligence - Field Value Conditionals.png
You can now also compare date and time field values to perform calculations that are relative to fixed moments in time. For instance, you may want to determine if the values fall within a relative period (past 6 months, past 24 hours, past year, 3 years from now, etc…). This introduces a great deal of simplicity and many new possibilities to your Content Intelligence workflows!

Content Intelligence - Date Math Conditionals.png

Processing Failed Documents

For cases where a particular processing stage fails to process a document, there are now additional options to further process those documents through additional stages. By enabling the Continue Processing Failed Documents option in the workflow, the reason for the failure will be added to the document metadata, and the document will continue being processed by the remainder of the pipeline. This allows the system to conditionally handle failed documents differently based on the reason for the failure.

 

For example, you can index all documents with the error mentioning encrypted, and then either drop them from the workflow to prevent further processing, or proceed with the remaining processing stages in the pipeline.

Content Intelligence - Continue Processing Failed Documents.png

Troubleshooting Document Failures

It's now easier to investigate document failures from workflows. You can now view and filter document failures by date, category, or reason, making it much easier to pinpoint the failures that you’re looking for.

 

With consolidated and filterable views in the Document Failures table, you can identify commonalities and uncover trends for documents that failed for the same reasons or during a certain time window.
Content Intelligence - Document Failures Table.png

 

Also, check out the following resources:

 

Thanks for reading!

 


Michael Pacheco

Senior Solutions Marketing Manager, Hitachi Vantara

 

Follow me on Twitter: @TechMikePacheco

Outcomes