One of the key advantages of Content Intelligence is its flexibility to connect to and transform data in various ways. Content Intelligence includes a comprehensive library of various processing stages to analyze, extract, filter, transform, enrich, and further act upon data. The Plugin Software Development Kit can also be leveraged to create new processing stages or connectors to new data sources. Below, are the new data processing enhancements in Content Intelligence v1.4.
Just because data is compressed shouldn’t mean that Content Intelligence can not access and process it. This new Decompression stage will automatically decompress documents that have been compressed with popular compression formats, including GZIP, BZIP2, and XZ files, so that files of these types residing on your data sources can be read and processed by Content Intelligence workflows.
Of course, this is not to be confused with the existing TAR Expansion stage, which expands TAR archives.
This new processing stage can blend data streams from multiple sources into single documents for further processing. This can be useful if you would like to index related files together.
For example, you can index e-mail attachments together with their corresponding e-mail messages, or subtitles along with video files.
Conditionals For Field Values And Date Math
In a workflow, if you have a processing stage that you want to affect only certain documents, but not others, you can include conditional statements to determine which documents will be either processed or bypassed by the stage. New with Hitachi Content Intelligence v1.4, you now can include field values and date math in conditional statements.
For example, you can compare the values of two fields (foo and bar) with each other and then conditionally continue processing based on the results.
You can now also compare date and time field values to perform calculations that are relative to fixed moments in time. For instance, you may want to determine if the values fall within a relative period (past 6 months, past 24 hours, past year, 3 years from now, etc…). This introduces a great deal of simplicity and many new possibilities to your Content Intelligence workflows!
Processing Failed Documents
For cases where a particular processing stage fails to process a document, there are now additional options to further process those documents through additional stages. By enabling the Continue Processing Failed Documents option in the workflow, the reason for the failure will be added to the document metadata, and the document will continue being processed by the remainder of the pipeline. This allows the system to conditionally handle failed documents differently based on the reason for the failure.
For example, you can index all documents with the error mentioning encrypted, and then either drop them from the workflow to prevent further processing, or proceed with the remaining processing stages in the pipeline.
Troubleshooting Document Failures
It's now easier to investigate document failures from workflows. You can now view and filter document failures by date, category, or reason, making it much easier to pinpoint the failures that you’re looking for.
With consolidated and filterable views in the Document Failures table, you can identify commonalities and uncover trends for documents that failed for the same reasons or during a certain time window.
Want to learn more? For more details on all of the great features and enhancements of Content Intelligence v1.4, check out my other blogs:
Also, check out the following resources:
- Content Monitor Datasheet
- Content Intelligence Datasheet
- Content Intelligence Product Page
- Content Intelligence Community
- Announcing Hitachi Content Intelligence v1.3
Thanks for reading!
Senior Solutions Marketing Manager, Hitachi Vantara
Follow me on Twitter: @TechMikePacheco