Hitachi Content Platform​

 View Only

Announcing Hitachi Content Intelligence v1.5

By Michael Pacheco posted 10-08-2019 04:13

  

Announcing Hitachi Content Intelligence v1.5
Maximize the value of your enterprise data, wherever it resides, to deliver the best quality information to where and when it’s needed most. 
 

 

From deeper integrations with Hitachi Content Platform, to customized searches, and improved data processing and scalability, the latest release of Content Intelligence comes with many great new features and enhancements to an already-robust data processing solution.  I will talk about most of these improvements in this blog.


Tighter Integration with Hitachi Content Platform

Content Intelligence can connect to and process data from a variety of sources containing both structured and unstructured datasets.  Hitachi Vantara has doubled-down on even tighter integration with Hitachi Content Platform (HCP) data sources.  This should come as no surprise, as HCP is the flagship of Hitachi Vantara’s Data Intelligence portfolio.  HCP is a massively scalable and flexible industry-leading hybrid cloud object storage solution.  HCP supports the most demanding enterprise use cases, and gives you total control of your data to satisfy even the most critical compliance requirements. 

Monitor HCP Replication
Hitachi Content Monitor  (Content Monitor), a feature of Content Intelligence, provides near real-time performance monitoring of multiple HCP clusters at scale from an interactive dashboard-rich user interface.  A long-awaited feature, Content Monitor can now monitor HCP replication statistics on a number of things, including:  the number of objects replicated per link, the rate of those operations, the number of pending operations, and of course any related errors.


 

Forecast Remaining Useful Life of HCP SSD’s
Also new with Content Monitor is an enhancement to the existing Forecasting feature that provides the ability to forecast the remaining useful life of Solid State Drives (SSD) on HCP with estimations on the time-remaining until an SSD fails.  Content Monitor even provides a confidence rating on these predictions so that you can proactively investigate and plan maintenance before an actual failure occurs.


 

Explore More HCP Services and Operations
In v1.4, Content Monitor introduced the ability to monitor the activity of services running on HCP for granular details on how long the services have been running, and the number of objects processed.  In v1.5, not only can you now monitor the Compression/Encryption services running on HCP S Series nodes, but you can additionally monitor the processing rates of all monitored services for more granular information on how many objects were examined, serviced, or failed per second.  


 

Perform Bulk Actions on HCP Documents
Following the trend of deeper integration with HCP, you can now conditionally perform Bulk Actions on Hitachi Content Search (Content Search) results containing documents that reside on HCP data sources.  This is key to streamlining Data Governance and eDiscovery initiatives as you can selectively perform operations, including:  applying and clearing of legal holds, setting retention, deleting, and purging on entire groups of documents.

 
 

Track HCP Versions
HCP can store multiple versions of an object, thus providing a history of how the data has changed over time.  Legal and Compliance departments often require that all files be preserved forever or long periods of time to satisfy their investigations and compliance needs.  Another benefit of versioning is that it supports rolling back of files to previous points in time for recoverability purposes.  Given the importance of this key HCP feature, you can now index and search all versions of documents residing on HCP.  So, if you search for a document that resides on HCP, not only do you see the current version, but can also see the contents of all previous versions of the file as well.

 
 

Move HCP Data More Efficiently With New Copy File Action
In Workflow Designer, when creating workflows that involve transferring data within an HCP system, a new Copy File action leverages the HCP Put-Copy REST API to more efficiently move or copy the data.  In prior versions of Content Intelligence, the data first had to be downloaded from the source HCP location to Content Intelligence, and then re-uploaded to the destination HCP location.  This incurred additional overhead and required extra steps.  The new Copy File feature paves the way for running performant migrations and more efficient copy operations with HCP namespaces.

 


Even More Customized Search Experiences

Every release of Content Intelligence includes new ways to customize end-user search experiences.  In v1.5, you can now configure entire indexes and individual fields with your language of choice, as well as dynamically filter search results based on the authenticated user.

Set Language for Search Indexes
Hitachi Content Intelligence search indexes can be configured with a choice of many languages.  By default, indexes are set to use the English language.  The new Language Select index configuration option allows you to correlate different languages to different fields or entire indexes.  You might opt to select a different language if you know that a specific data source contains non-English text.  When indexes or fields are set to a specific language, text is processed according to the rules of that language, and you will be able to search the index in that language.  Supported languages include:  Arabic, Armenian, Basque, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Farsi, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Latvian, Norwegian, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Thai, Turkish, and Ukrainian. 


 

Automatically Filter Queries by User
In Hitachi Content Intelligence v1.4, Filter Queries provided an additional way to customize searches by statically limiting results for specified users.   In v1.5, you can now apply user variables to Filter Queries so that results are dynamically limited according to the (CIFS and HCP Anywhere) permissions of the authenticated user.  Supported variables are:  $USERID, $USERSHORTNAME, $USERLONGNAME, and $USERDISPLAYNAME.  So, when an authenticated user performs a search, only the documents which the user has permissions to view will be displayed. 


Data Processing Improvements and Other Enhancements

With more data processing flexibility and improvements to performance and scale, Hitachi Content Intelligence v1.5 streamlines Data Operations to help unlock the full value of your data. 

Maintain Filesystem Timestamps
When data is moved from one file system to another, common behavior is that timestamps get modified to the current time during the process.  There are many reasons why you would instead want to preserve the timestamp information.  For example, this is critical for compliance and legal investigations.  With Hitachi Content Intelligence v1.5, you can now preserve important timestamp information - such as access time, creation time, and modification time - from CIFS, HDFS and LFS file systems while moving or copying data on those sources.

 

Process All Documents, Including Empty Folders
Just as important for Compliance and Legal investigations, as well as for file transfers, is to maintain directory structures during copy or migration operations.  In previous versions of Content Intelligence, if a source directory was empty, then it was not processed in a workflow.  Content Intelligence v1.5 includes a new option to Process All Documents so that empty directories are processed as regular objects to get included in search results and/or written to target CIFS, LFS, HDFS, S3, or HCP Anywhere destinations.

 

 

Use Search Results As Inputs to Workflows
You probably know that one of the key functions of Content Intelligence and Content Search is to provide customized, secure, and federated searches across all your organizational data.  It accomplishes this in a workflow with powerful data processing abilities, namely content indexing.  A workflow is comprised of three key components (Input, Pipeline, and Output).  Inputs define your data sources.  Pipelines are comprised of multiple stages that process the input data.  The processed data is then output to an index.  An index is a collection of data that you can perform searches against.  In Content Intelligence v1.5, you can select an existing internal or external Solr index as the input to your workflow.  This allows you to pre-define index queries in a workflow and process the query results as new documents – perhaps to edit, delete, index, or store them elsewhere.  This new functionality opens the door for new and innovative ways to maximize the true value of your data.

 

 

Dynamically Scale Indexes 
In order to ensure optimal Content Intelligence system performance and scalability, it is recommended to properly size your indexes to accommodate for future growth.  One way to accomplish this is with sharding.  An index can be split into smaller segments, called shards, that get dynamically distributed across the cluster.  This allows the index to grow very large, as shards are dynamically balanced across the cluster when the nodes run out of space or are heavily loaded.  You configure the index shard count during initial index creation.  So, what if after some time passes, you realize that your index is no longer optimally configured for your current needs? Perhaps you guessed wrong at the start, or your cluster has grown unexpectedly, and you now need to resize your index.  In Content Intelligence v1.5, you can increase the shard count of your index seamlessly with a new index configuration option, Desired Index Shard Count.  This new functionality reduces some of the upfront sizing guesswork because scaling up your index is now painless and can be done at any time.

 

Configure Document Processing Time Limits
In Workflow Designer, you can now configure alerts for documents that are taking longer than expected to process.  For documents that take longer to process than the configured time limit (the default is 5 minutes), an alert is displayed in the workflow status.  This can help pinpoint problematic files that may be impacting workflow performance.


Want to learn more?  Be sure to check out the following resources:

 


Thanks for reading!


Michael Pacheco
Senior Solutions Marketing Manager, Hitachi Vantara
Follow me on Twitter:
@TechMikePacheco

 


#HitachiContentIntelligenceHCI
#Blog
#ThoughtLeadership
1 comment
13 views

Permalink

Comments

05-04-2022 12:22

Nice Article