Skip navigation
1 2 3 Previous Next

Hitachi Content Intelligence

43 posts

As HCM is getting more use by our HCP customers, some of the deployments are running into more advanced topologies, namely ones that utilize load balancers (LB) in front of the HCP. For the newly indoctrinated, a LB serves an important function for HCP clients that cannot rely on DNS to obtain the HCP node topology. This topic has been discussed in a number of postings:

 

HCM with Load Balancer-fronted HCP (vADC)

HCM cannot connect to HCP nodes via SNMP when using a load Balancer

HCM for HCP behind Load Balancer

 

I wanted to summarize here the requirements for making HCM work when a LB stands between it and HCP:

 

  • The LB must be configured to allow UDP traffic on SNMP (161, 162) and Syslog (9601) ports. This is not usually the default configuration. It also must allow MAPI calls (TCP 9090) to pass through it.
  • Node Status signal (the one responsible for HTTP connections metric) requires HCM to be able to resolve HCP cluster name to the IPs of all nodes and HCM should be able to send a node status request directly to every single node. If DNS resolves only to a single IP of the load balancer, then HCM would be getting node status metrics from just one node.
  • For access logs coming over Syslog to be properly attributed to the HCP being monitored, the HCP system names must resolve in DNS to the IPs of all HCP nodes. When a syslog message arrives in HCM, HCM compares the source IP of the message to the IPs of the nodes it sees in DNS, and if those do not match, the syslog messages will not be processed. Like with Node Status signal, if your DNS resolves HCP name to the single LB IP, HCM won’t be able to attribute Syslog traffic to the HCP you are monitoring. Until we can find a permanent solution to this issue, we would ask that you edit the /etc/hosts file on the HCM node and insert the IP addresses of ALL the HCP nodes in ALL the HCP clusters that you plan to monitor through the LB.

 

To rectify the syslog issue above we are considering enhancing the HCM configuration wizard with an extra radio button that becomes active when you enable the syslog signal. The radio button will ask you if the HCP cluster name can resolve all HCP node addresses. If the answer is NO, then we will ask you to input the node IP addresses into a text box.

 

If you are ever in doubt of which metric is derived from which signal source, you can look it up in the Metrics Glossary in What is Hitachi Content Monitor (customer facing deck) starting on slide 19.

With a new look and feel, improved navigation, and richer visualizations, Content Intelligence v1.4 delivers a greatly-enhanced user experience.

 

Streamlined User Experience

Enhancements to the the Admin App include more intuitive configuration screens, improved help menus, and new dashboards with rich charts and graphs that provide convenient views to system health, services, events, configuration, and more.

Content Intelligence - User Interface - Dashboard.pngContent Intelligence - User Interface - Services.png


New Single Sign-On Portal

With the new Single Sign-On portal, you simply login once to automatically authenticate and conveniently access all of the Content Intelligence applications that you have access to (Admin App, Content Search, Content Monitor, Workflow Designer). You no longer need to login to these separately and can launch each app that you have access to right from the new Single Sign-On portal.

 

By the way, you'll also notice that Workflow Designer and Admin App now have their own dedicated applications. Previously, these two were accessed and managed together. The separation of these makes a lot of sense. You can now isolate these functions to the respective users that will either design workflows, or perform system administration.

Content Intelligence - User Interface - SSO.png

Improved Installation Wizard

An updated installation wizard simplifies the deployment of Content Intelligence with granular details on installation progress, such as completed and remaining steps. The new wizard also now includes intuitive descriptions of the installable components with screenshots.

Content Intelligence - User Interface - Installation.png

 

 

Also, check out the following resources:

 

Thanks for reading!

 


Michael Pacheco

Senior Solutions Marketing Manager, Hitachi Vantara

 

Follow me on Twitter: @TechMikePacheco

New and improved connectors for HCP Anywhere and CIFS provide simplified access and more robust processing of data on these sources.

 

HCP Anywhere System-Wide Connector

Hitachi Content Intelligence includes two data connectors that are able to access and process files that reside on HCP Anywhere. HCP Anywhere allows you to mobilize, protect, sync, and share HCP data from anywhere – including end user devices and remote offices.

 

With the first version of the HCP Anywhere connector, it was limited to the files within a single user’s HCP Anywhere folder. With the new HCP Anywhere System-Wide Connector, it can access and process files for all user and shared data that resides within HCP Anywhere.

 

This provides a much simpler and more comprehensive view of all end-user, remote office, and edge data. This new connector can be used to augment existing end-user data protection and compliance strategies, especially for data that resides outside of the datacenter – such as on desktops, laptops, mobile devices, and remote offices.

Content Intelligence - HCP Anywhere Data Connector.png

CIFS Connector

Having spent the majority of my career working closely with Microsoft Windows environments, this is one my favorite features of Hitachi Content Intelligence v1.4. A new CIFS connector provides access to remote CIFS shares.

 

Yes, Content Intelligence has been able to process data from CIFS shares in previous versions. But, that required Samba clients to be installed and CIFS shares to be locally mounted on each Content Intelligence cluster node. That was a lot of extra work – and difficult to sustain at scale.

 

The new CIFS connector authenticates to remote CIFS shares using Active Directory credentials, and also provides new abilities to filter the directories to crawl by whitelisting or blacklisting them.

Content Intelligence - CIFS Data Connector.png

 

 

Also, check out the following resources:

 

Thanks for reading!

 


Michael Pacheco

Senior Solutions Marketing Manager, Hitachi Vantara

 

Follow me on Twitter: @TechMikePacheco

There are some slick new ways to further customize end-user search experiences and perform maintenance on your indexes, with Filter Queries, Delete Documents By Query, and Index Or Delete actions.

 

Filter Queries

Filter Queries are new index query settings that allow for further customization of search results. Using Filter Queries, query settings can be configured to additionally limit search results for targeted users.

 

For example, let’s say several indexed documents have been deleted. While privileged users might need awareness of the deletions and want them reflected in their search results for audit reasons, most users might not want to see those documents in their search results.

 

In the example below, while the deleted documents are in the index, the standard users group will not see them in their search results, as their queries have been configured to filter out all deleted documents (-HCI_deleted:true).

Content Search - Filter Queries.png

Index Or Delete Action

There are now more choices on how to handle deleted documents in indexes.

 

Previously, deleted documents were re-indexed with a custom metadata tag, HCI_deleted. When adding an index to a workflow as an output, you now have the option to automatically remove all deleted documents from the index with the new Index Or Delete action.

 

With this new option, instead of indexing deleted documents, they will be removed from the index.

Content Search - Index Or Delete Action.png


Delete Documents By Query

A new index maintenance option allows you to delete documents in bulk from your index.

 

For example, let’s say that you wanted to remove a group of unwanted documents from your index. Perhaps, some documents have been deleted from the source and you want the index to reflect that.

 

The new Delete Documents by Query feature will automatically remove all documents matching your submitted query from the index. This is a very destructive operation that can not be undone. Thus, before using this feature, it is highly recommended to first test with a regular query within Workflow Designer to ensure that only the desired documents are returned.

Content Search - Delete By Query.png

 

 

Also, check out the following resources:

 

Thanks for reading!

 


Michael Pacheco

Senior Solutions Marketing Manager, Hitachi Vantara

 

Follow me on Twitter: @TechMikePacheco

One of the key advantages of Content Intelligence is its flexibility to connect to and transform data in various ways. Content Intelligence includes a comprehensive library of various processing stages to analyze, extract, filter, transform, enrich, and further act upon data. The Plugin Software Development Kit can also be leveraged to create new processing stages or connectors to new data sources. Below, are the new data processing enhancements in Content Intelligence v1.4.

 

Decompression Stage

Just because data is compressed shouldn’t mean that Content Intelligence can not access and process it. This new Decompression stage will automatically decompress documents that have been compressed with popular compression formats, including GZIP, BZIP2, and XZ files, so that files of these types residing on your data sources can be read and processed by Content Intelligence workflows.

 

Of course, this is not to be confused with the existing TAR Expansion stage, which expands TAR archives.

Content Intelligence - Decompression Stage.png
Attach Stream Stage

This new processing stage can blend data streams from multiple sources into single documents for further processing. This can be useful if you would like to index related files together.

 

For example, you can index e-mail attachments together with their corresponding e-mail messages, or subtitles along with video files.

Content Intelligence - Attach Stream Stage.png
JavaScript Stage
New with Content Intelligence v1.4, you can now include your own JavaScript in workflows to perform custom actions against individual document fields or entire documents.

 

In comparison to developing completely new plugins, the JavaScript stage requires far fewer steps to get code in place. This is especially useful if you need to do something relatively simple, but there is not an existing processing stage for it.

 

Let’s say that you have a database of medical records which contains height information in centimeters, and you want to convert to inches. Should be straightforward, right? But, there is not an existing processing stage for something like that. By using the JavaScript stage and a couple of simple and short lines of code, you could easily convert all height values from centimeters to inches by dividing by 2.54. (1 inch = 2.54 centimeters).

Content Intelligence - JavaScript Stage.png

Conditionals For Field Values And Date Math

In a workflow, if you have a processing stage that you want to affect only certain documents, but not others, you can include conditional statements to determine which documents will be either processed or bypassed by the stage. New with Hitachi Content Intelligence v1.4, you now can include field values and date math in conditional statements.

 

For example, you can compare the values of two fields (foo and bar) with each other and then conditionally continue processing based on the results.

Content Intelligence - Field Value Conditionals.png
You can now also compare date and time field values to perform calculations that are relative to fixed moments in time. For instance, you may want to determine if the values fall within a relative period (past 6 months, past 24 hours, past year, 3 years from now, etc…). This introduces a great deal of simplicity and many new possibilities to your Content Intelligence workflows!

Content Intelligence - Date Math Conditionals.png

Processing Failed Documents

For cases where a particular processing stage fails to process a document, there are now additional options to further process those documents through additional stages. By enabling the Continue Processing Failed Documents option in the workflow, the reason for the failure will be added to the document metadata, and the document will continue being processed by the remainder of the pipeline. This allows the system to conditionally handle failed documents differently based on the reason for the failure.

 

For example, you can index all documents with the error mentioning encrypted, and then either drop them from the workflow to prevent further processing, or proceed with the remaining processing stages in the pipeline.

Content Intelligence - Continue Processing Failed Documents.png

Troubleshooting Document Failures

It's now easier to investigate document failures from workflows. You can now view and filter document failures by date, category, or reason, making it much easier to pinpoint the failures that you’re looking for.

 

With consolidated and filterable views in the Document Failures table, you can identify commonalities and uncover trends for documents that failed for the same reasons or during a certain time window.
Content Intelligence - Document Failures Table.png

 

Also, check out the following resources:

 

Thanks for reading!

 


Michael Pacheco

Senior Solutions Marketing Manager, Hitachi Vantara

 

Follow me on Twitter: @TechMikePacheco

True to the main value of its parent product, Hitachi Content Intelligence, Hitachi Content Monitor’s (Content Monitor) latest Artificial Intelligence features, Forecasting and Anomaly Detection, are powered by new Machine Learning algorithms that make Content Monitor even smarter. With Content Monitor, you can centrally monitor the storage performance of multiple Hitachi Content Platform (HCP) clusters in near real-time and for specific time periods from a single view. With Anomaly Detection, Content Monitor can now also detect abnormal behaviors, based on historical HCP performance, to warn you about potential problems before they occur. Additionally, Content Monitor can now also predict future system needs based on historical performance behaviors with Forecasting.

 

Forecasting

Forecasting, a Content Monitor feature, analyzes HCP storage consumption behaviors to provide a daily prediction, with a confidence rating, of future storage capacity needs.

Content Monitor - Forecasting - Days Remaining Capacity Confidence.pngContent Monitor - Forecasting - Graph.png

 

Anomaly Detection

With Anomaly Detection, Content Monitor monitors HCP storage consumption and front-end network traffic patterns and generates notifications when abnormal behaviors are detected.

Content Monitor - Anomaly Detection.png

Import Logs

When you add a new HCP system to Content Monitor, you not only gain the ability to start monitoring it in near real-time, but metrics will continue to be captured for performance insights within specific time periods as well. This new log import feature allows you to import logs from an existing HCP system for visibility to historical performance. This can come in handy - for populating Content Monitor’s performance dashboards with historical information, as well as for troubleshooting and isolating issues.

Content Monitor - Import Historical HCP Logs.png


New Dashboard Visualizations For Objects Serviced By HCP

For even more comprehensive insights into HCP system performance, customizable dashboards provide the ability to monitor the activity of services running on HCP, with interactive and rich visualizations for granular details such as how long the services have been running, and how many objects have been processed.

Content Monitor - Objects Services By HCP.png

 

Also, check out the following resources:

 

Thanks for reading!

 


Michael Pacheco

Senior Solutions Marketing Manager, Hitachi Vantara

 

Follow me on Twitter: @TechMikePacheco

Hitachi Content Intelligence delivers a flexible and robust solution framework for comprehensive discovery and quick exploration of critical business data and storage operations.

Whether your data is on-premises, off-premises, in the cloud, structured, or unstructured, Hitachi Content Intelligence (Content Intelligence) delivers a powerful framework of tools for connecting to, transforming, and acting upon organizational data to maximize the value of it for better business outcomes.

Content Intelligence - Overview.png
Using the Content Intelligence Workflow Designer, you can create customized workflows to connect to all of your data repositories, transform that data with a comprehensive library of processing stages, and then output the data to optimize it for use by downstream applications and users.


Hitachi Content Search (Content Search), delivered with Content Intelligence, provides customized, secure, and federated searches across all of your organizational data sources from a self-service user portal so users can, on a role-based basis, quickly find the data they need whenever required.

 

Also, included with Content Intelligence is Hitachi Content Monitor (Content Monitor), a tightly-integrated, cost-effective add-on to the Hitachi Content Platform (HCP) that delivers near real-time monitoring and performance visualizations of multiple HCP clusters at scale from an interactive dashboard-rich user interface. If you are already using HCP for archiving, compliance, or as part of your larger Data Governance strategy, you should absolutely be using Content Monitor to monitor the operational performance of it. Given its low-cost, easy installation, and robust feature set, it truly is a no-brainer!

Content Intelligence - Components.pngContent Intelligence v1.4 delivers many new enhancements to Workflow Designer, Content Search, and Content Monitor. This release includes data connectors to new data sources, processing stages to transform data in new ways, and many other improvements. Not to mention – the newly-updated User Interface looks really nice with a more streamlined user experience!

 

Want to learn more? Be sure to check out parts 2 through 6 of this blog series via the links below.


What’s new in Content Intelligence v1.4:

  • Content Monitor Features:
    • Artificial Intelligence and Machine Learning with Anomaly Detection and Forecasting
    • Import historical HCP logs
    • New visualizations
  • Data Connector Enhancements:
    • New connectors for HCP Anywhere and CIFS
  • Content Search Features:
    • Customize search results with Filter Queries
    • Index maintenance options for bulk-removal of documents
  • Data Processing Enhancements:
    • Decompression stage
    • Attach Stream stage
    • JavaScript stage
    • Compare field values and perform date math within conditionals
    • Improved management of document failures
  • User Interface Enhancements:
    • Streamlined user experience with richer visualizations
    • New Single Sign-On portal
    • Dedicated applications for workflow design and system administration
    • Improved installation wizard

 

 

Also, check out the following resources:

 

Thanks for reading!

 


Michael Pacheco

Senior Solutions Marketing Manager, Hitachi Vantara

 

Follow me on Twitter: @TechMikePacheco

Jared Cohen

HCI 1.4.0 Release

Posted by Jared Cohen Employee Mar 22, 2019

Hi all,

 

We have just released version 1.4.0 of Hitachi Content Intelligence. All artifacts are downloadable from the Downloads page on community as well as the release notes and installation instructions.

 

Enjoy!

 

Thanks,

-Jared

The need for simple compliance search

Companies are often faced with the need to produce documents and emails as part of legal disputes. Maybe an employee deletes mails from the server (maybe even years ago), or an external partner claims the contents of a mailed document were different. Quite often the work is then passed to the IT team: find the necessary emails. This often means restore from backup, dig around in multiple archives, find and search through PST files and so on.

 

Lacking the proper tools this is almost guaranteed to be a waste of time and should not be fulfilled by IT in the first place. And with the next request it starts all over again...

 

The simple solution

A proper and lightweight tool is the combination of Hitachi Content Intelligence and Hitachi Content Platform to create a tailored search, in this case specifically for email. The high level architecture looks something like this:

 

Architecture.png

Using the SMTP journaling mechanism Hitachi Content Platform (HCP) can ingest email directly from the mail server or gateway without the need for any 3rd party software. Hitachi Content Intelligence (HCI) then processes the ingested emails, extracts the relevant information and creates a searchable index.

The built in search app of HCI is the front end for the data consumers (e.g.: the ones in need of searching like the legal or compliance team) to carry out their own searches without the need to involve IT.

A good search design is crucial to make it fast, intuitive and last but not least accurate. Who says search has to be cumbersome? The search app is very flexible and can easily be adapted, without coding of course, towards email search, which this blog entry is all about.

 

 

How to set it up

This guide assumes the following:

  • HCP and HCI are already up and running.
  • Your email server / gateway is configured to forward incoming and outgoing email to HCP via SMTP.
  • Authentication is gracefully ignored to keep it simple. HCI does support very fine grained access control to make sure sensitive data (as an index of your email data would be) can only be accessed by authorized users. For a deep dive on this topic have a look at this excellent summary: HCI: Authentication and Authorization

 

Step 0: The development dataset

I have used the widely known Enron email dataset, specifically the one provided by EDRM and Nuix (available here: https://www.edrm.net/resources/data-sets/edrm-enron-email-data-set/ ), which includes the attachments, is cleansed from PII data and is provided as a set of PST files.

For my purposes I extracted the PST files into seperate .eml files with readpst, part of libpst package. This mimics the real life scenario, since HCP will be set up to ingest each email as .eml object including any attachments. All email data shown in this blog is from this dataset.

 

Step 1: Think about the user experience

Now before actually doing anything like configuring the user interface or adapting index workflows pause to think about the different ways emails should be searched for. I came up with the following ways a user might search for email:

  • free text search by sender (from), recipients (to) or carbon copy (cc)
  • faceting for sender, recipient or carbon copy
  • free text search by subject
  • free text search by content (email body and attachments)
  • searches refined by date or date range
  • sort by date
  • searches refined by attachment type

 

Those should cover most of the requests like "find the email Jane Doe sent to our competitor with the confidential Excel spreadsheet sometime last month". It would allow to quickly produce a complete set of conversation between a sender and recipient for a certain time or for a certain subject, or indeed between two companies based on domains. Simple searches for content would also be possible of course.

 

A simple search flow might be: look for a certain time range, pick a sender and recipient and then look for specific subjects or contents. We want to make it easy for the end user to create those kind of searches and will tailor the setup to enable just that.

 

Step 2: Prepare HCP

The first step is to set up HCP the proper way. There are multiple ways how HCP can store emails ingested through SMTP. Now first we need a namespace were SMTP is enabled and the correct options for storing are set. You can do so in the protocols section of the admin interface:

 

HCP settings.png

 

By choosing.eml with inline attachments each email will stored as a single entity and later referencing is easier and more granular.

Step 3: Set up the data connection to HCP

After you have verified emails are ingested on HCP, HCI needs to know where to look. HCI supports HCP right out of the box in multiple ways. For this use case the HCP MQE (Metadata Query Engine) data connection is used:

 

Data Connection.png

 

Choosing this connection the indexing job can only pick up the changes since the last run. Given the WORM nature we don't need to look at indexed objects twice, since we can be sure they have not been changed.

 

Step 4: Develop the pipelines

This is the part where we adapt the pipeline to create a meaningful index for the email search use case. The HCI default pipeline provided with the system is a very good starting point. Basically it will do the following:

  • detect the MIME type
  • expand any archives / documents like PST, zip, mbox, ...
  • expand emails into seperate parts: this is where all the attachments are seperated from the email body and processed seperately
  • extract text and metadata: this creates fields with the extraced text and special fields like Message_From and Creation_Date
  • snippet extraction: this creates a short preview of the content for display in the search results
  • date conversion: one of the most important stages as it will bring all dates to a common format (ISO 8601)

 

For this particular use case I added only two processing stages to the default pipeline:

  • one mapping stage to create renamed copies of certain fields to make the end user experience better, e.g. Message_From to from (more on that later)
  • one tagging stage to tag emails with attachments

 

pipeline.png

 

mapping.png

 

I created a second processing pipeline to create additional tags for file type icons (this makes search results nicer to look at) and map the gazillion different content types to more meaningful values. For example "image/gif" and "image/jpeg" and so on are all mapped to type "Image". Again this makes it easier for the end user, who will prefer to work with meaningful terms instead of cryptic IT speak.

 

tagging.png

if.png

tagging2.png

 

If you want to have a detailed look, the pipelines are available in the attached export bundle at the end of the post.

 

Step 5: Create and kick off the workflow

The workflow is where you tie it all together. You need to provide the input (the data connection we set up) the processing pipelines and the output, an empty index in our case.

 

workflow.png

 

Let the workflow run once. It is best to use a small but reasonable sample set for the initial development. I picked a couple of mailboxes from the Enron dataset with about 50.000 mails to start with. This will keep iterating (adapting the index for the use case) fast and should be enough emails so most types we expect in real life are included. If not (e.g. no emails with presentations), just pick some more.

After the workflow has finished the index is ready and we can set up the look and feel experience for the end users.

 

Step 6: Adapt the index schema

The index is set up as a collection of fields with a certain type. The type together with their attributes define how they are treated with regards to search.

 

For email search senders and recipients play a big role. Those are extracted from the email headers and stored in the fields Message_From and Message_To. Unfortunately this is not very well defined and the same address can be written in a multitude of ways. For example for John Doe you might see things like:

Message_From and Message_To are created as type "strings" by the system. Fields of this type are not broken up into separate tokens and in order to produce a hit the search term needs to be exact, including case. Now this is a pain since the exact format and spelling is typically unknown. Besides that typing

Message_From:"John <JOHN.DOE@example.com>"

into the search field is not very satisfying.

Fortunately HCI has a solution for this. By changing the type from "strings" to "text_hci" we get a tokenized case insensitive search for the fields. Since the Message_From fields were copied to a field called from we can do the changes there and keep the original:

 

schema.png

 

Now our entry into the search field can be as simple as

from:john

to produce a match on all of the above variants. Neat, right?

 

The last change to the schema is the Creation_Date field needs to be changed from type "tdates" to "tdate". An email can only have one date when it was sent anyway and having a single value allows for sorting by date, which is super useful.

 

Step 7: Adapt the end user experience

Now that everything in place we are just a few steps away from a super efficient search for our end users. The behavior of the search app is adapted within the Query Settings section of a particular index.

 

The first thing is to allow a download of the search results. The intention here is to provide a list of links into the archive for every hit in the final search.

 

Next we pretty up the user interface by changing the display names of fields we want to expose. This is achieved in the Fields section. Those Message_xx fields are also the ones we enable as facets. They will show up on the left side of the UI. I have chosen the Creation_Date and Content_Type fields as refinements. Finally the Creation_Date field should also be sortable. All those settings are available in the sections highlighted below:

 

fields.png

 

The last adjustment ( Relevancy and Access Control have not been tampered with) is how the results are displayed in the search results which is configured in the Results section. There are basically two different kinds of search results in this case: email bodies and attachments. So the results are tailored towards those two cases. To summarize the following will be displayed: the subject as title, the URI to the original object in HCP, a snippet of the content, a file type icon (if it is an attachment), the date, the sender, the recipient, carbon copy and a list of attachments:

 

results.png

 

Results in the search UI will then look like this for both cases:

 

email.png

attachment.png

 

The UI now has a couple of features used to iteratively drill down the results. Here they are mapped out (left to right, top to bottom):

  1. select index (if you have more than one)
  2. search field
  3. download button
  4. enable refinements
  5. refinements (Content Type and Date)
  6. sorting
  7. facets with separate search field
  8. results section
  9. ...more to expand the full contents of each result

 

ui.png

 

On to the last task: finding emails.

 

Step 8: Test and try out the search UI

Now it's time to put the search features to good use. For all the different ways to search here is one sample way. Let's assume we know the following: one employee received a document sometime in October containing the words "fun run".

One way would be to first limit the time frame using the Date refinement using the date picker. Second we enter +"fun run" in the search field and get 5 results. Why the plus sign and why the quotes? That's down to syntax. The plus forces a logical AND and the quotes ensure the term is searched for in exactly that order. If in doubt about syntax HCI has you covered again with a simple builtin help function:

 

help.png

 

Quickly investigating the result set of 5 shows that we found the document in question and identified it as the "Enron Running Club" document, actually sent to a distribution list. The whole thing took less than a minute to drill down to a couple of hits and not even using the full breadth of capabilities we built. Pretty impressive!

 

running.png

 

 

What now?

I am sure by now you are itching to try this out yourself or see it in action. Get in touch with us so we can schedule a demo with your trusted Solution Consultant!

 

If you already have HCI you can just download the exported versions of the processing pipelines and index definitions. I have attached them for your convenience. Be sure to optimize them a bit more and remove unnecessary fields to keep the index lean and mean. The workflows and data connections are not included as they are unique to your specific environment.

 

Happy searching,

-Christian

Jared Cohen

Updating to HCI 1.4.0

Posted by Jared Cohen Employee Mar 1, 2019

Content Intelligence version 1.4.0 will be available soon. Here are some notes to keep in mind when updating from HCI version 1.3.1 or earlier to 1.4.0.

 

There are a couple of steps to take before updating to 1.4.0:

1. Make sure that all instances are running the minimum required docker version or later (docker 1.13.1).

2. Make sure that your HCI system has a hostname configured (Admin App > System Configuration > Security).

 

After updating to 1.4.0, there are a few extra steps you may wish to take:

1. If you are using the Monitor App and would like to use the new analytics features (Anomaly Detection and/or Forecasting):

  • You will need to scale up the new Monitor App Analytics service (Admin App > Services > Add Service).
  • You will also need to configure the Anomaly Detection job type and Forecasting job type to run somewhere on the cluster (Admin App > Jobs)

2. If you have users configured with an identity provider (i.e. any users EXCEPT the local admin user), you will need to add the new permissions to those users' roles for any of the new features as desired (including historical log import in the Monitor App and Index Maintenance for Search). This is done at (Admin App > Configuration > Security > Roles).

 

Hope this helps,

-Jared

Use this blog to import HCI workflow component configuration .bundle file for the integration of Hitachi Data Instance Director(HDID), Hitachi Content Platform(HCP), and Hitachi Content Intelligence(HCI).

Import of the HCI bundle includes HCI Workflow, Data connection, Pipeline, Index, and Content Class which is needed for the HDID, HCP, and HCI products integration to perform self-service service using HCI.

 

Please find attached "hci_HDID_HCP_HCI_Export.bundle" bundle file.

 

Follow the the below steps to import HCI bundle which includes HCI workflow, Data connection, Pipeline, Index, Content Class.

1.  Login to Hitachi Content Intelligence admin console using https://IP:8000 web URL with admin user and password using administration app.

2.  Click on Workflows. The Workflow Designer page opens.

3,  Click on Import/Export. Click on Import.

4.  Click to Upload to upload the HDID bundle into the HCI.

5.  Download the attached ‘hci_HDID_HCP_HCI_Export.bundle’ bundle at some location. Browse and select the same HCI bundle.

6.  Select all the components to import.

7.  Click on the Complete Import button. This will import the HCI workflow component configuration .bundle file into the HCI.

 

After the successful HCI bundle import, HCI Workflow, Data connection, Pipeline, Index, Content class was created.

 

Workflow:

  • Workflow 'Workflow_HDID_HCP_HCI' was created.
  • This workflow is used to perform self-service search of Hitachi Content Platform data using Hitachi Content Intelligence with Hitachi Content Search.

Data Connection:

  • Data connection 'DataConnection_HDID_HCP_HCI' was created with 'HCP MQE' connection type.
  • This data connection is used in the 'Workflow_HDID_HCP_HCI' workflow.

  • User needs to modify the HCP Data Connection details. i.e. HCP System Name, HCP Tenant Name, HCP Namespace Name, HCP tenant User Name and Password as per the users environment. Figure shows the HCP connection details from the environment where bundle was exported.

Processing Pipeline:

  • Pipeline 'ProcessingPipeline_HDID_HCP_HCP' was created.
  • This pipeline is used in the 'Workflow_HDID_HCP_HCI' workflow.
  • This pipeline detects document types, expands archive, and performs basic content and metadata extraction. Suitable for basic enterprise search use cases.

  • In the 'ProcessingPipeline_HDID_HCP_HCP' pipeline inside the Content Class Extraction content class 'ContentClass_HDID_HCP_HCI' was added which was imported along with other component was added.

 

Index Collections:

  • Index 'IndexCollections_HDID_HCP_HCP' was created.
  • This index collection is used in the 'Workflow_HDID_HCP_HCI' workflow.

Content Classes:

  • Content Class 'ContentClass_HDID_HCP_HCP' was created.
  • This index collection is used in the 'ProcessingPipeline_HDID_HCP_HCP' Pipeline.

  • Following content properties were added in the 'ContentClass_HDID_HCP_HCP' content class which was needed to search HCP data which copied from the source data using HDID.

 

Click here to get Implementation Guide for Integration of HDID, HCP, and HCI


We have just release the Content Intelligence version 1.3.1 maintenance release.

 

This release contains a number of bug fixes. The release notes and installation instructions are available along with the product downloads from the Downloads page.

Hi everybody,

 

I've been cooking a couple of tools to try to analyze the capacity consumed by live and backup object versions in HCP using Content Intelligence.

 

What I have managed to obtain so far works like this:

The first tool is a stage plugin that calculates the total size of all versions of the object:

And the second tool is a python script that uses the data obtained above to generate a report with the size of active and backup versions aggregated by the field of your choice (so you can obtain the capacity consumption of, for example, each namespace, as seen here:)
The script output is a CSV report with that day's date so you can automate its execution so that it gives you reports each week, for example.

 

You can find the source code, plugin and python script here:

 

https://hcpanywhere.hitachivantara.com/u/1ORe3kpkXtfG5F7E/Latest?l

 

This is all super early concept for a PoC, so I would appreciate any tips/advices/suggestions/corrections you have.

 

There are some things that I'm still not sure about, specifically:

 

  • Can I access the authentication token in the HCP connector from inside the stage? If I could do that I wouldn't have to configure the auth token in the stage configuration.
  • Is the SOLR stats functionality expected to be added to the HCI Search API in the future (or was it already added and I didn't realize)? It's what I'm using provisionally for the PoC in the python script at the moment, but I would like to rewrite it to use the HCI Search API, if possible.

 

Thank you in advance!

 

EDIT 26/07/18 - Updated plugin and source code, modifying the authorization settings as suggested by Yury. Thanks again for the tip!

 

 

Jon Chinitz

Plugins

Posted by Jon Chinitz Employee Jun 14, 2018

I am seeing more folks contributing plugins, whether they are pipeline stages or connectors. I have created a dedicated card on the Overview page that will list them all. To have your plugin automatically added tag the upload with the string "hci stage" (or "plugin", "stage", "connector").

 

Thanks for all your contributions and keep them coming!

 

Jonathan

Hitachi Content Intelligence delivers a flexible and robust solution framework to provide comprehensive discovery and quick exploration of critical business data and storage operations.

 

Make smarter decisions with better data and deliver the best information to the right people at the right time.

  • Connect to all of your data for real-time access regardless of its location or format - including on-premises, off-premises, or in the cloud
  • Combine multiple data sources into a single, centralized, and unified search experience
  • Data in context is everything – put data into meaningful form that can be easily consumed
  • Deliver relevant and insightful business information to the right users - wherever they are, whenever they need it

 

Designed for performance and scalable to meet your needs.

  • Flexible deployment options enable physical, virtual, or hosted instances
  • Dynamically scale performance up to 10,000+ nodes
  • Adopt new data formats, and create custom data connections and processing stages for business integrations and custom applications with a fully-featured software development kit

 

Connect Understand Act.png

 

What’s new in Hitachi Content Intelligence v1.3

 

  • Hitachi Content Monitor
  • Simplified navigation of Hitachi Content Intelligence consoles
  • External storage support for Docker Service Containers
  • Increased flexibility with new Workflow Jobs
  • Enhanced data processing actions
  • New and improved data connectors
  • Overall improvements to performance and functionality

 

Hitachi Content Monitor provides enhanced storage monitoring for Hitachi Content Platform.

  • Centrally monitor HCP G Series and HCP VM storage performance at scale, in near real-time, and for specific time periods
  • Analyze trends to improve capacity planning of resources - such as storage, compute, and networking
  • Customize monitoring of performance metrics that are relevant to business needs
  • Create detailed analytics and graphical visualizations that are easy to understand

 

HCP Storage and Objects - for blog.png

 

Hitachi Content Platform (HCP) is a massively scalable, multi-tiered, multi-tenant, hybrid cloud solution that spans small, mid-sized, and enterprise organizations.  While HCP already provides monitoring capabilities, Hitachi Content Monitor (Content Monitor) is a tightly-integrated, cost-effective add-on that delivers enhanced monitoring and performance visualizations of HCP G Series and HCP VM storage nodes.

 

Content Monitor’s tight-integration with HCP enables comprehensive insights into HCP performance to enable proactive capacity planning and more timely troubleshooting.  Customizable and pre-built dashboards provide a convenient view of critical HCP events and performance violations.  Receive e-mail and syslog notifications when defined thresholds are exceeded.  Aggregate and visualize multiple HCP performance metrics into a single view, and correlate events with each other to enable deeper insights into HCP behavior.

 

Content Monitor is quick to install, easy to configure, and simple to use.

 

HCP Application Load.png

With Content Monitor, a feature of the Hitachi Content Intelligence (Content Intelligence) product, you can monitor multiple HCP clusters in near real-time from a single management console for information on capacity, I/O, utilization, throughput, latency, and more.

 

 

Simplified navigation

  • Easily and seamlessly navigate, and automatically authenticate, between Content Intelligence apps (Admin, Search, Monitor) with enhanced toolbar actions 
  • No more need for numerous web browser tabs

 

External storage support for Docker Service Containers

  • Use external storage with Content Intelligence for more robust data storage features and improved sharing of remote volumes across multiple containers

 

Increased flexibility with new Workflow Jobs

  • Each Content Intelligence workflow job can now be individually monitored and configured to run on all Content Intelligence instances, a specific subset of instances, or to float across instances to dynamically run wherever resources are available

 

Enhanced data processing actions

  • Conditionally index processed documents to existing Content Intelligence, Elasticsearch, or Apache Solr indexes
  • New Aggregation calculations for 'Standard Deviation' and 'Variance' of values in fields of data

 

New and improved Content Intelligence data connectors

  • New connector for performance monitoring of HCP systems
  • New connector for processing HCP syslog events on Apache Kafka queues, and improvements to existing Kafka queue connectors

 

For more information, join the Hitachi Content Intelligence Community.

 

Also, check out the following resources:

 

Thanks for reading!

 


Michael Pacheco

Senior Solutions Marketing Manager, Hitachi Vantara

 

Follow me on Twitter:  @TechMikePacheco