Hitachi Content Platform​

 How to preserve tagged field after Read Lines?

  • Object Storage
  • Hitachi Content Intelligence HCI
Isaac Pittman's profile image
Isaac Pittman posted 01-08-2018 15:24

My goal is to:

1. Input a .tar file, containing a CSV file

2. Tag it with a unique ID, based on the filename (or something else TBD)

3. Extract the tar file

4. Read Lines from the extracted CSV

5. Output to indexes, including the unique ID that was added in step 2

I've added a Tagging stage (step 2) to the Extraction pipeline (step 3), to tag the unique ID. But, when I test the workflow, the Read Lines stage does not seem to copy the tagged field into the new documents it generates. (If it matters, I have recursion on.)

Is it possible for a tag added to a document to be applied to the resulting documents, when the original passes through the Read Lines stage? Or should I find another way to accomplish this (perhaps using HCI_parentUri, which seems to contain the URL of the original .tar file)?


#HitachiContentIntelligenceHCI
Benjamin Isherwood's profile image
Benjamin Isherwood

Hi Issac,

The "Read Lines" stage will transform the input document completely into a number of new documents (one for each line read).

You would need to tag the individual documents AFTER the "Read Lines" stage in the pipeline.

Yes, if you need the parent file information to use for generating the tag content, these are maintained as HCI_parentUri, HCI_parentId, and HCI_parentDisplay on the newly generated "line" documents.

It's a good feature request for introducing a mechanism to force metadata to be added to expanded documents from a parent document.

-Ben

Troy Myers's profile image
Troy Myers

You can log into the HALO lab labs.hds.com  under HCI-training or HCI-salesdemo.  We have an example Pipeline of this called " ReadHDICIFSLOGS  "  you can look at the pipeline or export it via AW and use it as your template.  The CSV file is in the HCP NS along with a custom index.

Troy