Hitachi Content Platform​

 View Only

 How to reset the Search index in HCI v 1.3?

  • Object Storage
  • Hitachi Content Platform HCP
Tanmoy Panja's profile image
Tanmoy Panja posted 10-04-2018 18:30

Hi,

I am working on integration between HDID v6.6.2, HCP v8.1 and HCI v1.3.45.

Using HDID I uploaded 210 documents to HCP namespace initially and ran the workflow successfully in HCI. I was able to search the document using Search apps.

But then I deleted the HCP namespace from HDID and created a new namespace. Using this new namespace, I have uploaded only one word document file into HCP. Although search apps still displaying old namespace documents. I have also made changes in the data connection in my workflow. I have already pointed out new namespace there.

Is there any way to reset the search index in HCI? I am little stuck here. I need your help on this.

HCI link for your reference: https://172.17.68.16:8000/ (admin/p@ssw0rd)@


#HitachiContentPlatformHCP
Troy Myers's profile image
Troy Myers

not sure what you mean by reset, but you really have two options. 

A) if all you want is the workflow stats reset you can go into the workflow and clear/ start over.  This will reset it back to zero

B) if you are referring to the index itself and not having the data in it/ search the easiest answer is to copy it as index 2, which will carry over your index settings but not the actual fields that were  indexed.

Workflow and index settings troy notes

Adding multiple fields to an index

HCI index options-20171224 1534-2.mp4

HCI workflow settings-20171226 1359-1.mp4

pastedimage_4

John Goodman's profile image
John Goodman

Hi Troy,

The problem we are having is this:

Tanmoy had indexing running on an HCP namespace (example NS1). Later he deleted/purged the namespace (NS1) and created a new namespace (NS2). He then updated the HCI connection to reflect this new namespace (NS2). When we re-run the index process, we are still seeing the results from the old deleted/purged namespace (NS1).

What is the expected behavior here? Should the old namespace results remain in the index, even thought the namespace has been deleted? 

Eckhard Roeser's profile image
Eckhard Roeser

Hi John,

yes. This is expected behavior as long you will search in the old index. Go the way Troy has showed and copy the index to a new index and call it different. Then HCI will create that new index with exactly the same settings as the old one. But no data in. Once this is done, change the output in the workflow and point to that new index. Then run the workflow again with the new namespace as the input. All old data should be gone then.

Ecky

Troy Myers's profile image
Troy Myers

This also brings up another good point.  Even if a file/object is deleted from the original source it will show up in your index unless you have "HCI_deleted".  You need to have the included in your index and then if you do not want to show deleted put in a parameter about not showing I believe it is  "True". 

Tanmoy Panja's profile image
Tanmoy Panja

Do we need to specify this field under Schema? For my index, it looks like "HCI_deleted" field is already added to my index.

pastedimage_1

Please let me know which attribute to select or discard?

pastedimage_0

Jonathan Chinitz's profile image
Jonathan Chinitz

You have 2 options:

1. In your search query specify +HCI_deleted:true which will exclude any deleted entries in the index

2. Create a new index and change the workflow to reflect the new index and not the old one. Delete the old index.

We are adding "Delete by Query" to version 1.4 to make this easier to manage.

Tanmoy Panja's profile image
Tanmoy Panja

Thanks John for your valuable inputs

Data Conversion's profile image
Data Conversion

Jon, is it possible to "tell to the Search App" not to show the deleted files, not the custom search app?

I try to do this with HCP AW latest connector.

I've tried to remove such files on a pipeline level with something like that (if statement based on HCI_deleted), but still I can see the files in the results. I ran the workflow, deleted a file and reran it again. I got the file returned in the search results:

User3 with npp...exe deleted!

pastedimage_2

Workflow with if statement:

pastedimage_0

Search results. One of the files is still there (npp...exe) - where HCI_deleted doesn't exists (I put this field as a metadata)

One of the deleted files is also there, with HCI_deleted = true even if I put this if statement (in the results of the pipeline I have 0 dropped documents)

pastedimage_8

 

 

So, each time I want a precise result I need to delete the index, make a new one and use that. Any workaround not to do that?

Data Conversion's profile image
Data Conversion
Data Conversion's profile image
Data Conversion

Thanks Yury!

1. From that post: "A search application could filter out all deleted objects from queries by specifying -(HCI_deleted:true) as a filter query on all user query requests". Where should I turn on that setting so it is done automatically for all users (so the user don't need to put this expression in the search console by himself)? The idea is that the user don't even see that option and I don't have this nice "Exclude deleted objects" in the HCP AW connector

2. I've described above the situation where the object was deleted by the user, but it is still in the search index without this tag HCI_deleted. Which is a bit strange

Data Conversion's profile image
Data Conversion

There is no way to exclude deleted files from search results automatically in 1.3. This is improved in 1.4, where you can configure the workflow to delete documents from index when the file is deleted from the source.

Not sure about the file that was deleted but is not tagged with HCI_deleted=true field. Should not happen. If this is reproducible, I’d be interested in exact steps.

Data Conversion's profile image
Data Conversion

Ok, thanks Yury!

I will finish with the main PoC task and try to replicate this error.

Data Conversion's profile image
Data Conversion