Hitachi Content Platform​

 HCI 4 node cluster performance is very slow

  • Object Storage
  • Hitachi Content Intelligence HCI
Data Conversion's profile image
Data Conversion posted 09-27-2018 05:44

HI Team,

We are running a HCI system with 4 nodes, 64GB RAM and 500GB hard disk on each node. The file system is a share from HNAS consisting of around 170 million files. The workflow is running very slow and HCI 4 node cluster is not able to scan even 100K files per hour. Whereas servers utilisation is not going beyond  20-30% and lot of resources are free on servers. Initially the workflow was talking bunch of 10000+ documents, and scanning 1M files / Hour. Now its reduced to less than 1000 files as batch and taking minimum 12 hours to scan 1M files. We faced the issue before and logged a case earlier but the issue is not resolved.

taking the recommendations from the support team ,we tried doing some changes to the pipelines. Like we have removed most of the regex fields in date conversion stage, added many content_type fields in mime type detection stage etc... but none of these seem to give us consistent performance and the rate is very slow.

Going by this rate, it would take months for the customer's 170 million file FS to get scanned by HCI... Can anyone please suggest some workarounds or their recommendations if you have faced a similar situation before.


#HitachiContentIntelligenceHCI
Jonathan Chinitz's profile image
Jonathan Chinitz

If you have opened a case for this, and it is not resolved, please have it escalated to Engineering.

Also please download the logs from all 4 servers and upload them to the case.

Some things to check:

1. Are these physical or virtual servers?

2. How much CPU (cores) does each server have?

3. Is the HNAS share mounted in the same path on all 4 nodes?

4. Are the nodes doing anything else? if they are virtual nodes can you verify what else is going on on that machine that might be causing this issue?

5. Are you seeing any document errors or task failures OR is the document processing rate simply decreasing?

Data Conversion's profile image
Data Conversion

Hello Jon,

Yes we are following up with engineering and in the mean time, i was looking for some workaround if our colleagues can help with. we have uploaded the logs too from all 4 servers. please find my answers inline and suggest us if there are any recommendations.

1. Are these physical or virtual servers?----------------- Virtual servers running on ESXi 6.5U1

2. How much CPU (cores) does each server have?------------ 8 CPU each

3. Is the HNAS share mounted in the same path on all 4 nodes?-------- yes on the same path

4. Are the nodes doing anything else? if they are virtual nodes can you verify what else is going on on that machine that might be causing this issue?--------------- we will definitely check this but these nodes are doing nothing else and we did check with the team handling VM environment and they found no issues related to performance on the ESXi servers and at the same time no specific hardware failures.

5. Are you seeing any document errors or task failures OR is the document processing rate simply decreasing?---- we haven't seen any document error or task failure till now and it is only the document procseeing rate which is simply going down drastically.

Jonathan Chinitz's profile image
Jonathan Chinitz

How many CPUs does the host have? If each HCI VM has 8 cores then 4 machines have 32. Does your host have 32 CPUs? More? Less? Even though they are "virtual" the host still needs the aggregate horsepower.

Troy Myers's profile image
Troy Myers

What are your workflow settings?  Most importantly the heap settings in the workflow and the memory in the index.  Please take a look at the document below.  To make sure you are utilizing all the RAM  in your configuration. 

Workflow and index settings troy notes

Data Conversion's profile image
Data Conversion

Hello Troy,

Sorry I was busy with other commitments. the issue is still pending as it was earlier. here are our workflow settings.

pastedimage_0

pastedimage_1

pastedimage_2

the memory of the index service is as below.

pastedimage_3

Jonathan Chinitz's profile image
Jonathan Chinitz

Have you opened a ticket with GSC?

Data Conversion's profile image
Data Conversion

Hello Jon,

we did log a case with GSC . Case Number: 00688667 - logged on 27th of september. apart from asking us to upload logs, we are getting no response from the support team even after repeated follow ups.

Jonathan Chinitz's profile image
Jonathan Chinitz

I will look into this.