HI Team,
We are running a HCI system with 4 nodes, 64GB RAM and 500GB hard disk on each node. The file system is a share from HNAS consisting of around 170 million files. The workflow is running very slow and HCI 4 node cluster is not able to scan even 100K files per hour. Whereas servers utilisation is not going beyond 20-30% and lot of resources are free on servers. Initially the workflow was talking bunch of 10000+ documents, and scanning 1M files / Hour. Now its reduced to less than 1000 files as batch and taking minimum 12 hours to scan 1M files. We faced the issue before and logged a case earlier but the issue is not resolved.
taking the recommendations from the support team ,we tried doing some changes to the pipelines. Like we have removed most of the regex fields in date conversion stage, added many content_type fields in mime type detection stage etc... but none of these seem to give us consistent performance and the rate is very slow.
Going by this rate, it would take months for the customer's 170 million file FS to get scanned by HCI... Can anyone please suggest some workarounds or their recommendations if you have faced a similar situation before.
If you have opened a case for this, and it is not resolved, please have it escalated to Engineering.
Also please download the logs from all 4 servers and upload them to the case.
Some things to check:
1. Are these physical or virtual servers?
2. How much CPU (cores) does each server have?
3. Is the HNAS share mounted in the same path on all 4 nodes?
4. Are the nodes doing anything else? if they are virtual nodes can you verify what else is going on on that machine that might be causing this issue?
5. Are you seeing any document errors or task failures OR is the document processing rate simply decreasing?