check driver heap limit error

Question

I have a workflow that essentially works (tests complete just fine), but it's unable to run as a task.

I have a HCI cluster with 5 nodes, they are dedicated to this single workflow. Each of them has 4 vCPUs, 16 GB RAM, 24 GB swap space and enough disk for everything (FS for docker has between 26% and 49% used per node).

My settings are 3072m (or 3g) for both driver heap limit and executor heap limit, so well within the available memory and within limit for swap space. But when I try to run the task, I run into this error:

Check Driver Heap limit

Please confirm the Driver Heap limit setting is tuned appropriately in the workflow task settings under Memory. If running on Fedora OS, also be sure to enable swap memory on all instances. Restart the workflow task. If the problem persists, contact your authorized service provider.

The error doesn't tell me if it's too little or too much. The files to be indexed are well within this limit (I don't think any of them are more than 30 MB).

During the task running, metrics don't budge from zero, nothing related to performance is listed, no aggregations or anything.

Also, the error doesn't appear until midnight, which means I can't know whether my changes to settings are doing anything before I come in the office on the next day, which means unnecessarily wasted time.

I'm stuck and I don't know how to move forward with this. I'd appreciate anything to help me move forward.

#HitachiContentIntelligenceHCI

Answer

The message you are reporting indicates a probable OutOfMemory error.

This is most likely a resource issue. While 16GB memory is minimum required memory footprint per node and will work for some use cases, it is not enough for most, and that is why we recommend at least 32 GB memory per node.

If you are able, test this out on a system with more memory. You can increase the driver and executor heap in the workflow, although that may not be necessary.

If this doesn't help, then as Jon said we'll need to triage via an escalation.

Object Storage

check driver heap limit error

Contact Us

Privacy & Terms