Dear community, A while ago we had an issue where one single HCP object would kill a workflow.I went through this with HIC engineering and they managed to isolate the actual object causing issues so we could exclude it. I have a similar issue but I can't remember exactly how to set it up. I think these were roughly the steps. 1) Set logging mode - don't know where....????2) Set job to use one instance3) Set workflow to 1 object at a time4) tail log - Which log and where, in container or host???? Can you please help, or if there ia na easier way please share :) Hedde <a data-tag-text="HitachiContentIntelligenceHCI" data-sign="#" class="user-content-hashtag" href="https://hitachi.connectedcommunity.org/search?s=tags%3A%22Hitachi Content Intelligence HCI%22&executesearch=true" data-tag-key="6275580e-f844-4bfc-8a6c-e678ae337314">#HitachiContentIntelligenceHCI</a> <a data-tag-text="HitachiContentPlatformHCP" data-sign="#" class="user-content-hashtag" href="https://hitachi.connectedcommunity.org/search?s=tags%3A%22Hitachi Content Platform HCP%22&executesearch=true" data-tag-key="8c775bd1-61d6-4d3c-a278-5d78d9100afc">#HitachiContentPlatformHCP</a> <a data-tag-text="ObjectStorage" data-sign="#" class="user-content-hashtag" href="https://hitachi.connectedcommunity.org/search?s=tags%3A%22Object Storage%22&executesearch=true" data-tag-key="1516a222-dc1f-4287-bb30-87da2dcc1ff8">#ObjectStorage</a>

Hitachi Content Platform

View Only

How to debug a workflow

Hedde van der Hoeven posted 09-27-2019 10:50

Dear community,

A while ago we had an issue where one single HCP object would kill a workflow.

I went through this with HIC engineering and they managed to isolate the actual object causing issues so we could exclude it.

I have a similar issue but I can't remember exactly how to set it up.

I think these were roughly the steps.

1) Set logging mode - don't know where....????

2) Set job to use one instance

3) Set workflow to 1 object at a time

4) tail log - Which log and where, in container or host????

Can you please help, or if there ia na easier way please share :)

Hedde

#HitachiContentIntelligenceHCI
#HitachiContentPlatformHCP
#ObjectStorage

Jonathan Chinitz posted 09-27-2019 10:57

Hedde:

The bad news -- I don't know all the details of how to do this. I will get them to you.

The good news -- in HCI 1.5 (next month) we added a feature called "Stall Detection" that will do all this for you :-). The same way that you can monitor the progress of the jobs in the Task UI it will now show you what document is taking "too long".

Jared Cohen posted 09-27-2019 12:56

Hi Hedde,

After we dealt with that issue on your system, we added the notification stages to help with this in the future.

The basic process to catch documents getting stuck is:

Spin up a syslog server somewhere that your cluster has network access to (can even be on one of the HCI instances if you want)
Add a Syslog Notification stage to the beginning of your pipeline with a message similar to STARTED pipeline: ${HCI_URI}
Add a Syslog Notification stage to the end of your pipeline with a message similar to ENDED pipeline: ${HCI_URI}
Now you can run the workflow, and your syslog server will have logs of every document that entered and exited the pipeline.
There are a number of ways to compare and find the first document that entered but did not exit the pipeline, that's the one that was stuck. I think we put the syslog lines into an excel sheet, sorted them somehow, and and eyeballed it to find the outlier. The exact details are a bit rusty, cause it was a while ago.

One key to making this easier is to try to do this once you know you are close to the failing document. We had reduced the batch size and paused the workflow when we knew it was on the batch causing problems so that there weren't tons of documents in the batch. That makes finding the outlier in the logs quicker.

Hope this helps,

-Jared

Hedde van der Hoeven posted 09-27-2019 13:18

Hi Jared, thanks for you reply.

We use ansible managed syslog configurations so I can't make any changes to this on our Linux instances.

Is there a possibility to do this the "old fashioned" way, if not I have to go and jump through some hoops :)

Cheers,

Hedde

Hedde van der Hoeven posted 09-27-2019 14:29

Don't worry, got it working :)

Hitachi Content Platform​

How to debug a workflow

Related Content

Troubleshooting Issues with Amazon AWS Java SDK

Workflows not starting

HCI: Logging and Monitoring at the Document/Object level

HCI: Logging and Monitoring at the Document/Object level

HCI to audit HCP access and internal logs

Hitachi Content Platform