Scope of StagePlugin Session data

Question

When developing a state, there are hooks to be able to set/user session data for the step. What I am looking to understand is the scope of the session information when taking into consideration that a stage can be used in pre-processing or Workflow-Agent modes. Then also consider if a stage is introduced in a pipeline multiple times. Under all these situation, what is considered a "session"? Is a session owned by a single step that is configured? Is the session for all steps in a pipeline that are of the same time? Then when executing in Workflow-Agent mode, will each executing instance of a step regardless of what and how many HCI batches and/or instances it could be executed on have its own session? And considering all this, what considerations should be made around locking, if any, to ensure the session is properly updated.

For those that are not aware what this is, there is a PluginSession interface that can be implemented in a class that allows or storing various information that can be used by the step. The session data can be initially configured via the startSession method that can be implemented as part of the StagePlugin. Then during execution against each document, the session can be accessed to either read or record information in the session. This can be handy for things like establishing a connection to another resource like a database to obtain additional information the stage may require during execution.

#HitachiContentIntelligenceHCI

Answer

1) The the sessions for the pre-processing pipeline stages are created when the workflow task starts, and will only be stopped when the task is paused or stops. They will remain open during the sleeps between "Check for updates". So the workflow being paused, completed, or halted should be the only times when you will lose your in memory information in the session.

2) No the task performance parallel jobs setting does not impact the pre-processing pipelines. The pre-processing pipeline elements are executed serially and single-threaded.

One approach you could take is to have a bit of a combination of the two things we've talked about. You could use the counter in the pre-processing session to count up to 10,000, then increment a value on an external database. That would mean that if the workflow ever stopped you would still have the folder number saved externally. The main negative to this is that if the workflow were to ever stop, you would not know exactly how many objects were put into the folder.

So if you are just trying to spread the objects somewhat evenly across the directories you can just increment the folder value on session startup (to avoid ever putting more than 10,000 objects in the folder) and accept that stopping the workflow will always result in a folder with fewer than 10,000 objects. If you have a hard requirement to have exactly 10,000 objects in every folder then this won't work though.

Hitachi Content Platform​

Scope of StagePlugin Session data

Related Content

HCI to audit HCP access and internal logs

HCPAW Edge Agent service account.

Workflows not starting

Reprocessing dropped documents by workflow

Performance during Quick Format

Hitachi Content Platform