How to properly implement a terminating end event that can trigger at any point during the process? - bpmn

I’m currently modeling a process with 2 exception statuses (a patient dies & No Neurologist found).
If no Neurologist is found (this can only happen once in my process), the process stops.
Another exception status is triggered when a patient dies at any point during the process. If this exception status occurs, the process stops.
I have difficulties modeling these exception statuses. Attached you can find my current attempt. I’m not 100% sure it is correct.
Example of my attempt

Terminating events are rarely needed. There are usually more elegant, clearer solution than this 'kill all switch'. Their purpose is to terminate any parallel activities / consume any tokes which exist in the same scope. The same can usually be achieved with interrupting (e.g. conditional) boundary events, which get triggered e.g. by a data change. A boundary event makes it clearly visible in the process where a cancellation can occur, under which circumstances, and allows ending a process in more controlled manner.
In your particular use case (diagram you attached) you don't need to use the terminating events at all. You are using two interrupting boundary events (escalation and error) on a scope created by the embedded sub process. The scope of the embedded sub process is already terminated when these events interrupting occur. A subsequent terminating event in the parent process' scope would cancel everything in this scope. In your case the parent scope is the root process instance, but since there is no token flow parallel to the embedded sub process, there is nothing to cancel.
Also see:
https://docs.camunda.org/manual/latest/reference/bpmn20/events/terminate-event/
https://docs.camunda.org/manual/latest/reference/bpmn20/events/error-events/#error-boundary-event

Related

Gateway that waits for just one token, then cancels other incoming paths

How to model running multiple tasks/branches in parallel, and wait for just the first one to finish. Then the other (running) branches should be cancelled. To illustrate what I'm asking (what to use instead of the X gateway):
As far as I know, the exclusive gateway's join function is to immediately proceed. It neither stops/cancels the other branches, nor does it stop further executions of the output (so multiple tokens can pass through it).
Is this the answer?
Or perhaps this is even better?
I would do the following:
Starting off from your third diagramme, wrap the tasks ‘a’ and ‘b’ inside your subprocess into another transaction subprocess (but still inside the bigger subprocess that you had already used.
At the boundary of this new sub process as well as the boundary of the task ‘c’, you should add interrupting boundary signal events that lead to a None end event.
After task ‘b’ and ‘c’, add a signal end event. Each of these two signal end events should be caught by the interrupting boundary signal events of the other subprocess or task that you want to stop. So, if task ‘c’ is completed, the signal that is thrown right after that should be caught by the boundary on the transaction subprocess of tasks ‘a’ and ‘b’. The signal end event after ‘b’ should be caught by the boundary event on task ‘c’.
After the bigger subprocess, which contains tasks ‘c’ as well the inner subprocess for ‘a’ and ‘b’, you continue just like in your third diagramme with a merging exclusive gateway and the “Do once” task. I would keep the timer boundary event on the bigger subprocess like you did in your third diagramme.
Here is how this would look like:
However, you could also draw a simpler diagramme with an additional exclusive gateway before the "Do only once" activity that filters out all remaining process instances if that activity has already been carried out. You diagramme would be easier to understand but the process would be slightly different from your requirements: You would allow a situation where activity b will be carried out even though activity c has already been completed. So, instead of cancelling one process instance you would ignore it. Depending on your business context, this might have certain implications.
A third option would be to use a terminate end event instead of a none end event. That way, all remaining process instances will be deleted as soon as the first one reaches the end. However, semantically, that might not be the most elegant solution because a termination is intended to signal that your process has finished abnormally.

How long should a process live in Camunda

How long should a process live in a Camunda BPMN workflow?
I have a process that can run multiple times throughout the life of a product. I need to keep track of and update data points that this workflow handles for the product.
One proposal was to write a looping BPMN that listens for an event to start the process, and ends with it back on the Receive Task listening for the event to fire again.
However, this would result in processes that never actually end because they always loop back, but we have no guarantees about when or how many times this event could be fired.
I have also considered creating BPMN that just does one run and terminates. This relieves the problem of a long living process, but I loose all of the process variables that are included.
EDIT:
Here is a simplified diagram of the looping mechanism we're looking at. I don't want to re-check eligibility after the first time, but I want to verify and save the address any time it changes.
Simplified Address Diagram
Honestly, the BPMN file (aka the process definition) should be the one to dictate how long it "lives". Like if you have a process that necessitates your user to contact a customer and wait for his answer, a process could easily state that "1 month" is the time to wait before sending a reminder (or reacting in any other way to the timer's expiration).
But we also have to differentiate between "time to live / life cycle of the real life process" conceputalized through the BPMN file VS "time to live / life cycle of the process in your Camunda engine (for lack of a better term)".
Each instance of a process in Camunda has an unique identifier. You do not have to let the "memory instance of the process" live until it is completed ... you could instead instanciate it everytime an event is sent to the unique ID of a process instance to treat the event/command being and stop the instance (not the lifecycle of the process) once the event/command has been treated.
The only time I worked with Camunda, thats is what we did. Basically, we'd sent to Camunda API the name of the BPMN file, the ID of the process instance we had previously started and all the pertinent informations to treat the event/command that will affect the process (include process variables).
This way, when an event/command is successfully treated by Camunda API, you could store all the process variables into the "return message" after it has been processed and you would never really lose process variables since you would always "reload" them from the latest "state" of the process (aka the response you got last time you sent an event to a specific process instance).
Hopefully, I'm being clear ?

Catch the event of a blocked instance only after a timeout

I have a program where I start several process instances using a cron. For each process instance I have a maximum time, and if the execution time exceeds it, I have to consider it as failure and use some specific methods.
For now what I did was simply to check, once my process instance has finished, if the elapsed time exceeds or not the given maximum time.
But what if my process instance gets blocked for some reason (e.g. server not responding)? I need to catch this event and perform failure operations as soon as the process gets blocked and timeout is exceeded.
How can I catch these two conditions?
I had a look at the FlowableEngineEventType, but there isn’t a PROCESS_BLOCKED/SUSPENDED type of event. But, even if it were, how do I fire it only if a certain amount of time has passed?
I assume that this is the same question as this from the Flowable Forum.
If you are using the Flowable HTTP Task then have a look at the documentation to see how you can set the timeouts on it and how you can react on errors there. If you are firing GET requests from your own code you would need to write your own business logic that would throw some kind of BpmnError and you would then handle that in your process.
The Flowable Process instance does not have the concept of being blocked, and you have to manually to that in your modelling.

How to make a Saga handler Reentrant

I have a task that can be started by the user, that could take hours to run, and where there's a reasonable chance that the user will start the task multiple times during a run.
I've broken the processing of the task up into smaller batches, but the way the data looks it's very difficult to tell what's still to be processed. I batch it using messages that each process a bite sized chunk of the data.
I have thought of using a Saga to control access to starting this process, with a Saga property called Processing that I set at the start of the handler and then unset at the end of the handler. The handler does some work and sends the messages to process the data. I check the value at the start of the handler, and if it's set, then just return.
I'm using Azure storage for Saga storage, if it makes a difference for the next bit. I'm also using NSB 6
I have a few questions though:
Is this the correct approach to re-entrancy with NSB?
When is a change to Saga data persisted? (and is it different depending on the transport?)
Following on from the above, if I set a Saga value in a handler, wait a while and then reset it to its original value will it change the persistent storage at all?
Seem to be cross posted in the Particular Software google group:
https://groups.google.com/forum/#!topic/particularsoftware/p-qD5merxZQ
Sagas are very often used for such patterns. The saga instance would track progress and guard that the (sub)tasks aren't invoked multiple times but could also take actions if the expected task(s) didn't complete or is/are over time.
The saga instance data is stored after processing the message and not when updating any of the saga data properties. The logic you described would not work.
The correct way would be having a saga that orchestrates your process and having regular handlers that do the actual work.
In the saga handle method that creates the saga check if the saga was already created or already the 'busy' status and if it does not have this status send a message to do some work. This will guard that the task is only initiated once and after that the saga is stored.
The handler can now do the actual task, when it completes it can do a 'Reply' back to the saga
When the saga receives the reply it can now start any other follow up task or raise an event and it can also 'complete'.
Optimistic concurrency control and batched sends
If two message are received that create/update the same saga instance only the first writer wins. The other will fail because of optimistic concurrency control.
However, if these messages are not processed in parallel but sequential both fail unless the saga checks if the saga instance is already initialized.
The following sample demonstrates this: https://github.com/ramonsmits/docs.particular.net/tree/azure-storage-saga-optimistic-concurrency-control/samples/azure/storage-persistence/ASP_1
The client sends two identical message bodies. The saga is launched and only 1 message succeeds due to optimistic concurrency control.
Due to retries eventually the second copy will be processed to but the saga checks the saga data for a field that it knows would normally be initialized by by a message that 'starts' the saga. If that field is already initialized it assumes the message is already processed and just returns:
It also demonstrates batches sends. Messages are not immediately send until the all handlers/sagas are completed.
Saga design
The following video might help you with designing your sagas and understand the various patterns:
Integration Patterns with NServiceBus: https://www.youtube.com/watch?v=BK8JPp8prXc
Keep in mind that Azure Storage isn't transactional and does not provide locking, it is only atomic. Any work you do within a handler or saga can potentially be invoked more than once and if you use non-transactional resources then make sure that logic is idempotent.
So after a lot of testing
I don't believe that this is the right approach.
As Archer says, you can manipulate the saga data properties as much as you like, they are only saved at the end of the handler.
So if the saga receives two simultaneous messages the check for Processing will pass both times and I'll have two processes running (and in my case processing the same data twice).
The saga within a saga faces a similar problem too.
What I believe will work (and has done during my PoC testing) is using a database unique index to help out. I'm using entity framework and azure sql, so database access is not contained within the handler's transaction (this is the important difference between the database and the saga data). The database will also operate across all instances of the endpoint and generally seems like a good solution.
The table that I'm using has each of the columns that make up the saga 'id', and there is a unique index on them.
At the beginning of the handler I retrieve a row from the database. If there is a row, the handler returns (in my case this is okay, in others you could throw an exception to get the handler to run again). The first thing that the handler does (before any work, although I'm not 100% sure that it matters) is to write a row to the table. If the write fails (probably because of the unique constraint being violated) the exception puts the message back on the queue. It doesn't really matter why the database write fails, as NSB will handle it.
Then the handler does the work.
Then remove the row.
Of course there is a chance that something happens during processing of the work, so I'm also using a timestamp and another process to reset it if it's busy for too long. (still need to define 'too long' though :) )
Maybe this can help someone with a similar problem.

Task API - Handling Already Finished Task

I'm making an API and have a function which takes a task and runs it. When the task is finished successfully, it's status is set to 'Completed'. Now, lets say the user of the API accidentally (or for whatever reason) sends that same task (or any other already completed task) back into the same function. What should the API do?
Throw an exception
Pretend as if I've rerun the task and tell the user (through events or whatever) that it is done/completed (again).
Do nothing and just ignore it.
Is there a standard or best practice for something like this?
Pretending to rerun hides what's probably a user error - this can lead to deadlocks or other logic bugs (i.e. - I create an event, wait on it and run a task that should reset it at some point - it never happens, deadlock). Also done handlers may fail if invoked twice per one successful task run.
Doing nothing is more or less the same - done handlers can't fail now :), but they are not invoked at all - a bug is more probable if done handler performed necessary communication with the spawning thread.
The worst thing is - these may happen or not happen, depending on the timing. I.e. the task may still be running by the time the user calls the function the second time (what do you do then, by the way?)
So, do throw an exception unless task status is "not started". The user can always check the status and perform the necessary processing in the unlikely case she needs it.