Activity trigger called twice within an orchestrator - azure-storage

We have an orchestrator which gets called by timer trigger every minute. In the orchestrator, there are multiple activity triggers called in function chaining mechanism. However there was one instance, where the each activity trigger was called twice with a time difference of just 7 milliseconds.
What I am assuming is when the 1st activity trigger was called, the checkpoint was delayed, even though the process had done its job, so when the orchestrator restarted, it executed the 1st activity trigger again as it did not find data in azure storage queue. Can somebody confirm if this would be the case or is there some issue with the way activity trigger behave?

This is the replay behavior of the orchestrator that you are observing. If an orchestrator function emits log messages, the replay behavior may cause duplicate log messages to be emitted. This is normal and by-design. Take a look at this documentation for more information.
When an orchestration function is given more work to do, the orchestrator wakes up and re-executes the entire function from the start to rebuild the local state. During the replay, if the code tries to call a function (or do any other async work), the Durable Task Framework consults the execution history of the current orchestration. If it finds that the activity function has already executed and yielded a result, it replays that function's result and the orchestrator code continues to run. Replay continues until the function code is finished or until it has scheduled new async work.

Related

Are async routing functions and asynchronous middleware in Express blocking the execution process (in 2021)?

I know that Express allows to execute asynchronous functions in the routes and in the middlewares, but is this correct? I read the documentation and it specifies that NO ROUTES OR ASYNCHRONOUS MIDDLEWARES SHOULD BE ASSIGNED, today, currently, does Express support asynchronous functions? Does it block the execution process? o Currently asynchronous functions DO NOT BLOCK THE EXECUTION PROCESS?,
For example, if I place in an asynchronous route, and if requests are made in that route at the same time, are they resolved in parallel?, that is:
Or when assigning asynchronous routes, will these requests be resolved one after the other ?, that is:
This is what I mean by "blocking the execution process", because if one fails, are the other requests pending? or Am I misunderstanding?
I hope you can help me.
You can use async functions just fine with Express, but whether or not they block has nothing to do with whether they are async, but everything to do with what the code in the function does. If it starts an asynchronous operation and then returns, then it won't block. But, if it executes a bunch of time consuming synchronous code before it returns, that will block.
If getDBInfo() is asynchronous and returns a promise that resolves when it completes, then your examples will have the three database operations in flight at the same time. Whether or not they actually run truly in parallel depends entirely upon your database implementation, but the code you show here allows them to run in parallel if the database implements that.
The single thread of Javascript execution will run the first call to getDBInfo(), that DB request will be started and will immediately return a promise. Then, it will hit the await and it will suspend the execution of the containing function. That will allow the event loop to then start processing the second request and it will do the same. When it hits the await, it will suspend execution of the containing function and allow the event loop to process the third request will do likewise. Then, sometime later, one of the DB calls will complete (it could be any one of the three) which will resolve its promise which will unsuspend the function and it will send the response. Then, one after another the other two DB calls will finish and send their responses.

Catch the event of a blocked instance only after a timeout

I have a program where I start several process instances using a cron. For each process instance I have a maximum time, and if the execution time exceeds it, I have to consider it as failure and use some specific methods.
For now what I did was simply to check, once my process instance has finished, if the elapsed time exceeds or not the given maximum time.
But what if my process instance gets blocked for some reason (e.g. server not responding)? I need to catch this event and perform failure operations as soon as the process gets blocked and timeout is exceeded.
How can I catch these two conditions?
I had a look at the FlowableEngineEventType, but there isn’t a PROCESS_BLOCKED/SUSPENDED type of event. But, even if it were, how do I fire it only if a certain amount of time has passed?
I assume that this is the same question as this from the Flowable Forum.
If you are using the Flowable HTTP Task then have a look at the documentation to see how you can set the timeouts on it and how you can react on errors there. If you are firing GET requests from your own code you would need to write your own business logic that would throw some kind of BpmnError and you would then handle that in your process.
The Flowable Process instance does not have the concept of being blocked, and you have to manually to that in your modelling.

sc_spawn and other process [SystemC]

Can you explain the difference between sc_spawn and another process (SC_METHOD, SC_THREAD, SC_CTHREAD )?
Thanks all.
Hook
To understand this, you have to get an idea of the phases of elaboration and simulation of SystemC first. The phases of elaboration and simulation shall run in the following sequence (from IEEE Std 1666-2011:
Elaboration—Construction of the module hierarchy
Elaboration—Callbacks to function before_end_of_elaboration
Elaboration—Callbacks to function end_of_elaboration
Simulation—Callbacks to function start_of_simulation
Simulation—Initialization phase
Simulation—Evaluation, update, delta notification, and timed notification phases (repeated)
Simulation—Callbacks to function end_of_simulation
Simulation—Destruction of the module hierarchy
Processes are objects derived from sc_object and are created by calling SC_METHOD, SC_THREAD, SC_CTHREAD, or the sc_spawn function.
If a process is created created during elaboration (1.) or the before_end_of_elaboration (2.) it is a static process. If it is created during the the end_of_elaboration (3.) callback or during simulation, it is a dynamic process.
A process instance created by the SC_METHOD, SC_THREAD, or SC_CTHREAD macro is an unspawned process instances and is typically a static process. Spawned process instances are processes created by calling sc_spawn. Typically, they are dynamic processes, but can be static if sc_spawn is called before the end_of_elaboration phase.
This means, to wrap this up in simple words, that sc_spawn enables you to dynamically add processes during simulation. For example: there can be cases where you only need a certain process, if a certain condition during your simulation becomes true.
Now let's take a look where the processes spawn during the simulation. The actual simulation of SystemC (6.) consists of these phases:
Initialization Phase—Execute all processes (except SC_CTHREADS) in an unspecified order.
Evaluation Phase—Select a process that is ready to run and resume its execution. This may cause immediate event notifications to occur, which may result in additional processes being made ready to run in this same phase. Repeat, as long as there are still processes ready to run.
Update Phase—Execute any pending calls to update() resulting from request_uptdate() calls made in step 1 or 2.
Delta notification phase—If there are pending delta notifications (result from calls to notify()), determine which processes are ready to run due to the delayed notifications and go to step 2.
Timed notification phase—If pending timed notifications or time-outs exist:
advance simulation time to the time of the earliest pending timed notification or time-out;
determine which process instances are sensitive to the events notified and time-outs lapsing at this precise time;
add all such process instances to the set of runnable processes;
If no pending timed notifications or time-outs exist → end of simulation. Otherwise, go to evaluation phase.
If sc_spawn is called to create a spawned process instance, the new process will be added to the set of runnable processes (except if dont_initialize is called). If sc_spawn is called during the evaluation phase, it shall be runnable in the current evaluation phase (2.). If it is called during the update phase (3.), it shall be runnable in the next evaluation phase.
If sc_spawn is called during elaboration, the spawned process will be a child of the module instance which calls sc_spawn. If it is called during simulation, it will be a child of the process that called the function sc_spawn. You may call sc_spawn from a method process (SC_METHOD), a thread process (SC_THREAD), or a clocked thread process (SC_CTHREAD).
This tutorial shows the difference between implementing processes through SC_METHOD and SC_THREAD, and sc_spawn.

How to make a Saga handler Reentrant

I have a task that can be started by the user, that could take hours to run, and where there's a reasonable chance that the user will start the task multiple times during a run.
I've broken the processing of the task up into smaller batches, but the way the data looks it's very difficult to tell what's still to be processed. I batch it using messages that each process a bite sized chunk of the data.
I have thought of using a Saga to control access to starting this process, with a Saga property called Processing that I set at the start of the handler and then unset at the end of the handler. The handler does some work and sends the messages to process the data. I check the value at the start of the handler, and if it's set, then just return.
I'm using Azure storage for Saga storage, if it makes a difference for the next bit. I'm also using NSB 6
I have a few questions though:
Is this the correct approach to re-entrancy with NSB?
When is a change to Saga data persisted? (and is it different depending on the transport?)
Following on from the above, if I set a Saga value in a handler, wait a while and then reset it to its original value will it change the persistent storage at all?
Seem to be cross posted in the Particular Software google group:
https://groups.google.com/forum/#!topic/particularsoftware/p-qD5merxZQ
Sagas are very often used for such patterns. The saga instance would track progress and guard that the (sub)tasks aren't invoked multiple times but could also take actions if the expected task(s) didn't complete or is/are over time.
The saga instance data is stored after processing the message and not when updating any of the saga data properties. The logic you described would not work.
The correct way would be having a saga that orchestrates your process and having regular handlers that do the actual work.
In the saga handle method that creates the saga check if the saga was already created or already the 'busy' status and if it does not have this status send a message to do some work. This will guard that the task is only initiated once and after that the saga is stored.
The handler can now do the actual task, when it completes it can do a 'Reply' back to the saga
When the saga receives the reply it can now start any other follow up task or raise an event and it can also 'complete'.
Optimistic concurrency control and batched sends
If two message are received that create/update the same saga instance only the first writer wins. The other will fail because of optimistic concurrency control.
However, if these messages are not processed in parallel but sequential both fail unless the saga checks if the saga instance is already initialized.
The following sample demonstrates this: https://github.com/ramonsmits/docs.particular.net/tree/azure-storage-saga-optimistic-concurrency-control/samples/azure/storage-persistence/ASP_1
The client sends two identical message bodies. The saga is launched and only 1 message succeeds due to optimistic concurrency control.
Due to retries eventually the second copy will be processed to but the saga checks the saga data for a field that it knows would normally be initialized by by a message that 'starts' the saga. If that field is already initialized it assumes the message is already processed and just returns:
It also demonstrates batches sends. Messages are not immediately send until the all handlers/sagas are completed.
Saga design
The following video might help you with designing your sagas and understand the various patterns:
Integration Patterns with NServiceBus: https://www.youtube.com/watch?v=BK8JPp8prXc
Keep in mind that Azure Storage isn't transactional and does not provide locking, it is only atomic. Any work you do within a handler or saga can potentially be invoked more than once and if you use non-transactional resources then make sure that logic is idempotent.
So after a lot of testing
I don't believe that this is the right approach.
As Archer says, you can manipulate the saga data properties as much as you like, they are only saved at the end of the handler.
So if the saga receives two simultaneous messages the check for Processing will pass both times and I'll have two processes running (and in my case processing the same data twice).
The saga within a saga faces a similar problem too.
What I believe will work (and has done during my PoC testing) is using a database unique index to help out. I'm using entity framework and azure sql, so database access is not contained within the handler's transaction (this is the important difference between the database and the saga data). The database will also operate across all instances of the endpoint and generally seems like a good solution.
The table that I'm using has each of the columns that make up the saga 'id', and there is a unique index on them.
At the beginning of the handler I retrieve a row from the database. If there is a row, the handler returns (in my case this is okay, in others you could throw an exception to get the handler to run again). The first thing that the handler does (before any work, although I'm not 100% sure that it matters) is to write a row to the table. If the write fails (probably because of the unique constraint being violated) the exception puts the message back on the queue. It doesn't really matter why the database write fails, as NSB will handle it.
Then the handler does the work.
Then remove the row.
Of course there is a chance that something happens during processing of the work, so I'm also using a timestamp and another process to reset it if it's busy for too long. (still need to define 'too long' though :) )
Maybe this can help someone with a similar problem.

Task API - Handling Already Finished Task

I'm making an API and have a function which takes a task and runs it. When the task is finished successfully, it's status is set to 'Completed'. Now, lets say the user of the API accidentally (or for whatever reason) sends that same task (or any other already completed task) back into the same function. What should the API do?
Throw an exception
Pretend as if I've rerun the task and tell the user (through events or whatever) that it is done/completed (again).
Do nothing and just ignore it.
Is there a standard or best practice for something like this?
Pretending to rerun hides what's probably a user error - this can lead to deadlocks or other logic bugs (i.e. - I create an event, wait on it and run a task that should reset it at some point - it never happens, deadlock). Also done handlers may fail if invoked twice per one successful task run.
Doing nothing is more or less the same - done handlers can't fail now :), but they are not invoked at all - a bug is more probable if done handler performed necessary communication with the spawning thread.
The worst thing is - these may happen or not happen, depending on the timing. I.e. the task may still be running by the time the user calls the function the second time (what do you do then, by the way?)
So, do throw an exception unless task status is "not started". The user can always check the status and perform the necessary processing in the unlikely case she needs it.