Does Hangfire's recurring jobs store the state in the case of exception for later execution attempts? - hangfire

I have a program which connects to a web service for pulling some messages.After I receive them I have no way of reading those messages again.So I decided to secure them in a persistent store to be processed by other parties.
I wrapped this request and persist process method in Hangfire's AddOrUpdate-cron recurring job hoping in case of exception during job execution hangfire will attempt the execute the task later with it's stored state.Is my assumption correct? I couldn't see any explanation in the documents regarding recurring job states.
In the case of delayed,recurring or fire-forget jobs does Hangfire serialize the code piece of those jobs with their states to database?

Following article answers this question and gives some information regarding how background jobs are handled.

Related

Is it safe to call BackgroundJob.Enqueue from inside a RecurringJob?

I have a RecurringJob that receives some rows from a database and I want to send an email for each row. Is it safe to call BackgroundJob.Enqueue within the recurring job handler for each row to send an email?
My aim is to keep the work in the recuring job to a minimum.
It's a bit late to answer this but yes, you can call BackgroundJon.Enqueue() from within the recurring job. We use MongoDB and since it is thread safe, we have one job creating other jobs that run both serially and parallel.
Purpose of background jobs is to perform a task whether you start from an API call, recurring job or the background job itself.

Is there a way to get confirmation from the scheduler that the job has been accepted?

I'm using dask.distributed.Client to connect to a remote Dask scheduler running that manages a bunch of workers. I am submitting my job using client.submit and keeping track of the returned Future:
client = Client("some-example-host:8786")
future = client.submit(job, job_args)
I want to be able to know if/when the job has been sent to and accepted by the scheduler. This is so that I can add some retry logic in cases when the scheduler goes down.
Is there an easy way to get confirmation that the scheduler has received and accepted the job?
Some additional points:
I realise that distributed.client.Future has a status property, but I'm hesitant to use it as it was not documented in the API.
I have tried using dask.callbacks.Callback but with no success. Any assistance with using callbacks with distributed.Client would be appreciated.
EDIT: I could also have the job post back a notification when it starts, but I would like to leave this approach as a last resort if the Client does not support this.

How to make a Saga handler Reentrant

I have a task that can be started by the user, that could take hours to run, and where there's a reasonable chance that the user will start the task multiple times during a run.
I've broken the processing of the task up into smaller batches, but the way the data looks it's very difficult to tell what's still to be processed. I batch it using messages that each process a bite sized chunk of the data.
I have thought of using a Saga to control access to starting this process, with a Saga property called Processing that I set at the start of the handler and then unset at the end of the handler. The handler does some work and sends the messages to process the data. I check the value at the start of the handler, and if it's set, then just return.
I'm using Azure storage for Saga storage, if it makes a difference for the next bit. I'm also using NSB 6
I have a few questions though:
Is this the correct approach to re-entrancy with NSB?
When is a change to Saga data persisted? (and is it different depending on the transport?)
Following on from the above, if I set a Saga value in a handler, wait a while and then reset it to its original value will it change the persistent storage at all?
Seem to be cross posted in the Particular Software google group:
https://groups.google.com/forum/#!topic/particularsoftware/p-qD5merxZQ
Sagas are very often used for such patterns. The saga instance would track progress and guard that the (sub)tasks aren't invoked multiple times but could also take actions if the expected task(s) didn't complete or is/are over time.
The saga instance data is stored after processing the message and not when updating any of the saga data properties. The logic you described would not work.
The correct way would be having a saga that orchestrates your process and having regular handlers that do the actual work.
In the saga handle method that creates the saga check if the saga was already created or already the 'busy' status and if it does not have this status send a message to do some work. This will guard that the task is only initiated once and after that the saga is stored.
The handler can now do the actual task, when it completes it can do a 'Reply' back to the saga
When the saga receives the reply it can now start any other follow up task or raise an event and it can also 'complete'.
Optimistic concurrency control and batched sends
If two message are received that create/update the same saga instance only the first writer wins. The other will fail because of optimistic concurrency control.
However, if these messages are not processed in parallel but sequential both fail unless the saga checks if the saga instance is already initialized.
The following sample demonstrates this: https://github.com/ramonsmits/docs.particular.net/tree/azure-storage-saga-optimistic-concurrency-control/samples/azure/storage-persistence/ASP_1
The client sends two identical message bodies. The saga is launched and only 1 message succeeds due to optimistic concurrency control.
Due to retries eventually the second copy will be processed to but the saga checks the saga data for a field that it knows would normally be initialized by by a message that 'starts' the saga. If that field is already initialized it assumes the message is already processed and just returns:
It also demonstrates batches sends. Messages are not immediately send until the all handlers/sagas are completed.
Saga design
The following video might help you with designing your sagas and understand the various patterns:
Integration Patterns with NServiceBus: https://www.youtube.com/watch?v=BK8JPp8prXc
Keep in mind that Azure Storage isn't transactional and does not provide locking, it is only atomic. Any work you do within a handler or saga can potentially be invoked more than once and if you use non-transactional resources then make sure that logic is idempotent.
So after a lot of testing
I don't believe that this is the right approach.
As Archer says, you can manipulate the saga data properties as much as you like, they are only saved at the end of the handler.
So if the saga receives two simultaneous messages the check for Processing will pass both times and I'll have two processes running (and in my case processing the same data twice).
The saga within a saga faces a similar problem too.
What I believe will work (and has done during my PoC testing) is using a database unique index to help out. I'm using entity framework and azure sql, so database access is not contained within the handler's transaction (this is the important difference between the database and the saga data). The database will also operate across all instances of the endpoint and generally seems like a good solution.
The table that I'm using has each of the columns that make up the saga 'id', and there is a unique index on them.
At the beginning of the handler I retrieve a row from the database. If there is a row, the handler returns (in my case this is okay, in others you could throw an exception to get the handler to run again). The first thing that the handler does (before any work, although I'm not 100% sure that it matters) is to write a row to the table. If the write fails (probably because of the unique constraint being violated) the exception puts the message back on the queue. It doesn't really matter why the database write fails, as NSB will handle it.
Then the handler does the work.
Then remove the row.
Of course there is a chance that something happens during processing of the work, so I'm also using a timestamp and another process to reset it if it's busy for too long. (still need to define 'too long' though :) )
Maybe this can help someone with a similar problem.

How does ActiveMQ AMQ_SCHEDULED_DELAY message works?

We want to use delay feature from activeMQ to delay particural event. How does AMQ_SCHEDULED_DELAY work internaly? In documentation is information about scheduler but no information what mechanism it utilize to delay message. For that reason we are not sure how delaying is going to affect activeMQ. Does activeMQ utilize pooling or async to achive delay.
I ask this question because people from my organization want to pick diffrent technology. I do not have any proof delay from activeMQ is any better.
Here is link to source code. I was thinking of looking up code but I'm not good in java. Can anyone help?
Default implementation of ActiveMQ does utilize the polling.
Active MQ internally keep polling for the scheduled (or delayed) messages by a background scheduler thread. This thread read the list of scheduled events (or messages) and fires the jobs, reschedule repeating jobs as needed before firing the job event.
The list of scheduled events is stored in a sorted order in internal storage of activemq. So during poll, it just read event which are scheduled for earliest processing. Since the messages are persisted during enquing, scheduling many not have visible performance impact during processing.
However before adopting, you can setup your benchmark, without worries much internal implementation detail, to see that your performance/SLA requirement are getting met.
For more details, you may refer to Javadoc of job scheduler API. For default implementation can you refers to the code.
Hope this helps.
In looking at the source code mentioned by #skadya, the term "polling" is not what I interpret. It appears to use the Java Object class' wait(long timeout) method to determine when to "wake up" the thread that runs the jobs.
So, I wouldn't call it polling. I would call it an asynchronous mechanism in which the delay / timeout is set such that the thread will wake up (e.g. to run the next scheduled job at the appropriate time) via the timeout set to a value that is appropriate for the next scheduled job's commencement.
Javadoc for Object.wait(long timeout)
Note that the implementation for Object.wait is a native (i.e. non-java) implementation provided by the JDK / JRE / JVM for a given platform. For what that's worth.
It is possible to do performance test with activemq web console. There is an option to send message with configurable delay and number of messages to send. It doesn't answer my question but it seems like best option to compare two approaches.

Long running workflow in asp.net mvc

I'm developing an intranet site using asp.net mvc4 to manage some of our data. One important feature of this site is to trigger import/export jobs. These jobs can take anywhere between 5 minutes to 1 hour. Users of the site need to be able to determine whether a job is currently running as well as the status of prior jobs. Many jobs will often include warning messages concerning duplicate data and these warnings need to be visible on the site.
My plan is to implement these long running processes as a WCF Workflow Service that the asp.net site will interact with. I've got much of the business logic implemented via activities and have tested it using a simple console application. I should note I'm using a correlation handle in order to partition the service based on specific "Projects" on the site.
My problem is how do I go by querying the status of an active job (if one exists) as well as the warning messages of previous jobs. I suspect the best way to do this would be to use the AppFabric tracking service and have my asp.net query a SQL monitoring store and report back on the current status. After setting up AppFabric and adding custom tracking messages, I ran into a few issues. My first issue is that I cannot figure out how to filter out workflow instances that were not using the correct correlation handle as I'd like to show only workflows for a specific project. The other issue is that the tracking database can be delayed quite a bit which causes issues for me trying to determine if a workflow is currently running.
Another possible solution could be to have the workflow explicitly update a database with its current status and any error messages. I'm leaning towards this solution but could use some expert advice.
TL;DR: I need to know the best way to query the execution status and any warning messages of a WCF Workflow service.
As you want to query workflow status and messages even after the workflow is finished I would start by creating a table where you can convert the correlation values a client send to the related workflow ID. I would create a custom activity to do that and drop it right after the receive that creates the workflow.
Next I would create a regular WCF service the client app uses to query the status. This WCF service can query the WF persistence store to see if a given workflow is still running. If so the active bookmarks column will tell you what SOAP messages the workflow is currently waiting for.
As far as messages go you can either use the AppFabric tracking infrastructure to store and retrieve them or you could create a custom activity and store them in your own database. It really depends if you are also interested in the standard WF tracking messages generated.
Update on cheking for running workflow instances:
There are several downsides to adding an IsRunning message to your workflow. For one you would need to make sure one branch keeps looping and waiting for the message but stops as soon as the other real workflow branch is done. Certainly possible but it complicates the workflow and is a possible source of errors. And as it is not part of the business problem it really has no place in the workflow as far as I am concerned. It also means that you will have to load a workflow from disk and persist it back just to tell you that it is there. If it was finished you will need to wait for a fault to indicate there was no workflow instance. And that usually means you get a timeout exception after, by default, 60 seconds. Add throttling to that and you request might be queued because there are too many other workflow instances or SOAP request being processed. So a timeout might mean that a workflow instance exists but is unreachable due to system constraints. Instead I would opt for the simple thing and check if the record in the instance store is still available. The additional info from the active bookmarks column will tell you what the workflow is waiting on, information I have used in the past to dynamically update the UI by enabling/disabling UI elements.