RabbitMQ job queus completion indicator event - rabbitmq

I am trying out RabbitMQ with springboot. I have a main process and within that process I am creating many number of small tasks that can be processed from other workers. From the main process perspective, I like to know when all of these tasks are completed so that it can move to next step. I did not find a easy way to query rabbitmq if the tasks are complete.
One solution I can think of is to store these tasks in a database and when each message is completed, update the database with COMPLETE status. Once all jobs are in COMPLETE status, the main process can know the jobs are COMPLETE and it can move to next step o fits process.
Another solution I can think of is that the main process maintain the list of jobs that is being sent to other workers. Once each worker completes it's job, it can send a message to the main process indicating the job is complete. Then the Main process can mark the job is complete and remove the item from the list.Once the list is empty, the main process will know the jobs are complete and it can move to next step of it's work.
I am looking to learn best practice on how other people have dealt this kind of situation. I appreciate for any suggestion.
Thank you!

There is no way to query RabbitMQ for this information.
The best way to approach this is with the use of a process manager.
The basic idea is to have your individual steps send a message back to a central process that keeps track of which steps are done. When that main process receives notice that all of the steps are done, it lets the system move on to the next thing.
The details of this approach are fairly complex, but I do have a blog post that covers the core of a process manager from a JavaScript/NodeJS perspective.
You should be able to find something like a "process manager" or "saga" as they are sometimes called, within your language and RabbitMQ framework of choice. If not, you should be able to write one for your process without too much trouble, as described in my blog post.

Related

Hangfire - Is there a way to attach additional meta data to jobs when they are created to be able to identify them later?

I am looking to implement Hangfire within an Asp.Net Core application.
However, I'm struggling to understand how best to prevent the user from creating duplicate Hangfire "Fire-and-Forget" jobs.
The Problem
Say the user, via the app, creates a job that does some processing relating to a specific client. The process may take several minutes to complete. I want to be able to prevent the user from creating another job for the same client while there are other jobs for that client still being processed by Hangfire (i.e. there can only be 1 processing job for a specific client at any one time, although several different clients could also each have their own job being processed).
Solution?
I need a way to attach additional meta-data (in this example, the client id) to each job as it is created, which I can then use to interrogate the jobs currently processing in Hangfire to see if any of them relate to the client id in question.
It seems like such a basic feature that would prove so useful for such scenarios, but I'm coming to the conclusion that such a thing isn't supported, which surprises me.
... Unless you know different.
Hangfire looks great, and I'm keen to use it, but this might be a show-stopper for me.
Any advice would be greatly received.
Thanks
I need a way to attach additional meta-data (in this example, the
client id) to each job as it is created
Adding metadata to jobs can be achieved by the mean of hangfire filters.
You may have a look at this answer.
https://stackoverflow.com/a/57396553/1236044
Depending on your needs you may use more filters types.
For example, the IElectStateFilter may be useful to filter out jobs if another one is currently processing.
I you have several processing servers, you will need your own storage solution to handle your own custom currently processing/priority/locking mechanism.

How do I wait for all work to complete in Akka.Net?

I have successfully sent work to a pool of actors to perform my work, but now I want to do some aggregation on the results returned by all the workers. How do I know that everyone is done?
The best I have come up with is to maintain a set of requests ids and wait for that set to go to zero, but this seems inelegant.
Generally, you want to use what we call the "Commander" pattern for this. Essentially, you have one stateful actor (the Commander) that is responsible for starting and monitoring the task. You then farm out the actual work across the actor pool, and have them report back to the Commander as they finish. The commander can then track the progress of the job by calculating # completions / size of worker pool.
This way, the workers can be monitored and restarted independently as they do the work, but all the precious task-level state and information lives in the Commander (this is called the "Error Kernel pattern")
You can see an example of this in the Akka.NET scalable webcrawler demo.

SSIS 2005 Control Flow Priority

The short version is I am looking for a way to prioritize certain tasks in SSIS 2005 control flows. That is I want to be able to set it up so that Task B does not start until Task A has started but Task B does not need to wait for Task A to complete. The goal is to reduce the amount of time where I have idle threads hanging around waiting for Task A to complete so that they can move onto Tasks C, D & E.
The issue I am dealing with is converting a data warehouse load from a linear job that calls a bunch of SPs to an SSIS package calling the same SPs but running multiple threads in parallel. So basically I have a bunch of Execute SQL Task and Sequence Container objects with Precedent Constraints mapping out the dependencies. So far no problems, things are working great and it cut our load time a bunch.
However I noticed that tasks with no downstream dependencies are commonly being sequenced before those that do have dependencies. This is causing a lot of idle time in certain spots that I would like to minimize.
For example: I have about 60 procs involved with this load, ~10 of them have no dependencies at all and can run at any time. Then I have another one with no upstream dependencies but almost every other task in the job is dependent on it. I would like to make sure that the task with the dependencies is running before I pick up any of the tasks with no dependencies. This is just one example, there are similar situations in other spots as well.
Any ideas?
I am late in updating over here but I also raised this issue over on the MSDN forums and we were able to devise a partial work around. See here for the full thread, or here for the feature request asking microsoft to give us a way to do this cleanly...
The short version is that you use a series of Boolean variables to control loops that act like roadblocks and prevent the flow from reaching the lower priority tasks until the higher priority items have started.
The steps involved are:
Declare a bool variable for each of the high priority tasks and default the values to false.
Create a pre-execute event for each of the high priority tasks.
In the pre-execute event create a script task which sets the appropriate bool to true.
At each choke point insert a for each loop that will loop while the appropriate bool(s) are false. (I have a script with a 1 second sleep inside each loop but it also works with empty loops.)
If done properly this gives you a tool where at each choke point the package has some number of high priority tasks ready to run and a blocking loop that keeps it from proceeding down the lower priority branches until said high priority items are running. Once all of the high priority tasks have been started the loop clears and allows any remaining threads to move on to lower priority tasks. Worst case is one thread sits in the loop while waiting for other threads to come along and pick up the high priority tasks.
The major drawback to this approach is the risk of deadlocking the package if you have too many blocking loops get queued up at the same time, or misread your dependencies and have loops waiting for tasks that never start. Careful analysis is needed to decide which items deserved higher priority and where exactly to insert the blocks.
I don't know any elegant ways to do this but my first shot would be something like this..
Sequence Container with the proc that has to run first. In that same sequence container put a script task that just waits 5-10 seconds or so before each of the 10 independent steps can run. Then chain the rest of the procs below that sequence container.

Service Broker Design

I’m looking to introduce SS Service Broker,
I have a remote orders database and a local processing database, all activity on the processing database has to happen in sequence, this seems a perfect job for Service Broker!
I’ve set up the infrastructure, I can send and receive messages and now I’m looking at the design of the processing. As I said all processes for one order need to be completed in sequence so I’ll put them in one conversation.
One of these processes is a request for external flat file data, we then wait (could be several days) and then import and process this file when it returns. How can I process half the tasks, then wait for the flat file to return before processing the other half.
I’ve had some ideas but I’m sure I’m missing a trick somewhere
1) Write all queue items to a status table and use status values – seems to remove some of the flexibility of SSSB and add another layer of tasks
2) Keep the transaction open until we get the data back – not ideal
3) Have the flat file import task continually polling for the file to appear – this seems inefficient
What is the most efficient way of managing this workflow?
thanks in advance
In my opinion it is like chain of responsibility. As far as i can understand we have the following workflow.
1.) Process for message.
2.) Wait for external file, now this can be a busy wait or if external data provides you a notification then we can actually do it in non-polling manner.
3.) Once data is received then process the data.
So my suggestion would be to use 3 different Queues one for each part, when one is done it will forward or put a new message in chained queue.
I am assuming, one order processing will not disrupt another order processing.
I am thinking MSMQ with Windows Sequential Work flow, might also be a candidate for this task.

Design for VB.NET scheduler application

I wish to develop an application in VB.NET to provide to following functionality and hope you can give me some pointers on which direction to take.
I need some kind of “server” type component which sits in the background monitoring request from users and performing various task. (this component can be install locally or centrally)
The users submit an instruction to the “server” to perform a certain task at a designated date and time. (or perform the task straight away)
The “server” would perform the task at the desired date and time and inform the user the result of the task.
I have thought of using a central database to which the user writes the instructions. The “server” could read from the database to obtain the instructions, and write the result back to the database.
I want a fast reaction to the instructions, so the “server” must poll the database every few seconds; I fear this may be detrimental to performance. Also how do I get the server to perform the task at the desired time?
Again checking all outstanding tasks against the current time is not very efficient, so I thought about utilising the Windows Scheduler, but I am not sure of the best way of integrating this functionality.
I would be grateful for any ideas, pointers or suggestions.
Have you looked at quartz.net? It's a scheduling framework which might be useful to you.
We have a similar system where we work, utilising a webservice to accept requests, run them when required, and notify callers with the results if necessary.
In our case the callers were other applications and not people.
The web service consisted of the following methods: (rough version, not exact)
int AddJob(string jobType, string input, datetime startTime) // schedules job and sets timer to call StartJobs when needed, and then returns job id
void GetResults(int jobId, out string status, out string output) // gets results (status="queued / running / completed / failed")
void StartJobs() //called via a timer as needed to kick off scheduled jobs
We also built in checks to limit how many jobs of could run simultaneously, and whether they could retry if they failed, and emails admins if any jobs fail the last attempt.
Our version is much more comprehensive than this, with the jobs actually being webservices themselves, supporting simultaneous running, built-in workflow so jobs can wait on others, but maybe it will give you some ideas. It's not a trivial project, but was fun to implement!