I have a long running business process (weeks, not minutes). The start of the process is triggered by a user's action. The problem I have is that the next step in the saga will come from one of two places, either a second user interaction, or if after a period of say 1 week the the user hasn't performed an action then another point in the saga should be triggered.
Is a saga the correct mechanism to use in this business process? If so, how is this achieved? If a saga isn't suitable for this task, is there a better mechanism than simply executing nightly batch jobs against a database? My aversion to running a nightly batch job is simply the size of the database table I'd have to hit to query for the next point in the saga.
Yes - sagas are designed for exactly these scenarios.
Related
I am trying out RabbitMQ with springboot. I have a main process and within that process I am creating many number of small tasks that can be processed from other workers. From the main process perspective, I like to know when all of these tasks are completed so that it can move to next step. I did not find a easy way to query rabbitmq if the tasks are complete.
One solution I can think of is to store these tasks in a database and when each message is completed, update the database with COMPLETE status. Once all jobs are in COMPLETE status, the main process can know the jobs are COMPLETE and it can move to next step o fits process.
Another solution I can think of is that the main process maintain the list of jobs that is being sent to other workers. Once each worker completes it's job, it can send a message to the main process indicating the job is complete. Then the Main process can mark the job is complete and remove the item from the list.Once the list is empty, the main process will know the jobs are complete and it can move to next step of it's work.
I am looking to learn best practice on how other people have dealt this kind of situation. I appreciate for any suggestion.
Thank you!
There is no way to query RabbitMQ for this information.
The best way to approach this is with the use of a process manager.
The basic idea is to have your individual steps send a message back to a central process that keeps track of which steps are done. When that main process receives notice that all of the steps are done, it lets the system move on to the next thing.
The details of this approach are fairly complex, but I do have a blog post that covers the core of a process manager from a JavaScript/NodeJS perspective.
You should be able to find something like a "process manager" or "saga" as they are sometimes called, within your language and RabbitMQ framework of choice. If not, you should be able to write one for your process without too much trouble, as described in my blog post.
I'm using NServiceBus to handle some asynchronous tasks. Occasionally I have a task where I need to process 10,000 records, so this takes a few hours.
My problem is that when I handle these records all together, I cannot use NServiceBus default transaction handling.
Also - if I split these records up into 10,000 smaller messages, they will clog up MSMQ for a few hours, and users who are expecting functions to take a few minutes, will be waiting hours.
Is there a way in NServiceBus to prioritise different messages?
I'd consider breaking it down into smaller batches (not necessarily one message per record) and having a separate endpoint service specifically for this process so that other stuff is not held up. If breaking it into batches and you care about the when they all complete then I'd recommend using a saga to track that state.
Scenario:
We have a wcf workflow with a client that does NOT use transactionflow.
The workflow contains several sequential TransactedReceiveScopes (using content-based correlation).
The TransactedReceiveScopes contain custom db operations.
Observations:
When we run SQL profiler against the first call, we see all the custom db calls, and the SaveInstance call in the profile trace.
We've noticed that, even though the SendReply is at the very end of TransactedReceiveScope, sometimes the sendreply occurs a good 10 seconds before the transaction gets committed.
We tried changing the TimeToPersist and TimeToUnload to zero, but that had no effect. (The trace shows the SaveInstance happening immediately anyway, but rather the commit seems to be delayed).
Questions:
Are our observations correct?
At what point is the transaction committed? Is this like garbage collection - i.e. it commits some time later when it's not busy?
Is there any way to control the commit delay, or is the only way to do this to use transactionflow from the client (anc then it should all commit when the client commits, including the persist).
The TransactedReceiveScope commits the transaction when the body is completed but as all execution is done through the scheduler that could be some time later. It is not related to garbage collection and there is no real way to influence it other that to avoid a busy machine and a lot of other parallel activities that could also be in the execution queue.
I wish to develop an application in VB.NET to provide to following functionality and hope you can give me some pointers on which direction to take.
I need some kind of “server” type component which sits in the background monitoring request from users and performing various task. (this component can be install locally or centrally)
The users submit an instruction to the “server” to perform a certain task at a designated date and time. (or perform the task straight away)
The “server” would perform the task at the desired date and time and inform the user the result of the task.
I have thought of using a central database to which the user writes the instructions. The “server” could read from the database to obtain the instructions, and write the result back to the database.
I want a fast reaction to the instructions, so the “server” must poll the database every few seconds; I fear this may be detrimental to performance. Also how do I get the server to perform the task at the desired time?
Again checking all outstanding tasks against the current time is not very efficient, so I thought about utilising the Windows Scheduler, but I am not sure of the best way of integrating this functionality.
I would be grateful for any ideas, pointers or suggestions.
Have you looked at quartz.net? It's a scheduling framework which might be useful to you.
We have a similar system where we work, utilising a webservice to accept requests, run them when required, and notify callers with the results if necessary.
In our case the callers were other applications and not people.
The web service consisted of the following methods: (rough version, not exact)
int AddJob(string jobType, string input, datetime startTime) // schedules job and sets timer to call StartJobs when needed, and then returns job id
void GetResults(int jobId, out string status, out string output) // gets results (status="queued / running / completed / failed")
void StartJobs() //called via a timer as needed to kick off scheduled jobs
We also built in checks to limit how many jobs of could run simultaneously, and whether they could retry if they failed, and emails admins if any jobs fail the last attempt.
Our version is much more comprehensive than this, with the jobs actually being webservices themselves, supporting simultaneous running, built-in workflow so jobs can wait on others, but maybe it will give you some ideas. It's not a trivial project, but was fun to implement!
I have a procedure written in PLJava that sends out updates over JMS in my postgres database.
What I would like to do is have that function called on an interval (every 15 seconds) internally in the database (preferably not from an outside process). Is this possible? Any ideas?
If you need no external access, you are presumably able to modify the database design so that you don't need the update at all. Can you explain more about what the update is doing?
As depesz said, you could use either cron or pgAgent, but they are only able to go down to a one minute granularity, not 15 seconds. Considering sleeping inside the stored procedure until the next iteration is not a good idea, because you will have an open transaction for all that time which is a really bad idea.
Strict answer: it is not possible. Since you don't want outside process, and PostgreSQL doesn't support jobs - you are out of luck.
If you'll reconsider using outside processes, then you're most likely want something like cron, or better yet pgagent.
On absolutely other hand - what do you need to do that has to happen every 30 seconds? this seems like a problem with design.
First, you'll spend the least amount of effort if you just go with a cron job.
However, if you were starting from scracth: You are trying to periodically replicate rows from your database. I think you are looking at a replication queue.
The PGQ project (used for Londiste replication, both from Skype's SkyTools) has a queue that you can use independently. When configuring it, you set a maximum event count, and a loop delay, before batched events are generated. You can get batches spaced by no more than 15 seconds that way. You now have to produce the events that will be batched, using a trigger that calls pgq.insert_event; and consume the queues. The consumer can call your PL/Java stored proc; you'll have to rewrite the procedure to send everything in the batch instead of scanning the base table for new events.
As far as I know postgresql doesn't support scheduled tasks. You'll need to use a script with cron or at (depending on your operating system.)
Sounds like you're doing sort of replication? Every 15s sounds like a lot of updates. Could you setup a trigger (or a number of triggers) instead of polling?
If you are using JMS why not just have th task wait for input on the queue?
Per your depesz comment, you have a PL/Java stored procedure that "flushes out database tables (updates) as java objects". Since you want it to run in 15 second intervals, it must be processing a batch of updates each time. Rather than processing a batch of updates in a stored procedure every 15 seconds, why not process them one at a time when they happen via an after update trigger and eliminate the need for a timed interval. If you are aggregrating data from multiple tables to build your objects than add the triggers to you upper most tables only.
In my case the problem was that agent couldn't authorize to database so after I've made all connections trusted from localhost the service started successfully and job works fine
for more information about error you should see into windows event viewer or eq in unix based system. see my config file C:\Program Files\PostgreSQL\10\data\pg_hba.conf