Distributed JobRunr using single data source - kotlin

I want to create a scheduler using JobRunr that will run in two different server. This Scheduler will select data from SQL database and will call an api endpoint. How can I make sure these 2 schedulers running in 2 different server will not pick same data from database ?
Main concern is they should not call the API with duplicate data from 2 different server.
As per documentation JobRunr will push the job in the queue first, but I am wondering how one scheduler queue will know that the same data has not picked by other scheduler in different server, is they any extra locking mechanism I need to maintain ?

JobRunr will run any job only once - the locking is already done in JobRunr itself.

Related

Multiple service instances using Hangfire (shared tasks/objects), is it possible?

I need to run multiple instances of the same service, with the same database, for redundancy reason.
I found some question about "Hangfire multiple instances" but for a differenct purpose then mine: usually about running multiple instances for different tasks on the same database, or similar to this.
I need to know if there are problems of concurrency when 2 or more instances of Hangfire use the same Database (we want to use MongoDB) and if this is the solution to make the service resilient.
The goal is to have instance that take care of all the jobs when another instance goes down.
Any suggestion wellcome for covering this scenario.
In our environment, we have a replica set used by about 10 Hangfire servers. If there are multiple Hangfire servers servicing the same queue, it means they will share the load and whichever Hangfire server checks the queue first, picks up the job and continues. If you remove all but 1 server, the jobs will continue (as long as there are enough workers otherwise they will remain queued until a worker is available).
To answer your question, yes, you can have 2 or more Hangfire servers using the same MongoDB. MongoDB provides multi-threading support so its safe to have various servers accessing the same database backend. If you have two servers, both will be active and if one instance goes off line, other instance (based on queues) will continue to process the jobs in queue.
Keep in mind, Hangfire servers processes the jobs in Specific Queues. If both servers are part of the same queue then you are load balancing the jobs among the two servers. If they are part of different queues, then you read about that scenario where each Hangfire instance processes different jobs (because they are part of different queues).
Read about configuring Job Queues here

How to avoid DB deadlocks when multiple Kafka messages produced for same item?

We have 2 difference web applications. lets named them A and B.
When user change analyse of item in A app, A app do stuff things and produce a kafka mesage.
A rest API in B app consume the message via Confluent http sink connector.
The rest API in B app call SQL Stored Procedure that update records with transaction.
When (happens a lot) the user changing analyze of the same item in A app constantly- a deadlock caused in the DB. because the SP still works on records when another call for same item reach.
what is the best practice to handle this issue?
manage some global list with current items(IDs) enter to SP and remove them when SP finish? handle it on DB? other suggestion?
some relevant info:
the apps are ASP .Net Core.
stored in load balancing envoirment(AWS).
Any relevant answer is appreciated.
Thanks!
Make sure the same item is always published with the same key (eg use the hashcode of the item). This ensures that all requests from app A will go on the same topic partition.
In app B make sure the procedure call is done in the consumer polling thread (don't spawn a new thread) so that all procedure calls for the same item will be guaranteed to execute sequentially.
This will resolve deadlocks at the cost of performance. For multiple items you can scale horizontally with multiple consumers (as long as you have plenty of partitions). If performance on repeated requests for the same item is too slow then you have a more complex design issue to address.

Transactions in Redis - Read operations in another database

We are trying to implement caching for our multi-tenant application. We are planning to create new Redis DB for each tenant.
We have one scenario where we need to use Redis Transactions. While going through this post https://redis.io/topics/transactions, we found that
All the commands in a transaction are serialized and executed
sequentially. It can never happen that a request issued by another
client is served in the middle of the execution of a Redis
transaction. This guarantees that the commands are executed as a
single isolated operation.
Is this read blocking will only apply to database level or at full instance level?
The guarantee you quoted applies to the instance, not the database. A command for DB 2 will not run in the middle of a transaction for DB 1.
You can find more information about multiple databases (including an argument by the creator of Redis against using them at all) in this question.

What is the practice for scheduling multiple inter-dependent SQL Server Agent jobs?

The way my team currently schedules jobs is through the SQL Server Job Agent. Many of these jobs have dependencies on other internal servers which in turn have their own SQL Server Jobs that need to be run to keep their data up to date.
This has created dependencies in the start time and length of each of our SQL Server Jobs. Job A might depend on Job B finishing, so we schedule Job B a certain estimated time in advance to Job A. All of this process is very subjective and not scalable, as we add more jobs and servers which create more dependencies.
I would love to get out of the business of subjectively scheduling these jobs and hoping that the dominos fall in the right order. I am wondering what the accepted practices for scheduling SQL Server jobs are. Do people use SSIS to chain jobs together? Is there tooling already built into the SQL Server Job Agent to handle this?
What is the accepted way to handle the scheduling of multiple SQL Server jobs with dependencies on each other?
I have used Control-M before to schedule multiple inter-dependent jobs in different environment. Control-M generally works by using batch files (from what I remember) to execute SSIS packages.
We had a complicated environment hosting 2 data warehouses side by side (1 International and 1 US Local). There were jobs that were dependent on other jobs and those jobs on others and so on, but by using Control-M we could easily decide on the dependency (It has a really nice and intuitive GUI). Other tool that comes to my mind is Tidal Scheduler.
There is no set standard for job scheduling, but I think its safe to say that job schedules depend entirely on what an organization needs. For example Finance jobs might be dependent on Sales and Sales on Inventory and so on. But the point is, if you need to have job inter dependency, using a third party software such as Control-M is a safe bet. It can control jobs on different environments and give you real sense of the company wide job control.
We too had the requirement to manage dependencies between multiple agent jobs - after looking at various 3rd party tools and discounting them for various reasons (mainly down to the internal constraints relating to the use of 3rd party software) we decided to create our own solution.
The solution centres around a configuration database that holds details about processes (jobs) that need to run and how they are grouped (batches), along with the dependencies between processes.
Summary of configuration tables used:
Batch - highlevel definition of a group of related processes, includes metadata such as max concurrent processes, and current batch instance etc.
Process - meta data relating to a process (job) such as name, max wait time, earliest run time, status (enabled / disabled), batch (what batch the process belongs to), process job name etc.
Batch Instance - the active instance of a given batch
Process Instance - active instances of processes for a given batch
Process Dependency - dependency matrix
Batch Instance Status - lookup for batch instance status
Process Instance Status - loolup for process instance status
Each batch has 2 control jobs - START BATCH and UPDATE BATCH. The 1st deals with starting all processes that belong to it and the 2nd is the last to run in any given batch and deals with updating the outcome statuses.
Each process has an agent job associated with it that gets executed by the START BATCH job - processes have a capped concurrency (defined in the batch configuration) so processes are started up to a max of x at a time and then START BATCH waits until a free slot becomes available before starting the next process.
The process agent job steps call a templated SSIS package that deals with the actual ETL work and with the decision making around whether the process needs to run and has to wait for dependencies etc.
We are currently looking to move to a Service Broker solution for greater flexibility and control.
Anyway, probably too much detail and not enough example here so VS2010 project available on request.
I'm not sure how much this will help, but we ended up creating an email solution for scheduling.
We built an email reader that accesses an exchange mailbox. As jobs finish, they send an email to the mail reader to start another job. The other nice part, is that most applications have email notifications built in, so there really isn't much in the way of custom programming.
We really only built it in the first place to handle data files coming in from lots of other partners. It was much easier to give them an email address rather than setting them up with an ftp site, etc.
The mail reader app now has grown to include basic filtering, time of day scheduling, use of semaphores to prevent concurrent jobs, etc. It really works great.

Database Job Scheduling

I have a procedure written in PLJava that sends out updates over JMS in my postgres database.
What I would like to do is have that function called on an interval (every 15 seconds) internally in the database (preferably not from an outside process). Is this possible? Any ideas?
If you need no external access, you are presumably able to modify the database design so that you don't need the update at all. Can you explain more about what the update is doing?
As depesz said, you could use either cron or pgAgent, but they are only able to go down to a one minute granularity, not 15 seconds. Considering sleeping inside the stored procedure until the next iteration is not a good idea, because you will have an open transaction for all that time which is a really bad idea.
Strict answer: it is not possible. Since you don't want outside process, and PostgreSQL doesn't support jobs - you are out of luck.
If you'll reconsider using outside processes, then you're most likely want something like cron, or better yet pgagent.
On absolutely other hand - what do you need to do that has to happen every 30 seconds? this seems like a problem with design.
First, you'll spend the least amount of effort if you just go with a cron job.
However, if you were starting from scracth: You are trying to periodically replicate rows from your database. I think you are looking at a replication queue.
The PGQ project (used for Londiste replication, both from Skype's SkyTools) has a queue that you can use independently. When configuring it, you set a maximum event count, and a loop delay, before batched events are generated. You can get batches spaced by no more than 15 seconds that way. You now have to produce the events that will be batched, using a trigger that calls pgq.insert_event; and consume the queues. The consumer can call your PL/Java stored proc; you'll have to rewrite the procedure to send everything in the batch instead of scanning the base table for new events.
As far as I know postgresql doesn't support scheduled tasks. You'll need to use a script with cron or at (depending on your operating system.)
Sounds like you're doing sort of replication? Every 15s sounds like a lot of updates. Could you setup a trigger (or a number of triggers) instead of polling?
If you are using JMS why not just have th task wait for input on the queue?
Per your depesz comment, you have a PL/Java stored procedure that "flushes out database tables (updates) as java objects". Since you want it to run in 15 second intervals, it must be processing a batch of updates each time. Rather than processing a batch of updates in a stored procedure every 15 seconds, why not process them one at a time when they happen via an after update trigger and eliminate the need for a timed interval. If you are aggregrating data from multiple tables to build your objects than add the triggers to you upper most tables only.
In my case the problem was that agent couldn't authorize to database so after I've made all connections trusted from localhost the service started successfully and job works fine
for more information about error you should see into windows event viewer or eq in unix based system. see my config file C:\Program Files\PostgreSQL\10\data\pg_hba.conf