Database table as a job queue for multiple instances

Database table as a job queue for multiple instances - sql

Let's say I have a table with some jobs that need to be executed, that is read by many instances of the same service.
I have a status column, and each instance is querying first row which is not taken yet (filtering by status) and then changing this status within the same transaction.
How do I prevent multiple instances reading the same job and then updating its status and executing it simultaneously?

Related

Can this SQL operation be done without doing it row by row (RBAR)?

I have a set of tasks, with some tasks being more important than others
Each task does some work on one or more databases.
These tasks are assigned to workers that will perform the task (threads in an application poll a table).
When the worker has done the task, it sets the value back to null to signal it can accept work again.
When assigning the tasks to the workers, I would like to impose an upper limit on the number of database connections that can be used at any one time - so a task that uses a database that is currently at it's limit will not be assigned to a worker.
I can get the number of database connections available by subtracting the databases of tasks that are currently assigned to workers from the database limits.
My problem is this, how do I select tasks that can run, in order of importance, based on the number of database connections available, without doing it row by row?
I'm hoping the example below illustrates my problem:
On the right is available database connections, decreasing as we go down the list of tasks in order of importance.
If I'm selecting them in order of the importance of a task, then the connections available to the next task depend on whether the previous one was selected, which depends on if there was space for all it's database connections.
In the case above, task 7 can run only because task 6 couldn't
Also task 8 can't run because task 5 took the last connection to database C as it's a more important task.
Question:
Is there a way to work this out without using while loops and doing it row by row?

Accessing first row and deleting it instantly

I have a table where it stores tasks, that need to be executed into a Task Queue table as I am using multi threading to get top 1 from that table and then executing that task. i am getting top 1 record from Task Queue and then I am deleting that record. So for example, if another thread executes before previous thread deletes the task that it picked then both threads may pic same thread. I want know if there is a way to stop other reading from the database until my current thread deletes the thread that it picked?

Rather than doing a SELECT followed by a DELETE, you may instead perform a DELETE with OUTPUT clause. The OUTPUT clause produces a result set but you're now obtaining that result set directly from the DELETE and so it's a single atomic operation - two independent executions will not produce the same output row.

Are insert in Oracle sequential

I have a table in Oracle.
I am creating multiple Batch jobs.
Each Batch job is inserting some number of records in the table.
I wanted to know whether there insert statements are sequential executed ?

In particular job, if it's runs in one transaction answer is 'YES', but if you want to find out sequence through all jobs , which starts same time, it all depends of particular situation and you jobs implementation. For example if we have 2 jobs and one them starts earlier then other, but first one needs more time to collect data then we can't say that first job inserts will be done earlier then second. There are many factors which affect to order of records insert. So if it's critical for you to control order then you should implement consistency yourself, using timestamp checks or object locks.

Global Temporary Tables - locking rows + Concurrency question

I have a list of 100 entries that I want to process with multiple threads. Each thread will take up to 20 entries to process.
I'm currently using global temp tables to store the entries that meet certain criteria -- I also do not want threads to overlap entries to process.
How do I do this (preventing the overlap)?
Thanks!

If on 11g, I'd use the SELECT ... FOR UPDATE SKIP LOCKED.
If on a previous version, I'd use Advanced Queuing to populate a queue with the primary key values of the entries to be processed, and have your threads dequeue those keys to process those records. Because the dequeue can (but doesn't have to be, if memory serves) within the processing transactional scope, the dequeue commits or rolls back with the processing, and no two threads can get the same records to process.

There are two issues here, so let's handle them separately:
How do you split some work among several threads/sessions?
You could use Advanced Queuing or the SKIP LOCKED feature as suggested by Adam.
You could also use a column that contains processing information, for example a STATE column that is empty when not processed. Each thread would start work on a row with:
UPDATE your_table
SET state='P'
WHERE STATE IS NULL
AND rownum = 1
RETURNING id INTO :id;
At this point the thread would commit to prevent other thread being locked. Then you would do your processing and select another row when you're done.
Alternatively, you could also split the work beforehand and assign each process with a range of ids that need to be processed.
How will temporary tables behave with multiple threads?
Most likely each thread will have its own Oracle session (else you couldn't run queries in parallel). This means that each thread will have its own virtual copy of the temporary table. If you stored data in this table beforehand, the threads will not be able to see it (the temp table will always be empty at the beginning of a session).
You will need regular tables if you want to store data accessible to multiple sessions. Temporary tables are fine for storing data that is private to a single session, for example intermediate data in a complex process.

Easiest will be to use DBMS_SCHEDULER to schedule a job for every row that you want to process. You have to pass a key to a permanent table to identifiy the row that you want to process, or put the full row in the arguments for the job, since a temporary table's content is not visible in different sessions. The number of concurrent jobs are controlled by resource manager, mostly limited by the number of cpus.
Why would you want to process row by row anyway? Set operations are in most occasions a lot faster.

Is SSIS suitable for my problem (database replication for query)

I have a challenge that I am trying to solve and I can't work out from the documentation or the examples if SSIS is suitable for my problem.
I have 2 tables (jobs and tasks). Jobs represent a large piece of work, while tasks are tied to jobs. There will typically be anything from 1 task per job to 1,000,000 tasks per job. Each task has a column storing the job_id. The job_id in the jobs table is the primary key.
Every N hours, I want to do the following:
Take all of the job rows where the jobs have completed since I last ran (based on having an end_time value and that value being within the time between now and when I last ran) and add these to the jobs column in the 'query' database.
Copy all of the tasks that have a job_id from the jobs that were included in step 1 into the tasks column in the 'query' database.
Basically, I want to be able to regularly update my query database, but I only want to include completed jobs (hence the requirement of an end_time) and tasks from those completed jobs.
This is likely to be done 2 - 3 times per day so that users are able to query an almost-up-to-date copy of the live data.
Is SSIS suitable for this task, and if so, can you please advise some documentation to show where a column from the results from 1 step are used as the criteria for a 2nd step ?
Thanks in advance...

Sure SSIS can do that.
If you want to be sure that the child record are moved, then use a query for your data flow source for te second data flow. You insert the records to the main table in the first data flow. Then you use a query that picks any records in the source child table that are not inthe destination child table and that have records in the parent destination table. This way you catch any changes to existing closed records as well (you know there will be some, some one will close a job too soon and reopen and add something to it.)
Alternatively, you can add the records you are moving to a staging table. Then join to that table when doing the dataflow for the child tables. This will ensure that exactly the records you moved are the ones the child tables are populated for.
Or if you are in a denormalized datawarehouse just write a query that joins the parent and child tables together with the where clause for end date is not null. Of course don't forget to check for records that aren't currently in the datawarehouse.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas