I have a table that contains order numbers and order States (IN_PROGRESS, CANCELED, READY_FOR_PROC). I need to write a query that would return any single string in the state READY_FOR_PROC.The problem is that this query will be executed by multiple threads. And everyone should get a record that has not yet been processed by other threads (without duplicates).I tried to do this with SELECT FOR UPDATE skip locked and rownum=1, but then all executed queries except one return empty (if the first thread blocked the record for a long time). How do I write such a query?
I use Oracle if it's important
Change your skip locked slightly to do this:
cursor C is SELECT ... FROM ... FOR UPDATE SKIP LOCKED
then in your code, you just fetch
FETCH C INTO ...
It is the rownum=1 that is causing your issue.
Related
I have an application that selects row with a particular status and then starts processing these rows. However some long running processing can cause a new instance of my program to be started, selecting the same rows again because it haven't had time to update the status yet. So I'm thinking of selecting my rows and then updating the status to something else so they cannot be selected again. I have done some searching and got the impression that the following should work, but it fails.
UPDATE table SET status = 5 WHERE status in
(SELECT TOP (10) * FROM table WHERE status = 1)
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
TLDR: Is it possible to both select and update rows at the same time? (The order doesn't really matter)
You can use an output clause to update the rows and then return them as if they were selected.
As for updating the top 100 rows only, a safe approach is to use a cte to select the relevant rows first (I assumed that column id can be used to order the rows):
with cte as (select top (100) * from mytable where status = 1 order by id)
update cte set status = 5 output inserted.*
You can directly go for UPDATE statement. It will generate exclusive lock on the row and other concurrent transactions cannot read this row.
More information on locks
Exclusive locks: Exclusive locks are used to lock data being modified by one transaction thus preventing modifications by other
concurrent transactions. You can read data held by exclusive lock only
by specifying a NOLOCK hint or using a read uncommitted isolation
level. Because DML statements first need to read the data they want to
modify you'll always find Exclusive locks accompanied by shared locks
on that same data.
UPDATE TOP(10) table
SET status = 5 WHERE status =1
I have a big dataset in Oracle which I need to process some of the rows which has a column PROCESSED= 0.
I have multiple instances of an application which will read 1 row at a time and perform the processing. To avoid multiple threads to access to the same row - I am using
SELECT * FROM FOO WHERE ROWNUM = 1 FOR UPDATE
If I execute the above query, the first thread is locking the row and the other rows are not able to fetch any rows as the ROWUM = 1 is already locked by the first thread. What I am trying to achieve is to fetch the "next unlocked" row.
Is there an efficient way to do it via SQL?
Looks like SKIP LOCKED is what are you looking for.
See documentation
select * from foo for update skip locked
will select only those rows which are not locked by other transactions
I have a parallel process that is using queue table in PostgreSQL. Logic is:
Begin transaction.
Mark 100 unprocessed records with some random generated ID.
Commit.
Run some heavy app logic that takes some time and is processing queue records with generated ID in step 2.
Update 100 processed records with success/bad status.
Up to 20 threads are doing those steps.
However, sometimes when I'm trying to do 2 step with query:
UPDATE QUEUE_TABLE
SET QUEUE_TXN_GUID=$RANDOM_GUID,
QUEUE_STATUS=1
WHERE QUEUE_ROW_GUID IN
(SELECT QUEUE_ROW_GUID from QUEUE_TABLE
WHERE QUEUE_STATUS IS NULL OR QUEUE_STATUS = -1
LIMIT 100 FOR UPDATE SKIP LOCKED) RETURNING QUEUE_ROW_GUID
I got error deadlock detected.
Query that I'm using in step 5 is
UPDATE QUEUE_TABLE SET CDC_QUEUE_REZ_STATUS=$STATUS WHERE CDC_QUEUE_REZ_TXN_GUID=$RANDOM_GUID;
I don't know why I'm getting this strange deadlock, with FOR UPDATE SKIP LOCKED in first update subquery.
The reason of the issue is the fact that there are duplicates in QUEUE_ROW_GUID. Select locks some rows but then query updates not those rows that were locked. That's why concurrently running query may try to update the same rows as this one. So the SKIP LOCKED does not work in this case.
Given that update of rows may happen in different order the first query (that tries to update say row 1 and row 2) may first update row 1 and then try to update row 2 but waits on lock. Concurrently running query (that tries to update 1 and 2 as well) already updated row 2 and waits for lock for row 1. Hence the deadlock.
You need to use unique identifiers to update rows after they are locked.
Currently, we are updating many rows at the same time using this statement:
update my_table set field = 'value' where id in (<insert ids here>);
My worry is, it might cause a deadlock with another query that we run in intervals:
select my_table where field = 'value' for update order by id;
The query above will fetch multiple rows.
Is this scenario possible?
Just a bit of background:
We added the order by id before since when we run the query above multiple times at the same time, we were having random deadlocks due to different orders by that query.
We were wondering if this applies to update statements as well.
Yes, these can deadlock. To avoid this, run the select ... for update order by id in the same transaction immediately before the update. This will lock all rows affected and avoid any other transaction from running the same select ... for update query.
I am not saying consolidate the same two tasks. I am saying use the same locking select in both.
we have some persistent data in an application, that is queried from a server and then stored in a database so we can keep track of additional information. Because we do not want to query when an object is used in the memory we do an select for update so that other threads that want to get the same data will be blocked.
I am not sure how select for update handles non-existing rows. If the row does not exist and another thread tries to do another select for update on the same row, will this thread be blocked until the other transaction finishes or will it also get an empty result set? If it does only get an empty result set is there any way to make it block as well, for example by inserting the missing row immediately?
EDIT:
Because there was a remark, that we might lock too much, here some more details on the concrete usage in our case. In reduced pseudocode our programm flow looks like this:
d = queue.fetch();
r = SELECT * FROM table WHERE key = d.key() FOR UPDATE;
if r.empty() then
r = get_data_from_somewhere_else();
new_r = process_stuff( r );
if Data was present then
update row to new_r
else
insert new_r
This code is run in multiple thread and the data that is fetched from the queue might be concerning the same row in the database (hence the lock). However if multiple threads are using data that needs the same row, then these threads need to be sequentialized (order does not matter). However this sequentialization fails, if the row is not present, because we do not get a lock.
EDIT:
For now I have the following solution, which seems like an ugly hack to me.
select the data for update
if zero rows match then
insert some dummy data // this will block if multiple transactions try to insert
if insertion failed then
// somebody beat us at the race
select the data for update
do processing
if data was changed then
update the old or dummy data
else
rollback the whole transaction
I am neither 100% sure however that this actually solves the problem, nor does this solution seem good style. So if anybody has to offer something more usable this would be great.
I am not sure how select for update handles non-existing rows.
It doesn't.
The best you can do is to use an advisory lock if you know something unique about the new row. (Use hashtext() if needed, and the table's oid to lock it.)
The next best thing is a table lock.
That being said, your question makes it sound like you're locking way more than you should. Only lock rows when you actually need to, i.e. write operations.
Example solution (i haven't found better :/)
Thread A:
BEGIN;
SELECT pg_advisory_xact_lock(42); -- database semaphore arbitrary ID
SELECT * FROM t WHERE id = 1;
DELETE FROM t WHERE id = 1;
INSERT INTO t (id, value) VALUES (1, 'thread A');
SELECT 1 FROM pg_sleep(10); -- only for race condition simulation
COMMIT;
Thread B:
BEGIN;
SELECT pg_advisory_xact_lock(42); -- database semaphore arbitrary ID
SELECT * FROM t WHERE id = 1;
DELETE FROM t WHERE id = 1;
INSERT INTO t (id, value) VALUES (1, 'thread B');
SELECT 1 FROM pg_sleep(10); -- only for race condition simulation
COMMIT;
Causes always correct order of transactions execution.
Looking at the code added in the second edit, it looks right.
As for it looking like a hack, there's a couple options - basically it's all about moving the database logic to the database.
One is simply to put the whole select for update, if not exist then insert logic in a function, and do select get_object(key1,key2,etc) instead.
Alternatively, you could make an insert trigger that will ignore attempts to add an entry if it already exists, and simply do an insert before you do the select for update. This does have more potential to interfere with other code already in place, though.
(If I remember to, I'll edit and add example code later on when I'm in a position to check what I'm doing.)