waitlist in postgres, deadlock - sql

I'm trying to create a waiting list in Postgres. Minimal code:
CREATE TABLE IF NOT EXISTS applies(
created TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
user_id SMALLINT REFERENCES users(id) ON DELETE CASCADE,
service_id SMALLINT REFERENCES services(id) ON DELETE CASCADE,
status SMALLINT DEFAULT 0, --0: new 1: waitlisted, 2: applied
PRIMARY KEY (user_id, service_id)
-- ...
);
CREATE INDEX ON applies (service_id);
status is important, because I want to be able to notify users when they are applied after waitlisted. But don't want to notify them based on that they are in the first n position if they wasn't waitlisted at all.
Services are limited to a given number of users. There are multiple choices how the system decides which user gets the service. The simplest is first come first served. That's the reason it has two phases (but it shouldn't change the use case in any way).
Possible requests are:
Insert: a user applies/enrolls for a service
Inserting (user_id, service_id, CURRENT TIMESTAMP, state: 0 (new))
a Updating state to 1 (waitlisted) or 2 (applied) based on the COUNT(*) with the same service_id
Remove: a user cancels his/her application for a service
Removing the given application
If there's someone in waitlisted state, move to applied, and send notification about it
My first try was a naive implementation. Let's say service limit is 10.
1/2 OnAdd:
UPDATE applies
SET status = (
SELECT CASE WHEN COUNT(*) <= 10 THEN 2 ELSE 1 END
FROM applies
WHERE service_id = 7918
AND created <= '2021-08-16 16:20:34.161274+00:00:00'
)
WHERE user_id = 5070
AND service_id = 7918
RETURNING status
2/2 OnRemove:
SELECT user_id
FROM applies
WHERE status = 1
AND service_id = 7915
ORDER BY created
LIMIT 1
and then ( I know they could be joined)
UPDATE applies
SET status = 2
WHERE status = 1
AND user_id = 5063
AND service_id = 7915
It worked for a sequential test, but the multi-threading test showed cases when there were more than 10 rows with applied state.
So I put them in a transaction started with SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, then REPEATABLE READ, they gave me a lot of ERROR #40001 could not serialize access due to concurrent update. Then the same with READ COMMITTED. It was way much better, than the naive version, but it ended in over-applications as well.
Then I started using FOR NO KEY UPDATE in selects instead, but they always ended in a deadlock very soon. I searched a lot on deadlocks, but couldn't find anything useful.
So I came up with a version, where OnAdd and OnRemove had very similar queries, only differing in selecting user_id, where I didn't add FOR UPDATE. I had to change 1/1 Insert so the default state is waitlisted.
OnAdd:
UPDATE applies
SET status = 2
WHERE service_id = 7860
AND 10 > (
SELECT COUNT(*)
FROM (
SELECT service_id, user_id
FROM applies
WHERE service_id = 7860
AND status = 2
FOR NO KEY UPDATE
) as newstate)
AND user_id = 5012 RETURNING status
OnRemove:
UPDATE applies
SET status = 2
WHERE service_id = 7863
AND 10 > (
SELECT COUNT(*)
FROM (
SELECT service_id, user_id
FROM applies
WHERE service_id = 7863
AND status = 2
FOR NO KEY UPDATE
) as newstate)
AND user_id = (
SELECT user_id
FROM applies
WHERE service_id = 7863
And status = 1
ORDER BY created
LIMIT 1
)
RETURNING user_id
But in the multi-threading test it ended up in a deadlock too.
Edit:
As melcher suggested below, I added a column instead of counting. Not to a separate table, but inside services.
I added a transaction to OnAdd and OnRemove that starts with
SELECT *
FROM services
WHERE id = ?
FOR NO KEY UPDATE
Sometimes in the multithreading test it under-applied. So I joined Remove to be in the same transaction with OnRemove, and that works finally.

Based on my understanding of what you're trying to do and how databases work - you're going to need a shared resource that gets locked OnAdd.
The reason is that two threads that are both trying to 'add' at the same time must compete for a shared resource so that only one of them wins and the other errors / fails. You cannot accomplish your goal using a count of rows.
One solution would be a locks table:
CREATE TABLE IF NOT EXISTS applies(
service_id SMALLINT REFERENCES services(id) ON DELETE CASCADE,
applied_count SMALLINT
);
And then:
Open transaction (if outside of a procedure)
Acquire an exclusive/write lock to services row in the lock table
IF the limit/constraint is satisfied (e.g. applied_count < 10) then...
Mark the "applies" as applied UPDATE applies SET status = 2
Update lock table (e.g. SET applied_count = applied_count + 1)
Commit transaction

Related

How make one step from two steps in SQL - users working simultaneously

currently I have in one system implemented solution for my problem. But I think there exists better solution, but I have not found it yet. I hope someone could find couliton or hint in me some direciton.
I have system for spliting different kind of jobs. Each work is in PostgreSQL database in table "jobs", where one row is one job (can be done multiple times). Each job has an attribute how many times it can be done (how many users can work on it). So my table jobs works like this:
ID_JOB NAME DONE HAS_TO_BE_DONE
1 Puzzle Solving 2 3
2 Washing Dishes 1 3
When users comes and ask for job:
User gets id of job when job meets condition DONE < HAS_TO_BE_DONE
For given job value DONE is incremented ( +1 )
It works, but when more users are working simultaneously I need to everytime LOCK (type: SERIALIZABLE) database (even for reading).
If I would not used database locking probled could occur:
User #1 comes - he is in step 1, (he knows Id of job he got), in same time (a few miliseconds later) comes User #2 and will ask for job and will be in step 1 and will get same job as User #1 because #User 1 did not managed step 2 yet (and didn't incremente DONE). This is reason why I have to completely lock database.
Does anyone know how to improve this method?
Currently it works, but in moment when 100 users are working simultaneously database locking is causing slowing application. Problems are these two steps when in first step I will find job for user and then in second step I increment value.
If anyone would come with better solution I would be happy. Thank you
The simplest approach is to check and update in a single statement.
update jobs
set done = done + 1
where done < has_to_be_done
and id = ?
If it updates a row, there was a job to be done. If it doesn't, there was no job to be done.
But there's a problem. You only have two statuses: job needs to be done, job is done. What if 5 users are all working on the same job, but it only needs to be done twice? That's wasteful. You need three statuses:
queued
in progress
done
You could add these as columns to the table and increment/decrement them.
create table jobs (
id bigserial primary key,
name text not null,
queued integer not null default 0 check (queued >= 0),
in_progress integer not null default 0 check (in_progress >= 0),
done integer not null default 0 check (done >= 0)
);
Queue a job.
update jobs
set queued = queued + 1
where id = ?
Start a job. If it updates a column, you got a job. If it doesn't, there are no jobs in the queue.
update jobs
set queued = queued - 1,
in_progress = in_progress + 1
where queued > 0
and id = ?
Finish a job.
update jobs
set in_progress = in_progress - 1,
done = done + 1
where in_progress > 0
and id = ?
Or, for even finer control, split it into two tables: jobs and a job queue.
create table jobs (
id bigserial primary key
name text not null
);
create table job_queue (
id bigserial primary key
job_id bigint not null references jobs(id) on delete cascade
);
jobs
id name
1 Puzzle Solving
2 Washing Dishes
job_queue
id job_id
3 1
5 2
6 2
Queue a job by inserting it.
insert into job_queue (job_id) values (?)
Finish a job by deleting one entry. It's not necessary to first check. If a row is deleted, you finished a job.
delete from jobs_queue
where id = (
select id from jobs_queue
where job_id = ?
limit 1
)
We can combine the two. Flags and a job_queue table. We can store more information like who is doing the work, when it was queued, and so on.
create table job_queue (
id bigserial primary key
job_id bigint not null references jobs(id) on delete cascade,
status text not null default 'queued',
user_id bigint references users(id) on delete set null,
created_at timestamp not null default current_timestamp,
updated_at timestamp not null default current_timestamp
);
Queue a job.
insert into job_queue (job_id) values (?)
Start a job. Again, no check necessary. Instead, check if a row was updated.
update job_queue
set status = 'in progress', user_id = ?, updated_at = current_timestamp
where id = (
select id
from job_queue
where status = 'queued'
limit 1
)
returning id
Finish a job. Use the ID returned from the update if you have it.
update job_queue
set status = 'done', updated_at = current_timestamp
where id = ?
Or find a queued job you have in progress and update that.
update job_queue
set status = 'done', updated_at = current_timestamp
where id = (
select id
from job_queue
where user_id = ?
and job_id = ?
and status = 'in progress'
limit 1
)

SQL - Update all rows with certain condition

Hi i want to do is a query which do the following thing:
If i execute the query ,all rows will set to 0 except certain id in the where(which supposed to be always 1 and only one row can be active at the moment)
tbl_menu
id | serial
starter | varchar
plate | varchar
beverage | varchar
status | smallint
So if i have one registry with status in 1 and everything else in 0, when i execute this query, certain id i choose will change to 1 and the others to 0
also only one row status = 1
Try this:
UPDATE tbl_menu
set status = CASE WHEN id = 4 or status = 1 THEN 1 ELSE 0 END;
I think you have two choices with the design as it is.
1) Do it with two easy queries.
2) Write one more complicated query with a case statement.
I personally like easy:
UPDATE tblmenu SET status = 0 WHERE status = 1;
UPDATE tblmenu SET status = 1 WHERE id = n;
Although, having said that, I think a better approach is this...
Get rid of your status column
Create a new table called, say tblstatus with one column id
One record with the id of the record
Foreign key to your main table
Now all you have to do is:
UPDATE tblstatus SET id = n;
Faster, easier, more robust...
This is a basic update statement. But if you could give more info on the column names and on what condition you want to do the updates it would be helpful.
UPDATE tbl_menu SET (column) = value
WHERE (condition)
This doesn't solve the problem of the update, but it does solve the problem of enforcing that (at most) one row is active. Use a partial unique index:
create unique index unq_menu_active on tblmenu(flag) where flag = 1;

Make sure to get different results for two workers on a PostgresqlDB

I have a PostgresDB and in one table I have Server-IPs, now I want to take a bunch of workers to go and check the servers in my DBs. But I want to make sure that no two workers get the same Server-IP, so as soon as one IP is selected it should be unavailable for all the other workers.
CREATE TABLE server
(
id INT AUTO_INCREMENT,
ip VARCHAR(15) NOT NULL,
created INTEGER NOT NULL,
modified INTEGER NOT NULL
)
Sampledata
1 | 192.168.5.22 | 1476086120 | 1476086120
2 | 192.168.5.29 | 1476084147 | 1476084147
Worker 1: SELECT ip FROM server LIMIT 1 -> 192.168.5.22
Worker 2: SELECT ip FROM server LIMIT 1 -> 192.168.5.22 // but I'd rather have the result for id = 2 here, since id = 1 was already selected by Worker 1
Can anyone please point me into a good direction on how to achieve this?
The regular locking won't do, as far as I can tell, since it only applies to Insert, Update, Delete and all I do is Select.
Maybe there is a way of selecting and updating in one step, but I haven't found anything relating to that.
So some help here would be greatly appreciated.
Thanks
You could use UPDATE ... RETURNING as in the following example:
UPDATE iptable
SET processed = TRUE
WHERE id IN (SELECT id FROM iptable
WHERE NOT processed
LIMIT 1)
RETURNING ip_adress;
Here id is the primary key.
A partial index ON iptable(id) WHERE NOT processed may speed up the query.
Alternatively, you can use SELECT ... FOR UPDATE and UPDATE in one transaction:
START TRANSACTION;
SELECT id
FROM iptable
WHERE NOT processed
LIMIT 1
FOR UPDATE;
UPDATE iptable SET processed = TRUE WHERE id = ...;
COMMIT;

T-SQL Fixing Row Order / Sequence

Overview:
I have a page in my application that allows a user to set the priority of a list of records. These records are essentially tasks or goals that the team will need to complete. The order they are given is determined by the user dragging and dropping them into place. Once the user is done and saves the changes, it loops over the data and updates the records in the database with their respective order.
Issue:
The problem I am now running into is on a separate page at which the user can delete a record that may have a priority assigned to it. Since the saving and calculation of a priority is done on another page, deleting a record causes gaps in the sequence.
Example:
Projects that are prioritized:
A (1)
B (2)
C (3)
D (4)
E (5)
F (6)
G (7)
Project gets deleted from another page / function:
A (1)
B (2)
C (3)
D (4)
F (6)
G (7)
Question:
I need to make some type of function I can run after a record is deleted to "repair/re sync" this number sequence. Essentially the single column priority needs to be updated and in the example, F would need to be updated to 5 and G updated to 6 in order to fix the sequence gap.
What I have tried:
Mostly researching a way to solve for this to be honest. I was thinking of putting all the records into a temp table with an Auto Increment number and using that to update the priority level but I feel like there is probably a more simple solution and wanted some opinions.
After you delete you simple need to update the rows that are greater than the priority just deleted. So if your column name is Priority and you delete the row where Priority = 5 you simply do something like this. Notice you don't need to use loops here.
Update YourTable
set Priority = Priority - 1
where Priority > 5
You could use triggers, which would trigger automatically when delete is done. You could always take values from deleted.
More info
SQL Server ON DELETE Trigger
As others have suggested, if you're only deleting one at a time, just update the numbers immediately after deletion. If you want to renumber the whole table at once (perhaps there have been multiple deletions), you can use ROW_NUMBER() and a CTE:
DECLARE #Project TABLE (
Name nvarchar(100)
, Priority int
);
-- Projects that are prioritized
INSERT #Project ( Name, Priority ) VALUES
('A', 1)
, ('B', 2)
, ('C', 3)
, ('D', 4)
, ('E', 5)
, ('F', 6)
, ('G', 7)
;
SELECT * FROM #Project;
-- Projects get deleted
DELETE #Project WHERE Name = 'E';
DELETE #Project WHERE Name = 'C';
SELECT * FROM #Project;
-- Renumber
WITH P AS
(
SELECT Name, Priority, ROW_NUMBER() OVER ( ORDER BY Priority ) AS NewPriority
FROM #Project
)
UPDATE P SET Priority = NewPriority
;
-- Ta da!
SELECT * FROM #Project;
Why not just set the Priority in the procedure that extracts the records from the database? that way you don't need to change the priority cause it's always computed on the fly when you read the rows. the only issue is that when you save the new sequence after a user changes the priority sequence, you delete all the records and add them back in in the correct sequence, with an auto-incrementing primary key (pk).
I don't know your sequence, but this example should give you the idea.
Select Record,
(Select count(*) From table
Where pk <= t.pk) Priority
From Table t

Issue with UPDATE QUERY using EXISTS clause causing slow Tablespace scans in DB2

Our workplace has a database with a client table that holds 5 million records. Each time a client is updated, another row is added to a client_history table that holds 100 million records. All columns in the Client table are indexed. Only the Primary Key (ID), Foreign Key (FK_Client_ID) and Creation Timestamp in the Client History table are indexed.
I've been asked to update several hundred thousand client records, but only if the corresponding client history record indicates that the client record has not been updated since a certain date (e.g. 19th September 2012).
I've written an SQL update query that uses an EXISTS clause. I've been told by the DBA's that I shouldn't use an EXISTS clause, as this would trigger a tablespace scan that would slow down execution of the query. This is obviously an issue when updating several hundred thousand client records -
UPDATE Client_History SET Surname = 'MisterX',
Update_Timestamp = CURRENT_TIMESTAMP
WHERE (FK_Client_ID = 123 AND ID = 456)
AND NOT EXISTS
(SELECT *
FROM Client
WHERE Client.Client_Id = Client_History.FK_Client_ID
AND Client_History.Update_Timestamp > TIMESTAMP('2012-09-21-00:00:00')
AND Client_History.Update_Timestamp < TIMESTAMP('4000-12-31-00:00:00')
AND Client_History.Creation_Timestamp < NAME.Update_Timestamp);
Can anyone think of a better solution?
A shot in the dark: try hoisting all the constants up into the main query (where they belong)
UPDATE Client_History ch
SET Surname = 'MisterX'
, Update_Timestamp = CURRENT_TIMESTAMP
WHERE ch.FK_Client_ID = 123
AND ch.ID = 456
AND ch.Update_Timestamp > TIMESTAMP('2012-09-21-00:00:00')
AND ch.Update_Timestamp < TIMESTAMP('4000-12-31-00:00:00')
AND ch.Creation_Timestamp < NAME.Update_Timestamp
AND NOT EXISTS (
SELECT *
FROM Client cl
WHERE cl.Client_Id = ch.FK_Client_ID
)
;
BTW: what is NAME ? Some kind of pseudo table, like Oracle's dual ?