PostgreSQL INSERT with statements - sql

I'm using a postgres database and my problem includes two tables, simplified versions of them are below.
CREATE TABLE events(
id SERIAL PRIMARY KEY NOT NULL,
max_persons INTEGER NOT NULL
);
and
CREATE TABLE requests(
id SERIAL PRIMARY KEY NOT NULL,
confirmed BOOLEAN NOT NULL,
creation_time TIMESTAMP DEFAULT NOW(),
event_id INTEGER NOT NULL /*foreign key*/
);
There are n events and each event can have up to events.max_persons participants. New requests need to be confirmed and are valid up to 30 minutes. After that period the requests will be ignored, if they were not confirmed.
Now what I want to do is only insert a new request, when the sum of all confirmed requests and all requests that are still valid, but not confirmed, is less than events.max_persons.
I already have a query to select a single event. Here is a simplified version of it, just to give you an idea, how it should work
SELECT
e.id,
SUM(CASE WHEN r.confirmed = 1 THEN 1 ELSE 0 END) AS number_confirmed
SUM(CASE WHEN r.creation_time > (CURRENT_TIMESTAMP - INTERVAL '30 MINUTE') AND r.confirmed = 0 THEN 1 ELSE 0 END) AS number_reserved,
e.max_persons
FROM events e, requests r
WHERE l.id = ?
AND r.event_id = e.id
AND (r.confirmed = 1 OR r.creation_time > (CURRENT_TIMESTAMP - INTERVAL '30 MINUTE'))
GROUP BY e.id, e.max_persons
HAVING SUM(CASE WHEN r.confirmed = 1 OR (r.creation_time > (CURRENT_TIMESTAMP - INTERVAL '30 MINUTE')) THEN 1 ELSE 0 END) < e.max_persons";
Is it possibile to achieve this with a single INSERT - command?

You could do that like this:
INSERT INTO requests
SELECT * FROM (VALUES (...)) row
WHERE ...
and write a WHERE clause that is only true if your condition is satisfied.
But there is a fundamental problem with that approach, namely that it is subject to a race condition.
If two such statements run at the same time, both may find the condition satisfied, but when each one has added its row and commits, the condition can be violated. That is because none of the statements can see the effects of the other one before they commit.
There are two solutions for this:
Lock the table before you test and insert. That is simple, but very bad for concurrency.
Use SERIALIZABLE transactions throughout. Then this should cause a serialization error, and one of the statements has to be retried and will find the condition violated when it does.

Related

INPUT VALUE depending on the table rows

I want to input a new row in a table with the following design
CREATE TABLE DMZ
(
DDM date NOT NULL,
NDM int NOT NULL,
PR int NOT NULL
CONSTRAINT PK_DMZ PRIMARY KEY(NDM)
);
PR can only be 1, or 2, which I defined as a constraint.(1 if this document is for income, and 2 if this document is a consumption. DM is a document number (actually Id in my case).
ALTER TABLE DMZ
ADD CONSTRAINT PR CHECK (PR IN (1,2));
I filled it with some handwritten data
INSERT INTO DMZ VALUES('2014.01.04', 20, 1);
INSERT INTO DMZ VALUES('2014.01.04', 21, 1);
INSERT INTO DMZ VALUES('2014.01.04', 22, 2);
There are two rows, where PR = 1, and only one where PR = 2. I want to write a script to INSERT a new row like this
INSERT INTO DMZ(DDM, PR) VALUES(GETDATE(), X)
Where X, I want to have something like "count rows where PR = 1 and rows where PR = 2, and if there more rows where PR = 1, use PR = 2 in newly inserted row, and if there are more rows where PR = 2, use PR = 1.
P.S.: That is a recreation of my deleted answer, hope now it's clear. To those who asked, why am I doing such a nonsence - it is a part of a list of tasks I HAVE to perform. I tried to do it, but I don't know how to perform this part with PR.
EDIT: I managed to write what I needed, but I am getting the following error ""Cannot perform an aggregate function on an expression containing an aggregate or a subquery."
INSERT INTO DMZ(ddm, pr)
SELECT COUNT(CASE WHEN (COUNT(CASE WHEN PR = 1 THEN 1 ELSE 0 END)> COUNT(CASE WHEN PR = 2 THEN 1 ELSE 0 END)) THEN 1 ELSE 2 END) AS pr, GETDATE() as ddm
FROM DMZ
Try doing a INSERT SELECT statement with a CASE statement to check your PR counts using SUM and CASE in a subquery:
INSERT INTO DMZ (a.DDM, a.NDM, a.PR)
SELECT GETDATE() AS DOM,
a.NDM AS NDM,
CASE WHEN a.PR_1_Count > a.PR_2_Count
THEN 2
ELSE 1
END AS PR
FROM (SELECT
MAX(NDM) + 1 AS NDM,
SUM(CASE WHEN PR = 1 THEN 1 ELSE 0 END) AS PR_1_Count,
SUM(CASE WHEN PR = 2 THEN 1 ELSE 0 END) AS PR_2_Count
FROM DMZ) a
Fiddle here.
Note: If you want an actual count to be inserted, remove your CONSTRAINT for the PR check and change the CASE statement from THEN 2 to THEN PR_2_Count and THEN 1 to THEN PR_1_Count.
Also, I've hardcoded a NDM column value in my demo because you're column is set to NOT NULL, I assume you'll handle that.
Update: Per your comment below, I've updated the syntax to include MAX(NDM) + 1. I would, however, suggest adding a new NDM IDENTITY column to replace your current NDM column so that it will generate your PK for you vs. generating the value yourself (see the attached Fiddle for an example of this). Read more about IDENTITY columns here and how to do it here.
Identity columns can be used for generating key values. The identity
property on a column guarantees the following:
Each new value is generated based on the current seed & increment.
Each new value for a particular transaction is different from other
concurrent transactions on the table.
The identity property on a column does not guarantee the following:
Uniqueness of the value - Uniqueness must be enforced by using a
PRIMARY KEY or UNIQUE constraint or UNIQUE index.

waitlist in postgres, deadlock

I'm trying to create a waiting list in Postgres. Minimal code:
CREATE TABLE IF NOT EXISTS applies(
created TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
user_id SMALLINT REFERENCES users(id) ON DELETE CASCADE,
service_id SMALLINT REFERENCES services(id) ON DELETE CASCADE,
status SMALLINT DEFAULT 0, --0: new 1: waitlisted, 2: applied
PRIMARY KEY (user_id, service_id)
-- ...
);
CREATE INDEX ON applies (service_id);
status is important, because I want to be able to notify users when they are applied after waitlisted. But don't want to notify them based on that they are in the first n position if they wasn't waitlisted at all.
Services are limited to a given number of users. There are multiple choices how the system decides which user gets the service. The simplest is first come first served. That's the reason it has two phases (but it shouldn't change the use case in any way).
Possible requests are:
Insert: a user applies/enrolls for a service
Inserting (user_id, service_id, CURRENT TIMESTAMP, state: 0 (new))
a Updating state to 1 (waitlisted) or 2 (applied) based on the COUNT(*) with the same service_id
Remove: a user cancels his/her application for a service
Removing the given application
If there's someone in waitlisted state, move to applied, and send notification about it
My first try was a naive implementation. Let's say service limit is 10.
1/2 OnAdd:
UPDATE applies
SET status = (
SELECT CASE WHEN COUNT(*) <= 10 THEN 2 ELSE 1 END
FROM applies
WHERE service_id = 7918
AND created <= '2021-08-16 16:20:34.161274+00:00:00'
)
WHERE user_id = 5070
AND service_id = 7918
RETURNING status
2/2 OnRemove:
SELECT user_id
FROM applies
WHERE status = 1
AND service_id = 7915
ORDER BY created
LIMIT 1
and then ( I know they could be joined)
UPDATE applies
SET status = 2
WHERE status = 1
AND user_id = 5063
AND service_id = 7915
It worked for a sequential test, but the multi-threading test showed cases when there were more than 10 rows with applied state.
So I put them in a transaction started with SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, then REPEATABLE READ, they gave me a lot of ERROR #40001 could not serialize access due to concurrent update. Then the same with READ COMMITTED. It was way much better, than the naive version, but it ended in over-applications as well.
Then I started using FOR NO KEY UPDATE in selects instead, but they always ended in a deadlock very soon. I searched a lot on deadlocks, but couldn't find anything useful.
So I came up with a version, where OnAdd and OnRemove had very similar queries, only differing in selecting user_id, where I didn't add FOR UPDATE. I had to change 1/1 Insert so the default state is waitlisted.
OnAdd:
UPDATE applies
SET status = 2
WHERE service_id = 7860
AND 10 > (
SELECT COUNT(*)
FROM (
SELECT service_id, user_id
FROM applies
WHERE service_id = 7860
AND status = 2
FOR NO KEY UPDATE
) as newstate)
AND user_id = 5012 RETURNING status
OnRemove:
UPDATE applies
SET status = 2
WHERE service_id = 7863
AND 10 > (
SELECT COUNT(*)
FROM (
SELECT service_id, user_id
FROM applies
WHERE service_id = 7863
AND status = 2
FOR NO KEY UPDATE
) as newstate)
AND user_id = (
SELECT user_id
FROM applies
WHERE service_id = 7863
And status = 1
ORDER BY created
LIMIT 1
)
RETURNING user_id
But in the multi-threading test it ended up in a deadlock too.
Edit:
As melcher suggested below, I added a column instead of counting. Not to a separate table, but inside services.
I added a transaction to OnAdd and OnRemove that starts with
SELECT *
FROM services
WHERE id = ?
FOR NO KEY UPDATE
Sometimes in the multithreading test it under-applied. So I joined Remove to be in the same transaction with OnRemove, and that works finally.
Based on my understanding of what you're trying to do and how databases work - you're going to need a shared resource that gets locked OnAdd.
The reason is that two threads that are both trying to 'add' at the same time must compete for a shared resource so that only one of them wins and the other errors / fails. You cannot accomplish your goal using a count of rows.
One solution would be a locks table:
CREATE TABLE IF NOT EXISTS applies(
service_id SMALLINT REFERENCES services(id) ON DELETE CASCADE,
applied_count SMALLINT
);
And then:
Open transaction (if outside of a procedure)
Acquire an exclusive/write lock to services row in the lock table
IF the limit/constraint is satisfied (e.g. applied_count < 10) then...
Mark the "applies" as applied UPDATE applies SET status = 2
Update lock table (e.g. SET applied_count = applied_count + 1)
Commit transaction

if-then-else construction in complex stored procedure

I am relatively new to sql queries and I was wondering how to create a complex stored procedure. My database runs on SQL server.
I have a table customer (id, name) and a table customer_events (id, customer_id, timestamp, action_type). I want add a calculated field customer_status to table customer which is
0: (if there is no event for this customer in customer_events) or (the most recent event is > 5 minutes ago)
1: if the most recent event is < 5 minutes ago and action_type=0
2: if the most recent event is < 5 minutes ago and action_type=1
Can I use if-then-else constructions or should I solve this challenge differently?
As you mentioned in comments, you actually want to add a field to a select query, and in a general sense what you want is a CASE statement. They work like this:
SELECT field1,
field2,
CASE
WHEN some_condition THEN some_result
WHEN another_condition THEN another_result
END AS field_alias
FROM table
Applied to your specific scenario, well it's not totally straightforward. You're certainly going to need to left join your status table, you also want to aggregate to find the most recent event, along with that event's action type. Once you have that information, the case statement is straightforward.
Always hard to write sql without access to your data, but something like:
SELECT c.id,
c.name,
CASE
WHEN e.id IS NULL OR DATEDIFF(minute,e.timestamp,getDate())>=5 THEN 0
WHEN DATEDIFF(minute,e.timestamp,getDate())<5 AND s.action_type=1 THEN 1
WHEN DATEDIFF(minute,e.timestamp,getDate())<5 AND s.action_type=0 THEN 2
END as customer_status
FROM clients c
LEFT JOIN (
SELECT id, client_id, action_type,
rank() OVER(partition by client_id order by timestamp desc) AS r
FROM customer_events
) e
ON c.id=e.client_id AND e.r=1
The core of this is the subquery in the middle, it's using a rank funtion to give a number to each status by client_id ordered by the timestamp descending. Therefore every record with a rank of 1 will be the most recent (for that client). Thereafter, you simply join it on to the client table, and use it to determine the right value for customer_status
Presuming you get the event info into "Most_Recent_Event_Mins_Ago". If none it will be NULL.
SELECT Id, Name,
CASE
WHEN Most_Recent_Event_Mins_Ago IS NULL THEN 0
WHEN Most_Recent_Event_Mins_Ago <5 AND Action_type = 0 THEN 1
WHEN Most_Recent_Event_Mins_Ago <5 AND Action_type = 1 THEN 0
..other scenarions
ELSE yourDefaultValueForStatus
END as Status
FROM customer
WHERE
...
...

How to select records using table relationships and timestamps?

Another newbie PostgreSQL question.
I have something like this:
CREATE TABLE user (
userID bigserial primary key,
name varchar(50) NOT NULL,
created timestamp NULL DEFAULT CURRENT_TIMESTAMP
)
CREATE TABLE session (
sessionID bigserial primary key,
userID int NOT NULL,
lastAction timestamp NULL DEFAULT NULL,
created timestamp NULL DEFAULT CURRENT_TIMESTAMP
)
CREATE TABLE action (
actionID bigserial primary key,
sessionID int NOT NULL,
lastAction timestamp NULL DEFAULT NULL,
created timestamp NULL DEFAULT CURRENT_TIMESTAMP
)
A user can have many sessions, each with multiple session actions.
Each user has sessions which expire, in which case a new one is inserted and any action they take is catalogued there.
My question is, how do I go about grabbing actions only for a select user, only from his sessions, and only if they happened 1 day ago, 2 days ago, a week ago, a month ago, or for all time.
I've looked at the docs and I think interval() is what I'm looking for but I only really know how to expire sessions:
(part of a join here) e.lastAction >= now() - interval '4 hours'
That one either returns me what I need or it doesn't. But how do I make it return all the records that have been created since 1 day ago, 2 days ago, etc. SQL syntax and logic is still a bit confusing.
So in an ideal world I'll want to ask a question like, how many actions has this user taken in 2 days? I have the relationships and timestamps created but I writing a query I've been met with failure.
I'm not sure which timestamp you want from the actions table -- the created or the last action timestamp. In any case, the query you want is a basic join, where you filter on the user id and the time stamp:
select a.*
from actions a join
sessions s
on a.sessionid = s.sessionid
where s.userid = v_userid and
a.created >= now() - interval '1 day';
If you want the number of transactions in the past two days, you would use aggregation:
select count(*)
from actions a join
sessions s
on a.sessionid = s.sessionid
where s.userid = v_userid and
a.created >= now() - interval '2 day';

Selecting batches of rows

I have a table widget_events that records event_what events occurring to
widget widget_id on date event_when. It's possible for the same event to
occur multiple times to the same widget on the same day. For this reason,
column event_id is used as primary key to distinguish such rows. Here is
the table declaration:
CREATE TABLE widget_events
(
event_id int4 UNIQUE NOT NULL,
event_when date NOT NULL,
event_what text NOT NULL,
widget_id int4 REFERENCES widgets (widget_id) NOT NULL,
PRIMARY KEY (event_id)
);
The client application processes events in batches, where each batch consists
of all events for one widget on one date. However, the application has no
previous knowledge of which widgets and dates are stored in widget_events.
One possible solution is to start by selecting one random row from
widget_events (using SQL's LIMIT), and then do another query for all
rows with the same widget_id and widget_when. After this batch is
processed, those rows can be deleted from widget_events, and we go back
to the first step. The algorithm stops when the first step reports that
there is no more random row to return.
My question is whether there is a faster, more elegant way to do this.
Is it possible in SQL (in particular the SQL understood by PostgreSQL)
to return each distinct batch in a single query?
To select distinct batches:
select distinct event_when
, event_what
from widget_events
Or you could pick up a single batch in one query, like:
select batch.*
from widget_events batch
join (
select event_when
, event_what
from widget_events
limit 1
) filter
on filter.event_when = batch.event_when
and filter.event_what = batch.event_what
Why don't you just return the rows, ordered by event_when:
select *
from widget_events we
order by event_when, event_what, event_id
I threw in event_what as well, so all similar events will be on consecutive rows.
Your logic can then just look for when the date changes to determine whether something is the last event. You could even put this into the select, if you wanted:
select *,
(case when lag(event_when) over (partition by event_id) < event_when then 1
else 0
end) as isFirst,
(case when lead(event_when) over (partition by event_id) < event_when then 1
else 0
end) as isLast
from widget_events we
order by event_when, event_what, event_id