Isolation level for two statements on the same transaction vs. single statement - sql

I have the following table:
DROP TABLE IF EXISTS event;
CREATE TABLE event(
kind VARCHAR NOT NULL,
num INTEGER NOT NULL
);
ALTER TABLE event ADD PRIMARY KEY (kind, num);
The idea is that I want to use the num column to maintain separate increment counters for each kind of event. So num is like a dedicated auto-increment sequence for each different kind.
Assuming multiple clients/threads writing events (potentially of the same kind) into that table concurrently, is there any difference in terms of the required level for transaction isolation between: (a) executing the following block:
BEGIN TRANSACTION;
DO
$do$
DECLARE
nextNum INTEGER;
BEGIN
SELECT COALESCE(MAX(num),-1)+1 FROM event WHERE kind='A' INTO nextNum;
INSERT INTO event(kind, num) VALUES('A', nextNum);
END;
$do$;
COMMIT;
... and (b) combining the select and insert into a single statement:
INSERT INTO event(kind, num)
(SELECT 'A', COALESCE(MAX(num),-1)+1 FROM event WHERE kind='A');
From some tests I run it seems that in both cases I need the serializable transaction isolation level. What's more, even with the serializable transactions isolation level, my code has to be prepared to retry due to the following error in highly concurrency situations:
ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.
In other words merging the select into the insert does not seem to confer any benefit in terms of atomicity or allow any lower/more lenient transaction isolation level to be set. Am I missing anything?
(This question is broadly related in that it asks for a PostgreSQL pattern to facilitate the generation of multiple sequences inside a table. So just to be clear I am not asking for the right pattern to do that sort of thing; I just want to understand if the block of two statements is in any way different than a single merged INSERT/SELECT statement).

The problem with the task is possible concurrent write access. Merging SELECT and INSERT into one statement reduces the time frame for possible conflicts to a minimum and is the superior approach in any case. The potential for conflicts is still there. And yes, serializable transaction isolation is one possible (if expensive) solution.
Typically (but that's not what you are asking), the best solution is not to try what you are trying. Gapless sequential numbers are a pain in databases with concurrent write access. If possible, use a serial column instead, which gives you unique ascending numbers - with possible gaps. You can eliminate gaps later or dynamically with in a VIEW. Details:
Serial numbers per group of rows for compound key
Aside: you don't need parentheses around the SELECT:
INSERT INTO event(kind, num)
SELECT 'A', COALESCE(MAX(num) + 1, 0) FROM event WHERE kind='A';

Related

SELECT, check and INSERT if condition is met

I have the following tables: Tour, Customer, TourCustomerXRef. Assuming every Tour has a capacity Cap. Now a request arrives at the backend for a booking.
What I need to do is:
SELECT and count() all of the entries in TourCustomerXRef where the tourid=123
In the program code (or the query?): If count() < Cap
True: INSERT into TourCustomerXRef, return success
False: return an error
However, it might be possible that the backend api is being called concurrently. This could result into the SELECT statement returning a number under the max. capacity both times, resulting in the INSERT being executed multiple times (even though there is just one place left).
What would be the best prevention for above case? set transaction isolation level serializable? or Repeatable Read?
I'm worried that the application logic (even though it's just a simple if) could hurt the performance, since read and write locks would not allow the api to execute querys that just want to select the data without inserting anything, right?
(Using mariaDB)
You can try using lock hint "FOR UPDATE" in the SELECT (mysql example):
BEGIN TRANSACTION;
SELECT * FROM TourCustomerXRef FOR UPDATE WHERE ...;
INSERT ...;
COMMIT;

Can an INSERT-SELECT query be subject to race conditions?

I have this query that attempts to add rows to the balances table if no corresponding row exists in the totals table. The query is run in a transaction using the default isolation level on PostgreSQL.
INSERT INTO balances (account_id, currency, amount)
SELECT t.account_id, t.currency, 0
FROM balances AS b
RIGHT OUTER JOIN totals USING (account_id, currency) AS t
WHERE b.id IS NULL
I have a UNIQUE constraint on balances (accountId, currency). I'm worried that I will get into a race condition situation that will lead to duplicate key errors if multiple sessions execute this query concurrently. I've seen many questions on that subject, but they all seem to involve either subqueries, multiple queries or pgSQL functions.
Since I'm not using any of those in my query, is it free from race conditions? If it isn't how can I fix it?
Yes it will fail with a duplicate key value violates unique constraint error. What I do is to place the insertion code in a try/except block and when the exception is thrown I catch it and retry. That simple. Unless the application has a huge amount of users it will work flawlessly.
In your query the default isolation level is enough since it is a single insert statement and there is no risk of phantom reads.
Notice that even when setting the isolation level to serializable the try/except block is not avoidable. From the manual about serializable:
like the Repeatable Read level, applications using this level must be prepared to retry transactions due to serialization failures
The default transaction level is Read Committed. Phantom reads are possible in this level (see Table 13.1). While you are protected from seeing any weird effects in the totals table were you to update the totals, you are not protected from phantom reads in the balances table.
What this means can be explained when looking at a single query similar to yours that attempts the outer join twice (and only queries, does not insert anything). The fact that a balance is missing is not guaranteed to stay the same between the two "peeks" at the balances table. The sudden appearance of a balance that wasn't there when the same transaction looked for the first time, is called a "phantom read".
In your case, several concurrent statements can see that a balance is missing and nothing prevents them trying to insert it and error out.
To rule out phantom reads (and to fix your query), you need to execute in in the SERIALIZABLE isolation level prior to running your query:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

PostgreSql upsert: which explicit locking method?

From this question, I want to use the following method to perform upserts in a PostGreSql table:
UPDATE table SET field='C', field2='Z' WHERE id=3;
INSERT INTO table (id, field, field2)
SELECT 3, 'C', 'Z'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE id=3);
I understand I have to perform it in a transaction and that I should use an explicit locking method. I have been reading the postgresql documentation, but I am still hesitating between three types of locks (the difference between each is not crystal clear for me):
ROW EXCLUSIVE
SHARE ROW EXCLUSIVE
EXCLUSIVE
I would like to avoid having to retry the transaction. I am not expecting much concurrency on the rows, though this could happen from time to time. There might be simultaneous attempts at deleting and upserting the same row in rare cases.
You need self-exclusive lock type that also excludes all DML commands.
The correct lock type for this operation is EXCLUSIVE. It conflicts with all other locks, including its self, except ACCESS SHARE.
ACCESS SHARE is taken by SELECT. All other commands require higher locking levels. So your upsert transaction will then block everything except SELECT, which is what you want.
See the docs on explicit locking.
So:
BEGIN;
LOCK TABLE ... IN EXCLUSIVE MODE;
...
(The historical, and awful, naming of some table level locks using the word ROW does not ease understanding of PostgreSQL's lock types).
BTW, your application should check the rowcount returned from the UPDATE. If it's non-zero, it can skip the INSERT.

How to exclude one statement from current Sql transaction for Sql id generator?

I would like to implement id generator to be able to have unique records identification for multiple tables and be able to assign id to structures of new records formed on client side.
Usually obvious and standard answer is Guid, but I want to use int because of space efficiency and human readability.
It's ok to have gaps in id sequence - which will happen with unfinished transactions, lost client connections and so on.
For implementation I would have a table Counters with field NextId int and increment that counter any time id is requested. I may increment that id by more than 1 when I need range of ids for multiple or bulk inserts.
To avoid locking bottlenecks when updating Counters table I need to make id requests atomic and outside of any other transactions. So my question is how to do that ?
It's not a problem on application level - it can make one atomic transaction request to get pool of ids and then use those ids in another bigger transaction to insert records.
But what do I do if I want to get new ids inside Stored Procedure or Trigger ?
If I wrap that update Counter set NextId=NextId+1 table request into nested transaction begin tran ... commit tran it's not going to exclude it from locking until outer big transaction ends.
Is there any way to exclude that one Sql statement from current transaction so that locking ends right when statement ends and it does not participate in rollback if outer transaction is rolled back.
You need to use a second connection. You cannot have multiple transactions at once per connection.

Is bulk insert atomic?

I have table with auto increment primary key. I want insert a bunch of data into it and get keys for each of them without additional queries.
START TRANSACTION;
INSERT INTO table (value) VALUES (x),(y),(z);
SELECT LAST_INSERT_ID() AS last_id;
COMMIT;
Could MySQL guarantee, that all data would be inserted in one continuous ordered flow, so I can easily compute id's for each element?
id(z) = last_id;
id(y) = last_id - 1;
id(x) = last_id - 2;
If you begin a transaction and then insert data into a table, that whole table will become locked to that transaction (unless you start playing with transaction isolation levels and/or locking hints).
That is bascially the point of transactions. To prevent external operations altering (in anyway) that which you are operating on.
This kind of "Table Lock" is the default behaviour in most cases.
In unusual circumstances you can find the the RDBMS has had certain options set that mean the 'normal' default (Table Locking) behaviour is not what happens for that particular install. If this is the case, you should be able to override the default by specifying that you want a Table Lock as part of the INSERT statement.
EDIT:
I have a lot of experience on MS-SQL Server and patchy experience on many other RDBMSs. In none of that time have I found any guarantee that an insert will occur in a specific order.
One reason for this is that the SELECT portion of the INSERT may be computed in parallel. That would mean the data to be inserted is ready out-of-order.
Equally where particularly large volumes of data are being inserted, the RDBMS may identify that the new data will span several pages of memory or disk space. Again, this may lead to parallelised operation.
As far as I am aware, MySQL has a row_number() type of function, that you could specify in the SELECT query, the result of which you can store in the database. You would then be able to rely upon That field (constructed by you) but not the auto-increment id field (constructed by the RDBMS).
As far as i know, this will work in pretty much all SQL-engines out there.