PostgreSql upsert: which explicit locking method?

PostgreSql upsert: which explicit locking method? - locking

From this question, I want to use the following method to perform upserts in a PostGreSql table:
UPDATE table SET field='C', field2='Z' WHERE id=3;
INSERT INTO table (id, field, field2)
SELECT 3, 'C', 'Z'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE id=3);
I understand I have to perform it in a transaction and that I should use an explicit locking method. I have been reading the postgresql documentation, but I am still hesitating between three types of locks (the difference between each is not crystal clear for me):
ROW EXCLUSIVE
SHARE ROW EXCLUSIVE
EXCLUSIVE
I would like to avoid having to retry the transaction. I am not expecting much concurrency on the rows, though this could happen from time to time. There might be simultaneous attempts at deleting and upserting the same row in rare cases.

You need self-exclusive lock type that also excludes all DML commands.
The correct lock type for this operation is EXCLUSIVE. It conflicts with all other locks, including its self, except ACCESS SHARE.
ACCESS SHARE is taken by SELECT. All other commands require higher locking levels. So your upsert transaction will then block everything except SELECT, which is what you want.
See the docs on explicit locking.
So:
BEGIN;
LOCK TABLE ... IN EXCLUSIVE MODE;
...
(The historical, and awful, naming of some table level locks using the word ROW does not ease understanding of PostgreSQL's lock types).
BTW, your application should check the rowcount returned from the UPDATE. If it's non-zero, it can skip the INSERT.

Related

Some confusion on the description of read consistency in Oracle

Below is a short brief of read consistency from oracle concepts guide.
What is a sql statement, just one sql? Or Pl/SQL or Store Procedure? Anyone can help provide me one opposite example which can indicates the un-consistency read?
read consistency
A consistent view of data seen by a user. For example, in statement-level read
consistency the set of data seen by a SQL statement remains constant throughout
statement execution.

A "statement" in this context is one DML statement: a single SELECT, INSERT, UPDATE, DELETE, MERGE.
It is not a PL/SQL block. Similarly, multiple executions of the same DML statement (say, within a PL/SQL loop) are separate "statements". If you need consistency over multiple statements or within a PL/SQL block, you can achieve that using SET TRANSACTION ISOLATION LEVEL SERIALIZABLE or SET TRANSACTION READ ONLY. Both introduce limitations.
An opposite example of an inconsistent read would be as follows.
Starting conditions: table BIG_TABLE has 10 million rows.
User A at 10:00:
SELECT COUNT(*) FROM BIG_TABLE;
User B at 10:01:
DELETE FROM BIG_TABLE WHERE ID >= 9000000; -- delete the last million rows
User B at 10:02:
COMMIT;
User A at 10:03: query completes:
COUNT(*)
--------------
9309129
That is wrong. User A should have either gotten 10 million rows or 9 million rows. At no point were there 9309129 committed rows in the table. What has happened is that user A had read 309,129 rows that user B was deleting before Oracle actually processed the deletion (or before the COMMIT). Then, after the user B delete/commit, user A's query stopped seeing the deleted rows and stopped counting them.
This sort of problem is impossible in Oracle, thanks to its implementation of Multiversion Read Consistency.
In Oracle, in the above situation, as it encountered blocks that had rows deleted (and committed) by User B, User A's query would have used the UNDO data reconstruct what those blocks looked like at 10:00 -- the time when user A's query started.
That's basically it -- Oracle statements operate on the a version of the database as it existed as of a single point in time. This point in time is almost always the time when the statement started. There are some exception cases involving updates when that point in time will be moved to a point in time "mid statement". But it is always consistent as of one point in time or another.

Postgres - Deadlock while updating a column that is also part of where clause

My colleague at work and I were wondering if, during an update, a column is being updated while the same column is used in where clause, there are chances of deadlock.
For ex:
UPDATE EMPLOYEES
SET DEPT_ID = NULL
WHERE DEPT_ID = 13;
So if the table EMPLOYEES contains about a million records, are there chances of deadlock?

There is no chance for a deadlock at all. Not only will a single query never deadlock itself in Postgres (see comments), there is also no chance for a deadlock in combination with the same query in a concurrent transactions.
The minimum "requirements" for a deadlock:
At least two competing concurrent transactions.
Each of both must lock a resource that one of the others will try to access later.
Each of both must later try to access a resource locked by the other transaction. So that at least two wait for the other to finish.
In theory two concurrent, identical calls like you display have the potential for a deadlock if there are multiple rows with the same DEPT_IT. Since there is no ORDER BY for a DELETE, it can take an exclusive row lock on rows to delete in any arbitrary order. Two identical commands might start with different rows and end up deadlocking each other.
In practice, this is not going to happen because both concurrent deletes will take locks in the same order thereby voiding any potential for deadlocks. We would need additional concurrent transactions or more commands in the same transaction trying to lock resources out of order.
But all of this is completely unrelated to the fact that a column to be updated is also in the WHERE clause. (Even if indexes on the column are involved.) Due to the MVCC model of Postgres, it writes a new row version anyway, no matter which columns are actually updated.
If you should run into deadlocks involving out-of-order row locks, you can solve it using SELECT .. FOR UPDATE with a deterministic ORDER BY in a subquery:
Avoiding PostgreSQL deadlocks when performing bulk update and delete operations
Postgres, update and lock ordering

When does sql exclusively lock a row in an update statement?

Can a race condition occur in sql under these conditions?
If I have this SQL update running in one thread call it statement 1:
Update Items
Set Flag = B
where Flag = A;
And this SQL update running in another call it statement 2:
Update Items
Set Flag = C
where Flag = A;
Is it possible for each thread to read the same record where Flag is equal to A and write the record with their own values? Such that statement 1 can write it first and then statement 2 writes it or visa versa?
The answer to this question depends on when the database exclusively locks the update. Does it happen before it finds the records or after it finds the records and evaluates the where clause?

First, there are three lock contexts:
Database level lock
Table level lock
Row level lock
Then you have four lock modes:
IX
IS
X
S
IX and IS locks are "intention" locks. These locks are held before acquiring other types of locks. X locks are exclusive (write) locks and S locks are shared (read) locks.
The locks (IX,IS,X or S) locks can be taken at any context level. An X lock at the database level will block all other operations in the database for example. This is the type of lock that SQLlite takes. An S lock is taken for the entire database during reads, and an X lock is taken for the entire database during writes. Writes will wait for any S locks to complete and will block new S and X locks until the write lock is released. This provides a serializable isolation transaction level.
For MySQL, the locking depends on the storage engine. MyISAM will take X and S locks on entire (sets of) tables. X locks will wait on existing S or X locks and block new locks. New X locks will be given higher priority in the queue, moved ahead of new S locks. This behavior can be changed by setting LOW_PRIORITY_UPDATES, which could result in write starvation because writes will be de-prioritized in favor of reads.
It is possible in MySQL to obtain an X lock over the entire database using 'FLUSH TABLES WITH READ LOCK'.
InnoDB locks rows as they are encountered via an index read. InnoDB locks index records and locks the records when the index records are traversed. InnoDB uses special locks called 'gap' locks to ensure REPEATABLE-READ transaction isolation level. Locks are held on index entries, so if a table is not well indexed for an UPDATE query, then many rows will be locked. Note that InnoDB does not create S locks for normal SELECT queries. It uses row versioning, not row level locking for consistent snapshots.
When acquiring X locks, the database needs to detect deadlocks. Consider the following:
>connection 1
start transaction;
update T set c = c + 1 order by id asc;
>connection 2
start transaction;
update T set c = c - 1 order by id desc;
In a row locking model, these two statements can not both complete successfully. The first would wait forever to acquire locks the second holds, and vice-versa. The database will pick one of the connections to roll back. InnoDB will pick the connection which has made the fewest number of changes. MyISAM will lock the whole table for whichever connection acquires the lock first, and then the second will run after the first completes.
The simple example given by you will be resolved by X locks at any context (database, table or row). If two connections begin at exactly the same type, both running two updates which try to update the same row, both will attempt to acquire an X lock. Only one connection can acquire the X lock. It is not possible to determine exactly which one will acquire the lock. The other connection will have to wait until the lock is released until it can acquire the X lock. Keep in mind, that if the row was locked by a DELETE or UPDATE, then the waiter might end up not acquiring a lock after waiting, because there is nothing left in the database to lock.
In your example, the first UPDATE to acquire the X lock, and the second UPDATE will then wait on the X lock and will eventually execute but not match any rows.

Exclusive lock, used for data-modification operations, such as INSERT, UPDATE, or DELETE will be used in this scenario.
An exclusive lock ensures that multiple updates cannot be made to the same resource at the same time.
You will not get a race condition in this scenario.
If you have a more complex scenario involving multiple tables then you may get race conditions, or deadlocks. There are many ways to avoid this, simplifying and separating queries, etc.
You can also apply hints to queries that tell SQL what type of lock to use.
http://msdn.microsoft.com/en-us/library/aa213026(v=sql.80).aspx

Sounds like you should read about locking. SQL server has a complex set of logic and will perform either table or row level locks based on the number of rows it estimates will require updates. Unless you specifically tell it which you want it to perform it can even vary from query to query. Usually if you are modifying a small subset of the table it will choose a row level lock.
SQL Server is designed with ACID in mind, thus it writes changes to its logs before performing any actual updates to the data. This allows any failed updates to be rolled back and allows consistency between queries (like your asking about). You can perform dirty reads to get around locking issues, however you cannot prevent SQL Server from locking inserted, updated and/or deleted records.
SQL Server Locking
EDIT: Here is an article about ACID.
ACID - Wikipedia

All SQL databases pretty much guarantee that such a collision will not occur. "When" locking occurs depends on whether locking is at the table, partition, page, or row level. Or, whether you have turned off such locking in your database.
What can happen, if you have concurrent update statements and multiple rows being updated, is that sone row are updated with the first, some with the second.
In general, I think of the where clause as being evaluated to select the row set, lock the rows one at a time, do the update and unlock. However, this depends on the type of locking. In this case, the scenario above would continue with the values flipping.
If you are concerned about this situation, use table level locking to force serialization when concurrent update requests are being processed.

Is bulk insert atomic?

I have table with auto increment primary key. I want insert a bunch of data into it and get keys for each of them without additional queries.
START TRANSACTION;
INSERT INTO table (value) VALUES (x),(y),(z);
SELECT LAST_INSERT_ID() AS last_id;
COMMIT;
Could MySQL guarantee, that all data would be inserted in one continuous ordered flow, so I can easily compute id's for each element?
id(z) = last_id;
id(y) = last_id - 1;
id(x) = last_id - 2;

If you begin a transaction and then insert data into a table, that whole table will become locked to that transaction (unless you start playing with transaction isolation levels and/or locking hints).
That is bascially the point of transactions. To prevent external operations altering (in anyway) that which you are operating on.
This kind of "Table Lock" is the default behaviour in most cases.
In unusual circumstances you can find the the RDBMS has had certain options set that mean the 'normal' default (Table Locking) behaviour is not what happens for that particular install. If this is the case, you should be able to override the default by specifying that you want a Table Lock as part of the INSERT statement.
EDIT:
I have a lot of experience on MS-SQL Server and patchy experience on many other RDBMSs. In none of that time have I found any guarantee that an insert will occur in a specific order.
One reason for this is that the SELECT portion of the INSERT may be computed in parallel. That would mean the data to be inserted is ready out-of-order.
Equally where particularly large volumes of data are being inserted, the RDBMS may identify that the new data will span several pages of memory or disk space. Again, this may lead to parallelised operation.
As far as I am aware, MySQL has a row_number() type of function, that you could specify in the SELECT query, the result of which you can store in the database. You would then be able to rely upon That field (constructed by you) but not the auto-increment id field (constructed by the RDBMS).

As far as i know, this will work in pretty much all SQL-engines out there.

Getting deadlocks in MySQL

We're very frustratingly getting deadlocks in MySQL. It isn't because of exceeding a lock timeout as the deadlocks happen instantly when they do happen. Here's the SQL code that is executing on 2 separate threads (with 2 separate connections from the connection pool) that produces a deadlock:
UPDATE Sequences SET Counter = LAST_INSERT_ID(Counter + 1) WHERE Sequence IS NULL
Sequences table has 2 columns: Sequence and Counter
The LAST_INSERT_ID allows us to retrieve this updated counter value as per MySQL's recommendation. That works perfect for us, but we get these deadlocks! Why are we getting them and how can we avoid them??
Thanks so much for any help with this.
EDIT: this is all in a transaction (required since I'm using Hibernate) and AUTO_INCREMENT doesn't make sense here. I should've been more clear. The Sequences table holds many sequences (in our case about 100 million of them). I need to increment a counter and retrieve that value. AUTO_INCREMENT plays no role in all of this, this has nothing to do with Ids or PRIMARY KEYs.

Wrap your sql statements in a transaction. If you aren't using a transaction you will get a race condition on LAST_INSERT_ID.
But really, you should have counter fields auto_increment, so you let mysql handle this.
Your third solution is to use LOCK_TABLES, to lock the sequence table so no other process can access it concurrently. This is the probably the slowest solution unless you are using INNODB.

Deadlocks are a normal part of any transactional database, and can occur at any time. Generally, you are supposed to write your application code to handle them, as there is no surefire way to guarantee that you will never get a deadlock. That being said, there are situations that increase the likelihood of deadlocks occurring, such as the use of large transactions, and there are things you can do to mitigate their occurrence.
First thing, you should read this manual page to get a better understanding of how you can avoid them.
Second, if all you're doing is updating a counter, you should really, really, really be using an AUTO_INCREMENT column for Counter rather than relying on a "select then update" process, which as you have seen is a race condition that can produce deadlocks. Essentially, the AUTO_INCREMENT property of your table column will act as a counter for you.
Finally, I'm going to assume that you have that update statement inside a transaction, as this would produce frequent deadlocks. If you want to see it in action, try the experiment listed here. That's exactly what's happening with your code... two threads are attempting to update the same records at the same time before one of them is committed. Instant deadlock.
Your best solution is to figure out how to do it without a transaction, and AUTO_INCREMENT will let you do that.

No other SQL involved ? Seems a bit unlikely to me.
The 'where sequence is null' probably causes a full table scan, causing read locks to be acquired on every row/page/... .
This becomes a problem if (your particular engine does not use MVCC and) there were an INSERT that preceded your update within the same transaction. That INSERT would have acquired an exclusive lock on some resource (row/page/...), which will cause the acquisition of a read lock by any other thread to go waiting. So two connections can first do their insert, causing each of them to have an exclusive lock on some small portion of the table, and then they both try to do your update, requiring each of them to be able to acquire a read lock on the entire table.

I managed to do this using a MyISAM table for the sequences.
I then have a function called getNextCounter that does the following:
performs a SELECT sequence_value FROM sequences where sequence_name = 'test';
performs the update: UPDATE sequences SET sequence_value = LAST_INSERT_ID(last_retrieved_value + 1) WHERE sequence_name = 'test' and sequence_value = last retrieved value;
repeat in a loop until both queries are successful, then retrieve the last insert id.
As it is a MyISAM table it won't be part of your transaction, so the operation won't cause any deadlocks.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas