is 'FOR UPDATE' needed in SELECT statements when using SERIALIZABLE ISOLATION - sql

say i have code like this in a SERIALIZABLE transaction:
users = "select * from users where account='x'"
for (u of users) {
if (condition)
"UPDATE users SET foo = 'bar' where id=u.id"
}
do i need to use SELECT FOR UPDATE instad of SELECT?
if yes, then what is the point of serializable isolation above? according to serializable isolation the current code should work fine.
if no, then what is the point of SELECT FOR UPDATE? is it only useful for lower isoaltion levels?

From a functional point of view, there is no difference; the SERIALIZABLE level guarantees that the transaction is internally and externally consistent; however, from an operational point of view, there is a difference: the FOR UPDATE clause immediately locks the rows for any concurrent serializable transaction (and wait for any prior locks if any) . Without it, it does not (most DBs nowadays including pg use snapshot reads to satisfy repeatable-read), but on the condition of an unserializable concurrency (always on a write), one of the transactions will be rolled back and will need to be retried.
So how is it different? Well let's assume your transaction takes 10 minutes to complete but you know beforehand which rows will be updated. Locking them in advance will make the 10 minute transaction wait for any locks before starting, eliminating the risk that it's rolled back aftre 5 minutes. Then it will lock out any concurrent transactions affecting the same rows until it finishes, instead of those transactions being possibly rolled back.
So, with judicious use of immediate locking semantics, one can reduce the amount of concurrency failures/rollbacks. On the other hand, indiscriminate use of immediate locking semantics (i.e. On rows that will not be affected) can kill concurrency.
To sum up, only in SERIALIZABLE mode: functionally equal, operationally different, judicious use can be good, if in doubt don't use at all.
For lower isolation levels, FOR UPDATE is functionally relevant, as it allows i.e. snapshot concurrency to achieve serializability by way of explicit locking directions.

Related

How is Oracle ACID compliant when it does not honour 'isolation' property fully?

Claim: Oracle does not honour isolation property in ACID properties.
As per Wikipedia page on ACID
"Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially."
This can happen only if the transactions are serializable. Yes, Oracle has a transaction level called Serializable but it is not true serializability and is only snapshot isolation.
Read https://blog.dbi-services.com/oracle-serializable-is-not-serializable/
An excerpt from Wiki page of Snapshot isolation (https://en.wikipedia.org/wiki/Snapshot_isolation)
"In spite of its distinction from serializability, snapshot isolation is sometimes referred to as serializable by Oracle".
There are weaker isolation levels but they are not sufficient to guarantee that the sequence of transactions would lead to the outcome that would be obtained if they were executed sequentially. To guarantee it, serializability is a must.
Q1) Since Oracle does not provide it (its serializaibility is not a true one), it does not honor isolation 100 percent. How can then it be called ACID compliant?
Q2) Looks like Oracle was treated here leniently with regard to isolation. Is this leniency extended to other databases as well?
Q3) If we take an unforgiving stance and say (isolation means 100 percent isolation - no less is accepted), won't Oracle's claim of being ACID compliant fall to pieces? What about other relational databases? Will they be able to make the cut or will they fall short like Oracle?
SQL Server has SERIALIZABLE in addition to SNAPSHOT. But, at least in SQL Server, for most practical purposes SERIALIZABLE is useless, as it's too expensive, and not really effective. And you use special constructs for the few transactions that actually need to be serialized (ie run one-at-a-time).
SERIALIZABLE is to expensive because transaction ordering is accomplished by some combination eliminating concurrency, and by generating run-time failures (deadlocks). Both of which are very expensive and troublesome.
SERIALIZABLE is not really effective, because doesn't actually accomplish complete transaction isolation. To do so would require every transactions to exclusively lock all data it reads, to prevent two transactions from reading the same data, and then writing.
The classic example is where two sessions run
SELECT salary FROM emp where id = 1
and then, compute a new value based on the existing in the client, and then
UPDATE emp set salary = :newSalary
The only way to make this work right is to place an exclusive lock on the first read, so a second session can't read too.
In Oracle this is accomplished with SELECT ... FOR UPDATE, and in SQL Server with an UPDLOCK hint. Or with an explicit "application lock", Oracle's DMBS_LOCK, or SQL Server's sp_getapplock.

What's the point of using high isolation-level when in autocommit mode?

I don't see a reason why anything higher than READ COMMITTED is useful in autocommit mode.
Autocommit ends the transaction after each query, which in turn releases the aquired locks over the selected data. If the lock doesn't survive multiple queries you can't do consistent reads. So having higher isolation level in autocommit mode only causes locking more data => BAD
Is that correct?
You seem to assume that a single statement is always safe from concurrency issues just by the fact that it is one statement. This is not true. Let's make an example to intuitively see this. Compare the following two transactions:
--A
select * from T where ID = 1
select * from T where ID = 2
--B
select * from T where ID IN (1, 2)
Cramming the two reads into a single statement does not avoid any concurrency problems (at least not in all RDBMS'es and storage engines). The two transactions have identical locking and consistency properties in SQL Server for example. Some other RDBMS'es use MVCC for each statement. MVCC does not provide serializability. Only under serializability you are always safe of concurrency problems.
Whether you use one or two statements does not make any difference. The same for autocommit or not.
Note, that both versions not serializable under READ COMMITTED. So you see: There is a reason to not use READ COMMITTED and auto-commit at the same time.
I can answer regarding MySQL implementation, which may have different implementation details compared to other brands like Oracle or PostgreSQL, etc.
You're right, you should just use READ COMMITTED if you can. In MySQL, it creates fewer locks than REPEATABLE READ, and if you use only autocommit, then you don't need the consistent transaction read view of REPEATABLE READ.
In MySQL, REPEATABLE READ is the default transaction isolation level. That's kind of a pity, since it creates a bit more overhead.
You are not at risk for the single statement viewing different results if reading the same row twice, as #MK supposes. Even in READ COMMITTED, each statement is made "atomic" by creating a short-lived transaction read view. Yet it doesn't block concurrent updates. That's the magic of MVCC!
See also http://www.mysqlperformanceblog.com/2012/08/28/differences-between-read-committed-and-repeatable-read-transaction-isolation-levels/

Some clarifications on different Isolation level in database transaction?

Below is the statement written from Wikipedia's Isolation article about REPEATABLE READS
In this isolation level, a lock-based concurrency control DBMS implementation keeps read and write locks (acquired on selected data)
until the end of the transaction. However, range-locks are not managed, so the phantom reads phenomenon can occur (see below).
My question here is when does the the transaction begin and end respectively.
If we take the example of Non-repeatable reads with REPEATABLE READS Isolation level at the same link , as per my understanding trnsaction 1 begin
when first query is fired i.e SELECT * FROM users WHERE id = 1. DBMS will keep the lock on the users table until and unless transaction gets end.
here By end I mean is when connection gets rolledback or commited not on the completion of SELECT * FROM users WHERE id = 1. Till that time
Transaction 2 will wait Right?
Question 2 :- Now if we consider the isolation level and thier behaviour as given below (at the same link)
Isolation level Dirty reads Non-repeatable Phantoms
Read Uncommitted may occur may occur may occur
Read Committed - may occur may occur
Repeatable Read - may occur -
Serializable - - -
As per my understanding Most reliable is Serializable then Repeatable Read and then Read Committed but still i have seen aplications using Read Committed. Is that because
of performance of Serializable and Repeatable Read is bad in comparison to Read Committed because in serializable it will be sequential and in case
of transaction has to wait for release of lock by another transaction. Right? So to get best of all three we can use isolation
level as Read Committed with SELECT FOR UPDATE (to achieve repeatable read).Not sure how we can achieve phantom read if we want , in case of read commited
isolation level?
Oracle does not support the REPEATABLE READ isolation level. However, SQL Server does - and it does place locks on all rows selected by the transaction until it ends (ie: it's committed or rolled back). So you are correct, this will indeed make other transactions wait (if they are updating the locked data) and can be detrimental to concurrency.
As for question 2: Yes, the higher the isolation level, the worse your concurrent transactions will perform because they have to wait for more locks to be released. I am not sure what you mean by "getting the best of all three" by using SELECT FOR UPDATE because SELECT FOR UPDATE will place row locks on all selected rows.
And finally, here's a quote from Oracle's manual on phantom reads:
[phantom reads occur when] a transaction reruns a query returning a set of rows that satisfies a search condition and finds that another committed transaction has inserted additional rows that satisfy the condition.
For example, a transaction queries the number of employees. Five minutes later it performs the same query, but now the number has increased by one because another user inserted a record for a new hire. More data satisfies the query criteria than before, but unlike in a fuzzy read the previously read data is unchanged.
Reference:
Data Concurrency and Consistency (Oracle)
SET TRANSACTION LEVEL (SQL Server)

Does it make sense to begin a transaction where there will be only some data-retrieval operations?

Does it make sense to begin a transaction where there will be only some data-retrieval operations and no UPDATE or INSERT will occur?
Thanks!
Not normally.
If you have 2 SELECTs they could become inconsistent in the fraction of second between reads.
A transaction won't fix this for SQL Server/Sybase type locking because read locks will be released. So you'd need to use higher isolation levels which will affect concurrency (potentially quite serious)
The trade off between "tiny risk of inconsistent data" and "loss of performance" is up to you.
Starting a transaction for that case insures that the data seen will be consistent; some other process will not be able to update the rows you are looking at such that the second SELECT sees something different from the first when it should be seeing the same thing.
Yes, to ensure transaction-leven read consistency.
Ensuring Repeatable Reads with Read-Only Transactions
By default, the consistency model for Oracle guarantees statement-level read consistency, but does not guarantee transaction-level read consistency (repeatable reads). If you want transaction-level read consistency, and if your transaction does not require updates, then you can specify a read-only transaction. After indicating that your transaction is read-only, you can execute as many queries as you like against any database table, knowing that the results of each query in the read-only transaction are consistent with respect to a single point in time.
http://download.oracle.com/docs/cd/B10501_01/appdev.920/a96590/adg08sql.htm

MySQL: Transactions vs Locking Tables

I'm a bit confused with transactions vs locking tables to ensure database integrity and make sure a SELECT and UPDATE remain in sync and no other connection interferes with it. I need to:
SELECT * FROM table WHERE (...) LIMIT 1
if (condition passes) {
// Update row I got from the select
UPDATE table SET column = "value" WHERE (...)
... other logic (including INSERT some data) ...
}
I need to ensure that no other queries will interfere and perform the same SELECT (reading the 'old value' before that connection finishes updating the row.
I know I can default to LOCK TABLES table to just make sure that only 1 connection is doing this at a time, and unlock it when I'm done, but that seems like overkill. Would wrapping that in a transaction do the same thing (ensuring no other connection attempts the same process while another is still processing)? Or would a SELECT ... FOR UPDATE or SELECT ... LOCK IN SHARE MODE be better?
Locking tables prevents other DB users from affecting the rows/tables you've locked. But locks, in and of themselves, will NOT ensure that your logic comes out in a consistent state.
Think of a banking system. When you pay a bill online, there's at least two accounts affected by the transaction: Your account, from which the money is taken. And the receiver's account, into which the money is transferred. And the bank's account, into which they'll happily deposit all the service fees charged on the transaction. Given (as everyone knows these days) that banks are extraordinarily stupid, let's say their system works like this:
$balance = "GET BALANCE FROM your ACCOUNT";
if ($balance < $amount_being_paid) {
charge_huge_overdraft_fees();
}
$balance = $balance - $amount_being paid;
UPDATE your ACCOUNT SET BALANCE = $balance;
$balance = "GET BALANCE FROM receiver ACCOUNT"
charge_insane_transaction_fee();
$balance = $balance + $amount_being_paid
UPDATE receiver ACCOUNT SET BALANCE = $balance
Now, with no locks and no transactions, this system is vulnerable to various race conditions, the biggest of which is multiple payments being performed on your account, or the receiver's account in parallel. While your code has your balance retrieved and is doing the huge_overdraft_fees() and whatnot, it's entirely possible that some other payment will be running the same type of code in parallel. They'll be retrieve your balance (say, $100), do their transactions (take out the $20 you're paying, and the $30 they're screwing you over with), and now both code paths have two different balances: $80 and $70. Depending on which ones finishes last, you'll end up with either of those two balances in your account, instead of the $50 you should have ended up with ($100 - $20 - $30). In this case, "bank error in your favor".
Now, let's say you use locks. Your bill payment ($20) hits the pipe first, so it wins and locks your account record. Now you've got exclusive use, and can deduct the $20 from the balance, and write the new balance back in peace... and your account ends up with $80 as is expected. But... uhoh... You try to go update the receiver's account, and it's locked, and locked longer than the code allows, timing out your transaction... We're dealing with stupid banks, so instead of having proper error handling, the code just pulls an exit(), and your $20 vanishes into a puff of electrons. Now you're out $20, and you still owe $20 to the receiver, and your telephone gets repossessed.
So... enter transactions. You start a transaction, you debit your account $20, you try to credit the receiver with $20... and something blows up again. But this time, instead of exit(), the code can just do rollback, and poof, your $20 is magically added back to your account.
In the end, it boils down to this:
Locks keep anyone else from interfering with any database records you're dealing with. Transactions keep any "later" errors from interfering with "earlier" things you've done. Neither alone can guarantee that things work out ok in the end. But together, they do.
in tomorrow's lesson: The Joy of Deadlocks.
I've started to research the same topic for the same reasons as you indicated in your question. I was confused by the answers given in SO due to them being partial answers and not providing the big picture. After I read couple documentation pages from different RDMS providers these are my takes:
TRANSACTIONS
Statements are database commands mainly to read and modify the data in the database. Transactions are scope of single or multiple statement executions. They provide two things:
A mechanism which guaranties that all statements in a transaction are executed correctly or in case of a single error any data modified by those statements will be reverted to its last correct state (i.e. rollback). What this mechanism provides is called atomicity.
A mechanism which guaranties that concurrent read statements can view the data without the occurrence of some or all phenomena described below.
Dirty read: A transaction reads data written by a concurrent
uncommitted transaction.
Nonrepeatable read: A transaction re-reads data it has previously read
and finds that data has been modified by another transaction (that
committed since the initial read).
Phantom read: A transaction re-executes a query returning a set of
rows that satisfy a search condition and finds that the set of rows
satisfying the condition has changed due to another recently-committed
transaction.
Serialization anomaly: The result of successfully committing a group
of transactions is inconsistent with all possible orderings of running
those transactions one at a time.
What this mechanism provides is called isolation and the mechanism which lets the statements to chose which phenomena should not occur in a transaction is called isolation levels.
As an example this is the isolation-level / phenomena table for PostgreSQL:
If any of the described promises is broken by the database system, changes are rolled back and the caller notified about it.
How these mechanisms are implemented to provide these guaranties is described below.
LOCK TYPES
Exclusive Locks: When an exclusive lock acquired over a resource no other exclusive lock can be acquired over that resource. Exclusive locks are always acquired before a modify statement (INSERT, UPDATE or DELETE) and they are released after the transaction is finished. To explicitly acquire exclusive locks before a modify statement you can use hints like FOR UPDATE(PostgreSQL, MySQL) or UPDLOCK (T-SQL).
Shared Locks: Multiple shared locks can be acquired over a resource. However, shared locks and exclusive locks can not be acquired at the same time over a resource. Shared locks might or might not be acquired before a read statement (SELECT, JOIN) based on database implementation of isolation levels.
LOCK RESOURCE RANGES
Row: single row the statements executes on.
Range: a specific range based on the condition given in the statement (SELECT ... WHERE).
Table: whole table. (Mostly used to prevent deadlocks on big statements like batch update.)
As an example the default shared lock behavior of different isolation levels for SQL-Server :
DEADLOCKS
One of the downsides of locking mechanism is deadlocks. A deadlock occurs when a statement enters a waiting state because a requested resource is held by another waiting statement, which in turn is waiting for another resource held by another waiting statement. In such case database system detects the deadlock and terminates one of the transactions. Careless use of locks can increase the chance of deadlocks however they can occur even without human error.
SNAPSHOTS (DATA VERSIONING)
This is a isolation mechanism which provides to a statement a copy of the data taken at a specific time.
Statement beginning: provides data copy to the statement taken at the beginning of the statement execution. It also helps for the rollback mechanism by keeping this data until transaction is finished.
Transaction beginning: provides data copy to the statement taken at the beginning of the transaction.
All of those mechanisms together provide consistency.
When it comes to Optimistic and Pessimistic locks, they are just namings for the classification of approaches to concurrency problem.
Pessimistic concurrency control:
A system of locks prevents users from modifying data in a way that
affects other users. After a user performs an action that causes a
lock to be applied, other users cannot perform actions that would
conflict with the lock until the owner releases it. This is called
pessimistic control because it is mainly used in environments where
there is high contention for data, where the cost of protecting data
with locks is less than the cost of rolling back transactions if
concurrency conflicts occur.
Optimistic concurrency control:
In optimistic concurrency control, users do not lock data when they
read it. When a user updates data, the system checks to see if another
user changed the data after it was read. If another user updated the
data, an error is raised. Typically, the user receiving the error
rolls back the transaction and starts over. This is called optimistic
because it is mainly used in environments where there is low
contention for data, and where the cost of occasionally rolling back a
transaction is lower than the cost of locking data when read.
For example by default PostgreSQL uses snapshots to make sure the read data didn't change and rolls back if it changed which is an optimistic approach. However, SQL-Server use read locks by default to provide these promises.
The implementation details might change according to database system you chose. However, according to database standards they need to provide those stated transaction guarantees in one way or another using these mechanisms. If you want to know more about the topic or about a specific implementation details below are some useful links for you.
SQL-Server - Transaction Locking and Row Versioning Guide
PostgreSQL - Transaction Isolation
PostgreSQL - Explicit Locking
MySQL - Consistent Nonlocking Reads
MySQL - Locking
Understanding Isolation Levels (Video)
You want a SELECT ... FOR UPDATE or SELECT ... LOCK IN SHARE MODE inside a transaction, as you said, since normally SELECTs, no matter whether they are in a transaction or not, will not lock a table. Which one you choose would depend on whether you want other transactions to be able to read that row while your transaction is in progress.
http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
START TRANSACTION WITH CONSISTENT SNAPSHOT will not do the trick for you, as other transactions can still come along and modify that row. This is mentioned right at the top of the link below.
If other sessions simultaneously
update the same table [...] you may
see the table in a state that never
existed in the database.
http://dev.mysql.com/doc/refman/5.0/en/innodb-consistent-read.html
Transaction concepts and locks are different. However, transaction used locks to help it to follow the ACID principles.
If you want to the table to prevent others to read/write at the same time point while you are read/write, you need a lock to do this.
If you want to make sure the data integrity and consistence, you had better use transactions.
I think mixed concepts of isolation levels in transactions with locks.
Please search isolation levels of transactions, SERIALIZE should be the level you want.
I had a similar problem when attempting a IF NOT EXISTS ... and then performing an INSERT which caused a race condition when multiple threads were updating the same table.
I found the solution to the problem here: How to write INSERT IF NOT EXISTS queries in standard SQL
I realise this does not directly answer your question but the same principle of performing an check and insert as a single statement is very useful; you should be able to modify it to perform your update.
I'd use a
START TRANSACTION WITH CONSISTENT SNAPSHOT;
to begin with, and a
COMMIT;
to end with.
Anything you do in between is isolated from the others users of your database if your storage engine supports transactions (which is InnoDB).
You are confused with lock & transaction. They are two different things in RMDB. Lock prevents concurrent operations while transaction focuses on data isolation. Check out this great article for the clarification and some graceful solution.