Difference between "read commited" and "repeatable read" in SQL Server - sql

I think the above isolation levels are so alike. Could someone please describe with some nice examples what the main difference is?

Read committed is an isolation level that guarantees that any data read was committed at the moment is read. It simply restricts the reader from seeing any intermediate, uncommitted, 'dirty' read. It makes no promise whatsoever that if the transaction re-issues the read, will find the Same data, data is free to change after it was read.
Repeatable read is a higher isolation level, that in addition to the guarantees of the read committed level, it also guarantees that any data read cannot change, if the transaction reads the same data again, it will find the previously read data in place, unchanged, and available to read.
The next isolation level, serializable, makes an even stronger guarantee: in addition to everything repeatable read guarantees, it also guarantees that no new data can be seen by a subsequent read.
Say you have a table T with a column C with one row in it, say it has the value '1'. And consider you have a simple task like the following:
BEGIN TRANSACTION;
SELECT * FROM T;
WAITFOR DELAY '00:01:00'
SELECT * FROM T;
COMMIT;
That is a simple task that issue two reads from table T, with a delay of 1 minute between them.
under READ COMMITTED, the second SELECT may return any data. A concurrent transaction may update the record, delete it, insert new records. The second select will always see the new data.
under REPEATABLE READ the second SELECT is guaranteed to display at least the rows that were returned from the first SELECT unchanged. New rows may be added by a concurrent transaction in that one minute, but the existing rows cannot be deleted nor changed.
under SERIALIZABLE reads the second select is guaranteed to see exactly the same rows as the first. No row can change, nor deleted, nor new rows could be inserted by a concurrent transaction.
If you follow the logic above you can quickly realize that SERIALIZABLE transactions, while they may make life easy for you, are always completely blocking every possible concurrent operation, since they require that nobody can modify, delete nor insert any row. The default transaction isolation level of the .Net System.Transactions scope is serializable, and this usually explains the abysmal performance that results.
And finally, there is also the SNAPSHOT isolation level. SNAPSHOT isolation level makes the same guarantees as serializable, but not by requiring that no concurrent transaction can modify the data. Instead, it forces every reader to see its own version of the world (its own 'snapshot'). This makes it very easy to program against as well as very scalable as it does not block concurrent updates. However, that benefit comes with a price: extra server resource consumption.
Supplemental reads:
Isolation Levels in the Database Engine
Concurrency Effects
Choosing Row Versioning-based Isolation Levels

Repeatable Read
The state of the database is maintained from the start of the transaction. If you retrieve a value in session1, then update that value in session2, retrieving it again in session1 will return the same results. Reads are repeatable.
session1> BEGIN;
session1> SELECT firstname FROM names WHERE id = 7;
Aaron
session2> BEGIN;
session2> SELECT firstname FROM names WHERE id = 7;
Aaron
session2> UPDATE names SET firstname = 'Bob' WHERE id = 7;
session2> SELECT firstname FROM names WHERE id = 7;
Bob
session2> COMMIT;
session1> SELECT firstname FROM names WHERE id = 7;
Aaron
Read Committed
Within the context of a transaction, you will always retrieve the most recently committed value. If you retrieve a value in session1, update it in session2, then retrieve it in session1again, you will get the value as modified in session2. It reads the last committed row.
session1> BEGIN;
session1> SELECT firstname FROM names WHERE id = 7;
Aaron
session2> BEGIN;
session2> SELECT firstname FROM names WHERE id = 7;
Aaron
session2> UPDATE names SET firstname = 'Bob' WHERE id = 7;
session2> SELECT firstname FROM names WHERE id = 7;
Bob
session2> COMMIT;
session1> SELECT firstname FROM names WHERE id = 7;
Bob
Makes sense?

Simply the answer according to my reading and understanding to this thread and #remus-rusanu answer is based on this simple scenario:
There are two transactions A and B.
Transaction B is reading Table X
Transaction A is writing in table X
Transaction B is reading again in Table X.
ReadUncommitted: Transaction B can read uncommitted data from Transaction A and it could see different rows based on B writing. No lock at all
ReadCommitted: Transaction B can read ONLY committed data from Transaction A and it could see different rows based on COMMITTED only B writing. could we call it Simple Lock?
RepeatableRead: Transaction B will read the same data (rows) whatever Transaction A is doing. But Transaction A can change other rows. Rows level Block
Serialisable: Transaction B will read the same rows as before and Transaction A cannot read or write in the table. Table-level Block
Snapshot: every Transaction has its own copy and they are working on it. Each one has its own view

Old question which has an accepted answer already, but I like to think of these two isolation levels in terms of how they change the locking behavior in SQL Server. This might be helpful for those who are debugging deadlocks like I was.
READ COMMITTED (default)
Shared locks are taken in the SELECT and then released when the SELECT statement completes. This is how the system can guarantee that there are no dirty reads of uncommitted data. Other transactions can still change the underlying rows after your SELECT completes and before your transaction completes.
REPEATABLE READ
Shared locks are taken in the SELECT and then released only after the transaction completes. This is how the system can guarantee that the values you read will not change during the transaction (because they remain locked until the transaction finishes).

Trying to explain this doubt with simple diagrams.
Read Committed: Here in this isolation level, Transaction T1 will be reading the updated value of the X committed by Transaction T2.
Repeatable Read: In this isolation level, Transaction T1 will not consider the changes committed by the Transaction T2.

I think this picture can also be useful, it helps me as a reference when I want to quickly remember the differences between isolation levels (thanks to kudvenkat on youtube)

Please note that, the repeatable in repeatable read regards to a tuple, but not to the entire table. In ANSC isolation levels, phantom read anomaly can occur, which means read a table with the same where clause twice may return different return different result sets. Literally, it's not repeatable.

My observation on initial accepted solution.
Under RR (default mysql) - If a tx is open and a SELECT has been fired, another tx can NOT delete any row belonging to previous READ result set until previous tx is committed (in fact delete statement in the new tx will just hang), however the next tx can delete all rows from the table without any trouble. Btw, a next READ in previous tx will still see the old data until it is committed.

Related

How can this SQL command cause phantom read?

In short, my professor said the following transactions are susceptible to phantom read if they're both left to default isolation level (READ COMMITTED)
BEGIN;
UPDATE product SET price = 100 WHERE type='Electronics';
COMMIT;
BEGIN;
UPDATE product SET price = 100 WHERE price < 100;
COMMIT;
I can't really seem to be able to figure how a phantom read could happen.
He also said that, to fix this, you'd have to set the second transaction to REPEATABLE READ
So... why? How could a phantom read happen here, and why does REPEATABLE READ fixes it?
EDIT: could this be the case?
Say we have an initial product P that has type=Electronics AND price=1049
T1 would begin, and add P to the set of rows to consider.
T2 would begin, and ignore P (its price is below 1050).
T1 would increment its price to 1100, and COMMITs.
Now T2 should update its rows set and include P.
But since in READ COMMITTED a transaction will get an updated snapshot only if changes are made to rows that were within the SET they are considering, the change goes unnotified.
T2, therefore, simply ignores P, and COMMITs.
This scenario was suggested by the example I found in postgresql docs,
on the isolation level page, read committed paragraph.
Do you think this is a possible scenario and, hopefully, what my professor meant?
A phantom read means that if you run the same SELECT twice in a transaction, the second one could get different results than the first.
In the words of the SQL standard:
SQL-transaction T1 reads the set of rows N that satisfy some <search condition>.
SQL-transaction T2 then executes SQL-statements that generate one or more rows that satisfy the <search condition> used by SQL-transaction T1.
If SQL-transaction T1 then repeats the initial read with the same
<search condition>, it obtains a different collection of rows.
This can be caused by concurrent data modifications like the ones you quote at low isolation levels. This is because each query will see all committed data, even if they were committed after the transaction started.
You could also speak of phantom reads in the context of an UPDATE statement, since it also reads from the table. Then the same UPDATE can affect different statements if it is run twice.
However, it makes no sense to speak of phantom reads in the context of the two statements in your question: The second one modifies the column it is searching for, so the second execution will read different rows, no matter if there are concurrent data modifications or not.
Note: The SQL standard does not require that REPEATABLE READ transactions prevent phantom reads — this is only guaranteed with SERIALIZABLE isolation.
In PostgreSQL phantom reads are already impossible at REPEATABLE READ isolation, because it uses snapshots that guarantee a stable view of the database.
This might help
https://en.wikipedia.org/wiki/Isolation_(database_systems)
Non-repeatable reads A non-repeatable read occurs, when during the
course of a transaction, a row is retrieved twice and the values
within the row differ between reads.
Non-repeatable reads phenomenon may occur in a lock-based concurrency
control method when read locks are not acquired when performing a
SELECT, or when the acquired locks on affected rows are released as
soon as the SELECT operation is performed. Under the multiversion
concurrency control method, non-repeatable reads may occur when the
requirement that a transaction affected by a commit conflict must roll
back is relaxed.
A phantom read occurs when, in the course of a transaction, new rows
are added or removed by another transaction to the records being read.
This can occur when range locks are not acquired on performing a
SELECT ... WHERE operation. The phantom reads anomaly is a special
case of Non-repeatable reads when Transaction 1 repeats a ranged
SELECT ... WHERE query and, between both operations, Transaction 2
creates (i.e. INSERT) new rows (in the target table) which fulfil that
WHERE clause.

Atomic update in SQL with two queries

Assume that I want to do atomic update for a column with two queries:
SELECT num FROM table WHERE id = ?
Put num inside variable x
UPDATE table SET num = x + 1 WHERE id = ?
And I make these two queries inside a transaction with isolation level: repeatable read.
Is it possible to have a deadlock situation?
Let's say there are two transactions running.
T1 reads the value num and acquires the share lock on this row
T2 reads the value num and acquires the share lock on this row
T1 now wants to get the exclusive lock to write but cannot because T2 holds a share lock
T2 now wants to get the exclusive lock to write but cannot because T1 holds a share lock
We have deadlock here.
But I tried in the code with Spring data JPA, there is no deadlock. Instead I have race condition and the final incremented value is less than what is expected. If I move to isolation level serializable, then I will have deadlock, and there is no race condition. The final increment value is good.
I thought serializable only deals with the range lock. Why does the deadlock happen in serializable isolation level and not repeatable read? with the two queries above.
PS: Don't tell me to user just one single UPDATE statement, I am trying to learn how to manual do atomic update with two statement and learn more about transaction here.

Generating a sequence number in a PostgreSQL database - concurrency & isolation levels

Similar question; Update SQL with consecutive numbering
I want to be able to generate a sequence number by incrementing a column num in a table called SeqNum. The SeqNum table layout;
|num|
|===|
| 0 |
The query being run;
BEGIN TRANSACTION
UPDATE SeqNum
SET num = num + 1
SELECT num from SeqNum
COMMIT TRANSACTION
My question is if I have multiple processes running this query with a READ COMMITTED isolation level at the same time, will the select clause always return a unique updated value. I'm assuming this will be consistent and no two processes would ever return the same num... Obviously this is all running in the one transaction. If it wasn't in a transaction I would expect it to potentially return duplicate values.
I'm not sure how the behavior changes (if at all) depending on the isolation level.
In PostgreSQL, you can request any of the four standard transaction isolation levels. But internally, there are only three distinct isolation levels, which correspond to the levels Read Committed, Repeatable Read, and Serializable. When you select the level Read Uncommitted you really get Read Committed ...1
In read committed isolation level dirty reads are not possible per standard, which means these transactions cannot read data written by a concurrent uncommitted transaction. That can only happen in the read uncommitted isolation level per standard (but won't happen in PostgreSQL: the four isolation levels only define which phenomena must not happen, they do not define which phenomena must happen).
In short your select clause won't return a unique value always. Neither will, if you rewrite it to UPDATE ... RETUNRING ..., but the time window will be really small, so chances will be much more lower to multiple transactions return the same value.
But lucky for you, the only thing in PostgreSQL, which doesn't affected by transactions, is the sequence:
To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used, even if the transaction that did the nextval later aborts. This means that aborted transactions might leave unused "holes" in the sequence of assigned values. 2
Because sequences are non-transactional, changes made by setval are not undone if the transaction rolls back. 2

Some clarifications on different Isolation level in database transaction?

Below is the statement written from Wikipedia's Isolation article about REPEATABLE READS
In this isolation level, a lock-based concurrency control DBMS implementation keeps read and write locks (acquired on selected data)
until the end of the transaction. However, range-locks are not managed, so the phantom reads phenomenon can occur (see below).
My question here is when does the the transaction begin and end respectively.
If we take the example of Non-repeatable reads with REPEATABLE READS Isolation level at the same link , as per my understanding trnsaction 1 begin
when first query is fired i.e SELECT * FROM users WHERE id = 1. DBMS will keep the lock on the users table until and unless transaction gets end.
here By end I mean is when connection gets rolledback or commited not on the completion of SELECT * FROM users WHERE id = 1. Till that time
Transaction 2 will wait Right?
Question 2 :- Now if we consider the isolation level and thier behaviour as given below (at the same link)
Isolation level Dirty reads Non-repeatable Phantoms
Read Uncommitted may occur may occur may occur
Read Committed - may occur may occur
Repeatable Read - may occur -
Serializable - - -
As per my understanding Most reliable is Serializable then Repeatable Read and then Read Committed but still i have seen aplications using Read Committed. Is that because
of performance of Serializable and Repeatable Read is bad in comparison to Read Committed because in serializable it will be sequential and in case
of transaction has to wait for release of lock by another transaction. Right? So to get best of all three we can use isolation
level as Read Committed with SELECT FOR UPDATE (to achieve repeatable read).Not sure how we can achieve phantom read if we want , in case of read commited
isolation level?
Oracle does not support the REPEATABLE READ isolation level. However, SQL Server does - and it does place locks on all rows selected by the transaction until it ends (ie: it's committed or rolled back). So you are correct, this will indeed make other transactions wait (if they are updating the locked data) and can be detrimental to concurrency.
As for question 2: Yes, the higher the isolation level, the worse your concurrent transactions will perform because they have to wait for more locks to be released. I am not sure what you mean by "getting the best of all three" by using SELECT FOR UPDATE because SELECT FOR UPDATE will place row locks on all selected rows.
And finally, here's a quote from Oracle's manual on phantom reads:
[phantom reads occur when] a transaction reruns a query returning a set of rows that satisfies a search condition and finds that another committed transaction has inserted additional rows that satisfy the condition.
For example, a transaction queries the number of employees. Five minutes later it performs the same query, but now the number has increased by one because another user inserted a record for a new hire. More data satisfies the query criteria than before, but unlike in a fuzzy read the previously read data is unchanged.
Reference:
Data Concurrency and Consistency (Oracle)
SET TRANSACTION LEVEL (SQL Server)

READ UNCOMMITTED and Estimates

From time to time, I want to run a stored procedure to get a rough estimate of how many records in two or three different tables satisfy some criteria. If during this estimate new records are added, deleted or updated, there is not really a problem (I just want a rough estimate). That being, I can afford for this process using SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED. However, I have two questions about this:
1) Since I am only using SELECT COUNT(*) instructions, do I really need to wrap these statement in a BEGIN/COMMIT TRANSACTION block?
2) Do I need to SET TRANSACTION ISOLATION LEVEL READ COMMITTED back in the end of the stored procedure, or will this be automatically set once its execution ends?
No. Reads don't need to be in a transaction
The SET is scoped only for the stored procedure. See my answer here: Is it okay if from within one stored procedure I call another one that sets a lower transaction isolation level?. However, you'd use the NOLOCK hint rather then SET: SELECT COUNT(*) FROM myTable WITH (NOLOCK).
If you want an approximate count without WHERE filters, then use sys.dm_db_partition_stats. See my answer here: Fastest way to count exact number of rows in a very large table?
1) No. It uses implicit transaction for each statement if you do not specify the transaction scope. You do not have to put explicit transaction scope for making 'set transaction isolation level to work'.
2) You don't have to reset it to original. It will be taken care by SQL Server. Please refer to this SO entry: Transaction Isolation Level Scopes