In short, my professor said the following transactions are susceptible to phantom read if they're both left to default isolation level (READ COMMITTED)
BEGIN;
UPDATE product SET price = 100 WHERE type='Electronics';
COMMIT;
BEGIN;
UPDATE product SET price = 100 WHERE price < 100;
COMMIT;
I can't really seem to be able to figure how a phantom read could happen.
He also said that, to fix this, you'd have to set the second transaction to REPEATABLE READ
So... why? How could a phantom read happen here, and why does REPEATABLE READ fixes it?
EDIT: could this be the case?
Say we have an initial product P that has type=Electronics AND price=1049
T1 would begin, and add P to the set of rows to consider.
T2 would begin, and ignore P (its price is below 1050).
T1 would increment its price to 1100, and COMMITs.
Now T2 should update its rows set and include P.
But since in READ COMMITTED a transaction will get an updated snapshot only if changes are made to rows that were within the SET they are considering, the change goes unnotified.
T2, therefore, simply ignores P, and COMMITs.
This scenario was suggested by the example I found in postgresql docs,
on the isolation level page, read committed paragraph.
Do you think this is a possible scenario and, hopefully, what my professor meant?
A phantom read means that if you run the same SELECT twice in a transaction, the second one could get different results than the first.
In the words of the SQL standard:
SQL-transaction T1 reads the set of rows N that satisfy some <search condition>.
SQL-transaction T2 then executes SQL-statements that generate one or more rows that satisfy the <search condition> used by SQL-transaction T1.
If SQL-transaction T1 then repeats the initial read with the same
<search condition>, it obtains a different collection of rows.
This can be caused by concurrent data modifications like the ones you quote at low isolation levels. This is because each query will see all committed data, even if they were committed after the transaction started.
You could also speak of phantom reads in the context of an UPDATE statement, since it also reads from the table. Then the same UPDATE can affect different statements if it is run twice.
However, it makes no sense to speak of phantom reads in the context of the two statements in your question: The second one modifies the column it is searching for, so the second execution will read different rows, no matter if there are concurrent data modifications or not.
Note: The SQL standard does not require that REPEATABLE READ transactions prevent phantom reads — this is only guaranteed with SERIALIZABLE isolation.
In PostgreSQL phantom reads are already impossible at REPEATABLE READ isolation, because it uses snapshots that guarantee a stable view of the database.
This might help
https://en.wikipedia.org/wiki/Isolation_(database_systems)
Non-repeatable reads A non-repeatable read occurs, when during the
course of a transaction, a row is retrieved twice and the values
within the row differ between reads.
Non-repeatable reads phenomenon may occur in a lock-based concurrency
control method when read locks are not acquired when performing a
SELECT, or when the acquired locks on affected rows are released as
soon as the SELECT operation is performed. Under the multiversion
concurrency control method, non-repeatable reads may occur when the
requirement that a transaction affected by a commit conflict must roll
back is relaxed.
A phantom read occurs when, in the course of a transaction, new rows
are added or removed by another transaction to the records being read.
This can occur when range locks are not acquired on performing a
SELECT ... WHERE operation. The phantom reads anomaly is a special
case of Non-repeatable reads when Transaction 1 repeats a ranged
SELECT ... WHERE query and, between both operations, Transaction 2
creates (i.e. INSERT) new rows (in the target table) which fulfil that
WHERE clause.
Related
While reading transaction isolation levels from wikipedia, I got confused by the isolation phenomena "dirty reads" and "non-repeatable reads." Both mean that if t1 selects some data, t2 modifies that same data, and then t1 reads the data again, t1 will see modified data. So what are the differences?
dirty reads: when you see uncommited changes
non-repeatable reads: when you see successfully committed changes when performing the same query multiple times
the first one is evil one should avoid in most cases because you may see rows in inconsistent intermediate state while the second one is ok for many applications
What's the differences between these two transaction's levels: READ WRITE and ISOLATION LEVEL SERIALIZABLE ?
As I understand, READ WRITE allows dirty reads, while ISOLATION LEVEL SERIALIZABLE prevents data from changing by other users(think that I'm mistaken here) or just read that data that is available at the beginning of the transaction(don't see the data, that has been changed by other users during my transaction).
You can find detailed information about this topic on the oracle site.
Basically READ COMMITTED allows "nonrepeatable reads" and "phantom reads", while both are prohibited in SERIALIZABLE.
If non-repeatable reads are permitted, the same SELECT query inside of the same transaction, might return different results based on when the query is issued. Other parallel transactions may change the data and this changes might become visible inside your transaction.
If phantom reads are permitted, it can happen that when you issue the same SELECT query twice inside of one transaction, and another transactions inserts rows into the table in parallel, these rows might become visible inside of your transaction, but only in the resultset of the second select. So the same select statement will return for example 5 rows the first time and 10 rows the second time it was executed.
Both properties are similar, but the first only says something about data which may change, while the scond property says something about additional rows which might be returned.
Below is the statement written from Wikipedia's Isolation article about REPEATABLE READS
In this isolation level, a lock-based concurrency control DBMS implementation keeps read and write locks (acquired on selected data)
until the end of the transaction. However, range-locks are not managed, so the phantom reads phenomenon can occur (see below).
My question here is when does the the transaction begin and end respectively.
If we take the example of Non-repeatable reads with REPEATABLE READS Isolation level at the same link , as per my understanding trnsaction 1 begin
when first query is fired i.e SELECT * FROM users WHERE id = 1. DBMS will keep the lock on the users table until and unless transaction gets end.
here By end I mean is when connection gets rolledback or commited not on the completion of SELECT * FROM users WHERE id = 1. Till that time
Transaction 2 will wait Right?
Question 2 :- Now if we consider the isolation level and thier behaviour as given below (at the same link)
Isolation level Dirty reads Non-repeatable Phantoms
Read Uncommitted may occur may occur may occur
Read Committed - may occur may occur
Repeatable Read - may occur -
Serializable - - -
As per my understanding Most reliable is Serializable then Repeatable Read and then Read Committed but still i have seen aplications using Read Committed. Is that because
of performance of Serializable and Repeatable Read is bad in comparison to Read Committed because in serializable it will be sequential and in case
of transaction has to wait for release of lock by another transaction. Right? So to get best of all three we can use isolation
level as Read Committed with SELECT FOR UPDATE (to achieve repeatable read).Not sure how we can achieve phantom read if we want , in case of read commited
isolation level?
Oracle does not support the REPEATABLE READ isolation level. However, SQL Server does - and it does place locks on all rows selected by the transaction until it ends (ie: it's committed or rolled back). So you are correct, this will indeed make other transactions wait (if they are updating the locked data) and can be detrimental to concurrency.
As for question 2: Yes, the higher the isolation level, the worse your concurrent transactions will perform because they have to wait for more locks to be released. I am not sure what you mean by "getting the best of all three" by using SELECT FOR UPDATE because SELECT FOR UPDATE will place row locks on all selected rows.
And finally, here's a quote from Oracle's manual on phantom reads:
[phantom reads occur when] a transaction reruns a query returning a set of rows that satisfies a search condition and finds that another committed transaction has inserted additional rows that satisfy the condition.
For example, a transaction queries the number of employees. Five minutes later it performs the same query, but now the number has increased by one because another user inserted a record for a new hire. More data satisfies the query criteria than before, but unlike in a fuzzy read the previously read data is unchanged.
Reference:
Data Concurrency and Consistency (Oracle)
SET TRANSACTION LEVEL (SQL Server)
My answer to my question Is a half-written values reading prevented when SELECT WITH (NOLOCK) hint? cites a script illustrating catching non-atomic reads (SELECTs) of partly-updated values in SQL Server.
Is such non-atomic (partly updated, inserted, deleted) value reading problem specific to SQL Server?
Is it possible in other DBMS-es?
Update:
Not long time ago I believed that READ UNCOMMITTED transaction isolation level (also achieved through WITH(NOLOCK) hint in SQL Server) permitted reading (from other transactions) the uncommitted (or committed, if not yet changed) values but not partly modified (partly updated, partly inserted, partly deleted) values.
Update2:
The first two answers deviated the discussion to attacking READ UNCOMMITTED (isolation level ) phenomena specified by ANSI/ISO SQL-92 specifications.
This question is not about this.
Is non-atomicity of a value (not row!) is compliant with READ UNCOMMITTED and dirty read at all?
I believed that READ UNCOMMITTED did imply reading of uncommitted rows in their entirety but not partly modified values.
Does the definition of "dirty read" include possibility of value modification non-atomicity?
Is it a bug or by design?
or by ANSI SQL92 definition of "dirty read"? I believed that "dirty read" did include atomic reading uncommitted rows but non-atomically modified values...
Is it possible in other DBMS-es?
As far as I know the only other databases that allow READ UNCOMMITTED are DB2, Informix and MySQL when using a non-transactional engine.
All hell would break loose if atomic statements were in fact not atomic.
I can answer this for MSSQL - all single statements are atomic, "dirty reads" refers to the
possibility of reading a "phantom row" that might not exist after TX is committed/rolled back.
There is a difference between Atomicity and READ COMMITTED if the implementation of the latter relies on locking.
Consider transactions A and B. Transaction A is a single SELECT for all records with a status of 'pending' (perhaps a full scan on a very large table so it takes several minutes).
At 3:01 transaction A reads record R1 in the database and sees its status is 'New' so doesn't return it or lock it.
At 3:02 transaction B updates record R1 from 'New' to 'Pending' and record R2000 from 'New' to 'Pending' (single statement)
At 3:03 transaction B commits
At 3:04 transaction A reads record R2000, sees it is 'Pending' and committed and returns it (and locks it).
In this situation, the select in transaction A has only seen part of Transaction B, violating atomicity. Technically though, the select has only returned committed records.
Databases relying on locking reads suffer from this problem because the only solution would be to lock the entirety of the table(s) being read so no-one can update any records in any of them. This would make it impractical for any concurrent activity.
In practice, most OLTP applications have very quick transactions operating on very small data volumes (relative to the database size), and concurrent operations tend to hit different 'slices' of data so the situation occurs very rarely. Even if it does happen, it doesn't necessarily result in a noticeable problem and even when it does they are very hard to reproduce and fixing them would require a whole new architecture. In short, despite being a theoretical problem, in practice it often isn't worth worrying about.
That said, an architect should be aware of the potential issue, be able to assess the risk for a particular application and determine alternatives.
That's one reason why SQL Server added non-locking consistent reads in 2005.
Database theory requires that in all isolation levels, the individual UPDATE or INSERT statements are atomic. Their intermediate results should not be visible to read uncommitted transactions. This has been stated in a paper by a group of well-known database experts. http://research.microsoft.com/apps/pubs/default.aspx?id=69541
However, as read uncommitted results are not considered transactionally consistent by definition, it is possible that implementations may contain bugs that result in part-updated row sets to be returned and these bugs have not been noticed in tests because of the difficulty to determine the validity of the returned result sets.
I think the above isolation levels are so alike. Could someone please describe with some nice examples what the main difference is?
Read committed is an isolation level that guarantees that any data read was committed at the moment is read. It simply restricts the reader from seeing any intermediate, uncommitted, 'dirty' read. It makes no promise whatsoever that if the transaction re-issues the read, will find the Same data, data is free to change after it was read.
Repeatable read is a higher isolation level, that in addition to the guarantees of the read committed level, it also guarantees that any data read cannot change, if the transaction reads the same data again, it will find the previously read data in place, unchanged, and available to read.
The next isolation level, serializable, makes an even stronger guarantee: in addition to everything repeatable read guarantees, it also guarantees that no new data can be seen by a subsequent read.
Say you have a table T with a column C with one row in it, say it has the value '1'. And consider you have a simple task like the following:
BEGIN TRANSACTION;
SELECT * FROM T;
WAITFOR DELAY '00:01:00'
SELECT * FROM T;
COMMIT;
That is a simple task that issue two reads from table T, with a delay of 1 minute between them.
under READ COMMITTED, the second SELECT may return any data. A concurrent transaction may update the record, delete it, insert new records. The second select will always see the new data.
under REPEATABLE READ the second SELECT is guaranteed to display at least the rows that were returned from the first SELECT unchanged. New rows may be added by a concurrent transaction in that one minute, but the existing rows cannot be deleted nor changed.
under SERIALIZABLE reads the second select is guaranteed to see exactly the same rows as the first. No row can change, nor deleted, nor new rows could be inserted by a concurrent transaction.
If you follow the logic above you can quickly realize that SERIALIZABLE transactions, while they may make life easy for you, are always completely blocking every possible concurrent operation, since they require that nobody can modify, delete nor insert any row. The default transaction isolation level of the .Net System.Transactions scope is serializable, and this usually explains the abysmal performance that results.
And finally, there is also the SNAPSHOT isolation level. SNAPSHOT isolation level makes the same guarantees as serializable, but not by requiring that no concurrent transaction can modify the data. Instead, it forces every reader to see its own version of the world (its own 'snapshot'). This makes it very easy to program against as well as very scalable as it does not block concurrent updates. However, that benefit comes with a price: extra server resource consumption.
Supplemental reads:
Isolation Levels in the Database Engine
Concurrency Effects
Choosing Row Versioning-based Isolation Levels
Repeatable Read
The state of the database is maintained from the start of the transaction. If you retrieve a value in session1, then update that value in session2, retrieving it again in session1 will return the same results. Reads are repeatable.
session1> BEGIN;
session1> SELECT firstname FROM names WHERE id = 7;
Aaron
session2> BEGIN;
session2> SELECT firstname FROM names WHERE id = 7;
Aaron
session2> UPDATE names SET firstname = 'Bob' WHERE id = 7;
session2> SELECT firstname FROM names WHERE id = 7;
Bob
session2> COMMIT;
session1> SELECT firstname FROM names WHERE id = 7;
Aaron
Read Committed
Within the context of a transaction, you will always retrieve the most recently committed value. If you retrieve a value in session1, update it in session2, then retrieve it in session1again, you will get the value as modified in session2. It reads the last committed row.
session1> BEGIN;
session1> SELECT firstname FROM names WHERE id = 7;
Aaron
session2> BEGIN;
session2> SELECT firstname FROM names WHERE id = 7;
Aaron
session2> UPDATE names SET firstname = 'Bob' WHERE id = 7;
session2> SELECT firstname FROM names WHERE id = 7;
Bob
session2> COMMIT;
session1> SELECT firstname FROM names WHERE id = 7;
Bob
Makes sense?
Simply the answer according to my reading and understanding to this thread and #remus-rusanu answer is based on this simple scenario:
There are two transactions A and B.
Transaction B is reading Table X
Transaction A is writing in table X
Transaction B is reading again in Table X.
ReadUncommitted: Transaction B can read uncommitted data from Transaction A and it could see different rows based on B writing. No lock at all
ReadCommitted: Transaction B can read ONLY committed data from Transaction A and it could see different rows based on COMMITTED only B writing. could we call it Simple Lock?
RepeatableRead: Transaction B will read the same data (rows) whatever Transaction A is doing. But Transaction A can change other rows. Rows level Block
Serialisable: Transaction B will read the same rows as before and Transaction A cannot read or write in the table. Table-level Block
Snapshot: every Transaction has its own copy and they are working on it. Each one has its own view
Old question which has an accepted answer already, but I like to think of these two isolation levels in terms of how they change the locking behavior in SQL Server. This might be helpful for those who are debugging deadlocks like I was.
READ COMMITTED (default)
Shared locks are taken in the SELECT and then released when the SELECT statement completes. This is how the system can guarantee that there are no dirty reads of uncommitted data. Other transactions can still change the underlying rows after your SELECT completes and before your transaction completes.
REPEATABLE READ
Shared locks are taken in the SELECT and then released only after the transaction completes. This is how the system can guarantee that the values you read will not change during the transaction (because they remain locked until the transaction finishes).
Trying to explain this doubt with simple diagrams.
Read Committed: Here in this isolation level, Transaction T1 will be reading the updated value of the X committed by Transaction T2.
Repeatable Read: In this isolation level, Transaction T1 will not consider the changes committed by the Transaction T2.
I think this picture can also be useful, it helps me as a reference when I want to quickly remember the differences between isolation levels (thanks to kudvenkat on youtube)
Please note that, the repeatable in repeatable read regards to a tuple, but not to the entire table. In ANSC isolation levels, phantom read anomaly can occur, which means read a table with the same where clause twice may return different return different result sets. Literally, it's not repeatable.
My observation on initial accepted solution.
Under RR (default mysql) - If a tx is open and a SELECT has been fired, another tx can NOT delete any row belonging to previous READ result set until previous tx is committed (in fact delete statement in the new tx will just hang), however the next tx can delete all rows from the table without any trouble. Btw, a next READ in previous tx will still see the old data until it is committed.