What's the differences between these two transaction's levels: READ WRITE and ISOLATION LEVEL SERIALIZABLE ?
As I understand, READ WRITE allows dirty reads, while ISOLATION LEVEL SERIALIZABLE prevents data from changing by other users(think that I'm mistaken here) or just read that data that is available at the beginning of the transaction(don't see the data, that has been changed by other users during my transaction).
You can find detailed information about this topic on the oracle site.
Basically READ COMMITTED allows "nonrepeatable reads" and "phantom reads", while both are prohibited in SERIALIZABLE.
If non-repeatable reads are permitted, the same SELECT query inside of the same transaction, might return different results based on when the query is issued. Other parallel transactions may change the data and this changes might become visible inside your transaction.
If phantom reads are permitted, it can happen that when you issue the same SELECT query twice inside of one transaction, and another transactions inserts rows into the table in parallel, these rows might become visible inside of your transaction, but only in the resultset of the second select. So the same select statement will return for example 5 rows the first time and 10 rows the second time it was executed.
Both properties are similar, but the first only says something about data which may change, while the scond property says something about additional rows which might be returned.
Related
In short, my professor said the following transactions are susceptible to phantom read if they're both left to default isolation level (READ COMMITTED)
BEGIN;
UPDATE product SET price = 100 WHERE type='Electronics';
COMMIT;
BEGIN;
UPDATE product SET price = 100 WHERE price < 100;
COMMIT;
I can't really seem to be able to figure how a phantom read could happen.
He also said that, to fix this, you'd have to set the second transaction to REPEATABLE READ
So... why? How could a phantom read happen here, and why does REPEATABLE READ fixes it?
EDIT: could this be the case?
Say we have an initial product P that has type=Electronics AND price=1049
T1 would begin, and add P to the set of rows to consider.
T2 would begin, and ignore P (its price is below 1050).
T1 would increment its price to 1100, and COMMITs.
Now T2 should update its rows set and include P.
But since in READ COMMITTED a transaction will get an updated snapshot only if changes are made to rows that were within the SET they are considering, the change goes unnotified.
T2, therefore, simply ignores P, and COMMITs.
This scenario was suggested by the example I found in postgresql docs,
on the isolation level page, read committed paragraph.
Do you think this is a possible scenario and, hopefully, what my professor meant?
A phantom read means that if you run the same SELECT twice in a transaction, the second one could get different results than the first.
In the words of the SQL standard:
SQL-transaction T1 reads the set of rows N that satisfy some <search condition>.
SQL-transaction T2 then executes SQL-statements that generate one or more rows that satisfy the <search condition> used by SQL-transaction T1.
If SQL-transaction T1 then repeats the initial read with the same
<search condition>, it obtains a different collection of rows.
This can be caused by concurrent data modifications like the ones you quote at low isolation levels. This is because each query will see all committed data, even if they were committed after the transaction started.
You could also speak of phantom reads in the context of an UPDATE statement, since it also reads from the table. Then the same UPDATE can affect different statements if it is run twice.
However, it makes no sense to speak of phantom reads in the context of the two statements in your question: The second one modifies the column it is searching for, so the second execution will read different rows, no matter if there are concurrent data modifications or not.
Note: The SQL standard does not require that REPEATABLE READ transactions prevent phantom reads — this is only guaranteed with SERIALIZABLE isolation.
In PostgreSQL phantom reads are already impossible at REPEATABLE READ isolation, because it uses snapshots that guarantee a stable view of the database.
This might help
https://en.wikipedia.org/wiki/Isolation_(database_systems)
Non-repeatable reads A non-repeatable read occurs, when during the
course of a transaction, a row is retrieved twice and the values
within the row differ between reads.
Non-repeatable reads phenomenon may occur in a lock-based concurrency
control method when read locks are not acquired when performing a
SELECT, or when the acquired locks on affected rows are released as
soon as the SELECT operation is performed. Under the multiversion
concurrency control method, non-repeatable reads may occur when the
requirement that a transaction affected by a commit conflict must roll
back is relaxed.
A phantom read occurs when, in the course of a transaction, new rows
are added or removed by another transaction to the records being read.
This can occur when range locks are not acquired on performing a
SELECT ... WHERE operation. The phantom reads anomaly is a special
case of Non-repeatable reads when Transaction 1 repeats a ranged
SELECT ... WHERE query and, between both operations, Transaction 2
creates (i.e. INSERT) new rows (in the target table) which fulfil that
WHERE clause.
Consider the following query executed in PostgreSQL 9.1 (or 9.2):
SELECT * FROM foo WHERE bar = true
Suppose it's a fairly long running query (e.g. taking a minute).
If at the start of the query there are 5 million records for which bar = true holds, and during this query in another transaction there are rows added and removed in the foo table, and for some existing rows updates are made to the bar field.
Will any of this affect the outcome of the above shown select query?
I know about transaction-isolation and visibility between separate statements in a single transaction, but what about a single statement that's running?
No.
Due to the MVCC model only tuples that are visible at query start will be used in a single SELECT. Details in the manual here:
Read Committed is the default isolation level in PostgreSQL. When a
transaction uses this isolation level, a SELECT query (without a FOR
UPDATE/SHARE clause) sees only data committed before the query began;
it never sees either uncommitted data or changes committed during
query execution by concurrent transactions. In effect, a SELECT query
sees a snapshot of the database as of the instant the query begins to
run. However, SELECT does see the effects of previous updates executed
within its own transaction, even though they are not yet committed.
Also note that two successive SELECT commands can see different data,
even though they are within a single transaction, if other
transactions commit changes during execution of the first SELECT.
Emphasis mine.
The query will be a read-consistent view of the data as of the start of the query. In Postgresql, the documentation on Multi-Version Concurrency Control (MVCC) explains how it is done (multiple versions of a record exist in the table). In Oracle, the Sequence Change Number (SCN) is used along with "before-images" of modified data. Here is an old doc, Transaction Processing in Postgresql, with the section "non-overwriting storage management". But take a look at MVCC.
Or read the chapter on MVCC in the Postgresql doc
Below is the statement written from Wikipedia's Isolation article about REPEATABLE READS
In this isolation level, a lock-based concurrency control DBMS implementation keeps read and write locks (acquired on selected data)
until the end of the transaction. However, range-locks are not managed, so the phantom reads phenomenon can occur (see below).
My question here is when does the the transaction begin and end respectively.
If we take the example of Non-repeatable reads with REPEATABLE READS Isolation level at the same link , as per my understanding trnsaction 1 begin
when first query is fired i.e SELECT * FROM users WHERE id = 1. DBMS will keep the lock on the users table until and unless transaction gets end.
here By end I mean is when connection gets rolledback or commited not on the completion of SELECT * FROM users WHERE id = 1. Till that time
Transaction 2 will wait Right?
Question 2 :- Now if we consider the isolation level and thier behaviour as given below (at the same link)
Isolation level Dirty reads Non-repeatable Phantoms
Read Uncommitted may occur may occur may occur
Read Committed - may occur may occur
Repeatable Read - may occur -
Serializable - - -
As per my understanding Most reliable is Serializable then Repeatable Read and then Read Committed but still i have seen aplications using Read Committed. Is that because
of performance of Serializable and Repeatable Read is bad in comparison to Read Committed because in serializable it will be sequential and in case
of transaction has to wait for release of lock by another transaction. Right? So to get best of all three we can use isolation
level as Read Committed with SELECT FOR UPDATE (to achieve repeatable read).Not sure how we can achieve phantom read if we want , in case of read commited
isolation level?
Oracle does not support the REPEATABLE READ isolation level. However, SQL Server does - and it does place locks on all rows selected by the transaction until it ends (ie: it's committed or rolled back). So you are correct, this will indeed make other transactions wait (if they are updating the locked data) and can be detrimental to concurrency.
As for question 2: Yes, the higher the isolation level, the worse your concurrent transactions will perform because they have to wait for more locks to be released. I am not sure what you mean by "getting the best of all three" by using SELECT FOR UPDATE because SELECT FOR UPDATE will place row locks on all selected rows.
And finally, here's a quote from Oracle's manual on phantom reads:
[phantom reads occur when] a transaction reruns a query returning a set of rows that satisfies a search condition and finds that another committed transaction has inserted additional rows that satisfy the condition.
For example, a transaction queries the number of employees. Five minutes later it performs the same query, but now the number has increased by one because another user inserted a record for a new hire. More data satisfies the query criteria than before, but unlike in a fuzzy read the previously read data is unchanged.
Reference:
Data Concurrency and Consistency (Oracle)
SET TRANSACTION LEVEL (SQL Server)
From time to time, I want to run a stored procedure to get a rough estimate of how many records in two or three different tables satisfy some criteria. If during this estimate new records are added, deleted or updated, there is not really a problem (I just want a rough estimate). That being, I can afford for this process using SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED. However, I have two questions about this:
1) Since I am only using SELECT COUNT(*) instructions, do I really need to wrap these statement in a BEGIN/COMMIT TRANSACTION block?
2) Do I need to SET TRANSACTION ISOLATION LEVEL READ COMMITTED back in the end of the stored procedure, or will this be automatically set once its execution ends?
No. Reads don't need to be in a transaction
The SET is scoped only for the stored procedure. See my answer here: Is it okay if from within one stored procedure I call another one that sets a lower transaction isolation level?. However, you'd use the NOLOCK hint rather then SET: SELECT COUNT(*) FROM myTable WITH (NOLOCK).
If you want an approximate count without WHERE filters, then use sys.dm_db_partition_stats. See my answer here: Fastest way to count exact number of rows in a very large table?
1) No. It uses implicit transaction for each statement if you do not specify the transaction scope. You do not have to put explicit transaction scope for making 'set transaction isolation level to work'.
2) You don't have to reset it to original. It will be taken care by SQL Server. Please refer to this SO entry: Transaction Isolation Level Scopes
I have a SQL Server query (using the LLBL ORM, if that is important to the question) that is doing a large fetch on a set of related tables.
This query is being performed in a transaction with the Isolation Level of Repeatable Read, and is being filtered by two 'state' columns in the main table of the query.
Will the records being 'write locked' only be those that match the filter in the main table, or will all records be effectively write locked until the fetch has completed? I am guessing the later will be required to ensure no new records are added to the result set during the transaction.
to ensure no new records are added to
the result set during the transaction
This requires Serializable isolation level. Repeatable Read only ensures that the rows read will be able to read again later in the transaction, but does not prevent a concurrent transaction from inserting new rows and those new rows will be visible, in the original transaction, after they are committed by the insert. Under Serializable Read the locks will extend to ranges, so no new records that satisfy the filter will appear. Depending on the table schema (indexes available) the restriction may extend to the entire table(s).
You should give some serious consideration to doing everything under Snapshot Isolation too, this resolves almost all known anomalies, but is more expensive on resource, see Row Versioning Resource Usage.