I have a SQL Server query (using the LLBL ORM, if that is important to the question) that is doing a large fetch on a set of related tables.
This query is being performed in a transaction with the Isolation Level of Repeatable Read, and is being filtered by two 'state' columns in the main table of the query.
Will the records being 'write locked' only be those that match the filter in the main table, or will all records be effectively write locked until the fetch has completed? I am guessing the later will be required to ensure no new records are added to the result set during the transaction.
to ensure no new records are added to
the result set during the transaction
This requires Serializable isolation level. Repeatable Read only ensures that the rows read will be able to read again later in the transaction, but does not prevent a concurrent transaction from inserting new rows and those new rows will be visible, in the original transaction, after they are committed by the insert. Under Serializable Read the locks will extend to ranges, so no new records that satisfy the filter will appear. Depending on the table schema (indexes available) the restriction may extend to the entire table(s).
You should give some serious consideration to doing everything under Snapshot Isolation too, this resolves almost all known anomalies, but is more expensive on resource, see Row Versioning Resource Usage.
Related
I have a program I wrote for a group that is constantly writing, reading and deleting per session per user to our SQL server. I do not need to worry about what is being written or deleted as all the data being written/deleted by an individual will never be needed by another. Each users writes are separated by a unique ID and all queries are based on that unique ID.
I want to be able to write/delete rows from the same table by multiple users at the same time. I know I can set up the session to be able to read while data is being written using SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED.
The question:
That said can I have multiple users write/delete from the same table at the same time?
Currently my only idea if this is not possible is to set up the tool to write to temp tables per user session. But I don't think that is an efficient option to constantly create and delete temp tables hundreds of times a day.
Yes you can make this multi tenant approach work fine.
Ensure leading column of all indexes is UserId so a query for one user never needs to scan rows belonging to a different user.
Ensure all queries have an equality predicate on UserId and verify execution plans to ensure that they are seeking on it.
Ensure no use of serializable isolation level as this can take range locks affecting adjacent users.
Ensure that row locking is not disabled on all indexes and restrict DML operations to <= 5,000 rows (to prevent lock escalation)
Consider using read committed snapshot isolation for your reading queries
In short, my professor said the following transactions are susceptible to phantom read if they're both left to default isolation level (READ COMMITTED)
BEGIN;
UPDATE product SET price = 100 WHERE type='Electronics';
COMMIT;
BEGIN;
UPDATE product SET price = 100 WHERE price < 100;
COMMIT;
I can't really seem to be able to figure how a phantom read could happen.
He also said that, to fix this, you'd have to set the second transaction to REPEATABLE READ
So... why? How could a phantom read happen here, and why does REPEATABLE READ fixes it?
EDIT: could this be the case?
Say we have an initial product P that has type=Electronics AND price=1049
T1 would begin, and add P to the set of rows to consider.
T2 would begin, and ignore P (its price is below 1050).
T1 would increment its price to 1100, and COMMITs.
Now T2 should update its rows set and include P.
But since in READ COMMITTED a transaction will get an updated snapshot only if changes are made to rows that were within the SET they are considering, the change goes unnotified.
T2, therefore, simply ignores P, and COMMITs.
This scenario was suggested by the example I found in postgresql docs,
on the isolation level page, read committed paragraph.
Do you think this is a possible scenario and, hopefully, what my professor meant?
A phantom read means that if you run the same SELECT twice in a transaction, the second one could get different results than the first.
In the words of the SQL standard:
SQL-transaction T1 reads the set of rows N that satisfy some <search condition>.
SQL-transaction T2 then executes SQL-statements that generate one or more rows that satisfy the <search condition> used by SQL-transaction T1.
If SQL-transaction T1 then repeats the initial read with the same
<search condition>, it obtains a different collection of rows.
This can be caused by concurrent data modifications like the ones you quote at low isolation levels. This is because each query will see all committed data, even if they were committed after the transaction started.
You could also speak of phantom reads in the context of an UPDATE statement, since it also reads from the table. Then the same UPDATE can affect different statements if it is run twice.
However, it makes no sense to speak of phantom reads in the context of the two statements in your question: The second one modifies the column it is searching for, so the second execution will read different rows, no matter if there are concurrent data modifications or not.
Note: The SQL standard does not require that REPEATABLE READ transactions prevent phantom reads — this is only guaranteed with SERIALIZABLE isolation.
In PostgreSQL phantom reads are already impossible at REPEATABLE READ isolation, because it uses snapshots that guarantee a stable view of the database.
This might help
https://en.wikipedia.org/wiki/Isolation_(database_systems)
Non-repeatable reads A non-repeatable read occurs, when during the
course of a transaction, a row is retrieved twice and the values
within the row differ between reads.
Non-repeatable reads phenomenon may occur in a lock-based concurrency
control method when read locks are not acquired when performing a
SELECT, or when the acquired locks on affected rows are released as
soon as the SELECT operation is performed. Under the multiversion
concurrency control method, non-repeatable reads may occur when the
requirement that a transaction affected by a commit conflict must roll
back is relaxed.
A phantom read occurs when, in the course of a transaction, new rows
are added or removed by another transaction to the records being read.
This can occur when range locks are not acquired on performing a
SELECT ... WHERE operation. The phantom reads anomaly is a special
case of Non-repeatable reads when Transaction 1 repeats a ranged
SELECT ... WHERE query and, between both operations, Transaction 2
creates (i.e. INSERT) new rows (in the target table) which fulfil that
WHERE clause.
What's the differences between these two transaction's levels: READ WRITE and ISOLATION LEVEL SERIALIZABLE ?
As I understand, READ WRITE allows dirty reads, while ISOLATION LEVEL SERIALIZABLE prevents data from changing by other users(think that I'm mistaken here) or just read that data that is available at the beginning of the transaction(don't see the data, that has been changed by other users during my transaction).
You can find detailed information about this topic on the oracle site.
Basically READ COMMITTED allows "nonrepeatable reads" and "phantom reads", while both are prohibited in SERIALIZABLE.
If non-repeatable reads are permitted, the same SELECT query inside of the same transaction, might return different results based on when the query is issued. Other parallel transactions may change the data and this changes might become visible inside your transaction.
If phantom reads are permitted, it can happen that when you issue the same SELECT query twice inside of one transaction, and another transactions inserts rows into the table in parallel, these rows might become visible inside of your transaction, but only in the resultset of the second select. So the same select statement will return for example 5 rows the first time and 10 rows the second time it was executed.
Both properties are similar, but the first only says something about data which may change, while the scond property says something about additional rows which might be returned.
Consider the following query executed in PostgreSQL 9.1 (or 9.2):
SELECT * FROM foo WHERE bar = true
Suppose it's a fairly long running query (e.g. taking a minute).
If at the start of the query there are 5 million records for which bar = true holds, and during this query in another transaction there are rows added and removed in the foo table, and for some existing rows updates are made to the bar field.
Will any of this affect the outcome of the above shown select query?
I know about transaction-isolation and visibility between separate statements in a single transaction, but what about a single statement that's running?
No.
Due to the MVCC model only tuples that are visible at query start will be used in a single SELECT. Details in the manual here:
Read Committed is the default isolation level in PostgreSQL. When a
transaction uses this isolation level, a SELECT query (without a FOR
UPDATE/SHARE clause) sees only data committed before the query began;
it never sees either uncommitted data or changes committed during
query execution by concurrent transactions. In effect, a SELECT query
sees a snapshot of the database as of the instant the query begins to
run. However, SELECT does see the effects of previous updates executed
within its own transaction, even though they are not yet committed.
Also note that two successive SELECT commands can see different data,
even though they are within a single transaction, if other
transactions commit changes during execution of the first SELECT.
Emphasis mine.
The query will be a read-consistent view of the data as of the start of the query. In Postgresql, the documentation on Multi-Version Concurrency Control (MVCC) explains how it is done (multiple versions of a record exist in the table). In Oracle, the Sequence Change Number (SCN) is used along with "before-images" of modified data. Here is an old doc, Transaction Processing in Postgresql, with the section "non-overwriting storage management". But take a look at MVCC.
Or read the chapter on MVCC in the Postgresql doc
Below is the statement written from Wikipedia's Isolation article about REPEATABLE READS
In this isolation level, a lock-based concurrency control DBMS implementation keeps read and write locks (acquired on selected data)
until the end of the transaction. However, range-locks are not managed, so the phantom reads phenomenon can occur (see below).
My question here is when does the the transaction begin and end respectively.
If we take the example of Non-repeatable reads with REPEATABLE READS Isolation level at the same link , as per my understanding trnsaction 1 begin
when first query is fired i.e SELECT * FROM users WHERE id = 1. DBMS will keep the lock on the users table until and unless transaction gets end.
here By end I mean is when connection gets rolledback or commited not on the completion of SELECT * FROM users WHERE id = 1. Till that time
Transaction 2 will wait Right?
Question 2 :- Now if we consider the isolation level and thier behaviour as given below (at the same link)
Isolation level Dirty reads Non-repeatable Phantoms
Read Uncommitted may occur may occur may occur
Read Committed - may occur may occur
Repeatable Read - may occur -
Serializable - - -
As per my understanding Most reliable is Serializable then Repeatable Read and then Read Committed but still i have seen aplications using Read Committed. Is that because
of performance of Serializable and Repeatable Read is bad in comparison to Read Committed because in serializable it will be sequential and in case
of transaction has to wait for release of lock by another transaction. Right? So to get best of all three we can use isolation
level as Read Committed with SELECT FOR UPDATE (to achieve repeatable read).Not sure how we can achieve phantom read if we want , in case of read commited
isolation level?
Oracle does not support the REPEATABLE READ isolation level. However, SQL Server does - and it does place locks on all rows selected by the transaction until it ends (ie: it's committed or rolled back). So you are correct, this will indeed make other transactions wait (if they are updating the locked data) and can be detrimental to concurrency.
As for question 2: Yes, the higher the isolation level, the worse your concurrent transactions will perform because they have to wait for more locks to be released. I am not sure what you mean by "getting the best of all three" by using SELECT FOR UPDATE because SELECT FOR UPDATE will place row locks on all selected rows.
And finally, here's a quote from Oracle's manual on phantom reads:
[phantom reads occur when] a transaction reruns a query returning a set of rows that satisfies a search condition and finds that another committed transaction has inserted additional rows that satisfy the condition.
For example, a transaction queries the number of employees. Five minutes later it performs the same query, but now the number has increased by one because another user inserted a record for a new hire. More data satisfies the query criteria than before, but unlike in a fuzzy read the previously read data is unchanged.
Reference:
Data Concurrency and Consistency (Oracle)
SET TRANSACTION LEVEL (SQL Server)