This is quite a basic and somewhat strange question, I guess. Suppose I have a stored procedure that contains an INSERT (or MERGE) statement, followed by a SELECT statement.
Can I always assume that the INSERT statement has finished writing/committing data when I run SELECT? Is it to be expected that the SELECT statement (sometimes) doesn't select all recently inserted rows? If so, what options do I have to make the SELECT statement wait for the INSERT statement to have finished (in a stored procedure) or include possibly uncommitted data?
If it is in the same session it will see it , whether committed or not, unless it has been rolled back.
Once committed other sessions can see it.
If 'select' is from different session and you want read uncommited data
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
Related
Given an UPDATE execution which takes 5 minutes or so, what happens when SELECT tries to retrieve data from the same table? For different Transaction Isolation Levels and SELECT WITH (NOLOCK), does SELECT wait for UPDATE? If not, does SELECT return old data (data before the UPDATE) or part of the currently inserted records (such as 50% of the records currently being inserted) ?
If found the following question, but it only describes what happens when you execute and UPDATE during a long SELECT.
SQL Server - does [SELECT] lock [UPDATE]?
I am using MS SQL Server 2012. Hopefully, this behaviour is consistent for different implementations.
This post by Gavin Draper explains it quite well and contains some example query's.
SQL Server Isolation Levels By Example
Isolation levels in SQL Server control the way locking works between
transactions.
SQL Server 2008 supports the following isolation levels
Read Uncommitted
Read Committed (The default)
Repeatable Read
Serializable
Snapshot
Before I run through each of these in detail you may want to create a
new database to run the examples, run the following script on the new
database to create the sample data. Note : You’ll also want to drop
the IsolationTests table and re-run this script before each example to
reset the data.
CREATE TABLE IsolationTests
(
Id INT IDENTITY,
Col1 INT,
Col2 INT,
Col3 INTupdate te
)
INSERT INTO IsolationTests(Col1,Col2,Col3)
SELECT 1,2,3
UNION ALL SELECT 1,2,3
UNION ALL SELECT 1,2,3
UNION ALL SELECT 1,2,3
UNION ALL SELECT 1,2,3
UNION ALL SELECT 1,2,3
UNION ALL SELECT 1,2,3
Also before we go any further it is important to understand these two
terms….
Dirty Reads – This is when you read uncommitted data, when doing this there is no guarantee that data read will ever be committed
meaning the data could well be bad.
Phantom Reads – This is when data that you are working with has been changed by another transaction since you first read it in.
This
means subsequent reads of this data in the same transaction could
well be different.
Read Uncommitted
This is the lowest isolation level there is. Read uncommitted causes
no shared locks to be requested which allows you to read data that is
currently being modified in other transactions. It also allows other
transactions to modify data that you are reading.
As you can probably imagine this can cause some unexpected results in
a variety of different ways. For example data returned by the select
could be in a half way state if an update was running in another
transaction causing some of your rows to come back with the updated
values and some not to.
To see read uncommitted in action lets run Query1 in one tab of
Management Studio and then quickly run Query2 in another tab before
Query1 completes.
Query1
BEGIN TRAN
UPDATE IsolationTests SET Col1 = 2
--Simulate having some intensive processing here with a wait
WAITFOR DELAY '00:00:10'
ROLLBACK
Query2
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT * FROM IsolationTests
Notice that Query2 will not wait for Query1 to finish, also more
importantly Query2 returns dirty data. Remember Query1 rolls back all
its changes however Query2 has returned the data anyway, this is
because it didn't wait for all the other transactions with exclusive
locks on this data it just returned what was there at the time.
There is a syntactic shortcut for querying data using the read
uncommitted isolation level by using the NOLOCK table hint. You
could change the above Query2 to look like this and it would do the
exact same thing.
SELECT * FROM IsolationTests WITH(NOLOCK)
Read Committed
This is the default isolation level and means selects will only return
committed data. Select statements will issue shared lock requests
against data you’re querying this causes you to wait if another
transaction already has an exclusive lock on that data. Once you have
your shared lock any other transactions trying to modify that data
will request an exclusive lock and be made to wait until your Read
Committed transaction finishes.
You can see an example of a read transaction waiting for a modify
transaction to complete before returning the data by running the
following Queries in separate tabs as you did with Read Uncommitted.
Query1
BEGIN TRAN
UPDATE Tests SET Col1 = 2
--Simulate having some intensive processing here with a wait
WAITFOR DELAY '00:00:10'
ROLLBACK
Query2
SELECT * FROM IsolationTests
Notice how Query2 waited for the first transaction to complete before
returning and also how the data returned is the data we started off
with as Query1 did a rollback. The reason no isolation level was
specified is because Read Committed is the default isolation level for
SQL Server. If you want to check what isolation level you are running
under you can run DBCC useroptions. Remember isolation levels are
Connection/Transaction specific so different queries on the same
database are often run under different isolation levels.
Repeatable Read
This is similar to Read Committed but with the additional guarantee
that if you issue the same select twice in a transaction you will get
the same results both times. It does this by holding on to the shared
locks it obtains on the records it reads until the end of the
transaction, This means any transactions that try to modify these
records are forced to wait for the read transaction to complete.
As before run Query1 then while its running run Query2
Query1
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN
SELECT * FROM IsolationTests
WAITFOR DELAY '00:00:10'
SELECT * FROM IsolationTests
ROLLBACK
Query2
UPDATE IsolationTests SET Col1 = -1
Notice that Query1 returns the same data for both selects even though
you ran a query to modify the data before the second select ran. This
is because the Update query was forced to wait for Query1 to finish
due to the exclusive locks that were opened as you specified
Repeatable Read.
If you rerun the above Queries but change Query1 to Read Committed you
will notice the two selects return different data and that Query2 does
not wait for Query1 to finish.
One last thing to know about Repeatable Read is that the data can
change between 2 queries if more records are added. Repeatable Read
guarantees records queried by a previous select will not be changed or
deleted, it does not stop new records being inserted so it is still
very possible to get Phantom Reads at this isolation level.
Serializable
This isolation level takes Repeatable Read and adds the guarantee that
no new data will be added eradicating the chance of getting Phantom
Reads. It does this by placing range locks on the queried data. This
causes any other transactions trying to modify or insert data touched
on by this transaction to wait until it has finished.
You know the drill by now run these queries side by side…
Query1
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
SELECT * FROM IsolationTests
WAITFOR DELAY '00:00:10'
SELECT * FROM IsolationTests
ROLLBACK
Query2
INSERT INTO IsolationTests(Col1,Col2,Col3)
VALUES (100,100,100)
You’ll see that the insert in Query2 waits for Query1 to complete
before it runs eradicating the chance of a phantom read. If you change
the isolation level in Query1 to repeatable read, you’ll see the
insert no longer gets blocked and the two select statements in Query1
return a different amount of rows.
Snapshot
This provides the same guarantees as serializable. So what's the
difference? Well it’s more in the way it works, using snapshot doesn't
block other queries from inserting or updating the data touched by the
snapshot transaction. Instead row versioning is used so when data is
changed the old version is kept in tempdb so existing transactions
will see the version without the change. When all transactions that
started before the changes are complete the previous row version is
removed from tempdb. This means that even if another transaction has
made changes you will always get the same results as you did the first
time in that transaction.
So on the plus side your not blocking anyone else from modifying the
data whilst you run your transaction but…. You’re using extra
resources on the SQL Server to hold multiple versions of your changes.
To use the snapshot isolation level you need to enable it on the
database by running the following command
ALTER DATABASE IsolationTests
SET ALLOW_SNAPSHOT_ISOLATION ON
If you rerun the examples from serializable but change the isolation
level to snapshot you will notice that you still get the same data
returned but Query2 no longer waits for Query1 to complete.
Summary
You should now have a good idea how each of the different isolation
levels work. You can see how the higher the level you use the less
concurrency you are offering and the more blocking you bring to the
table. You should always try to use the lowest isolation level you can
which is usually read committed.
READ UNCOMMITTED: The SELECT can read all kinds of nasty inconsistencies. Old rows, new rows, duplicate rows, missing rows. It can also totally error out with the famous "data movement" error.
READ COMMITTED: Will block without snapshot isolation. Will return the old state with snapshot isolation in perfect consistency.
REPEATABLE READ/SERIALIZABLE: Will block.
SNAPSHOT: Will return the old state with snapshot isolation in perfect consistency.
It sounds like you should read a few concurrency tutorials. I have written these brief facts to get you started. To really understand what's going on to the point that you can make predictions (that come true) you need to go deeper than an answer on Stack Overflow can provide.
Most of the time, you want to use SNAPSHOT for read-only transactions. It takes away all concurrency concerns. Be aware that it has a few drawbacks.
I have read the explanations when a commit may be neccessary after a select statement for DB2 and MySQL:
Is a commit needed on a select query in DB2?
Should I commit after a single select
My question is when and why would it be important to commit after executing a select statement using Oracle?
If you did a SELECT ... FOR UPDATE; you would need a COMMIT or ROLLBACK to release the records held for update. Otherwise, I can't think of any reason to do this.
there are only a few situations that I can think of that you may want to commit after a select.
if your select is joining on database links, a transaction will be created. if you attempt to close this link, you'd get an error unless you committed/rolled back the transaction.
select for update (as DCookie says) to release the locks.
to remove an serialized isolation level if set or to add one, if you've been selecting from db links prior to invoking this.
I've noticed that MS SQL may begin another transaction just before a previous transaction is complete (or committed). Is there a way how we can ensure a transaction must complete first before the next transaction begins?
My problem is that I want to perform an SQL SELECT almost immediately after an SQL INSERT. What I'm seeing right now is; when the SELECT statement is run; it does not return the (very) recently inserted data.
As I traced this scenario using SQL profiler, I've noticed that the SQL INSERT and SELECT performs simultaneously, as in the SELECT occurs before the INSERT is completed.
Is there a way to fix this problem of mine? thanks!
From the sounds of it, you're looking for the OUTPUT clause
From the examples in the documentation http://msdn.microsoft.com/en-us/library/ms177564.aspx
DECLARE #MyTableVar table( NewScrapReasonID smallint,
Name varchar(50),
ModifiedDate datetime);
INSERT Production.ScrapReason
OUTPUT INSERTED.ScrapReasonID, INSERTED.Name, INSERTED.ModifiedDate
INTO #MyTableVar
VALUES (N'Operator error', GETDATE());
You can run your transactions in SERIALIZABLE isolation level. In this way you will ensure that the select will be performed after the insert. In lower isolation levels, the select is performed in paralell and returns the snapshot of the data - the way it is seen with all transactions completed before the one that issues select has been started.
I'm guessing you want to get an auto-generated identifier back after the insert? I'm not sure the MSSQL way to do this, but in PostgreSQL, there is INSERT ... RETURNING extension to solve exactly this problem.
http://www.postgresql.org/docs/9.1/static/sql-insert.html
Are you locked into MSSQL?
How do I select all rows for a table, their isn't part of any transaction that hasn't committed yet?
Example:
Let's say,
Table T has 10 rows.
User A is doing a transaction with some queries:
INSERT INTO T (...)
SELECT ...
FROM T
// doing other queries
Now, here comes the tricky part:
What if User B, in the time between User A inserted the row and the transaction was committed, was updating a list in the system with a select on Table T.
I only want that the SELECT User B is using returned the 10 rows(all rows from the table, that can't later be rolled back). How do I do this, if it's even possible?
I have tried setting the isolationlevel on the transaction and adding "WITH(NOLOCK)" "WITH(READUNCOMMITTED)" to the query without any luck.
The query either return all 11 records or it's waiting for the transaction to commit, and that's not what I need.
Any tips is much appriciated, thanks.
You need to use (default) read committed isolation level and the READPAST hint to skip rows locked as they are not committed (rather than being blocked waiting for the locks to be released)
This does rely on the INSERT taking out rowlocks though. If it takes out page locks you will be back to being blocked. Example follows
Connection 1
IF OBJECT_ID('test_readpast') IS NULL
BEGIN
CREATE TABLE test_readpast(i INT PRIMARY KEY CLUSTERED)
INSERT INTO test_readpast VALUES (1)
END
BEGIN TRAN
INSERT INTO test_readpast
WITH(ROWLOCK)
--WITH(PAGLOCK)
VALUES (2)
SELECT * FROM sys.dm_tran_locks WHERE request_session_id=##SPID
WAITFOR DELAY '00:01';
ROLLBACK
Connection 2
SELECT i
FROM test_readpast WITH (readpast)
Snapshot isolation ?
Either I or else the three people who have answered early have misread/ misinterpreted your question, so I have given a link so you can determine for yourself.
Actually, read uncommitted and nolock are the same. They mean you get to see rows that have not been committed yet.
If you run at the default isolation level, read committed, you will not see new rows that have not been committed. This should work by default, but if you want to be sure, prefix your select with set transaction isolation level read committed.
I have a random question. If I were to do a sql select and while the sql server was querying my request someone else does a insert statement... could that data that was inputted in that insert statement also be retrieved from my select statement?
Queries are queued, so if the SELECT occurs before the INSERT there's no possibility of seeing the newly inserted data.
Using default isolation levels, SELECT is generally given higher privilege over others but still only reads COMMITTED data. So if the INSERT data has not been committed by the time the SELECT occurs--again, you wouldn't see the newly inserted data. If the INSERT has been committed, the subsequent SELECT will include the newly inserted data.
If the isolation level allowed reading UNCOMMITTED (AKA dirty) data, then yes--a SELECT occurring after the INSERT but before the INSERT data was committed would return that data. This is not recommended practice, because UNCOMMITTED data could be subject to a ROLLBACK.
If the SELECT statement is executed before the INSERT statement, the selected data will certainly not include the new inserted data.
What happens in MySQL with MyISAM, the default engine, is that all INSERT statements require a table lock; as a result, once an INSERT statement is executed, it first waits for all existing SELECTs to complete before locking the table, performs the INSERT, and then unlocks it.
For more information, see: Internal Locking Methods in the MySQL manual
No, a SELECT that is already executing that the moment of the INSERT will never gather new records that did not exist when the SELECT statement started executing.
Also if you use the transactional storage engine InnoDB, you can be assured that your SELECT will not include rows that are currently being inserted. That's the purpose of transaction isolation, or the "I" in ACID.
For more details see http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html because there are some nuances about read-committed and read-uncommitted transaction isolation modes.
I don't know particulars for MySQL, but in SQL Server it would depend on if there were any locking hints used, and the default behavior for locks. You have a couple of options:
Your SELECT locks the table, which means the INSERT won't process until your select is finished.
Your SELECT is able to do a "dirty read" which means the transaction doesn't care if you get slightly out-of-date data, and you miss the INSERT
Your SELECT is able to do a "dirty read" but the INSERT happens before the SELECT hits that row, and you get the result that was added.
The only way you do that is with a "dirty read".
Take a look at MYSql's documentation on TRANSACTION ISOLATION LEVELS to get a better understanding of what that is.