Check if table data has changed? - sql

I am pulling the data from several tables and then passing the data to a long running process. I would like to be able to record what data was used for the process and then query the database to check if any of the tables have changed since the process was last run.
Is there a method of solving this problem that should work across all sql databases?
One possible solution that I've thought of is having a separate table that is only used for keeping track of whether the data has changed since the process was run. The table contains a "stale" flag. When I start running the process, stale is set to false. If any creation, update, or deletion occurs in any of the tables on which the operation depends, I set stale to true. Is this a valid solution? Are there better solutions?
One concern with my solution is situations like this:
One user starts inserting a new row into one of the tables. Stale gets set to true, but the new row has not actually been added yet. Another user has simultaneously started the long running process, pulling the data from the tables and setting the flag to false. The row is finally added. Now the data used for the process is out of date but the flag indicates it is not stale. Would transactions be able to solve this problem?
This is some SQL for my idea. Not sure if it works, but just to give you a better idea of what I was thinking:
# First transaction reads the data and sets the flag to false
UPDATE flag SET stale = false
# Second transaction updates the data and sets the flag to true
UPDATE data SET val = 15 WHERE ID = 10
UPDATE flag SET stale = true
I do not have much experience with transactions or handwriting xml, so there are probably issues with this. From what I understand two serializable transactions can not be interleaved. Please correct me if I'm wrong.
Is there a way to accomplish this with only the first transaction? The process will be run rarely, but the updates to the data table will occur more frequently, so it would be nice to not lock up the data table when performing updates.
Also, is the SET TRANSACTION ISOLATION syntax specific to MS?

The stale flag will probably work, but a timestamp would be better since it provides more metadata about the age of the records which could be used to tune your queries, e.g., only pull data that is over 5 minutes old.
To address your concern about inserting a row at the same time a query is run, transactions with an appropriate isolation level will help. For row inserts, updates, and selects, at least use a transaction with an isolation level that prevents dirty reads so that no other connections can see the updated data until the transaction is committed.
If you are strongly concerned about the case where an update happens at the same time as a record pull, you could use the REPEATABLE READ or even SERIALIZABLE isolation levels, but this will slow DB access down.
Your SQLServer sampled should work. For alternate databases, Here's an example that works in PostGres:
Transaction 1
-- run queries that update the tables, then set last_updated column
UPDATE sometable SET last_updatee = now() WHERE id = 1;;
Transaction 2
-- select data from tables, then set last_queried column
UPDATE sometable SET last_queried = now() WHERE id = 1;
If transaction 1 starts, and then transaction 2 starts before transaction 1 has completed, transaction 2 will block during on the update, and then will throw an error when transaction 1 is committed. If transaction 2 starts first, and transaction 1 starts before that has finished, then transaction 1 will error. Your application code or process should be able to handle those errors.
Other databases use similar syntax - MySQL (with InnoDB plugin) requires you to set the isolation level before you start the transaction.


Unexpected behaviour of the Serializable isolation level

Test setup
I have a SQL Server 2014 and a simple table MyTable that contains columns Code (int) and Data (nvarchar(50)), no indexes created for this table.
I have 4 records in the table in the following manner:
1, First
2, Second
3, Third
4, Fourth
Then I run the following query in a transaction:
WHERE dbo.MyTable.Code = 2
I have one affected row and I don't issue either Commit or Rollback.
Next I start yet another transaction:
SELECT TOP 10 Code, Data
FROM dbo.MyTable
WHERE Code = 3
At this step the transaction with the SELECT query hangs waiting for completion of the transaction with the DELETE query.
My question
I don't understand why the transaction with SELECT query is waiting for the transaction with the DELETE query. In my understanding the deleted row (with code 2) has nothing to do with the selected row (with code 3) and as far as I understand the specific of isolation level SERIALIZABLE SQL Server shouldn't lock entire table in this case. Maybe this happens because the minimal locking amount for SERIALIZABLE is a page? Then it could produce an inconsistent behavior for selecting rows from some other pages if the table would have more rows, say 1000000 (some rows from other pages wouldn't be locked then). Please help to figure out why the locking takes place in my case.
Under locking READ COMMITTED, REPEATABLE READ, or SERIALIZABLE a SELECT query must place Shared (S) locks for every row the query plan actually reads. The locks can be placed either at the row-level, page-level, or table-level. Additionally SERIALIZABLE will place locks on ranges, so that no other session could insert a matching row while the lock is held.
And because you have "no indexes created for this table", this query:
SELECT TOP 10 Code, Data
FROM dbo.MyTable
WHERE Code = 3
Has to be executed with a table scan, and it must read all the rows (even those with Code=2) to determine whether they qualify for the SELECT.
This is one reason why you should almost always use Row-Versioning, either by setting the database to READ COMMITTED SNAPSHOT, or by coding read-only transactions to use SNAPSHOT isolation.

How to implement Serializable Isolation Level in SQL Server

I need to implement a serializable isolation level in SQL Server but I've tried many ways and I don't get it.
I need to lock 1 row in one transaction (It doesn´t matter if lock the complete table). So, another transaction can´t even select the row (don´t read).
The last thing I tried:
For transaction 1:
SELECT code FROM table1 WHERE code = 1
-- Here I select in another instance the same row
For transaction 2:
SELECT code FROM table1 WHERE code = 1
I would expect that transaction 2 wait until transaction 1 commit the operation, but the transaction 2 gives me the row.
Anyone can explain me if I miss something?
SQL Server conforms to the strict definition of a Serializable query. That is, there must be a result that can logically be generated IF both queries ran in serial order - Transaction 1 finishing before Transaction 2 can start, or vice versa.
This results in some effects that can be different than you would expect. There is a great explanation of the Serializable isolation level over at that makes clear some of what this logical serializability ends up meaning. (Very helpful site, that one.)
For your above queries, there is no logical requirement to prevent the second query from reading the same row as the first query. No matter in what order the queries are run, they will both return the same data without modifying it. Since the Query Analyzer can identify this, there is no reason to place a read lock on the data. However, if one of the queries performed an update on the data, then (warning - logic assumption here, since I don't actually know the internals of how SQL Server handles this) the QA would set a stronger lock on the selected rows.
TL;DR - SQL Server wants to minimize blocking, so it uses logical analysis to see what types of locks are needed for a serializable isolation level, and it (tries to) use the minimum number and strength of locks needed to achieve its goal.
Now that we've dealt with that - there are only two ways that I can think of to lock a row so that no one else can read it: using XLOCK + TABLOCK (locking the whole table - not a recommended practice) or having some form of a field on each row that is updated when you start your process - something like an SPID field, or a bit flag for Locked. When you update it within your transaction, only SELECTs with NOLOCK hints will be able to read it.
Clearly, neither of these are optimal. I recommend the "This row is busy - go away" flag, as that's probably the approach I would take for an (almost) absolute lock on a row.
According to the documentation:
SERIALIZABLE Specifies the following:
Statements cannot read data that has been modified but not yet committed by other transactions.
No other transactions can modify data that has been read by the current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the
current transaction until the current transaction completes.
If you're not making any changes to data with an INSERT, UPDATE, or DELETE inside transaction 1, SQL will release the Shared Lock after the read completes.
What you might want to try is adding a table hit to prevent the row lock from being released until the end of transaction 1.
WHERE code = 1
Maybe you can solve this with some hack like this?
UPDATE someTableForThisHack set val = CASE WHEN val = 1 THEN 0 else 1 End
SELECT code from table1.....
So you create a table someTableForThisHack and insert one row to it.

Row locks on select

I have a stored procedure which does the following:
selects top N from table
sets these rows as processed
returns these rows to the client
Here is roughly how I am doing it in Sybase ASE:
set rowcount #count
begin tran get_items
insert into #temp_table
select item
from available_item
where is_processed = 0
update available_item
set is_processed = 1
from available_item, #temp_table
where available_item.item = #temp_table.item
# select the processed items...
commit trans
I am wondering whether there is a race condition here. If two separate processes execute this stored procedure at the same time, could they select and mark processed the same data? Or does having it in a transaction stop this?
If not, is there a way to hold locks on selected rows?
Some of the details will depend on your tables locking scheme. Allpages, pages and row level locking will have different impacts on your ability to run concurrent updates on a single table. I am assuming a page/row level scheme to allow for concurrency.
Your query will grab an initial shared page/row lock, which will be upgraded to an update lock, which will then be followed by an exclusive row lock on the updated pages/rows. No other processes will be able to make changes to the selected pages/rows for the duration of the transaction, but another process could read the selected rows prior to your update, which could lead to some inconsistency.
To get around this possibility, you can specify the isolation level in the transaction to either isolation level 2 (repeatable reads), or isolation level 3 (serialization). You may want to read up on the specifics of each level to decide which you want to enforce, and the trade-offs associate with it.
In your transaction, you would use it like this:
set rowcount #count
set transaction isolation level 2
Something to note, is that depending on the number of records you grab in your query, you could trigger a lock upgrade which could prevent your concurrent transactions from executing, even if they are not looking at the same rows as your first transaction. By default, the server will attempt to escalate to a table lock if it acquires locks on more than 200 pages/rows. This can be changed either to an explicit value or a range of values and percentage, and is configurable at the server, database or table level.
Relevant Documentation:
Transaction: Maintaining Data Consistency and Recovery
Performance and Tuning Series: Locking and Concurrency Control
Transact-SQL Users Guide 15.7 > Transactions: Maintaining Data Consistency and Recovery

SELECT during a lengthy UPDATE - What happens to SELECT for different Transaction Isolation Levels and SELECT WITH (NOLOCK)?

Given an UPDATE execution which takes 5 minutes or so, what happens when SELECT tries to retrieve data from the same table? For different Transaction Isolation Levels and SELECT WITH (NOLOCK), does SELECT wait for UPDATE? If not, does SELECT return old data (data before the UPDATE) or part of the currently inserted records (such as 50% of the records currently being inserted) ?
If found the following question, but it only describes what happens when you execute and UPDATE during a long SELECT.
SQL Server - does [SELECT] lock [UPDATE]?
I am using MS SQL Server 2012. Hopefully, this behaviour is consistent for different implementations.
This post by Gavin Draper explains it quite well and contains some example query's.
SQL Server Isolation Levels By Example
Isolation levels in SQL Server control the way locking works between
SQL Server 2008 supports the following isolation levels
Read Uncommitted
Read Committed (The default)
Repeatable Read
Before I run through each of these in detail you may want to create a
new database to run the examples, run the following script on the new
database to create the sample data. Note : You’ll also want to drop
the IsolationTests table and re-run this script before each example to
reset the data.
CREATE TABLE IsolationTests
Col1 INT,
Col2 INT,
Col3 INTupdate te
INSERT INTO IsolationTests(Col1,Col2,Col3)
SELECT 1,2,3
Also before we go any further it is important to understand these two
Dirty Reads – This is when you read uncommitted data, when doing this there is no guarantee that data read will ever be committed
meaning the data could well be bad.
Phantom Reads – This is when data that you are working with has been changed by another transaction since you first read it in.
means subsequent reads of this data in the same transaction could
well be different.
Read Uncommitted
This is the lowest isolation level there is. Read uncommitted causes
no shared locks to be requested which allows you to read data that is
currently being modified in other transactions. It also allows other
transactions to modify data that you are reading.
As you can probably imagine this can cause some unexpected results in
a variety of different ways. For example data returned by the select
could be in a half way state if an update was running in another
transaction causing some of your rows to come back with the updated
values and some not to.
To see read uncommitted in action lets run Query1 in one tab of
Management Studio and then quickly run Query2 in another tab before
Query1 completes.
UPDATE IsolationTests SET Col1 = 2
--Simulate having some intensive processing here with a wait
WAITFOR DELAY '00:00:10'
SELECT * FROM IsolationTests
Notice that Query2 will not wait for Query1 to finish, also more
importantly Query2 returns dirty data. Remember Query1 rolls back all
its changes however Query2 has returned the data anyway, this is
because it didn't wait for all the other transactions with exclusive
locks on this data it just returned what was there at the time.
There is a syntactic shortcut for querying data using the read
uncommitted isolation level by using the NOLOCK table hint. You
could change the above Query2 to look like this and it would do the
exact same thing.
Read Committed
This is the default isolation level and means selects will only return
committed data. Select statements will issue shared lock requests
against data you’re querying this causes you to wait if another
transaction already has an exclusive lock on that data. Once you have
your shared lock any other transactions trying to modify that data
will request an exclusive lock and be made to wait until your Read
Committed transaction finishes.
You can see an example of a read transaction waiting for a modify
transaction to complete before returning the data by running the
following Queries in separate tabs as you did with Read Uncommitted.
UPDATE Tests SET Col1 = 2
--Simulate having some intensive processing here with a wait
WAITFOR DELAY '00:00:10'
SELECT * FROM IsolationTests
Notice how Query2 waited for the first transaction to complete before
returning and also how the data returned is the data we started off
with as Query1 did a rollback. The reason no isolation level was
specified is because Read Committed is the default isolation level for
SQL Server. If you want to check what isolation level you are running
under you can run DBCC useroptions. Remember isolation levels are
Connection/Transaction specific so different queries on the same
database are often run under different isolation levels.
Repeatable Read
This is similar to Read Committed but with the additional guarantee
that if you issue the same select twice in a transaction you will get
the same results both times. It does this by holding on to the shared
locks it obtains on the records it reads until the end of the
transaction, This means any transactions that try to modify these
records are forced to wait for the read transaction to complete.
As before run Query1 then while its running run Query2
SELECT * FROM IsolationTests
WAITFOR DELAY '00:00:10'
SELECT * FROM IsolationTests
UPDATE IsolationTests SET Col1 = -1
Notice that Query1 returns the same data for both selects even though
you ran a query to modify the data before the second select ran. This
is because the Update query was forced to wait for Query1 to finish
due to the exclusive locks that were opened as you specified
Repeatable Read.
If you rerun the above Queries but change Query1 to Read Committed you
will notice the two selects return different data and that Query2 does
not wait for Query1 to finish.
One last thing to know about Repeatable Read is that the data can
change between 2 queries if more records are added. Repeatable Read
guarantees records queried by a previous select will not be changed or
deleted, it does not stop new records being inserted so it is still
very possible to get Phantom Reads at this isolation level.
This isolation level takes Repeatable Read and adds the guarantee that
no new data will be added eradicating the chance of getting Phantom
Reads. It does this by placing range locks on the queried data. This
causes any other transactions trying to modify or insert data touched
on by this transaction to wait until it has finished.
You know the drill by now run these queries side by side…
SELECT * FROM IsolationTests
WAITFOR DELAY '00:00:10'
SELECT * FROM IsolationTests
INSERT INTO IsolationTests(Col1,Col2,Col3)
VALUES (100,100,100)
You’ll see that the insert in Query2 waits for Query1 to complete
before it runs eradicating the chance of a phantom read. If you change
the isolation level in Query1 to repeatable read, you’ll see the
insert no longer gets blocked and the two select statements in Query1
return a different amount of rows.
This provides the same guarantees as serializable. So what's the
difference? Well it’s more in the way it works, using snapshot doesn't
block other queries from inserting or updating the data touched by the
snapshot transaction. Instead row versioning is used so when data is
changed the old version is kept in tempdb so existing transactions
will see the version without the change. When all transactions that
started before the changes are complete the previous row version is
removed from tempdb. This means that even if another transaction has
made changes you will always get the same results as you did the first
time in that transaction.
So on the plus side your not blocking anyone else from modifying the
data whilst you run your transaction but…. You’re using extra
resources on the SQL Server to hold multiple versions of your changes.
To use the snapshot isolation level you need to enable it on the
database by running the following command
If you rerun the examples from serializable but change the isolation
level to snapshot you will notice that you still get the same data
returned but Query2 no longer waits for Query1 to complete.
You should now have a good idea how each of the different isolation
levels work. You can see how the higher the level you use the less
concurrency you are offering and the more blocking you bring to the
table. You should always try to use the lowest isolation level you can
which is usually read committed.
READ UNCOMMITTED: The SELECT can read all kinds of nasty inconsistencies. Old rows, new rows, duplicate rows, missing rows. It can also totally error out with the famous "data movement" error.
READ COMMITTED: Will block without snapshot isolation. Will return the old state with snapshot isolation in perfect consistency.
SNAPSHOT: Will return the old state with snapshot isolation in perfect consistency.
It sounds like you should read a few concurrency tutorials. I have written these brief facts to get you started. To really understand what's going on to the point that you can make predictions (that come true) you need to go deeper than an answer on Stack Overflow can provide.
Most of the time, you want to use SNAPSHOT for read-only transactions. It takes away all concurrency concerns. Be aware that it has a few drawbacks.

Does a SQL UPDATE operation read data to "local memory"?

This answer quotes this Technet article which explains the two interpretations of lost updates:
A lost update can be interpreted in one of two ways. In the first scenario, a lost update is considered to have taken place when data that has been updated by one transaction is overwritten by another transaction, before the first transaction is either committed or rolled back. This type of lost update cannot occur in SQL Server 2005 because it is not allowed under any transaction isolation level.
The other interpretation of a lost update is when one transaction (Transaction #1) reads data into its local memory, and then another transaction (Transaction #2) changes this data and commits its change. After this, Transaction #1 updates the same data based on what it read into memory before Transaction #2 was executed. In this case, the update performed by Transaction #2 can be considered a lost update.
So it looks like the difference is that in the first scenario the whole update happens out of "local memory" while in the second one there's "local memory" used and this makes a difference.
Suppose I have the following code:
UPDATE MagicTable SET MagicColumn = MagicColumn + 10 WHERE SomeCondition
Does this involve "local memory"? Is it prone to the first or to the second interpretation of lost updates?
I suppose it would come under the second interpretation.
However the way this type of UPDATE is implemented in SQL Server a lost update is still not possible. Rows read for the update are protected with a U lock (converted to X lock when the row is actually updated).
U locks are not compatible with other U locks (or X locks)
So at all isolation levels if two concurrent transactions were to run this statement then one of them would end up blocked behind the other transaction's U lock or X lock and would not be able to proceed until that transaction completes.
Therefore it is not possible for lost updates to occur with this pattern in SQL Server at any isolation level.
To achieve a lost update you would need to do something like
DECLARE #MagicColumn INT;
/*Two concurrent transactions can both read the same pre-update value*/
SELECT #MagicColumn = MagicColumn FROM MagicTable WHERE SomeCondition
UPDATE MagicTable SET MagicColumn = #MagicColumn + 10 WHERE SomeCondition