I have a stored procedure which does the following:
selects top N from table
sets these rows as processed
returns these rows to the client
Here is roughly how I am doing it in Sybase ASE:
set rowcount #count
begin tran get_items
insert into #temp_table
select item
from available_item
where is_processed = 0
update available_item
set is_processed = 1
from available_item, #temp_table
where available_item.item = #temp_table.item
# select the processed items...
commit trans
I am wondering whether there is a race condition here. If two separate processes execute this stored procedure at the same time, could they select and mark processed the same data? Or does having it in a transaction stop this?
If not, is there a way to hold locks on selected rows?
Some of the details will depend on your tables locking scheme. Allpages, pages and row level locking will have different impacts on your ability to run concurrent updates on a single table. I am assuming a page/row level scheme to allow for concurrency.
Your query will grab an initial shared page/row lock, which will be upgraded to an update lock, which will then be followed by an exclusive row lock on the updated pages/rows. No other processes will be able to make changes to the selected pages/rows for the duration of the transaction, but another process could read the selected rows prior to your update, which could lead to some inconsistency.
To get around this possibility, you can specify the isolation level in the transaction to either isolation level 2 (repeatable reads), or isolation level 3 (serialization). You may want to read up on the specifics of each level to decide which you want to enforce, and the trade-offs associate with it.
In your transaction, you would use it like this:
set rowcount #count
set transaction isolation level 2
...
Something to note, is that depending on the number of records you grab in your query, you could trigger a lock upgrade which could prevent your concurrent transactions from executing, even if they are not looking at the same rows as your first transaction. By default, the server will attempt to escalate to a table lock if it acquires locks on more than 200 pages/rows. This can be changed either to an explicit value or a range of values and percentage, and is configurable at the server, database or table level.
Relevant Documentation:
Transaction: Maintaining Data Consistency and Recovery
Performance and Tuning Series: Locking and Concurrency Control
Transact-SQL Users Guide 15.7 > Transactions: Maintaining Data Consistency and Recovery
Related
Test setup
I have a SQL Server 2014 and a simple table MyTable that contains columns Code (int) and Data (nvarchar(50)), no indexes created for this table.
I have 4 records in the table in the following manner:
1, First
2, Second
3, Third
4, Fourth
Then I run the following query in a transaction:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRANSACTION
DELETE FROM dbo.MyTable
WHERE dbo.MyTable.Code = 2
I have one affected row and I don't issue either Commit or Rollback.
Next I start yet another transaction:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
SELECT TOP 10 Code, Data
FROM dbo.MyTable
WHERE Code = 3
At this step the transaction with the SELECT query hangs waiting for completion of the transaction with the DELETE query.
My question
I don't understand why the transaction with SELECT query is waiting for the transaction with the DELETE query. In my understanding the deleted row (with code 2) has nothing to do with the selected row (with code 3) and as far as I understand the specific of isolation level SERIALIZABLE SQL Server shouldn't lock entire table in this case. Maybe this happens because the minimal locking amount for SERIALIZABLE is a page? Then it could produce an inconsistent behavior for selecting rows from some other pages if the table would have more rows, say 1000000 (some rows from other pages wouldn't be locked then). Please help to figure out why the locking takes place in my case.
Under locking READ COMMITTED, REPEATABLE READ, or SERIALIZABLE a SELECT query must place Shared (S) locks for every row the query plan actually reads. The locks can be placed either at the row-level, page-level, or table-level. Additionally SERIALIZABLE will place locks on ranges, so that no other session could insert a matching row while the lock is held.
And because you have "no indexes created for this table", this query:
SELECT TOP 10 Code, Data
FROM dbo.MyTable
WHERE Code = 3
Has to be executed with a table scan, and it must read all the rows (even those with Code=2) to determine whether they qualify for the SELECT.
This is one reason why you should almost always use Row-Versioning, either by setting the database to READ COMMITTED SNAPSHOT, or by coding read-only transactions to use SNAPSHOT isolation.
I need to implement a serializable isolation level in SQL Server but I've tried many ways and I don't get it.
I need to lock 1 row in one transaction (It doesn´t matter if lock the complete table). So, another transaction can´t even select the row (don´t read).
The last thing I tried:
For transaction 1:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
SELECT code FROM table1 WHERE code = 1
-- Here I select in another instance the same row
COMMIT TRAN
For transaction 2:
BEGIN TRAN
SELECT code FROM table1 WHERE code = 1
COMMIT TRAN
I would expect that transaction 2 wait until transaction 1 commit the operation, but the transaction 2 gives me the row.
Anyone can explain me if I miss something?
SQL Server conforms to the strict definition of a Serializable query. That is, there must be a result that can logically be generated IF both queries ran in serial order - Transaction 1 finishing before Transaction 2 can start, or vice versa.
This results in some effects that can be different than you would expect. There is a great explanation of the Serializable isolation level over at SQLPerformance.com that makes clear some of what this logical serializability ends up meaning. (Very helpful site, that one.)
For your above queries, there is no logical requirement to prevent the second query from reading the same row as the first query. No matter in what order the queries are run, they will both return the same data without modifying it. Since the Query Analyzer can identify this, there is no reason to place a read lock on the data. However, if one of the queries performed an update on the data, then (warning - logic assumption here, since I don't actually know the internals of how SQL Server handles this) the QA would set a stronger lock on the selected rows.
TL;DR - SQL Server wants to minimize blocking, so it uses logical analysis to see what types of locks are needed for a serializable isolation level, and it (tries to) use the minimum number and strength of locks needed to achieve its goal.
Now that we've dealt with that - there are only two ways that I can think of to lock a row so that no one else can read it: using XLOCK + TABLOCK (locking the whole table - not a recommended practice) or having some form of a field on each row that is updated when you start your process - something like an SPID field, or a bit flag for Locked. When you update it within your transaction, only SELECTs with NOLOCK hints will be able to read it.
Clearly, neither of these are optimal. I recommend the "This row is busy - go away" flag, as that's probably the approach I would take for an (almost) absolute lock on a row.
According to the documentation:
SERIALIZABLE Specifies the following:
Statements cannot read data that has been modified but not yet committed by other transactions.
No other transactions can modify data that has been read by the current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the
current transaction until the current transaction completes.
If you're not making any changes to data with an INSERT, UPDATE, or DELETE inside transaction 1, SQL will release the Shared Lock after the read completes.
What you might want to try is adding a table hit to prevent the row lock from being released until the end of transaction 1.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
SELECT code
FROM table1 WITH(ROWLOCK, HOLDLOCK)
WHERE code = 1
COMMIT TRAN
Maybe you can solve this with some hack like this?
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
UPDATE someTableForThisHack set val = CASE WHEN val = 1 THEN 0 else 1 End
SELECT code from table1.....
COMMIT TRANSACTION
So you create a table someTableForThisHack and insert one row to it.
I believe that each SELECT statement in SQL Server will cause either a Shared or Key lock to be placed. But will it place that same type of lock when it is in a transaction? Will Shared or Key locks allow other processes to read the same records?
For example I have the following logic
Begin Trans
-- select data that is needed for the next 2 statements
SELECT * FROM table1 where id = 1000; -- Assuming this returns 10, 20, 30
insert data that was read from the first query
INSERT INTO table2 (a,b,c) VALUES(10, 20, 30);
-- update table 3 with data found in the first query
UPDATE table3
SET d = 10,
e = 20,
f = 30;
COMMIT;
At this point will my select statement still create a shared or key lock or will it get escalated to exclusive lock? Will other transaction be able to read records from the table1 or will all transaction wait until the my transaction is committed before others are able to select from it?
In an application does it makes since to move the select statement outside of a transaction and just keep the insert/update in one transaction?
A SELECT will always place a shared lock - unless you use the WITH (NOLOCK) hint (then no lock will be placed), use a READ UNCOMMITTED transaction isolation level (same thing), or unless you specifically override it with query hints like WITH (XLOCK) or WITH (UPDLOCK).
A shared lock allows other reading processes to also acquire a shared lock and read the data - but they prevent exclusive locks (for insert, delete, update operations) from being acquired.
In this case, with just three rows selected, there will be no lock escalation (that only happens when more than 5000 locks are being acquired by a single transaction).
Depending on the transaction isolation level, those shared locks will be held for different amounts of times. With READ COMMITTED, the default level, the locks is released immediately after the data has been read, while with REPEATABLE READ or SERIALIZABLE levels, the locks will be held until the transaction is committed or rolled back.
I am pulling the data from several tables and then passing the data to a long running process. I would like to be able to record what data was used for the process and then query the database to check if any of the tables have changed since the process was last run.
Is there a method of solving this problem that should work across all sql databases?
One possible solution that I've thought of is having a separate table that is only used for keeping track of whether the data has changed since the process was run. The table contains a "stale" flag. When I start running the process, stale is set to false. If any creation, update, or deletion occurs in any of the tables on which the operation depends, I set stale to true. Is this a valid solution? Are there better solutions?
One concern with my solution is situations like this:
One user starts inserting a new row into one of the tables. Stale gets set to true, but the new row has not actually been added yet. Another user has simultaneously started the long running process, pulling the data from the tables and setting the flag to false. The row is finally added. Now the data used for the process is out of date but the flag indicates it is not stale. Would transactions be able to solve this problem?
EDIT:
This is some SQL for my idea. Not sure if it works, but just to give you a better idea of what I was thinking:
# First transaction reads the data and sets the flag to false
BEGIN TRANSACTION
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
UPDATE flag SET stale = false
SELECT * FROM DATATABLE
COMMIT TRANSACTION
# Second transaction updates the data and sets the flag to true
BEGIN TRANSACTION
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
UPDATE data SET val = 15 WHERE ID = 10
UPDATE flag SET stale = true
COMMIT TRANSACTION
I do not have much experience with transactions or handwriting xml, so there are probably issues with this. From what I understand two serializable transactions can not be interleaved. Please correct me if I'm wrong.
Is there a way to accomplish this with only the first transaction? The process will be run rarely, but the updates to the data table will occur more frequently, so it would be nice to not lock up the data table when performing updates.
Also, is the SET TRANSACTION ISOLATION syntax specific to MS?
The stale flag will probably work, but a timestamp would be better since it provides more metadata about the age of the records which could be used to tune your queries, e.g., only pull data that is over 5 minutes old.
To address your concern about inserting a row at the same time a query is run, transactions with an appropriate isolation level will help. For row inserts, updates, and selects, at least use a transaction with an isolation level that prevents dirty reads so that no other connections can see the updated data until the transaction is committed.
If you are strongly concerned about the case where an update happens at the same time as a record pull, you could use the REPEATABLE READ or even SERIALIZABLE isolation levels, but this will slow DB access down.
Your SQLServer sampled should work. For alternate databases, Here's an example that works in PostGres:
Transaction 1
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- run queries that update the tables, then set last_updated column
UPDATE sometable SET last_updatee = now() WHERE id = 1;;
COMMIT;
Transaction 2
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- select data from tables, then set last_queried column
UPDATE sometable SET last_queried = now() WHERE id = 1;
COMMIT;
If transaction 1 starts, and then transaction 2 starts before transaction 1 has completed, transaction 2 will block during on the update, and then will throw an error when transaction 1 is committed. If transaction 2 starts first, and transaction 1 starts before that has finished, then transaction 1 will error. Your application code or process should be able to handle those errors.
Other databases use similar syntax - MySQL (with InnoDB plugin) requires you to set the isolation level before you start the transaction.
Considering a forum table and many users simultaneously inserting messages into it, how safe is this transaction?
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
DECLARE #LastMessageId SMALLINT
SELECT #LastMessageId = MAX(MessageId)
FROM Discussions
WHERE ForumId = #ForumId AND DiscussionId = #DiscussionId
INSERT INTO Discussions
(ForumId, DiscussionId, MessageId, ParentId, MessageSubject, MessageBody)
VALUES
(#ForumId, #DiscussionId, #LastMessageId + 1, #ParentId, #MessageSubject, #MessageBody)
IF ##ERROR = 0
BEGIN
COMMIT TRANSACTION
RETURN 0
END
ROLLBACK TRANSACTION
RETURN 1
Here I read last MessageId and increment it. I can't use Identity field because it needs to be incremented for every message inserted in a group (not every message insert into table.)
Your transaction should be quite safe indeed - check out the MSDN docs on the SERIALIZABLE transaction level:
SERIALIZABLE
Specifies the following:
Statements cannot read data that has been modified but not yet
committed by other transactions.
No other transactions can modify data that has been read by the
current transaction until the current
transaction completes.
Other transactions cannot insert new rows with key values that
would fall in the range of keys read
by any statements in the current
transaction until the current
transaction completes.
Range locks are placed in the range of key values that match the
search conditions of each statement
executed in a transaction. This blocks
other transactions from updating or
inserting any rows that would qualify
for any of the statements executed by
the current transaction. This means
that if any of the statements in a
transaction are executed a second
time, they will read the same set of
rows. The range locks are held until
the transaction completes. This is the
most restrictive of the isolation
levels because it locks entire ranges
of keys and holds the locks until the
transaction completes. Because
concurrency is lower, use this option
only when necessary. This option has
the same effect as setting HOLDLOCK on
all tables in all SELECT statements in
a transaction.
The main problem with this transaction isolation level is that it's a pretty heavy load on the server, and serializes (as the name implies) any access, so your server performance and scalability will suffer, e.g. with very high numbers of users, you'll possibly get lots of timeouts for users waiting for a transaction to finish.
So using the more lightweight approach of a global message id as INT IDENTITY is definitely much better!