Following up on https://stackoverflow.com/a/16553083/14731...
I understand that it is important to maintain a consisting locking order for tables in order to reduce the frequency of deadlocks, and that this affects both UPDATE and SELECT statements [1]. But, does the same hold true for read-only rows?
If a row is populated once at initialization time and no one modifies it ever again, does it really matter what order we access it?
Given two transactions: T1, T2 and two read-only rows R1, R2
T1 reads R1, then R2
T2 reads R2, then R1
Can the transactions deadlock, even if I use SERIALIZABLE transaction isolation?
[1] If transaction isolation is REPEATABLE_READ, T1 SELECTs R1, R2 while T2 UPDATEs R2, R1 a deadlock may occur.
CLARIFICATION: This question is not RDBMS-specific. I am under the impression that no implementation can deadlock on read-only rows. If you have a counter-example (for a concrete vendor), please post an answer demonstrating as much and I will accept it. Alternatively, post a list of all concrete implementations that you can prove will not deadlock (and the most complete list will get accepted).
This question is impossible to answer for all possible RDBMS's because the locking strategy is an implementation detail. That said, useful RDBMS's will share some common characteristics:
For SELECT statements without hints applied (FOR UPDATE, WITH (UPDLOCK), ...) any reasonable RDBMS will not take write-locks. It might take read-locks. Indeed, at least SQL Server does so for SERIALIZABLE except on Hekaton tables.
Read-locks never conflict. No deadlock is possible if only reads are being executed.
Read-only rows can however cause deadlocks even if they are never written to. In SQL Server,
UPDATE T SET SomeCol = 1 WHERE ID = 10 AND SomeCol = 0
will take an U-lock on the row with ID 10. If SomeCol is not 0 the lock will be released immediately and nothing will be written. But the U-lock is a lock type that can potentially conflict and lead to a deadlock. Had the row with ID 10 not been present no deadlock would have been possible.
Related
I have a table that looks like that:
Id GroupId
1 G1
2 G1
3 G2
4 G2
5 G2
It should at any time be possible to read all of the rows (committed only). When there will be an update I want to have a transaction that will lock on group id, i.e. there should at any given time be only one transaction that attempts to update per GroupId.
It should ideally be still possible to read all committed rows (i.e. other transaction/ordinary reads that will not try to acquire the "update per group lock" should be still able to read).
The reason I want to do this is that an update can not rely on "outdated" data. I.e. I do make some calculations in a transaction and another transaction cannot edit row with id 1 or add a new row with the same GroupId after these rows were read by the first transaction (even though the first transaction would never modify the row itself it will be dependent on it's value).
Another "nice to have" requirement is that sometimes I would need the same requirement "cross group", i.e. the update transaction would have to lock 2 groups at the same time. (This is not a dynamic number of groups, but rather just 2)
Here are some ideas. I don't think any of them are perfect - I think you will need to give yourself a set of use-cases and try them. Some of the situations I tried after applying locks
SELECTs with the WHERE filter as another group
SELECTs with the WHERE filter as the locked group
UPDATES on the table with the WHERE clause as another group
UPDATEs on the table where ID (not GrpID!) was not locked
UPDATEs on the table where the row was locked (e.g., IDs 1 and 2)
INSERTs into the table with that GrpId
I have the funny feeling that none of these will be 100%, but the most likely answer is the second one (setting the transaction isolation level). It will probably lock more than desired, but will give you the isolation you need.
Also one thing to remember: if you lock many rows (e.g., there are thousands of rows with the GrpId you want) then SQL Server can escalate the lock to be a full-table lock. (I believe the tipping point is 5000 locks, but not sure).
Old-school hackjob
At the start of your transaction, update all the relevant rows somehow e.g.,
BEGIN TRAN
UPDATE YourTable
SET GrpId = GrpId
WHERE GrpId = N'G1';
-- Do other stuff
COMMIT TRAN;
Nothing else can use them because (bravo!) they are a write within a transaction.
Convenient - set isolation level
See https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-transaction-locking-and-row-versioning-guide?view=sql-server-ver15#isolation-levels-in-the-
Before your transaction, set the isolation level high e.g., SERIALIZABLE.
You may want to read all the relevant rows at the start of your transaction (e.g., SELECT Grp FROM YourTable WHERE Grp = N'Grp1') to lock them from being updated.
Flexible but requires a lot of coding
Use resource locking with sp_getapplock and sp_releaseapplock.
These are used to lock resources, not tables or rows.
What is a resource? Well, anything you want it to be. In this case, I'd suggest 'Grp1', 'Grp2' etc. It doesn't actually lock rows. Instead, you ask (via sp_getapplock, or APPLOCK_TEST) whether you can get the resource lock. If so, continue. If not, then stop.
Anything code referring to these tables needs to be reviewed and potentially modified to ask if it's allowed to run or not. If something doesn't ask for permission and just does it, there's no actual real locks stopping it (except via any transactions you've explicity specified).
You also need to ensure that errors are handled appropriately (e.g., still releasing the app_lock) and that processes that are blocked are re-tried.
Given the example below, assuming that TABLE1 contains 1million records;
SELECT * INTO TMP_TABLEA FROM TABLE1
SELECT * INTO TMP_TABLEB FROM TABLE1
INSERT INTO TMP_TABLEC (COLUMN1) SELECT COLUMN1 FROM TABLE1
Question;
Considering that the queries has been executed at the same time, do the TABLE1 will be locked? Or will cause blocking anyhow?
Does it significantly affects the execution performance of each query?
In SQL Server readers never block readers. So no, none of those statements block each other. Because despite they write to tables, the tables they write are different.
First statement will lock exclusively TMP_TABLEA, but it will put shared locks on TABLE1 under the default isolation level.
Second statement will lock exclusively TMP_TABLEB, but it will put shared locks on TABLE1 under the default isolation level.
Third statement will put exclusive locks (rows, pages or the whole object) of TMP_TABLEC. but it will put shared locks on TABLE1 under the default isolation level.
Obviously it affects performance, as you are asking SQL Server to do three things at the same time. However it is faster to execute all three statements at the same time using three connections than executing them serially using just one connection.
Considering that the queries has been executed at the same time, do the TABLE1 will be locked? Or will cause blocking anyhow?
Answer: Yes, but shared lock(actually Read Committed locks) will be applied on TABLE1, which is released as rows are gettting read one by one. As it is a shared lock, it is non blocking lock.
Does it significantly affects the execution performance of each query?
Answer: Yes it does. But you can improve the performance al lot by using some kind of partitioning in the TABLE1. So, that it can leverage the multicore CPUs that you are trying to use. Also, make sure that you have set indexes for the TABLE1 correctly.
As there is no write operation involved in TABLE1, blocking won't occur.
during read operation, Shared lock. which means no block.
during write operation, exclusive lock. which means block.
But definitely will cause slowness. Because the TABLE1 is sharing its available resource to 'n' number of operation.
I'm currently working on benchmarking different isolation levels in SQL Server 2008 -- but right now I'm stuck on what seems to be a trivial deadlocking problem, but I can't seem to figure it out. Hopefully someone here can offer advice (I'm a novice to SQL)
I currently have two types of transactions (to demonstrate dirty reads, but that's irrelevant):
Transaction Type A: Select all rows from Table A.
Transaction Type B: Set value 'cost' = 0 in all rows in Table A, then rollback immediately.
I currently run a threadpool of 1000 threads and 10,000 transactions, where each thread randomly chooses between executing Transaction Type A and Transaction Type B. However, I'm getting a ton of deadlocks even with forced row locking.
I assume that the deadlocks are occurring because of the row ordering of locks being acquired -- that is, if both Type A and Type B 'scan' table A in the same ordering, e.g. from top to bottom, such deadlocks cannot occur. However, I'm having trouble figuring out how to get SQL Server to maintain row ordering during SELECT and UPDATE statements.
Any tips? First time poster to stackoverflow, so please be gentle :-)
EDIT: The isolation level is purposely set to READ_COMMITTED to show that it eliminates dirty reads (and it does). Deadlocks only occur on any level equal to or higher than READ_COMMITTED; obviously no deadlocks occur on READ_UNCOMMITTED.
EDIT 2: These transactions are being run on a fresh instance of AdventureWorks LT on SQL Server 2008R2.
If you are starting a transaction to update all the rows, type B, and then rollback the transaction, the lock will need to be held for that entire transaction on all rows. Even though you have row level locks the lock needs to be held for the entire transaction.
You may see less deadlocks if you have page level or table level locking because these are easier to handle for Sql Server, but you will still need to hold these locks on the whole whilst the transaction is ongoing.
When you are designing a highly concurrent system you should avoid queries that lock the whole table. I recommend the following MicroSoft guide for understanding locks and reducing their impact:
http://technet.microsoft.com/en-us/library/cc966413.aspx
My answer to my question Is a half-written values reading prevented when SELECT WITH (NOLOCK) hint? cites a script illustrating catching non-atomic reads (SELECTs) of partly-updated values in SQL Server.
Is such non-atomic (partly updated, inserted, deleted) value reading problem specific to SQL Server?
Is it possible in other DBMS-es?
Update:
Not long time ago I believed that READ UNCOMMITTED transaction isolation level (also achieved through WITH(NOLOCK) hint in SQL Server) permitted reading (from other transactions) the uncommitted (or committed, if not yet changed) values but not partly modified (partly updated, partly inserted, partly deleted) values.
Update2:
The first two answers deviated the discussion to attacking READ UNCOMMITTED (isolation level ) phenomena specified by ANSI/ISO SQL-92 specifications.
This question is not about this.
Is non-atomicity of a value (not row!) is compliant with READ UNCOMMITTED and dirty read at all?
I believed that READ UNCOMMITTED did imply reading of uncommitted rows in their entirety but not partly modified values.
Does the definition of "dirty read" include possibility of value modification non-atomicity?
Is it a bug or by design?
or by ANSI SQL92 definition of "dirty read"? I believed that "dirty read" did include atomic reading uncommitted rows but non-atomically modified values...
Is it possible in other DBMS-es?
As far as I know the only other databases that allow READ UNCOMMITTED are DB2, Informix and MySQL when using a non-transactional engine.
All hell would break loose if atomic statements were in fact not atomic.
I can answer this for MSSQL - all single statements are atomic, "dirty reads" refers to the
possibility of reading a "phantom row" that might not exist after TX is committed/rolled back.
There is a difference between Atomicity and READ COMMITTED if the implementation of the latter relies on locking.
Consider transactions A and B. Transaction A is a single SELECT for all records with a status of 'pending' (perhaps a full scan on a very large table so it takes several minutes).
At 3:01 transaction A reads record R1 in the database and sees its status is 'New' so doesn't return it or lock it.
At 3:02 transaction B updates record R1 from 'New' to 'Pending' and record R2000 from 'New' to 'Pending' (single statement)
At 3:03 transaction B commits
At 3:04 transaction A reads record R2000, sees it is 'Pending' and committed and returns it (and locks it).
In this situation, the select in transaction A has only seen part of Transaction B, violating atomicity. Technically though, the select has only returned committed records.
Databases relying on locking reads suffer from this problem because the only solution would be to lock the entirety of the table(s) being read so no-one can update any records in any of them. This would make it impractical for any concurrent activity.
In practice, most OLTP applications have very quick transactions operating on very small data volumes (relative to the database size), and concurrent operations tend to hit different 'slices' of data so the situation occurs very rarely. Even if it does happen, it doesn't necessarily result in a noticeable problem and even when it does they are very hard to reproduce and fixing them would require a whole new architecture. In short, despite being a theoretical problem, in practice it often isn't worth worrying about.
That said, an architect should be aware of the potential issue, be able to assess the risk for a particular application and determine alternatives.
That's one reason why SQL Server added non-locking consistent reads in 2005.
Database theory requires that in all isolation levels, the individual UPDATE or INSERT statements are atomic. Their intermediate results should not be visible to read uncommitted transactions. This has been stated in a paper by a group of well-known database experts. http://research.microsoft.com/apps/pubs/default.aspx?id=69541
However, as read uncommitted results are not considered transactionally consistent by definition, it is possible that implementations may contain bugs that result in part-updated row sets to be returned and these bugs have not been noticed in tests because of the difficulty to determine the validity of the returned result sets.
Ex)
When should I use this statement:
DELETE TOP (#count)
FROM ProductInfo WITH (ROWLOCK)
WHERE ProductId = #productId_for_del;
And when should be just doing:
DELETE TOP (#count)
FROM ProductInfo
WHERE ProductId = #productId_for_del;
The with (rowlock) is a hint that instructs the database that it should keep locks on a row scope. That means that the database will avoid escalating locks to block or table scope.
You use the hint when only a single or only a few rows will be affected by the query, to keep the lock from locking rows that will not be deleted by the query. That will let another query read unrelated rows at the same time instead of having to wait for the delete to complete.
If you use it on a query that will delete a lot of rows, it may degrade the performance as the database will try to avoid escalating the locks to a larger scope, even if it would have been more efficient.
Normally you shouldn't need to add such hints to a query, because the database knows what kind of lock to use. It's only in situations where you get performance problems because the database made the wrong decision, that you should add such hints to a query.
Rowlock is a query hint that should be used with caution (as is all query hints).
Omitting it will likely still result in the exact same behaviour and providing it will not guarantee that it will only use a rowlock, it is only a hint afterall. If you do not have a very in depth knowledge of lock contention chances are that the optimizer will pick the best possible locking strategy, and these things are usually best left to the database engine to decide.
ROWLOCK means that SQL will lock only the affected row, and not the entire table or the page in the table where the data is stored when performing the delete. This will only affect other people reading from the table at the same time as your delete is running.
If a table lock is used it will cause all queries to the table to wait until your delete has completed, with a row lock only selects reading the specific rows will be made to wait.
Deleting top N where N is a number of rows will most likely lock the table in any case.
SQL Server defaults to page locks. This is the most efficient way for SQL server to process multiple date sets. But SQL server is not multi-user friendly sometimes; therefore you may need to incorporate locking methods so you can get your data to flow in and out of the database. This is why people approach that problem by using locking hints.
If everyone designed there database tables so that everything processed each row at page width - the system would be very fast. But no one spends that detailed amount of time.
So, you might see people use with(nolock) on their SELECT statements and the use of with(rowlock) on their UPDATE and DELETE statements. An INSERT does not matter because it will lock the PAGE automatically. Sometimes by using with(rowlock), you can get better multi-user (multiple user connections) performance.
The problem with(nolock) is that you can return the committed record sitting there in the DB already, plus the dirty record that is about to update the sitting record; thus a double return of records to your SELECT statement. If you know the personality of your system on how the data runs through it, you can use with(nolock) to your advantage quite a bit though.
When do you know when to use with(rowlock)? When your system isn't letting user play nice with each other in the same table / record. Though, query re-write / tune first and then adjust your locking as a last resort.
But as a DBA, always blame the developer's code. It is your solemnly sworn duty to do such. If you are the developer writing this code, just blame yourself.