Microsoft SQL, understand transaction locking on low level - sql

I had a weird case when I got a deadlock between a select statement that joined two tables and a transaction that performed multiple updates on those two tables.
I am coming from a Java world, so I thought that using a transaction will lock all the tables in it, but what I come to understand now is that the lock will only be requested when you access the table from your transaction and if someone else is doing a heavy select on that table during that time you might get a deadlock. Just to be fair I also must say that I had multiple connections making that same sequence of calls where they performed a heavy query on two tables and then created a transaction to update those tables, so whatever age case you might think of, there is a big chance I been running into it.
With that been said, can you please provide a low level explanation in what situations you might get a deadlock between a select statement and a transaction?

Related

Lock SQL Table/Row and using NOLOCK/READPAST [duplicate]

This question already has answers here:
Effect of NOLOCK hint in SELECT statements
(5 answers)
SQL 2008+ NOLOCK vs READPAST Considerations for Reporting Accuracy
(3 answers)
Closed last year.
I am the end-user of a highly updated Microsoft SQL Server DB containing dozens of tables with hunreds of millions of rows each.
A banking DB is a good example for what I am working with, with the exception that in my DB UPDATE statement are rearly used and INSERT statements are used frequently (once a row as entered a table, it rarely changes).
I, personally, not using any UPDATE/INSERT statement, only SELECT statement (with complex WHERE/ JOIN/ CROSS/ GROUP clues).
I have some questions about locking and using NOLOCK/READPAST.
1.how can I know if a query I am using is locking only a row or the entire table?
for example, I noticed this query didn't locked other users from inserting new data to the table:
SELECT *
FROM Table
while this query did:
SELECT COUNT(Date)
FROM Table
This is of course just examples, not actual full queris I am using.
As I mentioned, rows rarely changing so locking a row isn't concerning me but locking a table is highly concerning.
2.I would like to know the risks of using NOLOCK/READPAST in my queries (to revoke any concern I might have about locking a table from updating).
I searched about it a lot but I could not find a full answer.
I dont care If by using NOLOCK/READPAST I might get past data (that again, data i rarely changes) or I might miss some newly added data.
I did read in a couple of places that using NOLOCK might cause duplicate data/ corrupted data, this is a problem for me.
3.what exactly is the diffrent between READPASY and NOLOCK? which one is "safer" regarding the concerns mentiond above
Thank you.
This is highly dependent on your servers settings. Generally speaking, you want to lock records, even when you are just reading them because you don't want data to change while you are reading it. This isn't just something that affects updated records, but also inserts. You can learn more about read commits and snapshop isolation here:
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/snapshot-isolation-in-sql-server
Both NOLOCK/READPAST should be avoided at all cost. There are a very small handful of scenarios where these make sense, but they are exceedingly rare. You are better off optimizing your query to perform better and reduce the amount of records being locked and the time that the records spend being locked. One case that I can see NOLOCK being useful would be a log table that only has inserts, and your query doesn't join the data to other tables, AND a dirty record wouldn't cause problems.
NOLOCK doesn't lock records that it reads. The risk here is that records you are reading can literally change mid read. This means you can begin reading a record and get some values for some columns before the update was made and some column values from after the update. If another transaction rolls back you could end up reading records that were never actually committed to the database.
READPAST skips any rows that are locked. If another query runs and the criteria causes rows 1-25 of 100 to be locked while you are querying the same data you are only going to see records 26-100. To your query locked rows don't exist.
Great article with the details:
https://www.mssqltips.com/sqlservertip/4468/compare-sql-server-nolock-and-readpast-table-hints/
You would be far better served by spending time learning to optimize your queries to reduce the number of records they need to lock, and improving the performance so that the amount of time those locks exist is kept to a minimum.

Locking database table between query and insert

Forgive me if this is a silly question (I'm new to databases and SQL), but is it possible to lock a table, similar to the lock keyword in C#, so that I can query the database to see if a condition is met, then insert a row afterwards while ensuring the state of the table has not changed between the two actions?
In this case, I have a table transactions which has two columns: user and product. This is a many-to-one relationship; multiple users can have the same product. However, the number of products is limited.
When a user adds a product to their account, I want to first check if the total number of items with the same product value to see if it is under a certain threshold, then add the transaction afterwards. However, since this is a multithreaded application, multiple transactions can come in at the same time. I want to make sure that one of these is rejected, and one succeeds, such that the number of transactions with the same product value can never be higher than the limit.
Rough pseudo-code for what I am trying to do:
my_user, my_product = ....
my_product_count = 0
for each transaction in transactions:
if product == my_product:
my_product_count += 1
if my_product_count < LIMIT:
insert my_user, my_product into transactions
return SUCCESS
else:
return FAILURE
I am using SQLAlchemy with SQLite3, if that matters.
Not needed if you do both operations in a transaction - which is supported by databases. Databases do maintain locks to guarantee transactional integrity. In fact that is one of the four pillars of what a database does - they are called ACID guaranetees (for (Atomicity, Consistency, Isolation, Durability).
So, in your case, to ensure consistence you would make both operations in one transaction and seat the transaction parameters in such a way to block reads on the already read rows.
SQL locking is WAY more powerfull than the lock statement because, among other things, databases per definition have multiple threads (users) hitting the same data - something that is exceedingly rare in programming (where same data access is avoided in multi threaded programming as much as possible).
I suggest a good book about SQL - because you need to simply LEARN some fundamental concepts at one point, or you will make mistakes that cost money.
Transactions allow you to use multiple SQL statements atomically.
(SQLite implements transactions by locking the entire database, but the exact mechanism doesn't matter, and you might want to use another database later anyway.)
However, you don't even need to bother with explicit transactions if your desired algorithm can be handled with single SQL statement, like this:
INSERT INTO transactions(user, product)
SELECT #my_user, #my_product
WHERE (SELECT COUNT(*)
FROM transactions
WHERE product = #my_product) < #LIMIT;

Minimizing deadlocks with purposely contrived + highly concurrent transactions?

I'm currently working on benchmarking different isolation levels in SQL Server 2008 -- but right now I'm stuck on what seems to be a trivial deadlocking problem, but I can't seem to figure it out. Hopefully someone here can offer advice (I'm a novice to SQL)
I currently have two types of transactions (to demonstrate dirty reads, but that's irrelevant):
Transaction Type A: Select all rows from Table A.
Transaction Type B: Set value 'cost' = 0 in all rows in Table A, then rollback immediately.
I currently run a threadpool of 1000 threads and 10,000 transactions, where each thread randomly chooses between executing Transaction Type A and Transaction Type B. However, I'm getting a ton of deadlocks even with forced row locking.
I assume that the deadlocks are occurring because of the row ordering of locks being acquired -- that is, if both Type A and Type B 'scan' table A in the same ordering, e.g. from top to bottom, such deadlocks cannot occur. However, I'm having trouble figuring out how to get SQL Server to maintain row ordering during SELECT and UPDATE statements.
Any tips? First time poster to stackoverflow, so please be gentle :-)
EDIT: The isolation level is purposely set to READ_COMMITTED to show that it eliminates dirty reads (and it does). Deadlocks only occur on any level equal to or higher than READ_COMMITTED; obviously no deadlocks occur on READ_UNCOMMITTED.
EDIT 2: These transactions are being run on a fresh instance of AdventureWorks LT on SQL Server 2008R2.
If you are starting a transaction to update all the rows, type B, and then rollback the transaction, the lock will need to be held for that entire transaction on all rows. Even though you have row level locks the lock needs to be held for the entire transaction.
You may see less deadlocks if you have page level or table level locking because these are easier to handle for Sql Server, but you will still need to hold these locks on the whole whilst the transaction is ongoing.
When you are designing a highly concurrent system you should avoid queries that lock the whole table. I recommend the following MicroSoft guide for understanding locks and reducing their impact:
http://technet.microsoft.com/en-us/library/cc966413.aspx

SQL transaction affecting a big amount of rows

The situation is as follows:
A big production client/server system where one central database table has a certain column that has had NULL as default value but now has 0 as default value. But all the rows created before that change of course still have value as null and that generates a lot of unnecessary error messages in this system.
Solution is of course simple as that:
update theTable set theColumn = 0 where theColumn is null
But I guess it's gonna take a lot of time to complete this transaction? Apart from that, will there be any other issues I should think of before I do this? Will this big transaction block the whole database, or that particular table during the whole update process?
This particular table has about 550k rows and 500k of them has null value and will be affected by the above sql statement.
The impact on the performance of other connected clients depends on:
How fast the servers hardware is
How many indexes containing the column your update statement has to update
Which transaction isolation settings the other clients connect to the database
The db engine will acquire write locks, so when your clients only need read access to the table, it should not be a big problem.
500.000 records sounds not too much for me, but as i said, the time and resources the update takes depends on many factors.
Do you have a similar test system, where you can try out the update?
Another solution is to split the one big update into many small ones and call them in a loop.
When you have clients writing frequently to that table, your update statement might get blocked "forever". I have seen databases where performing the update row by row was the only way of getting the update through. But that was a table with about 200.000.000 records and about 500 very active clients!
it's gonna take a lot of time to complete this transaction
there's no definite way to say this. Depends a lot on the hardware, number of concurrent sessions, whether the table has got locks, the number of interdependent triggers et al.
Will this big transaction block the whole database, or that particular table during the whole update process
If the "whole database" is dependent on this table then it might.
will there be any other issues I should think of before I do this
If the table has been locked by other transaction - you might run into a row-lock situation. In rare cases, perhaps a dead lock situation. Best would be to ensure that no one is utilizing the table, check for any pre-exising locks and then run the statement.
Locking issues are vendor specific.
Asuming no triggers on the table, half a million rows is not much for a dediated database server even with many indexes on the table.

What is the purpose of ROWLOCK on Delete and when should I use it?

Ex)
When should I use this statement:
DELETE TOP (#count)
FROM ProductInfo WITH (ROWLOCK)
WHERE ProductId = #productId_for_del;
And when should be just doing:
DELETE TOP (#count)
FROM ProductInfo
WHERE ProductId = #productId_for_del;
The with (rowlock) is a hint that instructs the database that it should keep locks on a row scope. That means that the database will avoid escalating locks to block or table scope.
You use the hint when only a single or only a few rows will be affected by the query, to keep the lock from locking rows that will not be deleted by the query. That will let another query read unrelated rows at the same time instead of having to wait for the delete to complete.
If you use it on a query that will delete a lot of rows, it may degrade the performance as the database will try to avoid escalating the locks to a larger scope, even if it would have been more efficient.
Normally you shouldn't need to add such hints to a query, because the database knows what kind of lock to use. It's only in situations where you get performance problems because the database made the wrong decision, that you should add such hints to a query.
Rowlock is a query hint that should be used with caution (as is all query hints).
Omitting it will likely still result in the exact same behaviour and providing it will not guarantee that it will only use a rowlock, it is only a hint afterall. If you do not have a very in depth knowledge of lock contention chances are that the optimizer will pick the best possible locking strategy, and these things are usually best left to the database engine to decide.
ROWLOCK means that SQL will lock only the affected row, and not the entire table or the page in the table where the data is stored when performing the delete. This will only affect other people reading from the table at the same time as your delete is running.
If a table lock is used it will cause all queries to the table to wait until your delete has completed, with a row lock only selects reading the specific rows will be made to wait.
Deleting top N where N is a number of rows will most likely lock the table in any case.
SQL Server defaults to page locks. This is the most efficient way for SQL server to process multiple date sets. But SQL server is not multi-user friendly sometimes; therefore you may need to incorporate locking methods so you can get your data to flow in and out of the database. This is why people approach that problem by using locking hints.
If everyone designed there database tables so that everything processed each row at page width - the system would be very fast. But no one spends that detailed amount of time.
So, you might see people use with(nolock) on their SELECT statements and the use of with(rowlock) on their UPDATE and DELETE statements. An INSERT does not matter because it will lock the PAGE automatically. Sometimes by using with(rowlock), you can get better multi-user (multiple user connections) performance.
The problem with(nolock) is that you can return the committed record sitting there in the DB already, plus the dirty record that is about to update the sitting record; thus a double return of records to your SELECT statement. If you know the personality of your system on how the data runs through it, you can use with(nolock) to your advantage quite a bit though.
When do you know when to use with(rowlock)? When your system isn't letting user play nice with each other in the same table / record. Though, query re-write / tune first and then adjust your locking as a last resort.
But as a DBA, always blame the developer's code. It is your solemnly sworn duty to do such. If you are the developer writing this code, just blame yourself.