Lock SQL Table/Row and using NOLOCK/READPAST [duplicate] - sql

This question already has answers here:
Effect of NOLOCK hint in SELECT statements
(5 answers)
SQL 2008+ NOLOCK vs READPAST Considerations for Reporting Accuracy
(3 answers)
Closed last year.
I am the end-user of a highly updated Microsoft SQL Server DB containing dozens of tables with hunreds of millions of rows each.
A banking DB is a good example for what I am working with, with the exception that in my DB UPDATE statement are rearly used and INSERT statements are used frequently (once a row as entered a table, it rarely changes).
I, personally, not using any UPDATE/INSERT statement, only SELECT statement (with complex WHERE/ JOIN/ CROSS/ GROUP clues).
I have some questions about locking and using NOLOCK/READPAST.
1.how can I know if a query I am using is locking only a row or the entire table?
for example, I noticed this query didn't locked other users from inserting new data to the table:
SELECT *
FROM Table
while this query did:
SELECT COUNT(Date)
FROM Table
This is of course just examples, not actual full queris I am using.
As I mentioned, rows rarely changing so locking a row isn't concerning me but locking a table is highly concerning.
2.I would like to know the risks of using NOLOCK/READPAST in my queries (to revoke any concern I might have about locking a table from updating).
I searched about it a lot but I could not find a full answer.
I dont care If by using NOLOCK/READPAST I might get past data (that again, data i rarely changes) or I might miss some newly added data.
I did read in a couple of places that using NOLOCK might cause duplicate data/ corrupted data, this is a problem for me.
3.what exactly is the diffrent between READPASY and NOLOCK? which one is "safer" regarding the concerns mentiond above
Thank you.

This is highly dependent on your servers settings. Generally speaking, you want to lock records, even when you are just reading them because you don't want data to change while you are reading it. This isn't just something that affects updated records, but also inserts. You can learn more about read commits and snapshop isolation here:
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/snapshot-isolation-in-sql-server
Both NOLOCK/READPAST should be avoided at all cost. There are a very small handful of scenarios where these make sense, but they are exceedingly rare. You are better off optimizing your query to perform better and reduce the amount of records being locked and the time that the records spend being locked. One case that I can see NOLOCK being useful would be a log table that only has inserts, and your query doesn't join the data to other tables, AND a dirty record wouldn't cause problems.
NOLOCK doesn't lock records that it reads. The risk here is that records you are reading can literally change mid read. This means you can begin reading a record and get some values for some columns before the update was made and some column values from after the update. If another transaction rolls back you could end up reading records that were never actually committed to the database.
READPAST skips any rows that are locked. If another query runs and the criteria causes rows 1-25 of 100 to be locked while you are querying the same data you are only going to see records 26-100. To your query locked rows don't exist.
Great article with the details:
https://www.mssqltips.com/sqlservertip/4468/compare-sql-server-nolock-and-readpast-table-hints/
You would be far better served by spending time learning to optimize your queries to reduce the number of records they need to lock, and improving the performance so that the amount of time those locks exist is kept to a minimum.

Related

Selecting 80% of rows and table lock

One of my colleagues came to me with this statement:
Having a SELECT on a table that fetch 80% of the rows while having a
WHERE clause on a column with an index. So to avoid that add a WITH (NOLOCK) in your FROM clause.
His only argument was: Believe me I've experienced it myself. I cannot find a proper documentation for this.
I far has I know WITH (NOLOCK) only affects the table by letting UPDATE and INSERT occur while selecting and that can lead us to dirty read.
Is my colleague's assumption correct?
I think you're referring to lock escalation, https://technet.microsoft.com/en-us/library/ms184286(v=sql.105).aspx , combined with a table scan caused by an index with bad selectivity, and some possibilities for blocking.
If the statistics on a non clustered index show that the number of rows returned from a table for a specific value exceed some threshold, then the optimizer will choose to use a table scan to find the corresponding rows instead of an index seek with corresponding bookmark lookups, because they are slow in quantity.
I typically tell people that you want that percentage to be 5% or lower, but sometimes it will still index seek up to 10% or so. At 80%, it's definitely going to table scan.
Also, since the query is doing a table scan, the query has to be able to acquire some kind of lock on every single row in the table. If there are any other queries that are running performing updates, or otherwise preventing locks from being acquired on even a single row, the query will have to wait.
With lock escalation, it's not a percentage, but instead a specific magic number of 5,000. A query generally starts reading rows using row locks. If a single query reads 5,000 or more rows, it will escalate the locks that it is using against the table from row and/or page locks to full table locks.
This is when deadlocks happen, because another query may be trying to do the same thing.
These locks don't necessarily have anything to do with inserts/updates.
This is an actual thing. No, this does not mean that you should use NOLOCK. You'd be much better off looking at READPAST, TABLOCK, or TABLOCKX, https://msdn.microsoft.com/en-us/library/ms187373.aspx , if you're having issues with deadlocks.
Do not do any of these things just out of habit and only look into them for specific instances with highly transactional tables that are experiencing actual problems.
By default writers have priority and readers will wait on writers to finish. WITH(NOLOCK) will allow readers to read uncommitted data, avoiding waits on writers to finish. For read only queries against very large tables, this is ok if you are querying data such as an old partition of data or pulling back data that is not going to change often and changes are not critical in the presentation of data. This is the same as using the SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED directive in SP's.

Is there any perfomance issues when inserting into a large SQL Server table which is being queried?

I use SQL Server. I got a large table - millions of rows. And I iterate through them (SELECT .. WHERE ..). This is a long operation (and I assume can't be shorter).
So what am I asking is if there will be any problems to insert data into that table in the progress of selecting? If yes, what should I do to reduce that? Same questing for update command (with indexed parameters of course).
Yes, you will have performance, and more specifically, locking and blocking issues. If your SELECT statements are using indexes, which they should be, these indexes will be locked every time that you INSERT data into the table. Since the table is relatively large, the lock will probably be long enough to block your SELECT statements, and deadlocks are likely as well.
This might be a scenario where you need to re-evaluate your table structure, and possibly even consider denormalizing to avoid this.
You might also consider Enabling Row Versioning-Based Isolation Levels, assuming that you can throughly test the rest of your system to understand the impact.
The answer is yes, absolutely. A simple solution (if it's an acceptable trade off within your application) is to specify the NOLOCK locking hint. IE:
select * from table with NOLOCK
The tradeoff is that you won't get a consistent read, but in many cases this isn't problem.
It's generally not a good idea to have long running queries on a database with frequent updates. This decrease performance significantly because of locking.
It might be a good idea to look into data warehouses and see if that is something that you could use. That would enable you to have the transactions on a separate database and the bulk load from it in to another database that would have your warehouse.
This would greatly improve performance for both inserts and queries. The trans-actional database could have no indexes, and the warehouse could have all the indexes you want.
You could also put the warehouse in a column store database. That would give you the best query time with the minimal effort because there isn't any need to create indexes in a column store, all you would have to do is to design the schema properly. The drawback with column stores is how ever that inserts, updates and deletes are very slow compared to relational databases. But bulk loading from the transactional database should do the trick. If you require the data to be very up to date, you could bulk load every few minutes. If you just need data from the previous day you could bulk load into the warehouse each night.
The possibilities are endless. If you want to look into column store warehouses you could try MonetDB. Its an open source column store so you could try it out and see if that's anything that suits you.
Do not assume execution time can't be shorter. If you query a date range, an index on date is a must!
Solve your problem indexing on date field:
-- please use correct names for your_table and date_field --
CREATE INDEX index_name ON your_table date_field
Warehousing, as per #Gisli, is a good option: build a copy of the data elsewhere, and run your long-running queries there, freeing up the "main" database for OLTP processing.
If this is not an option, you can mess around with snapshot isolation (something I know about, but have never worked with personally). Esssentially, this will take a "snapshot" of the database at the point in time you start the query, and will execute the query as if no subsequent changes were made to the database, even if changes are made to the database while the query is running. More importantly, any such changes are "real" and permanent. Think of it like a short-term branching of your database.
The duration of the branch (snapshot) is where I get weak. I believe you can have the snapshot last for the duration of the query, which means you'd (possibly) never be able to get the same results for a given query twice (if the data changes while you are running it); or you can create a "saved" snapshot that can be re-used over and over until you get around to deleting it. Be wary with this, you don't want your system to get cluttered up with old forgotten branches of past data!
There is no PROBLEM. SQL Serve is built to deal with this kind of situations, you just need to set the correct isolation level on the transactions.
There are several possible scenarios, for example, if you don't mind reading the data that is being inserted, set your isolation level to read uncommited on your read transaction. If you are inserting values in a range and reading values on another range, you can use SERIALIZABLE.
Take a look at the possible isolation levels:
http://msdn.microsoft.com/en-us/library/ms173763.aspx

SQL transaction affecting a big amount of rows

The situation is as follows:
A big production client/server system where one central database table has a certain column that has had NULL as default value but now has 0 as default value. But all the rows created before that change of course still have value as null and that generates a lot of unnecessary error messages in this system.
Solution is of course simple as that:
update theTable set theColumn = 0 where theColumn is null
But I guess it's gonna take a lot of time to complete this transaction? Apart from that, will there be any other issues I should think of before I do this? Will this big transaction block the whole database, or that particular table during the whole update process?
This particular table has about 550k rows and 500k of them has null value and will be affected by the above sql statement.
The impact on the performance of other connected clients depends on:
How fast the servers hardware is
How many indexes containing the column your update statement has to update
Which transaction isolation settings the other clients connect to the database
The db engine will acquire write locks, so when your clients only need read access to the table, it should not be a big problem.
500.000 records sounds not too much for me, but as i said, the time and resources the update takes depends on many factors.
Do you have a similar test system, where you can try out the update?
Another solution is to split the one big update into many small ones and call them in a loop.
When you have clients writing frequently to that table, your update statement might get blocked "forever". I have seen databases where performing the update row by row was the only way of getting the update through. But that was a table with about 200.000.000 records and about 500 very active clients!
it's gonna take a lot of time to complete this transaction
there's no definite way to say this. Depends a lot on the hardware, number of concurrent sessions, whether the table has got locks, the number of interdependent triggers et al.
Will this big transaction block the whole database, or that particular table during the whole update process
If the "whole database" is dependent on this table then it might.
will there be any other issues I should think of before I do this
If the table has been locked by other transaction - you might run into a row-lock situation. In rare cases, perhaps a dead lock situation. Best would be to ensure that no one is utilizing the table, check for any pre-exising locks and then run the statement.
Locking issues are vendor specific.
Asuming no triggers on the table, half a million rows is not much for a dediated database server even with many indexes on the table.

What is the purpose of ROWLOCK on Delete and when should I use it?

Ex)
When should I use this statement:
DELETE TOP (#count)
FROM ProductInfo WITH (ROWLOCK)
WHERE ProductId = #productId_for_del;
And when should be just doing:
DELETE TOP (#count)
FROM ProductInfo
WHERE ProductId = #productId_for_del;
The with (rowlock) is a hint that instructs the database that it should keep locks on a row scope. That means that the database will avoid escalating locks to block or table scope.
You use the hint when only a single or only a few rows will be affected by the query, to keep the lock from locking rows that will not be deleted by the query. That will let another query read unrelated rows at the same time instead of having to wait for the delete to complete.
If you use it on a query that will delete a lot of rows, it may degrade the performance as the database will try to avoid escalating the locks to a larger scope, even if it would have been more efficient.
Normally you shouldn't need to add such hints to a query, because the database knows what kind of lock to use. It's only in situations where you get performance problems because the database made the wrong decision, that you should add such hints to a query.
Rowlock is a query hint that should be used with caution (as is all query hints).
Omitting it will likely still result in the exact same behaviour and providing it will not guarantee that it will only use a rowlock, it is only a hint afterall. If you do not have a very in depth knowledge of lock contention chances are that the optimizer will pick the best possible locking strategy, and these things are usually best left to the database engine to decide.
ROWLOCK means that SQL will lock only the affected row, and not the entire table or the page in the table where the data is stored when performing the delete. This will only affect other people reading from the table at the same time as your delete is running.
If a table lock is used it will cause all queries to the table to wait until your delete has completed, with a row lock only selects reading the specific rows will be made to wait.
Deleting top N where N is a number of rows will most likely lock the table in any case.
SQL Server defaults to page locks. This is the most efficient way for SQL server to process multiple date sets. But SQL server is not multi-user friendly sometimes; therefore you may need to incorporate locking methods so you can get your data to flow in and out of the database. This is why people approach that problem by using locking hints.
If everyone designed there database tables so that everything processed each row at page width - the system would be very fast. But no one spends that detailed amount of time.
So, you might see people use with(nolock) on their SELECT statements and the use of with(rowlock) on their UPDATE and DELETE statements. An INSERT does not matter because it will lock the PAGE automatically. Sometimes by using with(rowlock), you can get better multi-user (multiple user connections) performance.
The problem with(nolock) is that you can return the committed record sitting there in the DB already, plus the dirty record that is about to update the sitting record; thus a double return of records to your SELECT statement. If you know the personality of your system on how the data runs through it, you can use with(nolock) to your advantage quite a bit though.
When do you know when to use with(rowlock)? When your system isn't letting user play nice with each other in the same table / record. Though, query re-write / tune first and then adjust your locking as a last resort.
But as a DBA, always blame the developer's code. It is your solemnly sworn duty to do such. If you are the developer writing this code, just blame yourself.

SQL Server 2005 efficient delete

My issue at hand is that I need to remove about 60M records from a table without causing deadlocks with other processes that use this table. At this point I'm almost done removing the records using a while loop that only processes about 1M records at a time however it's taken all day.
Q1: What is the optimal way to remove large quantities of data from a table, keeping the table online and minimal impact to other resources that need to use this table in MS SQL Server 2005?
Q2: Is there a way to implement the individual row locking (rather than table locking) in SQL Server like they have in Oracle? (Note answering this may answer Q1).
A2: So as #Remus Rusanu informed me there is a way to do row level locking with a delete.
See this thread, the original poster actually did some tests and posted the most efficient method. An MVP did originally chime in with an option to actually insert the data you want to retain into a temp table and then truncate the original table and reinsert.
I just did something like that recently. I simply created a SQL Server job that ran every 10 mins deleting a million rows. Code follows
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
DELETE TOP 1000000 FROM BIG_TABLE WHERE CreatedDate <= '20080630'
Said table in question had about 900 mil rows to start with. Didn't notice any significant performance issues.
The most efficient way is to use partition switching, see Transferring Data Efficiently by Using Partition Switching. The drawback is that it requires planning ahead in how the partitions are deployed.
If partition switching is not available then the answer depends on the actual table schema. You better post the actual schema (including all indexes and most importantly the clustered key definiton) and the criteria that qualifies the delete candidates.
As for Q2, SQL Server had row level locking since mid 90s, I don't know what you're actually asking.