SQL Server Table Lockings/Blockings - sql

Given the example below, assuming that TABLE1 contains 1million records;
SELECT * INTO TMP_TABLEA FROM TABLE1
SELECT * INTO TMP_TABLEB FROM TABLE1
INSERT INTO TMP_TABLEC (COLUMN1) SELECT COLUMN1 FROM TABLE1
Question;
Considering that the queries has been executed at the same time, do the TABLE1 will be locked? Or will cause blocking anyhow?
Does it significantly affects the execution performance of each query?

In SQL Server readers never block readers. So no, none of those statements block each other. Because despite they write to tables, the tables they write are different.
First statement will lock exclusively TMP_TABLEA, but it will put shared locks on TABLE1 under the default isolation level.
Second statement will lock exclusively TMP_TABLEB, but it will put shared locks on TABLE1 under the default isolation level.
Third statement will put exclusive locks (rows, pages or the whole object) of TMP_TABLEC. but it will put shared locks on TABLE1 under the default isolation level.
Obviously it affects performance, as you are asking SQL Server to do three things at the same time. However it is faster to execute all three statements at the same time using three connections than executing them serially using just one connection.

Considering that the queries has been executed at the same time, do the TABLE1 will be locked? Or will cause blocking anyhow?
Answer: Yes, but shared lock(actually Read Committed locks) will be applied on TABLE1, which is released as rows are gettting read one by one. As it is a shared lock, it is non blocking lock.
Does it significantly affects the execution performance of each query?
Answer: Yes it does. But you can improve the performance al lot by using some kind of partitioning in the TABLE1. So, that it can leverage the multicore CPUs that you are trying to use. Also, make sure that you have set indexes for the TABLE1 correctly.

As there is no write operation involved in TABLE1, blocking won't occur.
during read operation, Shared lock. which means no block.
during write operation, exclusive lock. which means block.
But definitely will cause slowness. Because the TABLE1 is sharing its available resource to 'n' number of operation.

Related

Selecting 80% of rows and table lock

One of my colleagues came to me with this statement:
Having a SELECT on a table that fetch 80% of the rows while having a
WHERE clause on a column with an index. So to avoid that add a WITH (NOLOCK) in your FROM clause.
His only argument was: Believe me I've experienced it myself. I cannot find a proper documentation for this.
I far has I know WITH (NOLOCK) only affects the table by letting UPDATE and INSERT occur while selecting and that can lead us to dirty read.
Is my colleague's assumption correct?
I think you're referring to lock escalation, https://technet.microsoft.com/en-us/library/ms184286(v=sql.105).aspx , combined with a table scan caused by an index with bad selectivity, and some possibilities for blocking.
If the statistics on a non clustered index show that the number of rows returned from a table for a specific value exceed some threshold, then the optimizer will choose to use a table scan to find the corresponding rows instead of an index seek with corresponding bookmark lookups, because they are slow in quantity.
I typically tell people that you want that percentage to be 5% or lower, but sometimes it will still index seek up to 10% or so. At 80%, it's definitely going to table scan.
Also, since the query is doing a table scan, the query has to be able to acquire some kind of lock on every single row in the table. If there are any other queries that are running performing updates, or otherwise preventing locks from being acquired on even a single row, the query will have to wait.
With lock escalation, it's not a percentage, but instead a specific magic number of 5,000. A query generally starts reading rows using row locks. If a single query reads 5,000 or more rows, it will escalate the locks that it is using against the table from row and/or page locks to full table locks.
This is when deadlocks happen, because another query may be trying to do the same thing.
These locks don't necessarily have anything to do with inserts/updates.
This is an actual thing. No, this does not mean that you should use NOLOCK. You'd be much better off looking at READPAST, TABLOCK, or TABLOCKX, https://msdn.microsoft.com/en-us/library/ms187373.aspx , if you're having issues with deadlocks.
Do not do any of these things just out of habit and only look into them for specific instances with highly transactional tables that are experiencing actual problems.
By default writers have priority and readers will wait on writers to finish. WITH(NOLOCK) will allow readers to read uncommitted data, avoiding waits on writers to finish. For read only queries against very large tables, this is ok if you are querying data such as an old partition of data or pulling back data that is not going to change often and changes are not critical in the presentation of data. This is the same as using the SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED directive in SP's.

PostgreSql upsert: which explicit locking method?

From this question, I want to use the following method to perform upserts in a PostGreSql table:
UPDATE table SET field='C', field2='Z' WHERE id=3;
INSERT INTO table (id, field, field2)
SELECT 3, 'C', 'Z'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE id=3);
I understand I have to perform it in a transaction and that I should use an explicit locking method. I have been reading the postgresql documentation, but I am still hesitating between three types of locks (the difference between each is not crystal clear for me):
ROW EXCLUSIVE
SHARE ROW EXCLUSIVE
EXCLUSIVE
I would like to avoid having to retry the transaction. I am not expecting much concurrency on the rows, though this could happen from time to time. There might be simultaneous attempts at deleting and upserting the same row in rare cases.
You need self-exclusive lock type that also excludes all DML commands.
The correct lock type for this operation is EXCLUSIVE. It conflicts with all other locks, including its self, except ACCESS SHARE.
ACCESS SHARE is taken by SELECT. All other commands require higher locking levels. So your upsert transaction will then block everything except SELECT, which is what you want.
See the docs on explicit locking.
So:
BEGIN;
LOCK TABLE ... IN EXCLUSIVE MODE;
...
(The historical, and awful, naming of some table level locks using the word ROW does not ease understanding of PostgreSQL's lock types).
BTW, your application should check the rowcount returned from the UPDATE. If it's non-zero, it can skip the INSERT.

What is the purpose of ROWLOCK on Delete and when should I use it?

Ex)
When should I use this statement:
DELETE TOP (#count)
FROM ProductInfo WITH (ROWLOCK)
WHERE ProductId = #productId_for_del;
And when should be just doing:
DELETE TOP (#count)
FROM ProductInfo
WHERE ProductId = #productId_for_del;
The with (rowlock) is a hint that instructs the database that it should keep locks on a row scope. That means that the database will avoid escalating locks to block or table scope.
You use the hint when only a single or only a few rows will be affected by the query, to keep the lock from locking rows that will not be deleted by the query. That will let another query read unrelated rows at the same time instead of having to wait for the delete to complete.
If you use it on a query that will delete a lot of rows, it may degrade the performance as the database will try to avoid escalating the locks to a larger scope, even if it would have been more efficient.
Normally you shouldn't need to add such hints to a query, because the database knows what kind of lock to use. It's only in situations where you get performance problems because the database made the wrong decision, that you should add such hints to a query.
Rowlock is a query hint that should be used with caution (as is all query hints).
Omitting it will likely still result in the exact same behaviour and providing it will not guarantee that it will only use a rowlock, it is only a hint afterall. If you do not have a very in depth knowledge of lock contention chances are that the optimizer will pick the best possible locking strategy, and these things are usually best left to the database engine to decide.
ROWLOCK means that SQL will lock only the affected row, and not the entire table or the page in the table where the data is stored when performing the delete. This will only affect other people reading from the table at the same time as your delete is running.
If a table lock is used it will cause all queries to the table to wait until your delete has completed, with a row lock only selects reading the specific rows will be made to wait.
Deleting top N where N is a number of rows will most likely lock the table in any case.
SQL Server defaults to page locks. This is the most efficient way for SQL server to process multiple date sets. But SQL server is not multi-user friendly sometimes; therefore you may need to incorporate locking methods so you can get your data to flow in and out of the database. This is why people approach that problem by using locking hints.
If everyone designed there database tables so that everything processed each row at page width - the system would be very fast. But no one spends that detailed amount of time.
So, you might see people use with(nolock) on their SELECT statements and the use of with(rowlock) on their UPDATE and DELETE statements. An INSERT does not matter because it will lock the PAGE automatically. Sometimes by using with(rowlock), you can get better multi-user (multiple user connections) performance.
The problem with(nolock) is that you can return the committed record sitting there in the DB already, plus the dirty record that is about to update the sitting record; thus a double return of records to your SELECT statement. If you know the personality of your system on how the data runs through it, you can use with(nolock) to your advantage quite a bit though.
When do you know when to use with(rowlock)? When your system isn't letting user play nice with each other in the same table / record. Though, query re-write / tune first and then adjust your locking as a last resort.
But as a DBA, always blame the developer's code. It is your solemnly sworn duty to do such. If you are the developer writing this code, just blame yourself.

When it comes to updating all rows in a table, does the method of locking matter for performance?

Question is a follow up to this.
The SQL in question was
UPDATE stats SET visits = (visits+1)
And the question is, for the purpose of performance, does it matter if you lock all rows in stats in comparison to locking the table stats? Or, if the database uses a page-lock rather than a table/row lock?
There is no predicate on this. Any self respecting DB engine should work this out and realise all rows need updated.
Generally, don't second guess the DB engine: performance is subjectively the same.
Personally,
I'd not use table or locking hints unless I have to and know why I'm doing it.
I'd not issue a query like this anyway from an application without a WHERE clause
In theory you should lock the table, because 1 lock is cheaper than 1M locks.
Many DBs, though, will promote locks for operations like this. As they see the locks expanding, they'll automatically promote to page and table locks.
But, as with anything, "it depends", and it's better to be specific and lock the table yourself.
Edit:
sigh
Postgres example:
LOCK TABLE mytable IN EXCLUSIVE MODE;
UPDATE mytable SET field = field + 1;
COMMIT;
Here's the deal. This is going to happen ANYWAY, the LOCK TABLE command makes it more explicit, and ensures that your intent, locking the table, is clear and capable before the process takes place.
Would I do this on a 10 row table? No.
Would I do this on a database that I KNEW I had exclusive access to? No, there's no need.
Would I do this on a operational database with a table with a large amount a rows? You bet.

Best practices for multithreaded processing of database records

I have a single process that queries a table for records where PROCESS_IND = 'N', does some processing, and then updates the PROCESS_IND to 'Y'.
I'd like to allow for multiple instances of this process to run, but don't know what the best practices are for avoiding concurrency problems.
Where should I start?
The pattern I'd use is as follows:
Create columns "lockedby" and "locktime" which are a thread/process/machine ID and timestamp respectively (you'll need the machine ID when you split the processing between several machines)
Each task would do a query such as:
UPDATE taskstable SET lockedby=(my id), locktime=now() WHERE lockedby IS NULL ORDER BY ID LIMIT 10
Where 10 is the "batch size".
Then each task does a SELECT to find out which rows it has "locked" for processing, and processes those
After each row is complete, you set lockedby and locktime back to NULL
All this is done in a loop for as many batches as exist.
A cron job or scheduled task, periodically resets the "lockedby" of any row whose locktime is too long ago, as they were presumably done by a task which has hung or crashed. Someone else will then pick them up
The LIMIT 10 is MySQL specific but other databases have equivalents. The ORDER BY is import to avoid the query being nondeterministic.
Although I understand the intention I would disagree on going to row level locking immediately. This will reduce your response time and may actually make your situation worse. If after testing you are seeing concurrency issues with APL you should do an iterative move to “datapage” locking first!
To really answer this question properly more information would be required about the table structure and the indexes involved, but to explain further.
DOL, datarow locking uses a lot more locks than allpage/page level locking. The overhead in managing all the locks and hence the decrease of available memory due to requests for more lock structures within the cache will decrease performance and counter any gains you may have by moving to a more concurrent approach.
Test your approach without the move first on APL (all page locking ‘default’) then if issues are seen move to DOL (datapage first then datarow). Keep in mind when you switch a table to DOL all responses on that table become slightly worse, the table uses more space and the table becomes more prone to fragmentation which requires regular maintenance.
So in short don’t move to datarows straight off try your concurrency approach first then if there are issues use datapage locking first then last resort datarows.
You should enable row level locking on the table with:
CREATE TABLE mytable (...) LOCK DATAROWS
Then you:
Begin the transaction
Select your row with FOR UPDATE option (which will lock it)
Do whatever you want.
No other process can do anything to this row until the transaction ends.
P. S. Some mention overhead problems that can result from using LOCK DATAROWS.
Yes, there is overhead, though i'd hardly call it a problem for a table like this.
But if you switch to DATAPAGES then you may lock only one row per PAGE (2k by default), and processes whose rows reside in one page will not be able to run concurrently.
If we are talking of table with dozen of rows being locked at once, there hardly will be any noticeable performance drop.
Process concurrency is of much more importance for design like that.
The most obvious way is locking, if your database doesn't have locks, you could implement it yourself by adding a "Locked" field.
Some of the ways to simplify the concurrency is to randomize the access to unprocessed items, so instead of competition on the first item, they distribute the access randomly.
Convert the procedure to a single SQL statement and process multiple rows as a single batch. This is how databases are supposed to work.