SQL Server Table locking within a store procedure without using a transaction - sql

I need to execute a store procedure which includes 3 queries referring to the same table:
MERGE …
SELECT X FROM TABLE WHERE { BLA BLA }
UPDATE Y FROM TABLE WHERE { BLA BLA }
The store procedure should be thread safe and should executed as an atomic operation.
Currently I am using a transaction with serializable isolation level and a WITH ( XLOCK, TABLOCK ) hint on every query.
Is there a way to sustain a table lock for the time span of the store procedure without using a transaction which causing a performance penalty?
Cheers,
Doron

Whenever a developer chooses TABLOCKX or (XLOCK, TABLOCK) it had instantaneously lost the right to ask questions about performance.
Is not the transaction that is causing performance penalties. Is holding the locks. So your question really is:
Is there a way to produce performance
penalties without the effect of
producing performance penalties?
The answer is no, there is not such way.

Doron, to accomplish what you want to do, can only be done using a transaction. I don't understand why you say a transaction causes a performance overhead, that is a completely untrue statement. Unless what you mean is that your transaction takes eg. 10 seconds, and in that 10 seconds other transactions are blocked.
Now I regularly work and design databases that has to sustain around 80K transactions per second, and doing this you learn a few tricks. What I would suggest you do is to take a step back and re-evaluate your query and table architecture, and if this is a highly transactional database, the first this I suggest is to get rid of any foreign key constraints, that is a performance hit on any transactional db.
The other thing is to look at indexes, do you have the right indexes, and are you perhaps over-indexing tables that have to be inserted into and updated? This can cause massive perf impacts!
Maybe can I suggest if you cannot re-architect tables etc, think outside the box a little, perhaps select the data you want (with nolock) into temp tables, and then perform your merges etc.
Perhaps if you give me a more concrete example, I can assist more.
But for now, tell me what you can and cannot do. Hope it helps!

Related

Is there any perfomance issues when inserting into a large SQL Server table which is being queried?

I use SQL Server. I got a large table - millions of rows. And I iterate through them (SELECT .. WHERE ..). This is a long operation (and I assume can't be shorter).
So what am I asking is if there will be any problems to insert data into that table in the progress of selecting? If yes, what should I do to reduce that? Same questing for update command (with indexed parameters of course).
Yes, you will have performance, and more specifically, locking and blocking issues. If your SELECT statements are using indexes, which they should be, these indexes will be locked every time that you INSERT data into the table. Since the table is relatively large, the lock will probably be long enough to block your SELECT statements, and deadlocks are likely as well.
This might be a scenario where you need to re-evaluate your table structure, and possibly even consider denormalizing to avoid this.
You might also consider Enabling Row Versioning-Based Isolation Levels, assuming that you can throughly test the rest of your system to understand the impact.
The answer is yes, absolutely. A simple solution (if it's an acceptable trade off within your application) is to specify the NOLOCK locking hint. IE:
select * from table with NOLOCK
The tradeoff is that you won't get a consistent read, but in many cases this isn't problem.
It's generally not a good idea to have long running queries on a database with frequent updates. This decrease performance significantly because of locking.
It might be a good idea to look into data warehouses and see if that is something that you could use. That would enable you to have the transactions on a separate database and the bulk load from it in to another database that would have your warehouse.
This would greatly improve performance for both inserts and queries. The trans-actional database could have no indexes, and the warehouse could have all the indexes you want.
You could also put the warehouse in a column store database. That would give you the best query time with the minimal effort because there isn't any need to create indexes in a column store, all you would have to do is to design the schema properly. The drawback with column stores is how ever that inserts, updates and deletes are very slow compared to relational databases. But bulk loading from the transactional database should do the trick. If you require the data to be very up to date, you could bulk load every few minutes. If you just need data from the previous day you could bulk load into the warehouse each night.
The possibilities are endless. If you want to look into column store warehouses you could try MonetDB. Its an open source column store so you could try it out and see if that's anything that suits you.
Do not assume execution time can't be shorter. If you query a date range, an index on date is a must!
Solve your problem indexing on date field:
-- please use correct names for your_table and date_field --
CREATE INDEX index_name ON your_table date_field
Warehousing, as per #Gisli, is a good option: build a copy of the data elsewhere, and run your long-running queries there, freeing up the "main" database for OLTP processing.
If this is not an option, you can mess around with snapshot isolation (something I know about, but have never worked with personally). Esssentially, this will take a "snapshot" of the database at the point in time you start the query, and will execute the query as if no subsequent changes were made to the database, even if changes are made to the database while the query is running. More importantly, any such changes are "real" and permanent. Think of it like a short-term branching of your database.
The duration of the branch (snapshot) is where I get weak. I believe you can have the snapshot last for the duration of the query, which means you'd (possibly) never be able to get the same results for a given query twice (if the data changes while you are running it); or you can create a "saved" snapshot that can be re-used over and over until you get around to deleting it. Be wary with this, you don't want your system to get cluttered up with old forgotten branches of past data!
There is no PROBLEM. SQL Serve is built to deal with this kind of situations, you just need to set the correct isolation level on the transactions.
There are several possible scenarios, for example, if you don't mind reading the data that is being inserted, set your isolation level to read uncommited on your read transaction. If you are inserting values in a range and reading values on another range, you can use SERIALIZABLE.
Take a look at the possible isolation levels:
http://msdn.microsoft.com/en-us/library/ms173763.aspx

What is the purpose of ROWLOCK on Delete and when should I use it?

Ex)
When should I use this statement:
DELETE TOP (#count)
FROM ProductInfo WITH (ROWLOCK)
WHERE ProductId = #productId_for_del;
And when should be just doing:
DELETE TOP (#count)
FROM ProductInfo
WHERE ProductId = #productId_for_del;
The with (rowlock) is a hint that instructs the database that it should keep locks on a row scope. That means that the database will avoid escalating locks to block or table scope.
You use the hint when only a single or only a few rows will be affected by the query, to keep the lock from locking rows that will not be deleted by the query. That will let another query read unrelated rows at the same time instead of having to wait for the delete to complete.
If you use it on a query that will delete a lot of rows, it may degrade the performance as the database will try to avoid escalating the locks to a larger scope, even if it would have been more efficient.
Normally you shouldn't need to add such hints to a query, because the database knows what kind of lock to use. It's only in situations where you get performance problems because the database made the wrong decision, that you should add such hints to a query.
Rowlock is a query hint that should be used with caution (as is all query hints).
Omitting it will likely still result in the exact same behaviour and providing it will not guarantee that it will only use a rowlock, it is only a hint afterall. If you do not have a very in depth knowledge of lock contention chances are that the optimizer will pick the best possible locking strategy, and these things are usually best left to the database engine to decide.
ROWLOCK means that SQL will lock only the affected row, and not the entire table or the page in the table where the data is stored when performing the delete. This will only affect other people reading from the table at the same time as your delete is running.
If a table lock is used it will cause all queries to the table to wait until your delete has completed, with a row lock only selects reading the specific rows will be made to wait.
Deleting top N where N is a number of rows will most likely lock the table in any case.
SQL Server defaults to page locks. This is the most efficient way for SQL server to process multiple date sets. But SQL server is not multi-user friendly sometimes; therefore you may need to incorporate locking methods so you can get your data to flow in and out of the database. This is why people approach that problem by using locking hints.
If everyone designed there database tables so that everything processed each row at page width - the system would be very fast. But no one spends that detailed amount of time.
So, you might see people use with(nolock) on their SELECT statements and the use of with(rowlock) on their UPDATE and DELETE statements. An INSERT does not matter because it will lock the PAGE automatically. Sometimes by using with(rowlock), you can get better multi-user (multiple user connections) performance.
The problem with(nolock) is that you can return the committed record sitting there in the DB already, plus the dirty record that is about to update the sitting record; thus a double return of records to your SELECT statement. If you know the personality of your system on how the data runs through it, you can use with(nolock) to your advantage quite a bit though.
When do you know when to use with(rowlock)? When your system isn't letting user play nice with each other in the same table / record. Though, query re-write / tune first and then adjust your locking as a last resort.
But as a DBA, always blame the developer's code. It is your solemnly sworn duty to do such. If you are the developer writing this code, just blame yourself.

When it comes to updating all rows in a table, does the method of locking matter for performance?

Question is a follow up to this.
The SQL in question was
UPDATE stats SET visits = (visits+1)
And the question is, for the purpose of performance, does it matter if you lock all rows in stats in comparison to locking the table stats? Or, if the database uses a page-lock rather than a table/row lock?
There is no predicate on this. Any self respecting DB engine should work this out and realise all rows need updated.
Generally, don't second guess the DB engine: performance is subjectively the same.
Personally,
I'd not use table or locking hints unless I have to and know why I'm doing it.
I'd not issue a query like this anyway from an application without a WHERE clause
In theory you should lock the table, because 1 lock is cheaper than 1M locks.
Many DBs, though, will promote locks for operations like this. As they see the locks expanding, they'll automatically promote to page and table locks.
But, as with anything, "it depends", and it's better to be specific and lock the table yourself.
Edit:
sigh
Postgres example:
LOCK TABLE mytable IN EXCLUSIVE MODE;
UPDATE mytable SET field = field + 1;
COMMIT;
Here's the deal. This is going to happen ANYWAY, the LOCK TABLE command makes it more explicit, and ensures that your intent, locking the table, is clear and capable before the process takes place.
Would I do this on a 10 row table? No.
Would I do this on a database that I KNEW I had exclusive access to? No, there's no need.
Would I do this on a operational database with a table with a large amount a rows? You bet.

Best practice for archiving a huge table of over 1,000,000,000 rows

I'm using SQL Server 2005. There is an audit trail table, containing over 1,000,000,000 rows. I'm planning to archive this table. When I make a simple select with nolock, I can still find blocking (probably IO blocking with other process?). So are there any best practice for this kind of situation?
For a table that large you will be wanting to find some effective sharding/partitioning strategy. Archiving in this sense tends to be a form of partitioning but not a good one since you often want to query over the current and archive anyway. In the worst cases you end up with a SELECT over a UNION of the archive and current tables, which is worse than if you hadn't split them at all.
You will often do better by finding some other means to slice the data, say on a record type or something. But if you are going to split it by date make absolutely sure you won't query over the archive+current data set.
Also, SQL Server 2005+ doesn't by default enable MVCC. It can do this however if you enable what MS calls Snapshot Isolation. See Serializable vs. Snapshot Isolation Level.
The effect of not having this enabled is that an uncommitted INSERT or UPDATE will block a SELECT in another transaction until the first transaction commits or rolls back. That can cause unnecessary locks and limit your scalability.
Create a backup of the database and restore it in the archive location.
Selecting 1 billion rows all at once is going to strain the server no matter how you do it.
Do it in batches instead, say 1000 rows at a time. The bcp tool does this automatically. Or use SSIS to copy the data into another database - it does pretty much the same thing.

Best practices for multithreaded processing of database records

I have a single process that queries a table for records where PROCESS_IND = 'N', does some processing, and then updates the PROCESS_IND to 'Y'.
I'd like to allow for multiple instances of this process to run, but don't know what the best practices are for avoiding concurrency problems.
Where should I start?
The pattern I'd use is as follows:
Create columns "lockedby" and "locktime" which are a thread/process/machine ID and timestamp respectively (you'll need the machine ID when you split the processing between several machines)
Each task would do a query such as:
UPDATE taskstable SET lockedby=(my id), locktime=now() WHERE lockedby IS NULL ORDER BY ID LIMIT 10
Where 10 is the "batch size".
Then each task does a SELECT to find out which rows it has "locked" for processing, and processes those
After each row is complete, you set lockedby and locktime back to NULL
All this is done in a loop for as many batches as exist.
A cron job or scheduled task, periodically resets the "lockedby" of any row whose locktime is too long ago, as they were presumably done by a task which has hung or crashed. Someone else will then pick them up
The LIMIT 10 is MySQL specific but other databases have equivalents. The ORDER BY is import to avoid the query being nondeterministic.
Although I understand the intention I would disagree on going to row level locking immediately. This will reduce your response time and may actually make your situation worse. If after testing you are seeing concurrency issues with APL you should do an iterative move to “datapage” locking first!
To really answer this question properly more information would be required about the table structure and the indexes involved, but to explain further.
DOL, datarow locking uses a lot more locks than allpage/page level locking. The overhead in managing all the locks and hence the decrease of available memory due to requests for more lock structures within the cache will decrease performance and counter any gains you may have by moving to a more concurrent approach.
Test your approach without the move first on APL (all page locking ‘default’) then if issues are seen move to DOL (datapage first then datarow). Keep in mind when you switch a table to DOL all responses on that table become slightly worse, the table uses more space and the table becomes more prone to fragmentation which requires regular maintenance.
So in short don’t move to datarows straight off try your concurrency approach first then if there are issues use datapage locking first then last resort datarows.
You should enable row level locking on the table with:
CREATE TABLE mytable (...) LOCK DATAROWS
Then you:
Begin the transaction
Select your row with FOR UPDATE option (which will lock it)
Do whatever you want.
No other process can do anything to this row until the transaction ends.
P. S. Some mention overhead problems that can result from using LOCK DATAROWS.
Yes, there is overhead, though i'd hardly call it a problem for a table like this.
But if you switch to DATAPAGES then you may lock only one row per PAGE (2k by default), and processes whose rows reside in one page will not be able to run concurrently.
If we are talking of table with dozen of rows being locked at once, there hardly will be any noticeable performance drop.
Process concurrency is of much more importance for design like that.
The most obvious way is locking, if your database doesn't have locks, you could implement it yourself by adding a "Locked" field.
Some of the ways to simplify the concurrency is to randomize the access to unprocessed items, so instead of competition on the first item, they distribute the access randomly.
Convert the procedure to a single SQL statement and process multiple rows as a single batch. This is how databases are supposed to work.