What does this do? - sql

Once in a while, I need to clear out the anonymous user profiles from the database. A colleague has suggested I use this procedure because it allows a little breathing space from time to time for other procedures to run.
WHILE EXISTS (SELECT * FROM aspnet_users WITH (NOLOCK)
WHERE userID IN (SELECT UserID FROM #AspnetUsersToDelete))
BEGIN
SET ROWCOUNT 1000
DELETE FROM aspnet_users WHERE userID IN (SELECT UserID FROM #AspnetUsersToDelete )
print 'aspnet_Users deleted: ' + CONVERT(varchar(255), ##ROWCOUNT)
SET ROWCOUNT 0
WAITFOR DELAY '00:00:01'
END
This is the first time I've seen the NOLOCK keyword used and the logic for the rowcount seems backwards to me. Does anyone else use a similar sort of technique for providing windows in long running procedures and is this the best way of doing things?

Any time I anticipate deleting a very large number of rows, I'll do something similar to this to keep transaction batch sizes reasonable.
For SQL Server 2005+, you could use DELETE TOP (1000)... instead of the SET ROWCOUNT statements. I usually do:
SELECT NULL; /* Fudge ##ROWCOUNT value for first time in loop */
WHILE (##ROWCOUNT <> 0) BEGIN
DELETE TOP (1000)
...
END /* WHILE */

The SET ROWCOUNT 1000 means it will only process one thousand rows in the following statements (i.e., DELETE statement). SET ROWCOUNT 0 means each statement processes however many rows are relevant.
So basically, over all it deletes one thousand rows, waits a second, deletes another thousand, and continues that until there are no more to delete.
The WITH (NOLOCK) prevents the data from being locked, meaning that multiple queries running simultaneously can access the data. This allows your query to be a little faster. For more information about NOLOCK, consult the following link:
http://www.mollerus.net/tom/blog/2008/03/using_mssqls_nolock_for_faster_queries.html

(NOLOCK) allows dirty reads. Basically, there is a chance that if you are reading data out of the table while it is in the process of being updated, you could read the wrong data. You can also read data that has been modified by transactions that have not been committed yet as well as a slew of other problems.
Best practice is not to use NOLOCK unless you are reading from tables that really don't change (such as a table containing states) or from a data warehouse type DB that is not constantly updated.

Related

Select query hangs when a large number of rows have an update lock

I am designing a program that will read a queue table. I am using locking to make sure that multiple instances do not pull the same row.
I am locking rows like this:
BEGIN TRANSACTION
UPDATE top(10) Source with (ROWLOCK, READPAST, updlock)
SET status = status + 1
With another connection I read the rows like this:
SELECT COUNT(*) FROM Source WITH (ROWLOCK, READPAST, updlock)
The count from the select statement does not include the rows I have locked. This is exactly what I want.
This works fine when I pick the top 10 rows, or 100, or even 1000. Somewhere around 4,690 (it's not consistent) the select begins to hang until the transaction is committed. It's not just slow; it waits for the transaction to end.
This is a test. My real query will not be using top. It will use a join which also causes the problem when too many rows are locked.
Any ideas on what may cause this?
Is there a better way to have multiple instances read a table and not have conflicts?

SQL Server - READPAST, UPDLOCK update method?

We're in need of yet another massive update which as it is, would require downtime because of the risk of extensive locking problems. Basically, we'd like to update hundreds of millions of rows during business hours.
Now, reducing the updates to manageable < 5000 batch sizes helps, but I was wondering if it was feasible to create a template to only read and lock available rows, udpate them, and move on to the next batch? The idea is that this way we could patch some 95% of the data with minimal risk, after which the remaining set of data would be small enough to just update at once during a slower period while watching out for locks.
Yes, I know this sounds weird, but bear with me. How would one go about doing this?
I was thinking of something like this:
WHILE ##ROWCOUNT > 0
BEGIN
UPDATE TOP (5000) T
SET T.VALUE = 'ASD'
FROM MYTABLE T
JOIN (SELECT TOP 5000 S.ID
FROM MYTABLE S WITH (READPAST, UPDLOCK)
WHERE X = Y AND Z = W etc...) SRC
ON SRC.ID = T.ID
END
Any ideas? Basically, the last thing I want is for this query to get stuck in other potentially long-running transactions, or to do the same to others in return. So what I'm looking for here is a script that will skip locked rows, update what it can with minimal risk for getting involved in locks or deadlocks, so it can be safely run for the hour or so during uptime.
Just add WITH (READPAST) to the table for single-table updates:
UPDATE TOP (5000) MYTABLE WITH (READPAST)
SET VALUE = 'ASD'
WHERE X = Y AND Z = W etc...
If you are lucky enough to have a single table involved you can just add WITH (READPAST) and the UPDATE itself will add an exclusive lock on just the rows that get updated.
If there is more than one table involved, it may become more complicated. Also be very careful of the WHERE clause because that could add more load than expected - the first few batches are fine but become progressively worse if scanning the whole table is necessary to find enough rows to satisfy the TOP. You might want to consider a short timeout value for each batch.

Running large queries in the background MS SQL

I am using MS SQL Server 2008
i have a table which is constantly in use (data is always changing and inserted to it)
it contains now ~70 Mill rows,
I am trying to run a simple query over the table with a stored procedure that should properly take a few days,
I need the table to keep being usable, now I executed the stored procedure and after a while every simple select by identity query that I try to execute on the table is not responding/running too much time that I break it
what should I do?
here is how my stored procedure looks like:
SET NOCOUNT ON;
update SOMETABLE
set
[some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
WHERE
[some_col] = 243
even if i try it with this on the where clause (with an 'and' logic..) :
ID_COL > 57000000 and ID_COL < 60000000 and
it still doesn't work
BTW- SomeFunction does some simple mathematics actions and looks up rows in another table that contains about 300k items, but is never changed
From my perspective your server has a serious performance problem. Even if we assume that none of the records in the query
select some_col with (nolock) where id_col between 57000000 and 57001000
was in memory, it shouldn't take 21 seconds to read the few pages sequentially from disk (your clustered index on the id_col should not be fragmented if it's an auto-identity and you didn't do something stupid like adding a "desc" to the index definition).
But if you can't/won't fix that, my advice would be to make the update in small packages like 100-1000 records at a time (depending on how much time the lookup function consumes). One update/transaction should take no more than 30 seconds.
You see each update keeps an exclusive lock on all the records it modified until the transaction is complete. If you don't use an explicit transaction, each statement is executed in a single, automatic transaction context, so the locks get released when the update statement is done.
But you can still run into deadlocks that way, depending on what the other processes do. If they modify more than one record at a time, too, or even if they gather and hold read locks on several rows, you can get deadlocks.
To avoid the deadlocks, your update statement needs to take a lock on all the records it will modify at once. The way to do this is to place the single update statement (with only the few rows limited by the id_col) in a serializable transaction like
IF ##TRANCOUNT > 0
-- Error: You are in a transaction context already
SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
-- Insert Loop here to work "x" through the id range
BEGIN TRANSACTION
UPDATE SOMETABLE
SET [some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
WHERE [some_col] = 243 AND id_col BETWEEN x AND x+500 -- or whatever keeps the update in the small timerange
COMMIT
-- Next loop
-- Get all new records while you where running the loop. If these are too many you may have to paginate this also:
BEGIN TRANSACTION
UPDATE SOMETABLE
SET [some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
WHERE [some_col] = 243 AND id_col >= x
COMMIT
For each update this will take an update/exclusive key-range lock on the given records (but only them, because you limit the update through the clustered index key). It will wait for any other updates on the same records to finish, then get it's lock (causing blocking for all other transactions, but still only for the given records), then update the records and release the lock.
The last extra statement is important, because it will take a key range lock up to "infinity" and thus prevent even inserts on the end of the range while the update statement runs.

Updating 100k records in one update query

Is it possible, or recommended at all, to run one update query, that will update nearly 100k records at once?
If so, how can I do that? I am trying to pass an array to my stored proc, but it seems not to work, this is my SP:
CREATE PROCEDURE [dbo].[UpdateAllClients]
#ClientIDs varchar(max)
AS
BEGIN
DECLARE #vSQL varchar(max)
SET #vSQL = 'UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN (' + #ClientIDs + ')';
EXEC(#vSQL);
END
I have not idea whats not working, but its just not updating the relevant queries.
Anyone?
The UPDATE is reading your #ClientIDs (as a Comma Separated Value) as a whole. To illustrate it more, you are doing like this.
assume the #ClientIDs = 1,2,3,4,5
your UPDATE command is interpreting it like this
UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN ('1,2,3,4,5')';
and not
UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN (1,2,3,4,5)';
One suggestion to your question is by using subquery on your UPDATE, example
UPDATE Clients
SET LastUpdate = GETDATE()
WHERE ID IN
(
SELECT ID
FROM tableName
-- where condtion
)
Hope this makes sense.
A few notes to be aware of.
Big updates like this can lock up the target table. If > 5000 rows are affected by the operation, the individual row locks will be promoted to a table lock, which would block other processes. Worth bearing in mind if this could cause an issue in your scenario. See: Lock Escalation
With a large number of rows to update like this, an approach I'd consider is (basic):
bulk insert the 100K Ids into a staging table (e.g. from .NET, use SqlBulkCopy)
update the target table, using a join onto the above staging table
drop the staging table
This gives some more room for controlling the process, but breaking the workload up into chunks and doing it x rows at a time.
There is a limit for the number of items you pass to 'IN' if you are giving an array.
So, if you just want to update the whole table, skip the IN condition.
If not specify an SQL inside IN. That should do the job
The database will very likely reject that SQL statement because it is too long.
When you need to update so many records at once, then maybe your database schema isn't appropriate. Maybe the LastUpdate datum should not be stored separately for each client but only once globally or only once for a constant group of clients?
But it's hard to recommend a good course of action without seeing the whole picture.
What version of sql server are you using? If it is 2005+ I would recommend using TVPs (table valued parameters - http://msdn.microsoft.com/en-us/library/bb510489.aspx). The transfer of data will be faster (as opposed to building a huge string) and your query would look nicer:
update c
set lastupdate=getdate()
from clients c
join #mytvp t on c.Id = t.Id
Each SQL statement on its own is a transaction statement . This means sql server is going to grab locks for all these million of rows .It can really degrade the performance of a table .So you really don’t tend to update a table which has million of rows in it which hurts the performance.So the workaround is to set rowcount before DML operation
set rowcount=100
UPDATE Clients SET LastUpdate=GETDATE()
WHERE ID IN ('1,2,3,4,5')';
set rowcount=0
or from SQL server 2008 you can parametrize Top keyword
Declare #value int
set #value=100000
again:
UPDATE top (#value) Clients SET LastUpdate=GETDATE()
WHERE ID IN ('1,2,3,4,5')';
if ##rowcount!=0 goto again
See how fast the above query takes and then adjust and change the value of the variable .You need to break the tasks for smaller units as suggested in the above answers
Method 1:
Split the #clientids with delimiters ','
put in array and iterate over that array
update clients table for each id.
OR
Method 2:
Instead of taking #clientids as a varchar2, follow below steps
create object type table for ids and use join.
For faster processing u can also create index on clientid as well.

NHibernate: Select MAX value concurrently

Suppose I need to select max value as order number. Thus I'll select MAX(number), assign number to order, and save changes to database. However, how do I prevent others from messing with the number? Will transactions do? Something like:
ordersRepository.StartTransaction();
order.Number = ordersRepository.GetMaxNumber() + 1;
ordersRepository.Commit();
Will the code above "lock" changes so that order numbers are read/write only by one DB client? Given that transactions are plain NHibernate ones, and GetMaxNumber just does SELECT MAX(Number) FROM Orders.
Using an ITransaction with IsolationLevel.Serializable should do the job. Be careful of table contention, though. If you've got high frequency updates on the table, things could slow down big time. You might want to profile the hit on the db when using GetMaxNumber().
I had to do something similar to generate custom IDs for high concurrency usage. My solution moved the ID generation into the database, and used a separate Counter table to hold the max values.
Using a separate Counter table has a couple of plus points:
It removes the contention on the Order table
It's usually faster
If it's small enough, It can be pinned into memory
I also used a stored proc to return the next available ID:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
UPDATE [COUNTER] SET Value = Value + 1 WHERE COUNTER_ID = #counterId
COMMIT TRAN
RETURN [NEW_VALUE]
Hope that helps.