CURSOR vs. UPDATE - sql

A company uses a SQL Server database to store information about its customers and its business transactions. A new area code has been introduced for your city. The area code 111 remains the same for telephone numbers with prefixes that are less than 500. The numbers with prefixes that are 500 and greater will be assigned an area code of 222. All telephone numbers in the Phone column in the Customers table are stored as char(12) strings of the following format, ‘999-999-9999’. i must make the appropriate changes to the Customers table
as quickly as possible using the least administrative effort. Which one should I use ?
a.
UPDATE Customers SET Phone = ‘222-‘ + SUBSTRING(Phone,5,8)
FROM Customers WHERE SUBSTRING(Phone,1,3) = ‘111’
AND SUBSTRING(Phone,5,3) >= 500
b.
DECLARE PhoneCursor CURSOR FOR
SELECT Phone FROM Customers
WHERE SUBSTRING(Phone,1,3) = 111
AND SUBSTRING(Phone,5,3) >= 500
OPEN PhoneCursor
FETCH NEXT FROM PhoneCursor
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE Customers
SET Phone = ‘222’ + SUBSTRING(Phone,5,8)
WHERE CURRENT OF PhoneCursor
FETCH NEXT FROM PhoneCursor
END
CLOSE PhoneCursor
DEALLOCATE PhoneCursor

The big update will hold a transaction against the database for, potentially, a long time... locking things up and causing all kinds of havoc.
For this, I would recommend a cursor to spread that load out over a period of time.
I've also done a 'chunked' update... something like this:
DECLARE #Done bit = 0
WHILE #Done = 0
BEGIN
UPDATE TOP(10000)
Customers SET Phone = ‘222-‘ + SUBSTRING(Phone,5,8)
FROM Customers WHERE SUBSTRING(Phone,1,3) = ‘111’
AND SUBSTRING(Phone,5,3) >= 500
IF ##ROWCOUNT = 0
BEGIN
SET #Done = 1
END
END

The cursor would be far slower in any large dataset, talk about hours vice seconds or milliseconds. What it will do is not lock up the database from other users for as long at any one time.
This is, why with a large dataset, the batch approach can be best.
In general I would try the set-based approach first and run it on off hours if need be. Then I would try the set-based approach which runs a batch of the records only if the set-based is too slow.
If you have to go the cursor running one record at time, then there is probably something massively wrong with your database design. Cursors are generally the approach of last resort. Do not use them for inserts/updates or deletes.

Related

T-SQL ways to avoid potentialy updating the same row based on subquery results

I have a SQL Server table with records (raw emails) that needs to be processed (build the email and send it) in a given order by an external process (mailer). Its not very resource intensive but can take a while with all the parsing and SMTP overhead etc.
To speed things up I can easily run multiple instance of the mailer process over multiple servers but worry that if two were to start at almost the same time they might still overlap a bit and send the same records.
Simplified for the question my table look something like this with each record having the data for the email.
queueItem
======================
queueItemID PK
...data...
processed bit
priority int
queuedStart datetime
rowLockName varchar
rowLockDate datetime
Batch 1 (Server 1)
starts at 12:00PM
lock/reserve the first 5000 rows (1-5000)
select the newly reserved rows
begin work
Batch 2 (Server 2)
starts at 12:15PM
lock/reserve the next 5000 rows (5001-10000)
select the newly reserved rows
begin work
To lock the rows I have been using the following:
declare #lockName varchar(36)
set #lockName = newid()
declare #batchsize int
set #batchsize = 5000
update queueItem
set rowLockName = #lockName,
rowLockDate = getdate()
where queueitemID in (
select top(#batchsize) queueitemID
from queueItem
where processed = 0
and rowLockName is null
and queuedStart <= getdate()
order by priority, queueitemID
)
If I'm not mistaken the query would start executing the SELECT subquery first and then lock the rows in preparation of the update, this is fast but not instantaneous.
My concern is that if I start two batches at near the same time (faster than the subquery runs) Batch 1's UPDATE might not be completed and Batch 2's SELECT would see the records as still available and attempt/succeed in overwriting Batch 1 (sort of race condition?)
I have ran some test but so far haven't had the issue with them overlapping, is it a valid concern that will come to haunt me at the worst of time?
Perhaps there are better way to write this query worth looking into as I am by no mean a T-SQL guru.

More Efficient Way to Copy Large Data from One Table to Another

We have a few tables with over 300,000 records and it takes several hours for records to be transferred/copied from one table to another table. Is there a more efficient effective way than using cursors and copying each record one by one from one table to another table, when dealing with large quantities of data?
Code:
open SOMETBL
fetch SOMETBL into #key1,#key2,#key3,#key4
while(##fetch_status = 0)
begin
SELECT #key1InMapping = count(*) FROM SOMEOTHERDB.dbo.tblSOMETBLping WHERE fldEServicesKey = #key1
SELECT #eServiceTypeKey = fldAServiceTypeKey FROM SOMEOTHERDB.dbo.tblAServiceType WHERE fldAServiceTypeNumber = #key4
if (#eServiceTypeKey=null or #eServiceTypeKey=0)
set #eServiceTypeKey = 50
if #key1InMapping>0
begin
update SOMEOTHERDB.dbo.tblSOMETBLping set fldAServiceTypeKey=#eServiceTypeKey where fldEServicesKey= #key1
-- print 'post='+convert(varchar,#key2) + ' :key1='+convert(varchar,#key1)+ ' :serviceTypeKey='+convert(varchar,#eServiceTypeKey)+' : serviceTypeNum='+convert(varchar,#key4)
end
fetch SOMETBL into #key1,#key2,#key3,#key4
end
close SOMETBL
300,000 records is a small amount of data in database terms. But it is too large to be using cursors which really should not be used for more than a couple of hundred records and frankly once you learn to write set-based code, it is generally shorter and takes less time to write than a cursor anyway. So I would never use a cursor as a first choice, it is a technique of last resort. You should not be thinking about inserting/updating or deleting one record at a time in a cursor. You use set-based operations. Now with 300,000 records you might want to consider a combination of set-based and cursoring where you process a group of recorss at a time (say 10,000) rather than cursor one a time.
Checke out the following for details on how to change a cursor to set-based code.
http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them

What does this do?

Once in a while, I need to clear out the anonymous user profiles from the database. A colleague has suggested I use this procedure because it allows a little breathing space from time to time for other procedures to run.
WHILE EXISTS (SELECT * FROM aspnet_users WITH (NOLOCK)
WHERE userID IN (SELECT UserID FROM #AspnetUsersToDelete))
BEGIN
SET ROWCOUNT 1000
DELETE FROM aspnet_users WHERE userID IN (SELECT UserID FROM #AspnetUsersToDelete )
print 'aspnet_Users deleted: ' + CONVERT(varchar(255), ##ROWCOUNT)
SET ROWCOUNT 0
WAITFOR DELAY '00:00:01'
END
This is the first time I've seen the NOLOCK keyword used and the logic for the rowcount seems backwards to me. Does anyone else use a similar sort of technique for providing windows in long running procedures and is this the best way of doing things?
Any time I anticipate deleting a very large number of rows, I'll do something similar to this to keep transaction batch sizes reasonable.
For SQL Server 2005+, you could use DELETE TOP (1000)... instead of the SET ROWCOUNT statements. I usually do:
SELECT NULL; /* Fudge ##ROWCOUNT value for first time in loop */
WHILE (##ROWCOUNT <> 0) BEGIN
DELETE TOP (1000)
...
END /* WHILE */
The SET ROWCOUNT 1000 means it will only process one thousand rows in the following statements (i.e., DELETE statement). SET ROWCOUNT 0 means each statement processes however many rows are relevant.
So basically, over all it deletes one thousand rows, waits a second, deletes another thousand, and continues that until there are no more to delete.
The WITH (NOLOCK) prevents the data from being locked, meaning that multiple queries running simultaneously can access the data. This allows your query to be a little faster. For more information about NOLOCK, consult the following link:
http://www.mollerus.net/tom/blog/2008/03/using_mssqls_nolock_for_faster_queries.html
(NOLOCK) allows dirty reads. Basically, there is a chance that if you are reading data out of the table while it is in the process of being updated, you could read the wrong data. You can also read data that has been modified by transactions that have not been committed yet as well as a slew of other problems.
Best practice is not to use NOLOCK unless you are reading from tables that really don't change (such as a table containing states) or from a data warehouse type DB that is not constantly updated.

NHibernate: Select MAX value concurrently

Suppose I need to select max value as order number. Thus I'll select MAX(number), assign number to order, and save changes to database. However, how do I prevent others from messing with the number? Will transactions do? Something like:
ordersRepository.StartTransaction();
order.Number = ordersRepository.GetMaxNumber() + 1;
ordersRepository.Commit();
Will the code above "lock" changes so that order numbers are read/write only by one DB client? Given that transactions are plain NHibernate ones, and GetMaxNumber just does SELECT MAX(Number) FROM Orders.
Using an ITransaction with IsolationLevel.Serializable should do the job. Be careful of table contention, though. If you've got high frequency updates on the table, things could slow down big time. You might want to profile the hit on the db when using GetMaxNumber().
I had to do something similar to generate custom IDs for high concurrency usage. My solution moved the ID generation into the database, and used a separate Counter table to hold the max values.
Using a separate Counter table has a couple of plus points:
It removes the contention on the Order table
It's usually faster
If it's small enough, It can be pinned into memory
I also used a stored proc to return the next available ID:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
UPDATE [COUNTER] SET Value = Value + 1 WHERE COUNTER_ID = #counterId
COMMIT TRAN
RETURN [NEW_VALUE]
Hope that helps.

sql table cell modified by multiple threads at the same time

if you have table BankAccount that has a column Amount and the value for this column for a specific row can be modified by multiple threads at the same time so it could happen so that the last one to set the value will win.
how do you usually handle this kind of situations ?
UPDATE: I heard that in MSSQL there is update lock UPDLOCK that locks the table or the row that is being updated, could I this here somehow ?
An update statement which references the current value would prevent overwriting. So, instead of doing something like
SELECT Amount FROM BankAccount WHERE account_id = 1
(it comes back as 350 and you want to subtract 50)...
UPDATE BankAccount SET Amount = 300 WHERE account_id = 1
do
UPDATE BankAccount SET Amount = Amount - 50 WHERE account_id = 1
You cannot have several threads modifying the same data at exactly the same time : it will always be the last one which set the value that'll "win".
If the problem is that several threads read and set the value at almost the same time, and reads and writes don't arrive on the right order, the solution is to use Transactions :
start a transaction
read the value
set the new value
commit the transaction
This ensures the read and the write will be done consistently, and no other thread will be able to modify the data during the same transaction.
Quoting the wikipedia page about Database Transactions :
A database transaction comprises a
unit of work performed within a
database management system (or similar
system) against a database, and
treated in a coherent and reliable way
independent of other transactions.
Transactions in a database environment
have two main purposes:
To provide reliable units of work that allow correct recovery from
failures and keep a database
consistent even in cases of system
failure, when execution stops
(completely or partially) and many
operations upon a database remain
uncompleted, with unclear status.
To provide isolation between programs accessing a database
concurrently. Without isolation the
programs' outcomes are typically
erroneous.
You ussually use transactions to overcome this.
Have a look at Database transaction
You should have a database function/procedure which makes operations with the "Amount". This function/procedure should return if the operation was succeeded or failed (for example, you want take $1000, but current AMount is only $550, so operation can not be proceede).
Expamle in T-SQL:
UPDATE BankAccount SET Amount = Amount - 1000 WHERE BankAcountID = 12345 AND Amount >= 1000
RETURN ##ROWCOUNT
If the amaount was changed, the return value will be 1, otherwise 0.
Know, you can safely run this functions/procedures (in several threads too):
DECLARE #Result_01 int, Result_02 int, Result_03 int
EXEC #Result_01 = ChangeBankAccountAmount #BankAcountID = 12345, #Amount = 1000
EXEC #Result_02 = ChangeBankAccountAmount #BankAcountID = 12345, #Amount = 15
EXEC #Result_03 = ChangeBankAccountAmount #BankAcountID = 12345, #Amount = 600, #Amount = -2000
EDIT:
Whole procedure in T-SQL:
CRATE PROC ChangeBankAccountAmount
#BankAccountID int,
#ChangeAmount int,
#MMinAmount int = 0
AS BEGIN
IF #ChangeAmount >= 0
UPDATE BankAccount SET Amount = Amount + #ChangeAmount WHERE BankAcountID = 12345
ELSE
UPDATE BankAccount SET Amount = Amount + #ChangeAmount WHERE BankAcountID = 12345 AND Amount >= #MMinAmount
RETURN ##ROWCOUNT
END
Of course - the "int" datatype is not good for money, you should change it to datatype used in your table.