We are attempting to run the a DELETE statement inside a WHILE loop (to avoid large transaction logs for lots of rows) as follows:
WHILE (##ROWCOUNT > 0)
BEGIN
DELETE TOP (250000)
FROM
MYDATABASE.MYSCHEMA.MYTABLE
WHERE
MYDATABASE.MYSCHEMA.MYTABLE.DATE_KEY = 20160301
END
When this command is executed inside a new SQL Server Management Studio connection in our development environment, it deletes rows in blocks of 250K, which is the expected behavior.
When this command is executed in the same way on our test server, we get the message
Command completed successfully
That is, the WHILE loop was not entered when the statement was run.
After some additional investigation, we have found that the behavior also varies depending on the database that we connect to. So if the code is run (in our test environment) while SQL Server Management Studio is connected to MYDATABASE, the DELETE statement does not run. If we run the code while connected to SOME_OTHER_DATABASE, it does.
We partially suspect that the value of ##ROWCOUNT is not reliable, and may be different for different connections. But when we run the code multiple times for each database & server combination, we see behavior that is 100% consistent. So random initial values of ##ROWCOUNT do not appear to explain things.
Any suggestions as to what could be going on here? Thanks for your help!
Edit #1
For those asking about the initial value of ##ROWCOUNT and where it is coming from, we're not sure. But in some cases ##ROWCOUNT is definitely being initialized to some value above zero, as the code works on a fresh connection as-is.
Edit #2
For those proposing the declaration of our own variable, for our particular application we are executing SQL commands via a programming language wrapper which only allows for the execution of one statement at a time (i.e., one semicolon).
We have previously tried to establish the value of ##ROWCOUNT by executing one delete statement prior to the loop:
Statement #1:
DELETE TOP (250000)
FROM
MYDATABASE.MYSCHEMA.MYTABLE
WHERE
MYDATABASE.MYSCHEMA.MYTABLE.DATE_KEY = 20160301
Statement #2 (##ROWCOUNT is presumably now 250,000):
WHILE (##ROWCOUNT > 0)
BEGIN
DELETE TOP (250000)
FROM
MYDATABASE.MYSCHEMA.MYTABLE
WHERE
MYDATABASE.MYSCHEMA.MYTABLE.DATE_KEY = 20160301
END
However, whatever is causing ##ROWCOUNT to take on a different value on start-up is also affecting the value between commands. So in some cases the second statement never executes.
You should not use a variable before you have set its value. That is equally true for system variables.
The code that you have is very dangerous. Someone could add something like SELECT 'Here I am in the loop' after the delete and it will break.
A better approach? Use your own variable:
DELCARE #RC int;
WHILE (#RC > 0 OR #RC IS NULL)
BEGIN
DELETE TOP (250000)
FROM MYDATABASE.MYSCHEMA.MYTABLE
WHERE MYDATABASE.MYSCHEMA.MYTABLE.DATE_KEY = 20160301;
SET #RC = ##ROWCOUNT;
END;
Where are you getting your initial ##ROWCOUNT from? I mean, you're never going to enter that block, because ##ROWCOUNT would be expected to be zero, so you'd never enter the loop. Also, deleting in 250K batches wouldn't change the size of your transaction log - all of the deletions will be logged if you're logging, so there's no benefit (and some penalty) for doing this w/in a loop.
Have you traced the session? Since ##ROWCOUNT returns the number of rows affected by the prior statement in the session, I would guess that either the last query SSMS executes as part of establishing the session returns a different number of rows in the two environments or that you have a login trigger in one or the other environments whose last statement returns a different number of rows. Either way, a trace should tell you exactly why the behavior is different.
Fundamentally, though, it makes no sense to refer to ##ROWCOUNT before you run the statement that you are interested in getting a count for. It's easy enough to fix this using a variable
DECLARE cnt integer = -1;
WHILE (cnt != 0)
BEGIN
DELETE TOP (250000)
FROM MYDATABASE.MYSCHEMA.MYTABLE
WHERE MYDATABASE.MYSCHEMA.MYTABLE.DATE_KEY = 20160301;
SET cnt = ##ROWCOUNT;
END
Related
I have a stored procedure that is called from business code. This code uses parallelism, so multiple instances of this SP could be running at the same time depending on some conditions.
There is some logic in this SP that I want to execute only once. I have a table (let's call it HISTORY) that holds a UID for the run and a DATETIME when this portion of the code is executed. Here's my flow:
SP BEGIN
-- some logic
IF certain conditions are met, check if HISTORY does not have an entry for the UID
1. Add an entry in HISTORY for the current UID
2. Run the once only code
SP END
The issue is that, at times, the logic above still gets executed multiple times if different instances reach that part at the same time. What can I do to ensure that it only runs once?
Thank you!
BEGIN TRANSACTION;
INSERT [HISTORY](UID, ...)
SELECT #UID, ...
WHERE NOT EXISTS (
SELECT * FROM [HISTORY] WITH (HOLDLOCK) WHERE UID = #UID
);
IF ##ROWCOUNT = 1 BEGIN;
-- we inserted, do logic that should run only once
END;
COMMIT;
HOLDLOCK (equivalent to running the transaction under SERIALIZABLE, but more granular) ensures that no other transaction running in parallel can insert an entry in HISTORY for that UID; any transaction that tries so will block until the first INSERT is finished and then return (since a row already exists). Ensure that an index on UID exists, otherwise it will lock a lot more than is healthy for performance.
Getting code like this right is always tricky, so make sure to test it in practice as well by stress-testing concurrent inserts for the same (and different) UID.
I know that, if I execute a single SQL statement that UPDATEs or DELETEs some data, that it will return the number of rows affected.
But if I have multiple SQL statements in a sql script, and I want to know the number of rows affected from the last statement executed, will it still return that automatically, or do I need a
SELECT ##RowCount
at the end of the script?
The code in question is not a Stored Procedure. Rather, it is a parameterized SQL script stored in an arbitrary location, executed using the ExecuteStoreCommand function in Entity Framework, as in:
var numberOfRowsAffected = context.ExecuteStoreCommand<int>(mySqlScript, parameters);
It depends on the NOCOUNT setting when executing your quer(y/ies).
If NOCOUNT is ON then no DONE_IN_PROC messages will NOT be returned.
If NOCOUNT is OFF, the default setting, then DONE_IN_PROC messages will be returned, (eg. counts).
Both of these situations are different to executing,
SELECT ##ROWCOUNT;
which will return a result set with a single scalar value, different from a DONE_IN_PROC message. This will occur, regardless of the setting of NOCOUNT.
I believe that SELECT ##ROWCOUNT is sometimes used to make Entity Framework "play" with more complex TSQL statements because EF both requires
Requires a count for post validation
And will accept a scalar number result set as a substitute for a DONE_IN_PROC message.
Its important that SELECT ##ROWCOUNT; is executed immediately after the last query statement because many statements will reset ##ROWCOUNT and therefore yield an unexpected result.
Just to be specific on answer part, you would need to add SELECT ##RowCount to return number of rows affected by last statement.
I think confusion might be due to rows returned in SSMS window while executing query.By default SSMS shows number of rows returned for all sql statements but it returns affected rows as message not a dataset.
##ROWCOUNT will automatically return number of rows effected by the last statement.
Please find the msdn link here
https://msdn.microsoft.com/en-us/library/ms187316.aspx
I am using SQL Server 2008.
I have a table A which accepts many insert/update in one seconds. After insert, update I want to get the number of rows affected.
INSERT INTO A (ID) VALUES (1)
IF ##ROWCOUNT = 0
PRINT 'NO ROWS AFFECTED'
While query is being executed, the same query may be called again by application. So what happens if the current execution is after INSERT but before IF block at that moment.
Do you think ##ROWCOUNT may give wrong result for that reason?
Or is it always safe in its context?
Yes - its safe. It always refers the previous operation in current query
BUT
if you want to know the number of rows affected, save it to variable first, because after IF statement the count ##ROWCOUNT resets
INSERT INTO A (ID) VALUES (1)
DECLARE #rc INT = ##ROWCOUNT
IF #rc = 0
PRINT 'NO ROWS AFFECTED'
ELSE
SELECT #rc AS RowsAffected
##ROWCOUNT is both scope and connection safe.
In fact, it reads only the last statement row count for that connection and scope. The full rules are here on MSDN (cursors, DML, EXECUTE etc)
To use it in subsequent statements, you need to store it in a local variable.
You must preserve the ##ROWCOUNT value in a local variable, otherwise after the IF statement its value will reset to zero:
SET #rowCount = ##ROWCOUNT
IF #rowCount = 0
PRINT 'NO ROWS AFFECTED'
Other than that, yes, it is safe.
Short answer: Yes.
However it worth to see the question in a perspective, for the deeper understanding why the answer yes is so natural without doubt.
SQL Server is prepared to handle concurrent access correctly by its nature, regardless if the client application is multithreaded or not. Unless this attribute SQL Server would be useless in any multiuser scenario. From point of view of the server it does not matter if the concurrent access caused by one multithreaded application, or two applications which are currently using the server concurrently by multiple users.
Regarding this point the ##rowcount is only the top of the iceberg, there are much more and deeper functionality what must be handled correctly when concurrent access is in the picture.
The most practical part of this area is transaction management and transaction isolation.
I'm running a set of sql queries and they are not reporting the row affect until all the queries have ran. Is there anyway i can get incremental feedback.
Example:
DECLARE #HowManyLastTime int
SET #HowManyLastTime = 1
WHILE #HowManyLastTime <> 2400000
BEGIN
SET #HowManyLastTime = #HowManyLastTime +1
print(#HowManyLastTime)
END
This doesn't show the count till the loop has finished. How do i make it show the count as it runs?
To flush recordcounts and other data to the client, you'll want to use RaisError with NOWAIT. Related questions and links:
PRINT statement in T-SQL
http://weblogs.sqlteam.com/mladenp/archive/2007/10/01/SQL-Server-Notify-client-of-progress-in-a-long-running.aspx
In SSMS this will work as expected. With other clients, you might not get a response from the client until the query execution is complete.
SQL tends to be 'set-based', and you are thinking procedurally and trying to make it act systematically. It really doesn't make sense to do this in SQL.
I would be asking you motivation for doing this, and is there anything better that can be tried.
I'm working on a procedure that will update a large number of items on a remote server, using records from a local database. Here's the pseudocode.
CREATE PROCEDURE UpdateRemoteServer
pre-processing
get cursor with ID's of records to be updated
while on cursor
process the item
No matter how much we optimize it, the routine is going to take a while, so we don't want the whole thing to be processed as a single transaction. The items are flagged after being processed, so it should be possible to pick up where we left off if the process is interrupted.
Wrapping the contents of the loop ("process the item") in a begin/commit tran does not do the trick... it seems that the whole statement
EXEC UpdateRemoteServer
is treated as a single transaction. How can I make each item process as a complete, separate transaction?
Note that I would love to run these as "non-transacted updates", but that option is only available (so far as I know) in 2008.
EXEC procedure does not create a transaction. A very simple test will show this:
create procedure usp_foo
as
begin
select ##trancount;
end
go
exec usp_foo;
The ##trancount inside usp_foo is 0, so the EXEC statement does not start an implicit transaction. If you have a transaction started when entering UpdateRemoteServer it means somebody started that transaction, I can't say who.
That being said, using remote servers and DTC to update items is going to perform quite bad. Is the other server also SQL Server 2005 at least? Maybe you can queue the requests to update and use messaging between the local and remote server and have the remote server perform the updates based on the info from the message. It would perform significantly better because both servers only have to deal with local transactions, and you get much better availability due to the loose coupling of queued messaging.
Updated
Cursors actually don't start transactions. The typical cursor based batch processing is usually based on cursors and batches updates into transactions of a certain size. This is fairly common for overnight jobs, as it allows for better performance (log flush throughput due to larger transaction size) and jobs can be interrupted and resumed w/o losing everithing. A simplified version of a batch processing loop is typically like this:
create procedure usp_UpdateRemoteServer
as
begin
declare #id int, #batch int;
set nocount on;
set #batch = 0;
declare crsFoo cursor
forward_only static read_only
for
select object_id
from sys.objects;
open crsFoo;
begin transaction
fetch next from crsFoo into #id ;
while ##fetch_status = 0
begin
-- process here
declare #transactionId int;
SELECT #transactionId = transaction_id
FROM sys.dm_tran_current_transaction;
print #transactionId;
set #batch = #batch + 1
if #batch > 10
begin
commit;
print ##trancount;
set #batch = 0;
begin transaction;
end
fetch next from crsFoo into #id ;
end
commit;
close crsFoo;
deallocate crsFoo;
end
go
exec usp_UpdateRemoteServer;
I ommitted the error handling part (begin try/begin catch) and the fancy ##fetch_status checks (static cursors actually don't need them anyway). This demo code shows that during the run there are several different transactions started (different transaction IDs). Many times batches also deploy transaction savepoints at each item processed so they can skip safely an item that causes an exception, using a pattern similar to the one in my link, but this does not apply to distributed transactions since savepoints and DTC don't mix.
EDIT: as pointed out by Remus below, cursors do NOT open a transaction by default; thus, this is not the answer to the question posed by the OP. I still think there are better options than a cursor, but that doesn't answer the question.
Stu
ORIGINAL ANSWER:
The specific symptom you describe is due to the fact that a cursor opens a transaction by default, therefore no matter how you work it, you're gonna have a long-running transaction as long as you are using a cursor (unless you avoid locks altogether, which is another bad idea).
As others are pointing out, cursors SUCK. You don't need them for 99.9999% of the time.
You really have two options if you want to do this at the database level with SQL Server:
Use SSIS to perform your operation; very fast, but may not be available to you in your particular flavor of SQL Server.
Because you're dealing with remote servers, and you're worried about connectivity, you may have to use a looping mechanism, so use WHILE instead and commit batches at a time. Although WHILE has many of the same issues as a cursor (looping still sucks in SQL), you avoid creating the outer transaction.
Stu
Are yo running this only from within sql server, or from an app? if so, get the list to be processed, then loop in the app to only process for the subsets as required.
Then the transaction should be handled by your app, and should only lock the items being updated/pages the items are in.
NEVER process one item at a time in a loop when you are doing transactional work. You can loop through records processing groups of them but never ever do one record at a time. Do set-based inserts instead and your performance will change from hours to minutes or even seconds. If you are using a cursor to insert update or delete and it isn't handling at least 1000 rowa in each statement (not one at atime) you are doing the wrong thing. Cursors are an extremely poor practice for such thing.
Just an idea ..
Only process a few items when the procedure is called (e.g. only get the TOP 10 items to process)
Process those
Hopefully, this will be the end of the transaction.
Then write a wrapper that calls the procedure as long as there is more work to do (either use a simple count(..) to see if there are items or have the procedure return true indicating that there is more work to do.
Don't know if this works, but maybe the idea is helpful.