Cursor and stored procedure optimisation

Cursor and stored procedure optimisation - sql

The following stored proc has been written some time ago and now requires modification.
Unable to contact the original developer, I had a look. To me this proc seems over-complicated. Couldn't it be done with a straightforward UPDATE? Can anyone justify the use of CURSOR here?
ALTER PROCEDURE [settle_Stage1]
#settleBatch int
AS
DECLARE #refDate datetime;
DECLARE #dd int;
DECLARE #uid int;
DECLARE trans_cursor CURSOR FOR
SELECT uid, refDate FROM tblTransactions WHERE (settle IS NULL ) AND (state IN ( 21, 31, 98, 99 ))
OPEN trans_cursor
FETCH FROM trans_cursor INTO #uid, #refDate
WHILE ##FETCH_STATUS = 0
BEGIN
SET #dd = DATEDIFF( day, #refDate, getDate())
IF ( #dd >= '1' )
BEGIN
UPDATE tblTransactions
SET settle = #settleBatch WHERE uid = #uid
END
FETCH FROM trans_cursor INTO #uid, #refDate
END
CLOSE trans_cursor
DEALLOCATE trans_cursor

You are right - this looks like "procedural SQL", from someone who probably doesn't get SQL and set operations.
And converting this to a set based query should help performance.
A cursor is not needed and is indeed over complicating the stored procedure.

If there are triggers involved that would blow up on multiple updated rows, then you would want to iterate. But that would still not justify using an actual CURSOR.
Doing single updates would cause row locks and not page or table locks that a set based update could. Since you're making the transactions smaller, the programmer could have been attempted to remove deadlocks which were caused by a large update.
NOTE: I am not advocating this method, I am only suggesting reasons.

Simply looking at it, I don't see any reason at all why this isn't done on a single UPDATE. Maybe (and its a maaaaaybe) if there are too many records to update, then this could be a reason. In any case, I would simply change it with:
UPDATE tblTransactions
SET settle = #settleBatch
WHERE settle IS NULL
AND [state] IN (21, 31, 98, 99)
AND DATEDIFF( day, refDate, getDate()) >= 1
edited following #Martin Smith comment

If running one record at a time is too slow and a single update causes blocking and too much growth of the transaction log, the third alternative is to batch process. Use a set-based query, but run it through a loop of 1000 records at a time (you may have to experiement to find the optimum size of the batch).

Related

Best SQL method to update a column based on date ranges from another column.

My code is below. I am having a hard time as I am new to SQL to figure out what method would be simpler to use in order to update a column based on differential of dates. Basically what i want to do is if the date is in between today to today minus 7 days (update the symbology_bbl column to week1). the following has been updated.
USE [databasename]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE or Alter PROCEDURE gisuser.GetTheDate
AS
BEGIN
SET NOCOUNT ON;
Declare #rowcount int
Declare #editedDate datetime
Declare #Symbology_BBL nvarchar(25)
Declare mycursor cursor FORWARD_ONLY READ_ONLY LOCAL FOR
select objectID, Edited_Date, Symbology_BBL
from [tablename] order by objectid asc
Open mycursor
fetch next from mycursor
into #rowcount, #editedDate, #Symbology_BBL
while ##FETCH_STATUS = 0
Begin
--if edited_date is from 11/21/2018 to 11/28/2018
begin
set #Symbology_BBL = 'Week1'
end
--elseif edited_date is from 11/15/2018 to 11/20/2018
begin
set #Symbology_BBL = 'Week2'
end
else
begin
set #Symbology_BBL = 'Greater than Week3'
end
--*******************************************************************************
Update [tablename]
set symbology_bbl = #Symbology_BBL
fetch next from mycursor
into #rowcount, #editedDate, #Symbology_BBL
End
Close mycursor
deallocate mycursor
END
Thanks for the help in advance.

Your query is just running slow and here's why.
Cursors perform terribly and this is RBAR methodology
Here is what your cursor is currently doing, and why it's taking a long time (aside from blocking from locks that are needed for updates, indexes, yada yada).
select count(Symbology_BBL) from tableName
What ever number is returned here... your cursor is
executing / looping this many times and
setting Symbology_BBL = 'usa' for every single row... every single time
So basically, if there were 1000 rows in that table, you are doing an update on every row, 1000 times. And this, makes zero sense what so ever or at least is about the least performant way you could structure your update. What you most likely want is an UPDATE with a JOIN but you haven't provided enough to determine that.
Also, you could get a slight boost using FAST_FORWARD instead of READ_ONLY FORWARD_ONLY which you should at least have added LOCAL STATIC to, since cursors are global by default (and unnecessary in your use case). But Erik Darling shows how this could prevent it from going parallel... sneaky Microsoft... and thus FORWARD_ONLY with LOCAL STATIC could be faster... again add that LOCAL STATIC for most cursors.
EDIT
Based on your comment and edit here is the simplest method...
update tablename
set Symbology_BBL = case
when last_edited_date between GETUTCDATE() -7 and GETUTCDATE()
then 'Week 1'
when last_edited_date between GETUTCDATE() - 14 and GETUTCDATE() - 8
then 'Week 2'
else 'Greater Than Week 3'
end

This update has no where clause so it will write to all rows.
Update [tablename]
set symbology_bbl = #Symbology_BBL

Is this sql update guaranteed to be atomic?

I have the following sql:
UPDATE Customer SET Count=1 WHERE ID=1 AND Count=0
SELECT ##ROWCOUNT
I need to know if this is guaranteed to be atomic.
If 2 users try this simultaneously, will only one succeed and get a return value of 1? Do I need to use a transaction or something else in order to guarantee this?
The goal is to get a unique 'Count' for the customer. Collisions in this system will almost never happen, so I am not concerned with the performance if a user has to query again (and again) to get a unique Count.
EDIT:
The goal is to not use a transaction if it is not needed. Also this logic is ran very infrequently (up to 100 per day), so I wanted to keep it as simple as possible.

It may depend on the sql server you are using. However for most, the answer is yes. I guess you are implementing a lock.

Using SQL SERVER (v 11.0.6020) that this is indeed an atomic operation as best as I can determine.
I wrote some test stored procedures to try to test this logic:
-- Attempt to update a Customer row with a new Count, returns
-- The current count (used as customer order number) and a bit
-- which determines success or failure. If #Success is 0, re-run
-- the query and try again.
CREATE PROCEDURE [dbo].[sp_TestUpdate]
(
#Count INT OUTPUT,
#Success BIT OUTPUT
)
AS
BEGIN
DECLARE #NextCount INT
SELECT #Count=Count FROM Customer WHERE ID=1
SET #NextCount = #Count + 1
UPDATE Customer SET Count=#NextCount WHERE ID=1 AND Count=#Count
SET #Success=##ROWCOUNT
END
And:
-- Loop (many times) trying to get a number and insert in into another
-- table. Execute this loop concurrently in several different windows
-- using SMSS.
CREATE PROCEDURE [dbo].[sp_TestLoop]
AS
BEGIN
DECLARE #Iterations INT
DECLARE #Counter INT
DECLARE #Count INT
DECLARE #Success BIT
SET #Iterations = 40000
SET #Counter = 0
WHILE (#Counter < #Iterations)
BEGIN
SET #Counter = #Counter + 1
EXEC sp_TestUpdate #Count = #Count OUTPUT , #Success = #Success OUTPUT
IF (#Success=1)
BEGIN
INSERT INTO TestImage (ImageNumber) VALUES (#Count)
END
END
END
This code ran, creating unique sequential ImageNumber values in the TestImage table. This proves that the above SQL update call is indeed atomic. Neither function guaranteed the updates were done, but they did guarantee that no duplicates were created, and no numbers were skipped.

Can you save and insert SQL Query Messages in a table?

I have a feeling this is an extremely newbie question, but it's hard to find the answer as anything to do with logging points me to SQL errors and issues. If not that, then the answer is querying the entire log to sift through.
When I insert data into an existing table via TSQL. How can I save or reference the Query Message for that specific statement? That way I can take the Query Message and insert the result into a log table that specifies how many records got inserted, maybe a duration of time it took and etc.
I'm using SQL Server 2008 R2 and these SQL statements are stored procedures inserting data and updating data. I want to ensure every step of the process is logged and inserted into a specific log table with details about that step of the process.
Thanks for your help on this (I'm assuming) newbie question. I'm still learning MSSQL.

DECLARE #dt DATETIME2(7), #duration INT, #rowcount INT;
SET #dt = SYSDATETIME();
INSERT dbo.foo(bar) VALUES('x');
SELECT #rowcount = ##ROWCOUNT, #duration = DATEDIFF(MICROSECOND, #dt, SYSDATETIME());
INSERT dbo.LoggingTable(duration,row_count) SELECT #duration, #rowcount;
In 2005 or lower, you can't get quite that precise, e.g.
DECLARE #dt DATETIME, ...
SET #dt = GETDATE();
...
... , #duration = DATEDIFF(MILLISECOND, #dt, GETDATE());

Different Parameter Value Results In Slow Query

I have an sproc in SQL Server 2008. It basically builds a string, and then runs the query using EXEC():
SELECT * FROM [dbo].[StaffRequestExtInfo] WITH(nolock,readuncommitted)
WHERE [NoteDt] < #EndDt
AND [NoteTypeCode] = #RequestTypeO
AND ([FNoteDt] >= #StartDt AND [FNoteDt] <= #EndDt)
AND [FStaffID] = #StaffID
AND [FNoteTypeCode]<>#RequestTypeC
ORDER BY [LocName] ASC,[NoteID] ASC,[CNoteDt] ASC
All but #RequestTypeO and #RequestTypeF are passed in as sproc parameters. The other two are built from a parameter into local variables. Normally, the query runs under one second. However, for one particular value of #StaffID, the execution plan is different and about 30x slower. In either case, the amount of data returned is generally the same, but execution time goes way up.
I tried to recompile the sproc. I also tried to "copy" #StaffID into a local #LocalStaffID. Neither approach made any difference.
Any ideas?
UPDATE: Tried to drop specific plans using:
DECLARE #ph VARBINARY(64), #pt VARCHAR(128), #sql VARCHAR(1024)
DECLARE cur CURSOR FAST_FORWARD FOR
SELECT p.plan_handle
FROM sys.[dm_exec_cached_plans] p
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) t
WHERE t.text LIKE N'%cms_selectStaffRequests%'
OPEN cur
FETCH NEXT FROM cur INTO #ph
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT #pt = master.dbo.fn_varbintohexstr(#ph)
PRINT 'DBCC FREEPROCCACHE(' + #pt + ')'
SET #sql = 'DBCC FREEPROCCACHE(' + #pt + ')'
EXEC(#sql)
FETCH NEXT FROM cur INTO #ph
END
CLOSE cur
DEALLOCATE cur
Either the wrong plans were dropped, or the same plans ended up being recreated, but it had no effect.

Check the distribution/frequency/cardinality of the values in column FStaffID, and review your indexes. It may be that you have one staff member doing 50% of the work (probably the DBA :) and that may change how the optimizer chooses which indexes to use and how the data is read.
Alternatively, the execution plan generated by the dynamic code may be being saved and re-used, resulting in a poorly performing query (like HLGEM says). I'm not up on the details, but SQL 2008 has more ways to confuse you while doing this than its predecessors.

Doing an UPDATE STATISTICS ... WITH FULLSCAN on the main base table in the query resulted in the "slow" value not being associated with a slow plan.

SQL Batched Delete

I have a table in SQL Server 2005 which has approx 4 billion rows in it. I need to delete approximately 2 billion of these rows. If I try and do it in a single transaction, the transaction log fills up and it fails. I don't have any extra space to make the transaction log bigger. I assume the best way forward is to batch up the delete statements (in batches of ~ 10,000?).
I can probably do this using a cursor, but is the a standard/easy/clever way of doing this?
P.S. This table does not have an identity column as a PK. The PK is made up of an integer foreign key and a date.

You can 'nibble' the delete's which also means that you don't cause a massive load on the database. If your t-log backups run every 10 mins, then you should be ok to run this once or twice over the same interval. You can schedule it as a SQL Agent job
try something like this:
DECLARE #count int
SET #count = 10000
DELETE FROM table1
WHERE table1id IN (
SELECT TOP (#count) tableid
FROM table1
WHERE x='y'
)

What distinguishes the rows you want to delete from those you want to keep? Will this work for you:
while exists (select 1 from your_table where <your_condition>)
delete top(10000) from your_table
where <your_condition>

In addition to putting this in a batch with a statement to truncate the log, you also might want to try these tricks:
Add criteria that matches the first column in your clustered index in addition to your other criteria
Drop any indexes from the table and then put them back after the delete is done if that's possible and won't interfere with anything else going on in the DB, but KEEP the clustered index
For the first point above, for example, if your PK is clustered then find a range which approximately matches the number of rows that you want to delete each batch and use that:
DECLARE #max_id INT, #start_id INT, #end_id INT, #interval INT
SELECT #start_id = MIN(id), #max_id = MAX(id) FROM My_Table
SET #interval = 100000 -- You need to determine the right number here
SET #end_id = #start_id + #interval
WHILE (#start_id <= #max_id)
BEGIN
DELETE FROM My_Table WHERE id BETWEEN #start_id AND #end_id AND <your criteria>
SET #start_id = #end_id + 1
SET #end_id = #end_id + #interval
END

Sounds like this is one-off operation (I hope for you) and you don't need to go back to a state that's halfway this batched delete - if that's the case why don't you just switch to SIMPLE transaction mode before running and then back to FULL when you're done?
This way the transaction log won't grow as much. This might not be ideal in most situations but I don't see anything wrong here (assuming as above you don't need to go back to a state that's in between your deletes).
you can do this in your script with smt like:
ALTER DATABASE myDB SET RECOVERY FULL/SIMPLE
Alternatively you can setup a job to shrink the transaction log every given interval of time - while your delete is running. This is kinda bad but I reckon it'd do the trick.

Well, if you were using SQL Server Partitioning, say based on the date column, you would have possibly switched out the partitions that are no longer required. A consideration for a future implementation perhaps.
I think the best option may be as you say, to delete the data in smaller batches, rather than in one hit, so as to avoid any potential blocking issues.
You could also consider the following method:
Copy the data to keep into a temporary table
Truncate the original table to purge all data
Move everything from the temporary table back into the original table
Your indexes would also be rebuilt as the data was added back to the original table.

I would do something similar to the temp table suggestions but I'd select into a new permanent table the rows you want to keep, drop the original table and then rename the new one. This should have a relatively low tran log impact. Obviously remember to recreate any indexes that are required on the new table after you've renamed it.
Just my two p'enneth.

Here is my example:
-- configure script
-- Script limits - transaction per commit (default 10,000)
-- And time to allow script to run (in seconds, default 2 hours)
--
DECLARE #MAX INT
DECLARE #MAXT INT
--
-- These 4 variables are substituted by shell script.
--
SET #MAX = $MAX
SET #MAXT = $MAXT
SET #TABLE = $TABLE
SET #WHERE = $WHERE
-- step 1 - Main loop
DECLARE #continue INT
-- deleted in one transaction
DECLARE #deleted INT
-- deleted total in script
DECLARE #total INT
SET #total = 0
DECLARE #max_id INT, #start_id INT, #end_id INT, #interval INT
SET #interval = #MAX
SELECT #start_id = MIN(id), #max_id = MAX(id) from #TABLE
SET #end_id = #start_id + #interval
-- timing
DECLARE #start DATETIME
DECLARE #now DATETIME
DECLARE #timee INT
SET #start = GETDATE()
--
SET #continue = 1
IF OBJECT_ID (N'EntryID', 'U') IS NULL
BEGIN
CREATE TABLE EntryID (startid INT)
INSERT INTO EntryID(startid) VALUES(#start_id)
END
ELSE
BEGIN
SELECT #start_id = startid FROM EntryID
END
WHILE (#continue = 1 AND #start_id <= #max_id)
BEGIN
PRINT 'Start issued: ' + CONVERT(varchar(19), GETDATE(), 120)
BEGIN TRANSACTION
DELETE
FROM #TABLE
WHERE id BETWEEN #start_id AND #end_id AND #WHERE
SET #deleted = ##ROWCOUNT
UPDATE EntryID SET EntryID.startid = #end_id + 1
COMMIT
PRINT 'Deleted issued: ' + STR(#deleted) + ' records. ' + CONVERT(varchar(19), GETDATE(), 120)
SET #total = #total + #deleted
SET #start_id = #end_id + 1
SET #end_id = #end_id + #interval
IF #end_id > #max_id
SET #end_id = #max_id
SET #now = GETDATE()
SET #timee = DATEDIFF (second, #start, #now)
if #timee > #MAXT
BEGIN
PRINT 'Time limit exceeded for the script, exiting'
SET #continue = 0
END
-- ELSE
-- BEGIN
-- SELECT #total 'Removed now', #timee 'Total time, seconds'
-- END
END
SELECT #total 'Removed records', #timee 'Total time sec' , #start_id 'Next id', #max_id 'Max id', #continue 'COMPLETED? '
SELECT * from EntryID next_start_id
GO

The short answer is, you can't delete 2 billion rows without incurring some kind of major database downtime.
Your best option may be to copy the data to a temp table and truncate the original table, but this will fill your tempDB and would use no less logging than deleting the data.
You will need to delete as many rows as you can until the transaction log fills up, then truncate it each time. The answer provided by Stanislav Kniazev could be modified to do this by increasing the batch size and adding a call to truncate the log file.

I agree with the people who want you loop over a smaller set of records, this will be faster than trying to do the whole operation in one step. You may to experience withthe number of records you should include inthe loop. About 2000 at a time seems to be the sweet spot in most of the tables I do large deltes from althouhg a few need smaller amounts like 500. Depends on number of forign keys, size of the record, triggers etc, so it really will take some experimenting to find what you need. It also depends on how heavy the use of the table is. A heavily accessed table will need each iteration of the loop to run a shorter amount of time. If you can run during off hours, or best yet in single user mode, then you can have more records deleted in one loop.
If you don't think you do this in one night during off hours, it might be best to design the loop with a counter and only do a set number of iterations each night until it is done.
Further, if you use an implicit transaction rather than an explicit one, you can kill the loop query at any time and records already deleted will stay deleted except those in the current round of the loop. Much faster than trying to rollback half a million records becasue you've brought the system to a halt.
It is usually a good idea to backup a database immediately before undertaking an operation of this nature.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas