How to optimize this t-sql script code by avoiding loop? - sql

I use following sql query to update MyTable. the code take between 5 to 15 min. to update MyTabel as long as ROWS <= 100000000 but when Rows > 100000000 it take exponential time to update MYTable. How can I change this code to use set-base instead of while loop?
DECLARE #startTime DATETIME
DECLARE #batchSize INT
DECLARE #iterationCount INT
DECLARE #i INT
DECLARE #from INT
DECLARE #to INT
SET #batchSize = 10000
SET #i = 0
SELECT #iterationCount = COUNT(*) / #batchSize
FROM MyTable
WHERE LitraID = 8175
AND id BETWEEN 100000000 AND 300000000
WHILE #i <= #iterationCount BEGIN
BEGIN TRANSACTION T
SET #startTime = GETDATE()
SET #from = #i * #batchSize
SET #to = (#i + 1) * #batchSize - 1
;WITH data
AS (
SELECT DoorsReleased, ROW_NUMBER() OVER (ORDER BY id) AS Row
FROM MyTable
WHERE LitraID = 8175
AND id BETWEEN 100000000 AND 300000000
)
UPDATE data
SET DoorsReleased = ~DoorsReleased
WHERE row BETWEEN #from AND #to
SET #i = #i + 1
COMMIT TRANSACTION T
END

One of your issues is that your select statement in the loop fetches all records for LitraID = 8175, sets row numbers, then filters in the update statement. This happens on every iteration.
One way round this would be to get all ids for the update before entering the loop and storing them in a temporary table. Then you can write a similar query to the one you have, but joining to this table of ids.
However, there is an even easier way if you know approximately how many records have LitraID = 8175 and if they are spread throughout the table, not bunched together with similar ids.
DECLARE #batchSize INT
DECLARE #minId INT
DECLARE #maxId INT
SET #batchSize = 10000 --adjust according to how frequently LitraID = 8175, larger numbers if infrequent
SET #minId = 100000000
WHILE #minId <= 300000000 BEGIN
SET #maxId = #minId + #batchSize - 1
IF #maxId > 300000000 BEGIN
SET #maxId = 300000000
END
BEGIN TRANSACTION T
UPDATE MyTable
SET DoorsReleased = ~DoorsReleased
WHERE id BETWEEN #minId AND #maxId
COMMIT TRANSACTION T
SET #minId = #maxId + 1
END
This will use the value of id to control the loop, meaning you don't need the extra step to calculate #iterationCount. It uses small batches so that the table isn't locked for long periods. It doesn't have any unnecessary SELECT statements and the WHERE clause in the update is efficient assuming id has an index.
It won't have exactly the same number of records updated in every transaction, but there's no reason it needs to.

This will eliminate the loop
UPDATE MyTable
set DoorsReleased = ~DoorsReleased
WHERE LitraID = 8175
AND id BETWEEN 100000000 AND 300000000
AND DoorsReleased is not null -- if DoorsReleased is nullable
-- AND DoorsReleased <> ~DoorsReleased</strike>
if you are set on looping
below will NOT work
I thought ~ was part of the column name but it is a not operator
select 1;
WHILE (##ROWCOUNT > 0)
BEGIN
UPDATE top (100000) MyTable
set DoorsReleased = ~DoorsReleased
WHERE LitraID = 8175
AND id BETWEEN 100000000 AND 300000000
AND ( DoorsReleased <> ~DoorsReleased
or ( DoorsReleased is null and ~DoorsReleased is not null )
)
END
Inside a transaction I don't think looping would have value as the transaction log cannot clear. And a batch size of 10,000 is small.\
as stated in a comment if you want to loop then try using id as row_number() all those loops is expensive
you might be able to use OFFSET

Related

Fetch set of records in sequence until n rows

I had a table with almost 1.7M rows. I have to fetch a set of 100 rows and perform a operation and once the first set of 100 completed, I've do the similar operation for the rows from 101 to 200. In this similar note, I've to do the operation for all the rows in the table. I do have a column with Rownumber as well. What will be the best approach to accomplish it?
I had same problem in SSIS Package, and I solved it like this:
DECLARE #i INT, #RowsLimit INT, #Count INT
SET #i = 1
SET #RowsLimit = 100
SELECT #Count = COUNT(*) FROM yourTable
WHILE #i < (#Count / #RowsLimit) --Or you can hardcode #i < 17000
BEGIN
SELECT *
FROM yourTable
OFFSET (#RowsLimit * (#i - 1)) ROWS
FETCH NEXT #RowsLimit ROWS ONLY
SET #i = #i + 1
END
In while you can place your logic.

When executing a stored procedure, the loop inside the procedure is running infinitely

USE CIS111_BookStoreMC
GO
IF OBJECT_ID('spAssetInfo') IS NOT NULL
DROP PROC spAssetInfo
GO
CREATE PROC spAssetInfo
AS
SELECT AssetID, Description, Cost, PurchaseDate
INTO #temptable
FROM Assets
ALTER TABLE #temptable ADD CompleteDepreciationYear DATE
DECLARE #value MONEY;
SET #value = 0.00;
DECLARE #j INT;
DECLARE #depreciationNum INT;
SET #j = 1;
WHILE (#j <= 14)
BEGIN
SET #depreciationNum = 0;
SET #value = (select Cost From Assets Where AssetID = #j);
SET #j = #j + 1;
WHILE(#value > 0)
BEGIN
SET #value = #value - (#value * 0.2);
SET #depreciationNum = #depreciationNum + 1;
END
INSERT INTO #temptable
(CompleteDepreciationYear)
VALUES
(DATEADD(year, #depreciationNum, CAST((select PurchaseDate From Assets Where AssetID = #j-1) AS DATE)))
END
SELECT * FROM #temptable
I have been trying to figure this out for hours, basically I am trying to show an Asset Inventory with PurchaseDate and the date the item is completely depreciated, the asset depreciates 20% per year. I tried to do a temporary table and copy some of the assets table columns to it then adding a column for the date when the asset completely depreciates.
For some reason, when I try executing the procedure as such 'EXEC spAssetInfo', the query runs forever
I forgot to include it in the screenshot but I also have
SELECT * FROM #temptable
to show the table when the procedure is executed
The problem is this part:
Make a track and you will see that those two conditions are indefinitely (first is never false and the second is never true).
Money=5
Money=5*0,2 = 1
Money=1*0,2 = 0,2
Money=0,2*0,2 = 0,04 and then you have a loop ;)

How do I update a SQL table in batches?

I already spent some time trying to figure this out, but I am still somewhat stuck an I can't really find the solution online as I think I am missing the keywords.
I want to update an SQL tables in batches, meaning I have a few million entries and want to update index 0-999, 1000-1999 step by step to avoid a huge database lock.
This is what I found:
DECLARE #Rows INT,
#BatchSize INT;
SET #BatchSize = 2500;
SET #Rows = #BatchSize;
WHILE (#Rows = #BatchSize)
BEGIN
UPDATE TOP(#BatchSize) db1
SET db1.attr = db2.attr
FROM DB1 db1
LEFT JOIN DB2 db2
ON db1.attr2 = db2.attr2
SET #Rows = ##ROWCOUNT;
END;
I simplified my statement a little bit as you can see, but it should still be clear how I approached the whole problem.
However, this thing loops forever, and when looking at the output it changed much more rows than there are in the database.
I checked the same loop with a select statement inside later on and found out that it seems to simply select the first #BatchSize rows of the table on and on, even though I thought it would progress in the index with every iteration.
How can I change this so it actually does progress by #BatchSize indices every iteration instead of simply targeting the same rows everytime?
You need some limiting factor to decide which rows are hit each loop. Generally you will use an id field. There are lots of ways to approach it, but here is one way:
DECLARE #MinID int = 1;
DECLARE #MaxID int = 2500;
DECLARE #Rows int = 1;
DECLARE #Batchsize int = 2500;
WHILE (#Rows > 1)
BEGIN
UPDATE db1
SET db1.attr = db2.attr
FROM DB1 db1
LEFT JOIN DB2 db2 ON db1.attr2 = db2.attr2
WHERE db1.ID BETWEEN #MinID AND MaxID
SET #Rows = ##ROWCOUNT
SET #MinID = MinID + #Batchsize
SET #MaxID = MaxID + #Batchsize
END
Replace db1.ID with whatever field works best in your table schema.
Note, your approach would work if you had some kind of WHERE clause on the update query that prevented the same rows from being returned.
Ex. UPDATE table SET id = 1 WHERE id = 2 won't pull the same rows in a second execution
One way to do it is using a cte with row_number:
DECLARE #BatchSize int = 2500,
#LastRowUpdated int = 0;
#Count int
SELECT #Count = COUNT(*) FROM db1;
;WITH CTE AS
(
SELECT attr,
attr2,
ROW_NUMBER() OVER(ORDER BY attr, atrr2) As RN
FROM db1
)
WHILE #LastRowUpdated < #Count
BEGIN
UPDATE c
SET attr = db2.atrr
FROM CTE c
LEFT JOIN DB2 ON c.attr2 = db2.attr2
WHERE c.RN > #LastRowUpdated
AND c.RN < (#LastRowUpdated +1) * #BatchSize
SELECT #LastRowUpdated += 1
END
This will update 2500 records each step of the loop.
You are just updating the same rows. Need a and <>.
Left join? If you really want to assign null values then use a separate update.
DECLARE #Rows INT,
#BatchSize INT;
SET #BatchSize = 2500;
SET #Rows = #BatchSize;
WHILE (#Rows = #BatchSize)
BEGIN
UPDATE TOP(#BatchSize) db1
SET db1.attr = db2.attr
FROM DB1 db1
JOIN DB2 db2
ON db1.attr2 = db2.attr2
AND db1.attr <> db2.attr
SET #Rows = ##ROWCOUNT;
END;
And you can do this:
select 1
WHILE (##ROWCOUNT > 0)
BEGIN
UPDATE TOP(2000) db1
SET db1.attr = db2.attr
FROM DB1 db1
JOIN DB2 db2
ON db1.attr2 = db2.attr2
AND db1.attr <> db2.attr
END;

Batch deletion correctly formatted?

I have multiple tables with millions of rows in them. To be safe and not overflow the transaction log, I am deleting them in batches of 100,000 rows at a time. I have to first filter out based on date, and then delete all rows less than a certain date.
To do this I am creating a table in my stored procedure which holds the ID's of the rows that need to be deleted:
I then insert into that table and delete the rows from the desired table using loops. This seems to run successfully but it is extremely slow. Is this being done correctly? Is this the fastest way to do it?
DECLARE #FILL_ID_TABLE TABLE (
FILL_ID varchar(16)
)
DECLARE #TODAYS_DATE date
SELECT
#TODAYS_DATE = GETDATE()
--This deletes all data older than 2 weeks ago from today
DECLARE #_DATE date
SET #_DATE = DATEADD(WEEK, -2, #TODAYS_DATE)
DECLARE #BatchSize int
SELECT
#BatchSize = 100000
BEGIN TRAN FUTURE_TRAN
BEGIN TRY
INSERT INTO #FILL_ID_TABLE
SELECT DISTINCT
ID
FROM dbo.ID_TABLE
WHERE CREATED < #_DATE
SELECT
#BatchSize = 100000
WHILE #BatchSize <> 0
BEGIN
DELETE TOP (#BatchSize) FROM TABLE1
OUTPUT DELETED.* INTO dbo.TABLE1_ARCHIVE
WHERE ID IN (SELECT
ROLLUP_ID
FROM #FILL_ID_TABLE)
SET #BatchSize = ##rowcount
END
SELECT
#BatchSize = 100000
WHILE #BatchSize <> 0
BEGIN
DELETE TOP (#BatchSize) FROM TABLE2
OUTPUT DELETED.* INTO dbo.TABLE2_ARCHIVE
WHERE ID IN (SELECT
FILL_ID
FROM #FILL_ID_TABLE)
SET #BatchSize = ##rowcount
END
PRINT 'Succeed'
COMMIT TRANSACTION FUTURE_TRAN
END TRY
BEGIN CATCH
PRINT 'Failed'
ROLLBACK TRANSACTION FUTURE_TRAN
END CATCH
Try join instead of subquery
DELETE TOP (#BatchSize) T1
OUTPUT DELETED.* INTO dbo.TABLE1_ARCHIVE
FROM TABLE1 AS T1
JOIN #FILL_ID_TABLE AS FIL ON FIL.ROLLUP_ID = T1.Id

Query not working fine in while loop

I have a While loop where I am trying to insert.
DECLARE #CurrentOffer int =121
DECLARE #OldestOffer int = 115
DECLARE #MinClubcardID bigint=0
DECLARE #MaxClubcardID bigint=1000
WHILE 1 = 1
BEGIN
INSERT INTO Temp WITH (TABLOCK)
SELECT top (100) clubcard from TempClub with (nolock) where ID between
#MinClubcardand and #MaxClubcard
declare #sql varchar (8000)
while #OldestOffer <= #CurrentOffer
begin
print #CurrentOffer
print #OldestOffer
set #sql = 'delete from Temp where Clubcard
in (select Clubcard from ClubTransaction_'+convert(varchar,#CurrentOffer)+' with (nolock))'
print (#sql)
exec (#sql)
SET #CurrentOffer = #CurrentOffer-1
IF #OldestOffer = #CurrentOffer
begin
-- my logic
end
end
end
My TempClub table always checks only with first 100 records. My TempClub table has 3000 records.
I need to check all my clubcard all 3000 records with ClubTransaction_121,ClubTransaction_120,ClubTransaction_119 table.
The SELECT query in line 8 returns only the top 100 items
SELECT top (100) clubcard from TempClub ...
If you want to retrieve all items, remove the top (100) part of your statement
SELECT clubcard from TempClub ...
In order to do batch type processing, you need to set the #MinClubcardID to the last ID processed plus 1 and include an ORDER BY ID to ensure that the records are being returned in order.
But... I wouldn't use the approach of using the primary key as my "index". What you're looking for is a basic pagination pattern. In SQL Server 2005+, Microsoft introduced the row_number() function which makes pagination a lot easier.
For example:
DECLARE #T TABLE (clubcard INT)
DECLARE #start INT
SET #start = 0
WHILE(1=1)
BEGIN
INSERT #T (clubcard)
SELECT TOP 100 clubcard FROM
(
SELECT clubcard,
ROW_NUMBER() OVER (ORDER BY ID) AS num
FROM dbo.TempClub
) AS t
WHERE num > #start
IF(##ROWCOUNT = 0) BREAK;
-- update counter
SET #start = #start + 100
-- process records found
-- make sure temp table is empty
DELETE FROM #T
END