Better alternative for Cursor using While loop to iterate Sql - sql

Good Day,
I am trying to fetch dates from one of my table, containing records in millions, in database and save them in a variable using Cursor. After fetching dates i am inserting records in db at that particular date. For this I am using While Loop. It turns out that While loop is really slowing down the performance, it is taking about hours to complete execution. I am including a part of a query for further clearance.
declare #tranDateCursor cursor;
declare #today as date;
begin
set #tranDateCursor = cursor
for
select distinct transferDate
from transactions
where [type] = 'customer'
open #tranDateCursor
fetch next from #tranDateCursor
into #today
while ##fetch_status = 0
begin
declare #yesterday as date
set #yesterday = (
select top(1) transferDate
from (
select distinct(transferDate) AS transferDate
from transactions
where transferDate < #today
and [type] = 'customer'
) as ODC_DATES
order by ODC_DATES.transferDate desc
)
insert into transactions([type],transferDate)
select 'customer'
,#today
from transactions xt
right outer join x_itransaction as y
on y.customer_account = xt.customer_account
AND y.transferDate = #yesterday
AND xt.transferDate = #today
where xt.transactionId is null
and y.transferDate = #yesterday
and y.[type] ='customer'
end
end
I have tried using Table Variables instead of CURSOR with WHILE loop, but it turns out that it was too taking very much time to run. My concern is that, Is there a better alternative for a while loop in this particular scenario?

Related

How to improve while loop insert performance in sql server?

Here is my SQL Query. It's insert almost 6500+ row from temp table. But its takes 15+ mins! . How can i improve this ? Thanks
ALTER proc [dbo].[Process_bill]
#userid varchar(10),
#remark nvarchar(500),
#tdate date ,
#pdate date
as
BEGIN
IF OBJECT_ID('tempdb.dbo..#temptbl_bill', 'U') IS NOT NULL
DROP TABLE #temptbl_bill;
CREATE TABLE #temptbl_bill (
RowID int IDENTITY(1, 1),
------------
)
// instert into temp table
DECLARE #NumberRecords int, #RowCounter int
DECLARE #batch INT
SET #batch = 300
SET #NumberRecords = (SELECT COUNT(*) FROM #temptbl_bill)
SET #RowCounter = 1
SET NOCOUNT ON
BEGIN TRANSACTION
WHILE #RowCounter <= #NumberRecords
BEGIN
declare #clid int
declare #hlid int
declare #holdinNo nvarchar(150)
declare #clientid nvarchar(100)
declare #clientName nvarchar(50)
declare #floor int
declare #radius nvarchar(50)
declare #bill money
declare #others money
declare #frate int
declare #due money
DECLARE #fine money
DECLARE #rebate money
IF #RowCounter > 0 AND ((#RowCounter % #batch = 0) OR (#RowCounter = #NumberRecords))
BEGIN
COMMIT TRANSACTION
PRINT CONCAT('Transaction #', CEILING(#RowCounter/ CAST(#batch AS FLOAT)), ' committed (', #RowCounter,' rows)');
BEGIN TRANSACTION
END;
// multiple select
// insert to destination table
Print 'RowCount -' +cast(#RowCounter as varchar(20)) + 'batch -' + cast(#batch as varchar(20))
SET #RowCounter = #RowCounter + 1;
END
COMMIT TRANSACTION
PRINT CONCAT('Transaction #', CEILING(#RowCounter/ CAST(#batch AS FLOAT)), ' committed (',
#RowCounter,' rows)');
SET NOCOUNT OFF
DROP TABLE #temptbl_bill
END
GO
As has been said in comments, the loop is completely unnecessary. The way to improve the performance of any loop is to remove it completely. Loops are a last resort in SQL.
As far as I can tell your insert can be written with a single statement:
INSERT tbl_bill(clid, hlid, holdingNo,ClientID, ClientName, billno, date_month, unit, others, fine, due, bill, rebate, remark, payment_date, inserted_by, inserted_date)
SELECT clid = c.id,
hlid = h.id,
h.holdinNo ,
c.cliendID,
clientName = CAST(c.clientName AS NVARCHAR(50)),
BillNo = CONCAT(h.holdinNo, MONTH(#tdate), YEAR(#tdate)),
date_month = #tDate,
unit = 0,
others = CASE WHEN h.hfloor = 0 THEN rs.frate * (h.hfloor - 1) ELSE 0 END,
fine = bs.FineRate * b.Due / 100,
due = b.Due,
bill = #bill, -- This is declared but never assigned
rebate = bs.rebate,
remark = #remark,
payment_date = #pdate,
inserted_by = #userid,
inserted_date = GETDATE()
FROM ( SELECT id, clientdID, ClientName
FROM tbl_client
WHERE status = 1
) AS c
INNER JOIN
( SELECT id, holdinNo, [floor], connect_radius
FROM tx_holding
WHERE status = 1
AND connect_radius <> '0'
AND type = 'Residential'
) AS h
ON c.id = h.clid
LEFT JOIN tbl_radius_setting AS rs
ON rs.radius= CONVERT(real,h.connect_radius)
AND rs.status = 1
AND rs.type = 'Non-Govt.'
LEFT JOIN tbl_bill_setting AS bs
ON bs.Status = 1
LEFT JOIN
( SELECT hlid,
SUM(netbill) AS Due
FROM tbl_bill AS b
WHERE date_month < #tdate
AND (b.ispay = 0 OR b.ispay IS NULL)
GROUP BY hlid
) AS b
ON b.hlid = h.id
WHERE NOT EXISTS
( SELECT 1
FROM tbl_bill AS tb
WHERE EOMONTH(#tdate) = EOMONTH(date_month)
AND tb.holdingNo = h.holdinNo
AND (tb.update_by IS NOT NULL OR tb.ispay=1)
);
Please take this with a pinch of salt, it was quite hard work trying to piece together the logic, so it may need some minor tweaks and corrections
As well as adapting this to work as a single statement, I have made a number of modifications to your existing code:
Swapped NOT IN for NOT EXISTS to avoid any issues with null records. If holdingNo is nullable, they are equivalent, if holdingNo is nullable, NOT EXISTS is safer - Not Exists Vs Not IN
The join syntax you are using was replaced 27 years ago, so I switched from ANSI-89 join syntax to ANSI-92. - Bad habits to kick : using old-style JOINs
Changed predicates of YEAR(date_month) = YEAR(#tDate) AND MONTH(date_month) = MONTH(#tDate) to become EOMONTH(#tdate) = EOMONTH(date_month). These are syntactically the same, but EOMONTH is Sargable, whereas MONTH and YEAR are not.
Then a few further links/suggestions that are directly related to changes I have made
Although I removed the while lopp, don't fall into the trap of thinking this is better than a cursor. A properly declared cursor will out perform a while loop like yours - Bad Habits to Kick : Thinking a WHILE loop isn't a CURSOR
The general consensus is that prefixing object names is not a good idea. It should either be obvious from the context if an object is a table/view or function/procedure, or it should be irrelevant - i.e. There is no need to distinguish between a table or a view, and in fact, we may wish to change from one to the other, so having the prefix makes things worse, not better.
The average ratio of time spent reading code to time spent writing code is around 10:1 - It is therefore worth the effort to format your code when you are writing it so that it is easy to read. This is hugely subjective with SQL, and I would not recommend any particular conventions, but I cannot believe for a second you find your original code free flowing and easy to read. It took me about 10 minutes just unravel the first insert statement.
EDIT
The above is not correct, EOMONTH() is not sargable, so does not perform any better than YEAR(x) = YEAR(y) AND MONTH(x) = MONTH(y), although it is still a bit simpler. If you want a truly sargable predicate you will need to create a start and end date using #tdate, so you can use:
DATEADD(MONTH, DATEDIFF(MONTH, '19000101', #tdate), '19000101')
to get the first day of the month for #tdate, then almost the same forumla, but add months to 1st February 1900 rather than 1st January to get the start of the next month:
DATEADD(MONTH, DATEDIFF(MONTH, '19000201', #tdate), '19000201')
So the following:
DECLARE #Tdate DATE = '2019-10-11';
SELECT DATEADD(MONTH, DATEDIFF(MONTH, '19000101', #tdate), '19000101'),
DATEADD(MONTH, DATEDIFF(MONTH, '19000201', #tdate), '19000201');
Will return 1st October and 1st November respectively. Putting this back in your original query would give:
WHERE NOT EXISTS
( SELECT 1
FROM tbl_bill AS tb
WHERE date_month >= DATEADD(MONTH, DATEDIFF(MONTH, '19000101', #tdate), '19000101'),
AND date_month < DATEADD(MONTH, DATEDIFF(MONTH, '19000201', #tdate), '19000201')
AND tb.holdingNo = h.holdinNo
AND (tb.update_by IS NOT NULL OR tb.ispay=1)
);

SQL query loops twice through each row?

When I trigger my stored procedure from a web app, it loops twice and creates two identical entries of the same row. I cannot work out why :/
The query is supposed to INSERT (re-schedule) all submitted rows. It uses a cursor to go through each row and SELECT, then INSERT, the correct data from/for each row.
Here is my SQL:
CREATE PROCEDURE [cil].[executeCIL_updateComplDate_And_ReSchedule]
#equipID INT,
#date DATE,
#ip VARCHAR(15)
AS
/* add completion date */
UPDATE cil.schedule
SET completionDate = CAST(GETUTCDATE() AS SMALLDATETIME),
complIP = #ip
WHERE schedule.id IN (SELECT schedule.id
FROM cil.schedule
LEFT JOIN cil.task ON cil.schedule.taskFK = cil.task.id
--WHERE CAST(scheduledDate AS DATE)<=CAST(GetDate() AS DATE)
WHERE CAST(scheduledDate as DATE) = #date
AND completionDate IS NULL
AND result IS NOT NULL
AND equipFK = #equipID);
/* reschedule tasks */
DECLARE #nextTaskID AS INT;
DECLARE #nextScheduledDate AS DATETIME2(6);
DECLARE #nextRotaCycle AS INT;
DECLARE db_cursor CURSOR FOR
SELECT taskFK, scheduledDate, rotaCycle
FROM cil.schedule
LEFT JOIN cil.task ON cil.schedule.taskFK = cil.task.id
WHERE completionDate = CAST(GETUTCDATE() AS SMALLDATETIME)
AND equipFK = #equipID;
OPEN db_cursor;
FETCH NEXT FROM db_cursor INTO #nextTaskID, #nextScheduledDate, #nextRotaCycle;
WHILE ##FETCH_STATUS = 0
BEGIN
--Do stuff with scalar values
INSERT INTO cil.schedule (taskFK, scheduledDate, rotaCycle)
VALUES (#nextTaskID,
DATEADD(dd, #nextRotaCycle, #nextScheduledDate),
#nextRotaCycle)
FETCH NEXT FROM db_cursor INTO #nextTaskID, #nextScheduledDate,
#nextRotaCycle;
END;
CLOSE db_cursor;
DEALLOCATE db_cursor;
GO
It seems that you have duplicates caused by a join of the query of the cursor
If you run this query externally, in SSMS, will it produce only the one row?
SELECT taskFK, scheduledDate, rotaCycle
FROM cil.schedule
LEFT JOIN cil.task ON cil.schedule.taskFK=cil.task.id
WHERE completionDate=CAST(GETUTCDATE() AS SMALLDATETIME)
AND equipFK=#equipID ;
and doubles comes from a table task.
If this table is not in use, consider to remove it

How can I roll up rate-of-return into net asset value in SQL?

Given the starting value #pStartingValue and a table which contains rorDate and ror what is the most efficient way to get the NAV at each date using just TSQL?
This mathematically trivial, and simple in code. I have a naive SQL implementation currently that relies on cursors.
On the first date, the NAV is #pStartingValue * ror
On every subsequent date, it's the previously calculated nav * ror or it's #pStartingValue * every previous ror
How would you efficiently do this only in MSSQL2005+?
DECLARE #rorDate DATE
DECLARE #getDate CURSOR
DECLARE #lastNAV as DECIMAL(19,7)
DECLARE #datedRoR as float
DECLARE #NAVTotals TABLE
(
NAV DECIMAL(19,7),
navDate DATE
)
SET #lastNAV = 100
SET #getDate = CURSOR FOR
SELECT
p.[DATE]
FROM
performance p
ORDER BY
p.[DATE]
OPEN #getDate
FETCH NEXT
FROM #getDate INTO #rorDate
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT
#datedRoR = b.finalNetReturn
FROM
performance b
WHERE
b.date = #rorDate
INSERT INTO #NAVTotals (NAV, navDate)
VALUES (#lastNAV * (1 + #datedRoR), #rorDate)
SELECT
#lastNAV = c.NAV
FROM
#NAVTotals c
WHERE
c.navDate = #rorDate
FETCH NEXT
FROM #getDate INTO #rorDate
END
CLOSE #getDate
DEALLOCATE #getDate
select * from #NAVTotals
You'll have to do some testing to see if the performance improves but this is a way to do that same thing without using a cursor. It's untested so you'll want to make sure to test it. I also cast b.finalNetReturn as a float, if it's already a float you can remove that part.
DECLARE #lastNAV as DECIMAL(19,7)
SET #lastNAV = 100
DECLARE #NAVTotals TABLE
(
NAV DECIMAL(19,7),
navDate DATE
);
INSERT INTO #NAVTotals (navDate)
SELECT [DATE]
FROM performance
ORDER BY [DATE] ASC;
UPDATE NT
SET #lastNAV = Nav = (#lastNAV * (1.0 +
(Cast((SELECT b.finalNetReturn
FROM performance b
WHERE b.date = NT.navDate) AS FLOAT))))
FROM #NAVTotals NT;
SELECT * FROM #NAVTotals ORDER BY navDate;
By dropping the lastNAV variable into the update statement you can update both. It works similar to:
a = a + 1
There is an example of this same approach here. Including some good numbers that compare the efficiency of the approach to other approaches such as cursors.
Perhaps I'm not understanding it correctly, but you don't even need a stored proc to achieve this.
SELECT p.[DATE] AS navDate
, #pStartingValue * PRODUCT(1 + b.finalNetReturn) AS NAV
FROM performance p
INNER JOIN performance b
ON b.[DATE] <= p.[DATE]
GROUP BY p.[DATE]
ORDER BY p.[DATE]
However, there are a few "wierdness" that I don't grasp.
How come there is no range limit for p.[DATE]?
Does the "performance" table really have only one asset?

How to make a cursor faster

I have wrote this cursor for commission report. What happens is commission comes in one table, the records are another table. I match two based on certain critera (there is not exact match available). The problem is there are duplicates where records exist. When I match commission with the records table, it can result picking up these duplicates. Thus the rep gets paid more. On the other hand, there are duplicates in commission table also but those are valid beause they simple mean an account got paid for 2 months.
I wrote this query but it takes 5+ minutes to run. I have 50,000 records in records table and 100,000 in commission table. Is there any way I an improve this cursor?
/* just preparation of cursor, this is not time consuming */
CREATE TABLE #result
(
repid INT,
AccountNo VARCHAR(100),
supplier VARCHAR(15),
CompanyName VARCHAR(200),
StartDate DATETIME,
EndDate DATETIME,
Product VARCHAR(25),
commodity VARCHAR(25),
ContractEnd DATETIME,
EstUsage INT,
EnrollStatus VARCHAR(10),
EnrollDate DATETIME,
ActualEndDate DATETIME,
MeterStart DATETIME,
MeterEnd DATETIME,
ActualUsage INT
)
DECLARE #AccountNo VARCHAR(100)
DECLARE #supplier VARCHAR(10)
DECLARE #commodity VARCHAR(15)
DECLARE #meterstart DATETIME
DECLARE #meterEnd DATETIME
DECLARE #volume FLOAT
DECLARE #RepID INT
DECLARE #Month INT
DECLARE #Year INT
SET #repID = 80
SET #Month = 1
SET #year = 2012
/* the actual cursor */
DECLARE commission_cursor CURSOR FOR
SELECT AccountNo,
supplier,
commodity,
meterStart,
MeterEnd,
Volume
FROM commission
WHERE Datepart(m, PaymentDate) = #Month
AND Datepart(YYYY, PaymentDate) = #Year
OPEN commission_cursor
FETCH next FROM commission_cursor INTO #AccountNo, #supplier, #commodity, #MeterStart, #MeterEnd, #Volume;
WHILE ##fetch_status = 0
BEGIN
IF EXISTS (SELECT id
FROM Records
WHERE AccountNo = #AccountNo
AND supplier = #supplier
AND Commodity = #commodity
AND RepID = #repID)
INSERT INTO #result
SELECT TOP 1 RepID,
AccountNo,
Supplier,
CompanyName,
[Supplier Start Date],
[Supplier End Date],
Product,
Commodity,
[customer end date],
[Expected Usage],
EnrollStatus,
ActualStartDate,
ActualEndDate,
#meterstart,
#MeterEnd,
#volume
FROM Records
WHERE AccountNo = #AccountNo
AND supplier = #supplier
AND Commodity = #commodity
AND RepID = #repID
AND #MeterStart >= Dateadd(dd, -7, ActualStartDate)
AND #meterEnd <= Isnull(Dateadd(dd, 30, ActualEndDate), '2015-12-31')
FETCH next FROM commission_cursor INTO #AccountNo, #supplier, #commodity, #MeterStart, #MeterEnd, #Volume;
END
SELECT *
FROM #result
/* clean up */
CLOSE commission_cursor
DEALLOCATE commission_cursor
DROP TABLE #result
I have read answer to How to make a T-SQL Cursor faster?, for that what I get is rewrite this query in table form. But I do have another query which uses join and is lightening fast. The problem is, it can not differentiate between the dups in my records table.
Is there anything I can do to make is faster. This is primary question. If not, do you have any alternative way to do it.
I specifically need help with
Will using Views or store procedure help
I there a way I can use cache in Cursor to make it faster
Any other option in syntax
The very first option is to set the least resource intensive options for your cursor:
declare commission_cursor cursor
local static read_only forward_only
for
Next is to investigate whether you need a cursor at all. In this case I think you can do the same with a single pass and no loops:
;WITH x AS
(
SELECT
rn = ROW_NUMBER() OVER (PARTITION BY r.AccountNo, r.Supplier, r.Commodity, r.RepID
ORDER BY r.ActualEndDate DESC),
r.RepID,
r.AccountNo,
r.Supplier,
r.CompanyName,
StartDate = r.[Supplier Start Date],
EndDate = r.[Supplier End Date],
r.Product,
r.Commodity,
ContractEnd = r.[customer end date],
EstUsage = r.[Expected Usage],
r.EnrollStatus,
EnrollDate = r.ActualStartDate,
r.ActualEndDate,
c.MeterStart,
c.MeterEnd,
ActualUsage = c.Volume
FROM dbo.commission AS c
INNER JOIN dbo.Records AS r
ON c.AccountNo = r.AccountNo
AND c.Supplier = r.Supplier
AND c.Commodity = r.Commodity
AND c.RepID = r.RepID
WHERE
c.PaymentDate >= DATEADD(MONTH, #Month-1, CONVERT(CHAR(4), #Year) + '0101')
AND c.PaymentDate < DATEADD(MONTH, 1, CONVERT(CHAR(4), #Year) + '0101')
AND r.RepID = #RepID
)
SELECT RepID, AccountNo, Supplier, CompanyName, StartDate, EndDate,
Product, Commodity, ContractEnd, EstUsage, EnrollStatus, EnrollDate,
ActualEndDate, MeterStart, MeterEnd, ActualUsage
FROM x
WHERE rn = 1 --ORDER BY something;
If this is still slow, then the cursor probably wasn't the problem - the next step will be investigating what indexes might be implemented to make this query more efficient.
Temp tables are your friend
The way I solved my problem, merging data from two tables, removed duplicates in complex fashion and everything extremely fast was to use temporary table. This is what I did
Create a #temp table, fetch the merged data from both the tables. Make sure you include ID fields in both tables even if you do not required it. This will help remove duplicates.
Now you can do all sort of calculation on this table. Remove duplicates from table B, just remove duplicate table B IDs. Remove duplicates from table A, just remove duplicate table A Ids. There is more complexity to the problem but at least this is probably the best way to solve your problem and make it considerably faster if cursors are too expensive and takes considerable time to calculate. In my case it was taking +5 min. The #temp table query about about 5 sec, which had a lot more calculations in it.
While applying Aaron solution, the cursor did not get any faster. The second query was faster but it did not give me the correct answer, so finally I used temp tables. This is my own answer.

How to determine when a time stamp does not exist in a table

I have a table that receives data on an hourly basis. Part of this import process writes the timestamp of the import to the table. My question is, how can I build a query to produce a result set of the periods of time when the import did not write to the table?
My first thought is to have a table of static int and just do an outer join and look for nulls on the right side, but this seems kind of sloppy. Is there a more dynamic way to produce a result set for the times the import failed based on the timestamp?
This is a MS SQL 2000 box.
Update: I think I've got it. The two answers already provided are great, but instead what I'm working on is a function that returns a table of the values I am looking for for a given time frame. Once I get it finished I'll post the solution here.
Here's a slightly modified solution from this article in my blog:
Flattening timespans: SQL Server
DECLARE #t TABLE
(
q_start DATETIME NOT NULL,
q_end DATETIME NOT NULL
)
DECLARE #qs DATETIME
DECLARE #qe DATETIME
DECLARE #ms DATETIME
DECLARE #me DATETIME
DECLARE cr_span CURSOR FAST_FORWARD
FOR
SELECT s_timestamp AS q_start,
DATEADD(minute, 1, s_timestamp) AS q_end
FROM [20090611_timespans].t_span
ORDER BY
q_start
OPEN cr_span
FETCH NEXT
FROM cr_span
INTO #qs, #qe
SET #ms = #qs
SET #me = #qe
WHILE ##FETCH_STATUS = 0
BEGIN
FETCH NEXT
FROM cr_span
INTO #qs, #qe
IF #qs > #me
BEGIN
INSERT
INTO #t
VALUES (#ms, #me)
SET #ms = #qs
END
SET #me = CASE WHEN #qe > #me THEN #qe ELSE #me END
END
IF #ms IS NOT NULL
BEGIN
INSERT
INTO #t
VALUES (#ms, #me)
END
CLOSE cr_span
This will return you the consecutive ranges when updates did happen (with a minute resolution).
If you have an index on your timestamp field, you may issue the following query:
SELECT *
FROM records ro
WHERE NOT EXISTS
(
SELECT NULL
FROM records ri
WHERE ri.timestamp >= DATEADD(minute, -1, ro.timestamp)
AND ri.timestamp < ro.timestamp
)
I was thinking something like this:
select 'Start' MissingStatus, o1.LastUpdate MissingStart
from Orders o1
left join Orders o2
on o1.LastUpdate between
dateadd(ss,1,o2.LastUpdate) and dateadd(hh,1,o2.LastUpdate)
where o2.LastUpdate is null
union all
select 'End', o1.LastUpdate MissingEnd
from Orders o1
left join Orders o2
on o1.LastUpdate between
dateadd(hh,-1,o2.LastUpdate) and dateadd(ss,-1,o2.LastUpdate)
where o2.LastUpdate is null
order by 2