How to make a cursor faster - sql

I have wrote this cursor for commission report. What happens is commission comes in one table, the records are another table. I match two based on certain critera (there is not exact match available). The problem is there are duplicates where records exist. When I match commission with the records table, it can result picking up these duplicates. Thus the rep gets paid more. On the other hand, there are duplicates in commission table also but those are valid beause they simple mean an account got paid for 2 months.
I wrote this query but it takes 5+ minutes to run. I have 50,000 records in records table and 100,000 in commission table. Is there any way I an improve this cursor?
/* just preparation of cursor, this is not time consuming */
CREATE TABLE #result
(
repid INT,
AccountNo VARCHAR(100),
supplier VARCHAR(15),
CompanyName VARCHAR(200),
StartDate DATETIME,
EndDate DATETIME,
Product VARCHAR(25),
commodity VARCHAR(25),
ContractEnd DATETIME,
EstUsage INT,
EnrollStatus VARCHAR(10),
EnrollDate DATETIME,
ActualEndDate DATETIME,
MeterStart DATETIME,
MeterEnd DATETIME,
ActualUsage INT
)
DECLARE #AccountNo VARCHAR(100)
DECLARE #supplier VARCHAR(10)
DECLARE #commodity VARCHAR(15)
DECLARE #meterstart DATETIME
DECLARE #meterEnd DATETIME
DECLARE #volume FLOAT
DECLARE #RepID INT
DECLARE #Month INT
DECLARE #Year INT
SET #repID = 80
SET #Month = 1
SET #year = 2012
/* the actual cursor */
DECLARE commission_cursor CURSOR FOR
SELECT AccountNo,
supplier,
commodity,
meterStart,
MeterEnd,
Volume
FROM commission
WHERE Datepart(m, PaymentDate) = #Month
AND Datepart(YYYY, PaymentDate) = #Year
OPEN commission_cursor
FETCH next FROM commission_cursor INTO #AccountNo, #supplier, #commodity, #MeterStart, #MeterEnd, #Volume;
WHILE ##fetch_status = 0
BEGIN
IF EXISTS (SELECT id
FROM Records
WHERE AccountNo = #AccountNo
AND supplier = #supplier
AND Commodity = #commodity
AND RepID = #repID)
INSERT INTO #result
SELECT TOP 1 RepID,
AccountNo,
Supplier,
CompanyName,
[Supplier Start Date],
[Supplier End Date],
Product,
Commodity,
[customer end date],
[Expected Usage],
EnrollStatus,
ActualStartDate,
ActualEndDate,
#meterstart,
#MeterEnd,
#volume
FROM Records
WHERE AccountNo = #AccountNo
AND supplier = #supplier
AND Commodity = #commodity
AND RepID = #repID
AND #MeterStart >= Dateadd(dd, -7, ActualStartDate)
AND #meterEnd <= Isnull(Dateadd(dd, 30, ActualEndDate), '2015-12-31')
FETCH next FROM commission_cursor INTO #AccountNo, #supplier, #commodity, #MeterStart, #MeterEnd, #Volume;
END
SELECT *
FROM #result
/* clean up */
CLOSE commission_cursor
DEALLOCATE commission_cursor
DROP TABLE #result
I have read answer to How to make a T-SQL Cursor faster?, for that what I get is rewrite this query in table form. But I do have another query which uses join and is lightening fast. The problem is, it can not differentiate between the dups in my records table.
Is there anything I can do to make is faster. This is primary question. If not, do you have any alternative way to do it.
I specifically need help with
Will using Views or store procedure help
I there a way I can use cache in Cursor to make it faster
Any other option in syntax

The very first option is to set the least resource intensive options for your cursor:
declare commission_cursor cursor
local static read_only forward_only
for
Next is to investigate whether you need a cursor at all. In this case I think you can do the same with a single pass and no loops:
;WITH x AS
(
SELECT
rn = ROW_NUMBER() OVER (PARTITION BY r.AccountNo, r.Supplier, r.Commodity, r.RepID
ORDER BY r.ActualEndDate DESC),
r.RepID,
r.AccountNo,
r.Supplier,
r.CompanyName,
StartDate = r.[Supplier Start Date],
EndDate = r.[Supplier End Date],
r.Product,
r.Commodity,
ContractEnd = r.[customer end date],
EstUsage = r.[Expected Usage],
r.EnrollStatus,
EnrollDate = r.ActualStartDate,
r.ActualEndDate,
c.MeterStart,
c.MeterEnd,
ActualUsage = c.Volume
FROM dbo.commission AS c
INNER JOIN dbo.Records AS r
ON c.AccountNo = r.AccountNo
AND c.Supplier = r.Supplier
AND c.Commodity = r.Commodity
AND c.RepID = r.RepID
WHERE
c.PaymentDate >= DATEADD(MONTH, #Month-1, CONVERT(CHAR(4), #Year) + '0101')
AND c.PaymentDate < DATEADD(MONTH, 1, CONVERT(CHAR(4), #Year) + '0101')
AND r.RepID = #RepID
)
SELECT RepID, AccountNo, Supplier, CompanyName, StartDate, EndDate,
Product, Commodity, ContractEnd, EstUsage, EnrollStatus, EnrollDate,
ActualEndDate, MeterStart, MeterEnd, ActualUsage
FROM x
WHERE rn = 1 --ORDER BY something;
If this is still slow, then the cursor probably wasn't the problem - the next step will be investigating what indexes might be implemented to make this query more efficient.

Temp tables are your friend
The way I solved my problem, merging data from two tables, removed duplicates in complex fashion and everything extremely fast was to use temporary table. This is what I did
Create a #temp table, fetch the merged data from both the tables. Make sure you include ID fields in both tables even if you do not required it. This will help remove duplicates.
Now you can do all sort of calculation on this table. Remove duplicates from table B, just remove duplicate table B IDs. Remove duplicates from table A, just remove duplicate table A Ids. There is more complexity to the problem but at least this is probably the best way to solve your problem and make it considerably faster if cursors are too expensive and takes considerable time to calculate. In my case it was taking +5 min. The #temp table query about about 5 sec, which had a lot more calculations in it.
While applying Aaron solution, the cursor did not get any faster. The second query was faster but it did not give me the correct answer, so finally I used temp tables. This is my own answer.

Related

How to improve while loop insert performance in sql server?

Here is my SQL Query. It's insert almost 6500+ row from temp table. But its takes 15+ mins! . How can i improve this ? Thanks
ALTER proc [dbo].[Process_bill]
#userid varchar(10),
#remark nvarchar(500),
#tdate date ,
#pdate date
as
BEGIN
IF OBJECT_ID('tempdb.dbo..#temptbl_bill', 'U') IS NOT NULL
DROP TABLE #temptbl_bill;
CREATE TABLE #temptbl_bill (
RowID int IDENTITY(1, 1),
------------
)
// instert into temp table
DECLARE #NumberRecords int, #RowCounter int
DECLARE #batch INT
SET #batch = 300
SET #NumberRecords = (SELECT COUNT(*) FROM #temptbl_bill)
SET #RowCounter = 1
SET NOCOUNT ON
BEGIN TRANSACTION
WHILE #RowCounter <= #NumberRecords
BEGIN
declare #clid int
declare #hlid int
declare #holdinNo nvarchar(150)
declare #clientid nvarchar(100)
declare #clientName nvarchar(50)
declare #floor int
declare #radius nvarchar(50)
declare #bill money
declare #others money
declare #frate int
declare #due money
DECLARE #fine money
DECLARE #rebate money
IF #RowCounter > 0 AND ((#RowCounter % #batch = 0) OR (#RowCounter = #NumberRecords))
BEGIN
COMMIT TRANSACTION
PRINT CONCAT('Transaction #', CEILING(#RowCounter/ CAST(#batch AS FLOAT)), ' committed (', #RowCounter,' rows)');
BEGIN TRANSACTION
END;
// multiple select
// insert to destination table
Print 'RowCount -' +cast(#RowCounter as varchar(20)) + 'batch -' + cast(#batch as varchar(20))
SET #RowCounter = #RowCounter + 1;
END
COMMIT TRANSACTION
PRINT CONCAT('Transaction #', CEILING(#RowCounter/ CAST(#batch AS FLOAT)), ' committed (',
#RowCounter,' rows)');
SET NOCOUNT OFF
DROP TABLE #temptbl_bill
END
GO
As has been said in comments, the loop is completely unnecessary. The way to improve the performance of any loop is to remove it completely. Loops are a last resort in SQL.
As far as I can tell your insert can be written with a single statement:
INSERT tbl_bill(clid, hlid, holdingNo,ClientID, ClientName, billno, date_month, unit, others, fine, due, bill, rebate, remark, payment_date, inserted_by, inserted_date)
SELECT clid = c.id,
hlid = h.id,
h.holdinNo ,
c.cliendID,
clientName = CAST(c.clientName AS NVARCHAR(50)),
BillNo = CONCAT(h.holdinNo, MONTH(#tdate), YEAR(#tdate)),
date_month = #tDate,
unit = 0,
others = CASE WHEN h.hfloor = 0 THEN rs.frate * (h.hfloor - 1) ELSE 0 END,
fine = bs.FineRate * b.Due / 100,
due = b.Due,
bill = #bill, -- This is declared but never assigned
rebate = bs.rebate,
remark = #remark,
payment_date = #pdate,
inserted_by = #userid,
inserted_date = GETDATE()
FROM ( SELECT id, clientdID, ClientName
FROM tbl_client
WHERE status = 1
) AS c
INNER JOIN
( SELECT id, holdinNo, [floor], connect_radius
FROM tx_holding
WHERE status = 1
AND connect_radius <> '0'
AND type = 'Residential'
) AS h
ON c.id = h.clid
LEFT JOIN tbl_radius_setting AS rs
ON rs.radius= CONVERT(real,h.connect_radius)
AND rs.status = 1
AND rs.type = 'Non-Govt.'
LEFT JOIN tbl_bill_setting AS bs
ON bs.Status = 1
LEFT JOIN
( SELECT hlid,
SUM(netbill) AS Due
FROM tbl_bill AS b
WHERE date_month < #tdate
AND (b.ispay = 0 OR b.ispay IS NULL)
GROUP BY hlid
) AS b
ON b.hlid = h.id
WHERE NOT EXISTS
( SELECT 1
FROM tbl_bill AS tb
WHERE EOMONTH(#tdate) = EOMONTH(date_month)
AND tb.holdingNo = h.holdinNo
AND (tb.update_by IS NOT NULL OR tb.ispay=1)
);
Please take this with a pinch of salt, it was quite hard work trying to piece together the logic, so it may need some minor tweaks and corrections
As well as adapting this to work as a single statement, I have made a number of modifications to your existing code:
Swapped NOT IN for NOT EXISTS to avoid any issues with null records. If holdingNo is nullable, they are equivalent, if holdingNo is nullable, NOT EXISTS is safer - Not Exists Vs Not IN
The join syntax you are using was replaced 27 years ago, so I switched from ANSI-89 join syntax to ANSI-92. - Bad habits to kick : using old-style JOINs
Changed predicates of YEAR(date_month) = YEAR(#tDate) AND MONTH(date_month) = MONTH(#tDate) to become EOMONTH(#tdate) = EOMONTH(date_month). These are syntactically the same, but EOMONTH is Sargable, whereas MONTH and YEAR are not.
Then a few further links/suggestions that are directly related to changes I have made
Although I removed the while lopp, don't fall into the trap of thinking this is better than a cursor. A properly declared cursor will out perform a while loop like yours - Bad Habits to Kick : Thinking a WHILE loop isn't a CURSOR
The general consensus is that prefixing object names is not a good idea. It should either be obvious from the context if an object is a table/view or function/procedure, or it should be irrelevant - i.e. There is no need to distinguish between a table or a view, and in fact, we may wish to change from one to the other, so having the prefix makes things worse, not better.
The average ratio of time spent reading code to time spent writing code is around 10:1 - It is therefore worth the effort to format your code when you are writing it so that it is easy to read. This is hugely subjective with SQL, and I would not recommend any particular conventions, but I cannot believe for a second you find your original code free flowing and easy to read. It took me about 10 minutes just unravel the first insert statement.
EDIT
The above is not correct, EOMONTH() is not sargable, so does not perform any better than YEAR(x) = YEAR(y) AND MONTH(x) = MONTH(y), although it is still a bit simpler. If you want a truly sargable predicate you will need to create a start and end date using #tdate, so you can use:
DATEADD(MONTH, DATEDIFF(MONTH, '19000101', #tdate), '19000101')
to get the first day of the month for #tdate, then almost the same forumla, but add months to 1st February 1900 rather than 1st January to get the start of the next month:
DATEADD(MONTH, DATEDIFF(MONTH, '19000201', #tdate), '19000201')
So the following:
DECLARE #Tdate DATE = '2019-10-11';
SELECT DATEADD(MONTH, DATEDIFF(MONTH, '19000101', #tdate), '19000101'),
DATEADD(MONTH, DATEDIFF(MONTH, '19000201', #tdate), '19000201');
Will return 1st October and 1st November respectively. Putting this back in your original query would give:
WHERE NOT EXISTS
( SELECT 1
FROM tbl_bill AS tb
WHERE date_month >= DATEADD(MONTH, DATEDIFF(MONTH, '19000101', #tdate), '19000101'),
AND date_month < DATEADD(MONTH, DATEDIFF(MONTH, '19000201', #tdate), '19000201')
AND tb.holdingNo = h.holdinNo
AND (tb.update_by IS NOT NULL OR tb.ispay=1)
);

Better alternative for Cursor using While loop to iterate Sql

Good Day,
I am trying to fetch dates from one of my table, containing records in millions, in database and save them in a variable using Cursor. After fetching dates i am inserting records in db at that particular date. For this I am using While Loop. It turns out that While loop is really slowing down the performance, it is taking about hours to complete execution. I am including a part of a query for further clearance.
declare #tranDateCursor cursor;
declare #today as date;
begin
set #tranDateCursor = cursor
for
select distinct transferDate
from transactions
where [type] = 'customer'
open #tranDateCursor
fetch next from #tranDateCursor
into #today
while ##fetch_status = 0
begin
declare #yesterday as date
set #yesterday = (
select top(1) transferDate
from (
select distinct(transferDate) AS transferDate
from transactions
where transferDate < #today
and [type] = 'customer'
) as ODC_DATES
order by ODC_DATES.transferDate desc
)
insert into transactions([type],transferDate)
select 'customer'
,#today
from transactions xt
right outer join x_itransaction as y
on y.customer_account = xt.customer_account
AND y.transferDate = #yesterday
AND xt.transferDate = #today
where xt.transactionId is null
and y.transferDate = #yesterday
and y.[type] ='customer'
end
end
I have tried using Table Variables instead of CURSOR with WHILE loop, but it turns out that it was too taking very much time to run. My concern is that, Is there a better alternative for a while loop in this particular scenario?

SQL Server improve query performance #Temp Table, Bulk insert

I am working on an old auditing stored procedure that is taking a long time to get data causing timeouts on the system. We have managed to get the time down from over 20 minutes to running just over a minute, this is still too long.
I am running on SQL Server 2008 R2.
My question is is there anything I could do to improve the speed of the query? In particular the temporary tables and the a bulk insert statement.
SELECT
dbo.[Audit Result Entry Detail].PK_ID,
+ every other value in the table (all values required)
INTO
#temp5
FROM
dbo.[Audit Result Entry Detail]
An INNER JOIN then occurs on dbo.[Audit Register] another select occurs and added to another temporary table #result is created.
Result temporary table gets the old and new result values for comparison. Below I have provided what event occurs on #temp5. Note these are just snippets; the stored procedure is way too big to post everything.
SELECT
RED.[FK_RegisterID],
total of a 106 rows are selected here :(
FROM
#temp5 AS RED
LEFT JOIN
#temp5 AS OLD ON RED.IdentityColumn = OLD.IdentityColumn
LEFT JOIN
[Audit Register] AS REG ON REG.PK_ID = RED.FK_RegisterID
SELECT MAX(PK_ID)
OVER (PARTITION BY FK_RegisterID) AS MAX_PK_ID
FROM #temp5
DROP TABLE #temp5
The second part of my question is a bulk insert into this table created at the very top of the stored procedure
DECLARE #AuditOverView TABLE
(
FK_RegisterID INT,
Audit_Date DATETIME,
ContextUser NVARCHAR(30),
Original_Value NVARCHAR(255),
New_Value NVARCHAR(255),
Assay NVARCHAR(200),
Inst NVARCHAR(40),
LotLevel NVARCHAR(3),
Lot_ExpiryDate NVARCHAR(10),
Lot_Number NCHAR(50),
Audit_Type NVARCHAR(10),
Partnumber INT,
[Type] NVARCHAR(50),
SubType NVARCHAR(50)
)
The insert statement below:
INSERT INTO #AuditOverView
SELECT DISTINCT
t.FK_RegisterID,
t.Audit_Date,
t.ContextUser,
CONVERT(NVARCHAR, Original_Date_Run, 127) AS Original_Value,
CONVERT(NVARCHAR, New_Date_Run, 127) AS New_Value,
t.Assay AS 'Assay',
Instrument AS 'Inst',
'1' AS 'LotLevel',
Lot1_Expiry_Date AS 'Lot_ExpiryDate',
Lot1Number AS 'Lot_Number',
t.Audit_Type,
part_number AS Partnumber,
'QC Result' AS [Type],
'Result Date' AS SubType
FROM
#Result AS t
INNER JOIN
(SELECT MAX(Audit_Date) AS DATE, FK_RegisterID
FROM #Result
GROUP BY FK_RegisterID) dt ON t.FK_RegisterID = dt.FK_RegisterID
AND t.Audit_Date = dt.DATE
WHERE
RTRIM(Lot1Number) != ''
AND (CONVERT(NVARCHAR, Original_Date_Run, 120) != CONVERT(NVARCHAR, New_Date_Run, 120)
OR t.Audit_Type = 'i'
OR t.Audit_Type = 'd')
This insert statement occurs about 50 times and the only line changes occur are:
'Result Date' AS SubType
CONVERT(NVARCHAR, Original_Date_Run, 120) != CONVERT(NVARCHAR, New_Date_Run, 120)
I can provide more information if needed. Any help would be much appreciated to improve performance.

SQL query with start and end dates - what is the best option?

I am using MS SQL Server 2005 at work to build a database. I have been told that most tables will hold 1,000,000 to 500,000,000 rows of data in the near future after it is built... I have not worked with datasets this large. Most of the time I don't even know what I should be considering to figure out what the best answer might be for ways to set up schema, queries, stuff.
So... I need to know the start and end dates for something and a value that is associated with in ID during that time frame. SO... we can the table up two different ways:
create table xxx_test2 (id int identity(1,1), groupid int, dt datetime, i int)
create table xxx_test2 (id int identity(1,1), groupid int, start_dt datetime, end_dt datetime, i int)
Which is better? How do I define better? I filled the first table with about 100,000 rows of data and it takes about 10-12 seconds to set up in the format of the second table depending on the query...
select y.groupid,
y.dt as [start],
z.dt as [end],
(case when z.dt is null then 1 else 0 end) as latest,
y.i
from #x as y
outer apply (select top 1 *
from #x as x
where x.groupid = y.groupid and
x.dt > y.dt
order by x.dt asc) as z
or
http://consultingblogs.emc.com/jamiethomson/archive/2005/01/10/t-sql-deriving-start-and-end-date-from-a-single-effective-date.aspx
Buuuuut... with the second table.... to insert a new row, I have to go look and see if there is a previous row and then if so update its end date. So... is it a question of performance when retrieving data vs insert/update things? It seems silly to store that end date twice but maybe...... not? What things should I be looking at?
this is what i used to generate my fake data... if you want to play with it for some reason (if you change the maximum of the random number to something higher it will generate the fake stuff a lot faster):
declare #dt datetime
declare #i int
declare #id int
set #id = 1
declare #rowcount int
set #rowcount = 0
declare #numrows int
while (#rowcount<100000)
begin
set #i = 1
set #dt = getdate()
set #numrows = Cast(((5 + 1) - 1) *
Rand() + 1 As tinyint)
while #i<=#numrows
begin
insert into #x values (#id, dateadd(d,#i,#dt), #i)
set #i = #i + 1
end
set #rowcount = #rowcount + #numrows
set #id = #id + 1
print #rowcount
end
For your purposes, I think option 2 is the way to go for table design. This gives you flexibility, and will save you tons of work.
Having the effective date and end date will allow you to have a query that will only return currently effective data by having this in your where clause:
where sysdate between effectivedate and enddate
You can also then use it to join with other tables in a time-sensitive way.
Provided you set up the key properly and provide the right indexes, performance (on this table at least) should not be a problem.
for anyone who can use LEAD Analytic function of SQL Server 2012 (or Oracle, DB2, ...), retrieving data from the 1st table (that uses only 1 date column) would be much much quicker than without this feature:
select
groupid,
dt "start",
lead(dt) over (partition by groupid order by dt) "end",
case when lead(dt) over (partition by groupid order by dt) is null
then 1 else 0 end "latest",
i
from x

SQL server udf not working

I'm now all day on a fairly simple udf. It's below. When I paste the select statement into a query, it runs as expected... when I execute the entire function, I get "0" every time. As you know there aren't a ton of debugging options, so it's hard to see what value are/ aren't being set as it executes. The basic purpose of it is to make sure stock data exists in a daily pricing table. So I can check by how many days' data I'm checking for, the ticker, and the latest trading date to check. A subquery gets me the correct trading dates, and I use "IN" to pull data out of the pricing and vol table... if the count of what comes back is less than the number of days I'm checking, no good. If it does, we're in business. Any help would be great, I'm a newb that is punting at this point:
ALTER FUNCTION dbo.PricingVolDataAvailableToDateProvided
(#Ticker char,
#StartDate DATE,
#NumberOfDaysBack int)
RETURNS bit
AS
BEGIN
DECLARE #Result bit
DECLARE #RecordCount int
SET #RecordCount = (
SELECT COUNT(TradeDate) AS Expr1
FROM (SELECT TOP (100) PERCENT TradeDate
FROM tblDailyPricingAndVol
WHERE ( Symbol = #Ticker )
AND ( TradeDate IN (SELECT TOP (#NumberOfDaysBack)
CAST(TradingDate AS DATE) AS Expr1
FROM tblTradingDays
WHERE ( TradingDate <= #StartDate )
ORDER BY TradingDate DESC) )
ORDER BY TradeDate DESC) AS TempTable )
IF #RecordCount = #NumberOfDaysBack
SET #Result = 1
ELSE
SET #Result = 0
RETURN #Result
END
#Ticker char seems suspect.
If you don't declare a length in the parameter definition it defaults to char(1) so quite likely your passed in tickers are being silently truncated - hence no matches.
SELECT TOP (100) PERCENT TradeDate ... ORDER BY TradeDate DESC
in the derived table is pointless but won't affect the result.