Good Day,
I am trying to fetch dates from one of my table, containing records in millions, in database and save them in a variable using Cursor. After fetching dates i am inserting records in db at that particular date. For this I am using While Loop. It turns out that While loop is really slowing down the performance, it is taking about hours to complete execution. I am including a part of a query for further clearance.
declare #tranDateCursor cursor;
declare #today as date;
begin
set #tranDateCursor = cursor
for
select distinct transferDate
from transactions
where [type] = 'customer'
open #tranDateCursor
fetch next from #tranDateCursor
into #today
while ##fetch_status = 0
begin
declare #yesterday as date
set #yesterday = (
select top(1) transferDate
from (
select distinct(transferDate) AS transferDate
from transactions
where transferDate < #today
and [type] = 'customer'
) as ODC_DATES
order by ODC_DATES.transferDate desc
)
insert into transactions([type],transferDate)
select 'customer'
,#today
from transactions xt
right outer join x_itransaction as y
on y.customer_account = xt.customer_account
AND y.transferDate = #yesterday
AND xt.transferDate = #today
where xt.transactionId is null
and y.transferDate = #yesterday
and y.[type] ='customer'
end
end
I have tried using Table Variables instead of CURSOR with WHILE loop, but it turns out that it was too taking very much time to run. My concern is that, Is there a better alternative for a while loop in this particular scenario?
How to create a stored procedure in SQL Server 2012 that returns a list of courses running between 2 months?
I've written code something like this:
create procedure final_RTrainerqualification
(#TrainerID char(10),
#Coursecode char(4) OUTPUT,
#qualcode nvarchar(30),
#coursedate datetime output)
DECLARE #MinDate DATE = '20160401',
#MaxDate DATE = '20160601';
SELECT coursedate
FROM dbo.RTrainerqualification
WHERE coursedate >= #MinDate
AND coursedate < #MaxDate;`
It should be returning the list of courses that run between these 2 dates mentioned but I am new to stored procedure so my question is how do I assign coursedates to the courses and make it return the list?
Edit- used T-SQL to recreate the code
#Coursecount smallint
declare #Coursedatebeg datetime, #Coursedateend datetime, #CourseCode char(4),#TrainerID char(10);
select #Coursedatebeg = '2015-04-20'
select #Coursedateend = '2015-06-20';
while #Coursedatebeg <= #Coursedateend
begin
select #CourseCode = #Coursedateend;
select #Coursecount = count(*) from RCourseInstance
where CourseCode between 'R222' and 'R224';
if #CourseCount <> 0
begin
Print 'Courses running between April and June 2015 ' ;
select Coursedate,CourseCode from RCourseInstance as t
inner join Coursedate as d on t.Coursedate = d.Coursedate
inner join CourseCode as c on c.CourseCode = t.CourseCode
where CourseCode between 'R222' and 'R224';
end
else
print 'No courses are running between these dates ' ;
set #Coursedatebeg = #Coursedatebeg + 2;
end
It is returning the print statement but also declaring that invalid object name Coursedate
what have I done incorrectly here?
Firstly, use ISO date strings to avoid regional settings issues. You are only returning the coursedate in the SELECT. If you want further information such as the coursecode then add it into the SELECT. i.e.
DECLARE #MinDate DATE = '2016-04-01',
#MaxDate DATE = '2016-06-01'
SELECT coursedate, coursecode, *
FROM dbo.RTrainerqualification
WHERE coursedate >= #MinDate
AND coursedate < #MaxDate;
The OUTPUT variables are not required - these are needed when you want to return a single value rather than a list.
This is fundamental SQL stuff so I would strongly recommend you do some reading on this. There is a simple stored procedure tutorial here to start off with.
I have the following query, but it is not giving any regard to the in the p.created_by =#searchBy where clause, how to correct it so that the results would be filtered according #searchBy too.
ALTER PROC [dbo].[Rptcashcollectionouter] #branchId INT,
#searchBy INT,
#strDate DATETIME=NULL,
#endDate DATETIME=NULL
AS
BEGIN
SELECT DISTINCT p.created_on AS paid_date
FROM reading re
JOIN billing_gen bg ON re.id = bg.reading_id
JOIN customer_registration cr ON bg.account_number = cr.account_number
JOIN payment p ON bg.bill_number = p.bill_number
JOIN customer_category cc ON cr.customer_category_id = cc.id
WHERE p.created_by = #searchBy
AND ( ( #strDate IS NULL )
OR Cast(Floor(Cast(p.created_on AS FLOAT)) AS DATETIME) >=
Cast(Floor(Cast(#strDate AS FLOAT)) AS DATETIME) )
AND ( ( #endDate IS NULL )
OR Cast(Floor(Cast(p.created_on AS FLOAT)) AS DATETIME) <=
Cast(Floor(Cast(#endDate AS FLOAT)) AS DATETIME) )
AND cr.branch_id = #branchId
ORDER BY p.created_on ASC;
END;
Check the value inside your procedure as below.
SELECT #branchId, #searchBy, #strDate,#endDate
And, then try to run the SQL manually with the same value. Also, make sure you have data in your table for your criteria.
Also, what exactly you are trying here ?
Cast(Floor(Cast(p.created_on AS FLOAT)) AS DATETIME)
While executing procedure, make sure you are passing properly value.
Print out all values that are coming (Just for testing).
#searchBy INT is of integer type. But i think "p.created_by =#searchBy" is a type of datetime or date , so it may also conflicts here, or display wrong result. In below line. p.created_by is treating as a datetime or date and #searchby in integer.
WHERE p.created_by = #searchBy
I have wrote this cursor for commission report. What happens is commission comes in one table, the records are another table. I match two based on certain critera (there is not exact match available). The problem is there are duplicates where records exist. When I match commission with the records table, it can result picking up these duplicates. Thus the rep gets paid more. On the other hand, there are duplicates in commission table also but those are valid beause they simple mean an account got paid for 2 months.
I wrote this query but it takes 5+ minutes to run. I have 50,000 records in records table and 100,000 in commission table. Is there any way I an improve this cursor?
/* just preparation of cursor, this is not time consuming */
CREATE TABLE #result
(
repid INT,
AccountNo VARCHAR(100),
supplier VARCHAR(15),
CompanyName VARCHAR(200),
StartDate DATETIME,
EndDate DATETIME,
Product VARCHAR(25),
commodity VARCHAR(25),
ContractEnd DATETIME,
EstUsage INT,
EnrollStatus VARCHAR(10),
EnrollDate DATETIME,
ActualEndDate DATETIME,
MeterStart DATETIME,
MeterEnd DATETIME,
ActualUsage INT
)
DECLARE #AccountNo VARCHAR(100)
DECLARE #supplier VARCHAR(10)
DECLARE #commodity VARCHAR(15)
DECLARE #meterstart DATETIME
DECLARE #meterEnd DATETIME
DECLARE #volume FLOAT
DECLARE #RepID INT
DECLARE #Month INT
DECLARE #Year INT
SET #repID = 80
SET #Month = 1
SET #year = 2012
/* the actual cursor */
DECLARE commission_cursor CURSOR FOR
SELECT AccountNo,
supplier,
commodity,
meterStart,
MeterEnd,
Volume
FROM commission
WHERE Datepart(m, PaymentDate) = #Month
AND Datepart(YYYY, PaymentDate) = #Year
OPEN commission_cursor
FETCH next FROM commission_cursor INTO #AccountNo, #supplier, #commodity, #MeterStart, #MeterEnd, #Volume;
WHILE ##fetch_status = 0
BEGIN
IF EXISTS (SELECT id
FROM Records
WHERE AccountNo = #AccountNo
AND supplier = #supplier
AND Commodity = #commodity
AND RepID = #repID)
INSERT INTO #result
SELECT TOP 1 RepID,
AccountNo,
Supplier,
CompanyName,
[Supplier Start Date],
[Supplier End Date],
Product,
Commodity,
[customer end date],
[Expected Usage],
EnrollStatus,
ActualStartDate,
ActualEndDate,
#meterstart,
#MeterEnd,
#volume
FROM Records
WHERE AccountNo = #AccountNo
AND supplier = #supplier
AND Commodity = #commodity
AND RepID = #repID
AND #MeterStart >= Dateadd(dd, -7, ActualStartDate)
AND #meterEnd <= Isnull(Dateadd(dd, 30, ActualEndDate), '2015-12-31')
FETCH next FROM commission_cursor INTO #AccountNo, #supplier, #commodity, #MeterStart, #MeterEnd, #Volume;
END
SELECT *
FROM #result
/* clean up */
CLOSE commission_cursor
DEALLOCATE commission_cursor
DROP TABLE #result
I have read answer to How to make a T-SQL Cursor faster?, for that what I get is rewrite this query in table form. But I do have another query which uses join and is lightening fast. The problem is, it can not differentiate between the dups in my records table.
Is there anything I can do to make is faster. This is primary question. If not, do you have any alternative way to do it.
I specifically need help with
Will using Views or store procedure help
I there a way I can use cache in Cursor to make it faster
Any other option in syntax
The very first option is to set the least resource intensive options for your cursor:
declare commission_cursor cursor
local static read_only forward_only
for
Next is to investigate whether you need a cursor at all. In this case I think you can do the same with a single pass and no loops:
;WITH x AS
(
SELECT
rn = ROW_NUMBER() OVER (PARTITION BY r.AccountNo, r.Supplier, r.Commodity, r.RepID
ORDER BY r.ActualEndDate DESC),
r.RepID,
r.AccountNo,
r.Supplier,
r.CompanyName,
StartDate = r.[Supplier Start Date],
EndDate = r.[Supplier End Date],
r.Product,
r.Commodity,
ContractEnd = r.[customer end date],
EstUsage = r.[Expected Usage],
r.EnrollStatus,
EnrollDate = r.ActualStartDate,
r.ActualEndDate,
c.MeterStart,
c.MeterEnd,
ActualUsage = c.Volume
FROM dbo.commission AS c
INNER JOIN dbo.Records AS r
ON c.AccountNo = r.AccountNo
AND c.Supplier = r.Supplier
AND c.Commodity = r.Commodity
AND c.RepID = r.RepID
WHERE
c.PaymentDate >= DATEADD(MONTH, #Month-1, CONVERT(CHAR(4), #Year) + '0101')
AND c.PaymentDate < DATEADD(MONTH, 1, CONVERT(CHAR(4), #Year) + '0101')
AND r.RepID = #RepID
)
SELECT RepID, AccountNo, Supplier, CompanyName, StartDate, EndDate,
Product, Commodity, ContractEnd, EstUsage, EnrollStatus, EnrollDate,
ActualEndDate, MeterStart, MeterEnd, ActualUsage
FROM x
WHERE rn = 1 --ORDER BY something;
If this is still slow, then the cursor probably wasn't the problem - the next step will be investigating what indexes might be implemented to make this query more efficient.
Temp tables are your friend
The way I solved my problem, merging data from two tables, removed duplicates in complex fashion and everything extremely fast was to use temporary table. This is what I did
Create a #temp table, fetch the merged data from both the tables. Make sure you include ID fields in both tables even if you do not required it. This will help remove duplicates.
Now you can do all sort of calculation on this table. Remove duplicates from table B, just remove duplicate table B IDs. Remove duplicates from table A, just remove duplicate table A Ids. There is more complexity to the problem but at least this is probably the best way to solve your problem and make it considerably faster if cursors are too expensive and takes considerable time to calculate. In my case it was taking +5 min. The #temp table query about about 5 sec, which had a lot more calculations in it.
While applying Aaron solution, the cursor did not get any faster. The second query was faster but it did not give me the correct answer, so finally I used temp tables. This is my own answer.
I am using MS SQL Server 2005 at work to build a database. I have been told that most tables will hold 1,000,000 to 500,000,000 rows of data in the near future after it is built... I have not worked with datasets this large. Most of the time I don't even know what I should be considering to figure out what the best answer might be for ways to set up schema, queries, stuff.
So... I need to know the start and end dates for something and a value that is associated with in ID during that time frame. SO... we can the table up two different ways:
create table xxx_test2 (id int identity(1,1), groupid int, dt datetime, i int)
create table xxx_test2 (id int identity(1,1), groupid int, start_dt datetime, end_dt datetime, i int)
Which is better? How do I define better? I filled the first table with about 100,000 rows of data and it takes about 10-12 seconds to set up in the format of the second table depending on the query...
select y.groupid,
y.dt as [start],
z.dt as [end],
(case when z.dt is null then 1 else 0 end) as latest,
y.i
from #x as y
outer apply (select top 1 *
from #x as x
where x.groupid = y.groupid and
x.dt > y.dt
order by x.dt asc) as z
or
http://consultingblogs.emc.com/jamiethomson/archive/2005/01/10/t-sql-deriving-start-and-end-date-from-a-single-effective-date.aspx
Buuuuut... with the second table.... to insert a new row, I have to go look and see if there is a previous row and then if so update its end date. So... is it a question of performance when retrieving data vs insert/update things? It seems silly to store that end date twice but maybe...... not? What things should I be looking at?
this is what i used to generate my fake data... if you want to play with it for some reason (if you change the maximum of the random number to something higher it will generate the fake stuff a lot faster):
declare #dt datetime
declare #i int
declare #id int
set #id = 1
declare #rowcount int
set #rowcount = 0
declare #numrows int
while (#rowcount<100000)
begin
set #i = 1
set #dt = getdate()
set #numrows = Cast(((5 + 1) - 1) *
Rand() + 1 As tinyint)
while #i<=#numrows
begin
insert into #x values (#id, dateadd(d,#i,#dt), #i)
set #i = #i + 1
end
set #rowcount = #rowcount + #numrows
set #id = #id + 1
print #rowcount
end
For your purposes, I think option 2 is the way to go for table design. This gives you flexibility, and will save you tons of work.
Having the effective date and end date will allow you to have a query that will only return currently effective data by having this in your where clause:
where sysdate between effectivedate and enddate
You can also then use it to join with other tables in a time-sensitive way.
Provided you set up the key properly and provide the right indexes, performance (on this table at least) should not be a problem.
for anyone who can use LEAD Analytic function of SQL Server 2012 (or Oracle, DB2, ...), retrieving data from the 1st table (that uses only 1 date column) would be much much quicker than without this feature:
select
groupid,
dt "start",
lead(dt) over (partition by groupid order by dt) "end",
case when lead(dt) over (partition by groupid order by dt) is null
then 1 else 0 end "latest",
i
from x