SQL Calculate 20 business days, remove bank hols and reference to another query

SQL Calculate 20 business days, remove bank hols and reference to another query - sql

Sorry I am quite new to creating functions in SQL Server 2008 R2. I have largely been able to get by using T-SQL statements.
However I need to create a report that returns records with a date (program start date), that part is simple enough, however for each row date I want to calculate a target completion date based on 20 business days. I also want to avoid counting bank holidays too. I have a table named dCalendar which holds every day for the last few year with flags saying whether each day is a workday or bank holiday.
I have found lots of stuff on how to calculate the number of business days between two dates but this is more tricky.
I have created this function
ALTER function [warehouse].[MS_fnAddBusinessDays]
(#StartDate datetime,
#nDays int)
RETURNS TABLE
AS
RETURN
(SELECT calDt
FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY calDt ASC) AS rownumber,
calDt
FROM
warehouse.dCalendar
WHERE
(calDt >= #StartDate)
AND (weekDayFg = 1)
AND (BankHolidayFg = 0)) AS Results
WHERE
(rownumber = #nDays)
and can call it using the following
SELECT *
FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY calDt ASC) AS rownumber,
calDt, BankHolidayFg, weekDayFg, dateStr
FROM
warehouse.dCalendar
WHERE
(calDt >= CONVERT(DATETIME, '2016-12-11 00:00:00', 102))
AND (weekDayFg = 1) AND (BankHolidayFg = 0)) AS TblResults
WHERE
(rownumber = 20)
I just cannot work out how to embed this within the following example where progStartDate is the date i want to calculate the target date from
SELECT
end_user.fContactVwDn.client_no,
end_user.fProgrammeVwDn.prtyProgrammeType,
end_user.fProgrammeVwDn.progStartDate,
SUM(CASE WHEN comtContactMeetingType = 'Visit' THEN 1 ELSE 0 END) AS InitialVisitTotal,
MAX(CASE WHEN comtContactMeetingType = 'Visit' THEN contPlannedFromDate ELSE 0 END) AS InitialVisitDate
FROM
end_user.fContactVwDn
INNER JOIN
warehouse.fContactProgramme ON end_user.fContactVwDn.contKy = warehouse.fContactProgramme.contKy
INNER JOIN
end_user.fProgrammeVwDn ON warehouse.fContactProgramme.progGuid = end_user.fProgrammeVwDn.progprogGuid
GROUP BY
end_user.fProgrammeVwDn.prtyProgrammeType,
end_user.fProgrammeVwDn.progStartDate, end_user.fContactVwDn.client_no
HAVING
(end_user.fProgrammeVwDn.prtyProgrammeType = 'Application')
AND (end_user.fProgrammeVwDn.progStartDate > CONVERT(DATETIME, '2016-07-01 00:00:00', 102))
Any help would be much appreciated.

The function looks OK. I would add TOP, though, to avoid scanning the whole Calendar table. I hope, calDt is a primary key.
ALTER function [warehouse].[MS_fnAddBusinessDays]
(#StartDate datetime,
#nDays int)
RETURNS TABLE
AS
RETURN
(
SELECT calDt
FROM
(
SELECT TOP(#nDays)
ROW_NUMBER() OVER (ORDER BY calDt ASC) AS rownumber,
calDt
FROM
warehouse.dCalendar
WHERE
(calDt >= #StartDate)
AND (weekDayFg = 1)
AND (BankHolidayFg = 0)
ORDER BY calDt ASC
) AS Results
WHERE
rownumber = #nDays
)
This function is inline table-valued function, which means that it returns a table, not a scalar value. This is good, because scalar functions usually make queries slow, but inline (single-statement) table-valued functions can be inlined by the optimiser.
To call such function use CROSS APPLY. It was originally introduced to SQL Server specifically for calling table-valued functions, but it can do much more (it is so called lateral join).
I wrapped your original query in a CTE to make the final query readable.
WITH
CTE
AS
(
SELECT
end_user.fContactVwDn.client_no,
end_user.fProgrammeVwDn.prtyProgrammeType,
end_user.fProgrammeVwDn.progStartDate,
SUM(CASE WHEN comtContactMeetingType = 'Visit' THEN 1 ELSE 0 END) AS InitialVisitTotal,
MAX(CASE WHEN comtContactMeetingType = 'Visit' THEN contPlannedFromDate ELSE 0 END) AS InitialVisitDate
FROM
end_user.fContactVwDn
INNER JOIN warehouse.fContactProgramme ON end_user.fContactVwDn.contKy = warehouse.fContactProgramme.contKy
INNER JOIN end_user.fProgrammeVwDn ON warehouse.fContactProgramme.progGuid = end_user.fProgrammeVwDn.progprogGuid
GROUP BY
end_user.fProgrammeVwDn.prtyProgrammeType,
end_user.fProgrammeVwDn.progStartDate,
end_user.fContactVwDn.client_no
HAVING
(end_user.fProgrammeVwDn.prtyProgrammeType = 'Application')
AND (end_user.fProgrammeVwDn.progStartDate > CONVERT(DATETIME, '2016-07-01 00:00:00', 102))
)
SELECT
CTE.client_no,
CTE.prtyProgrammeType,
CTE.progStartDate,
CTE.InitialVisitTotal,
CTE.InitialVisitDate,
F.calDt
FROM
CTE
CROSS APPLY [warehouse].[MS_fnAddBusinessDays](CTE.progStartDate, 20) AS F
;

Related

Delete the records repeated by date, and keep the oldest

I have this query, and it returns the following result, I need to delete the records repeated by date, and keep the oldest, how could I do this?
select
a.EMP_ID, a.EMP_DATE,
from
EMPLOYES a
inner join
TABLE2 b on a.table2ID = b.table2ID and b.ID_TYPE = 'E'
where
a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and month(a.DATE) = 1
and a.ID <> 31
order by
a.DATE;
Additionally, I would like to fill in the missing days of the month ... and put them empty if I don't have that data, can this be done?
I would appreciate if you could guide me to solve this problem
Thank you!

The other answers miss some of the requirement..
Initial step - do this once only. Make a calendar table. This will come in handy for all sorts of things over the time:
DECLARE #Year INT = '2000';
DECLARE #YearCnt INT = 50 ;
DECLARE #StartDate DATE = DATEFROMPARTS(#Year, '01','01')
DECLARE #EndDate DATE = DATEADD(DAY, -1, DATEADD(YEAR, #YearCnt, #StartDate));
;WITH Cal(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM Cal
WHERE n < DATEDIFF(DAY, #StartDate, #EndDate)
),
FnlDt(d, n) AS
(
SELECT DATEADD(DAY, n, #StartDate), n FROM Cal
),
FinalCte AS
(
SELECT
[D] = CONVERT(DATE,d),
[Dy] = DATEPART(DAY, d),
[Mo] = DATENAME(MONTH, d),
[Yr] = DATEPART(YEAR, d),
[DN] = DATENAME(WEEKDAY, d),
[N] = n
FROM FnlDt
)
SELECT * INTO Cal FROM finalCte
ORDER BY [Date]
OPTION (MAXRECURSION 0);
credit: mostly this site
Now we can write some simple query to stick your data (with one small addition) onto it:
--your query, minus the date bits in the WHERE, and with a ROW_NUMBER
WITH yourQuery AS(
SELECT a.emp_id, a.emp_date,
ROW_NUMBER() OVER(PARTITION BY CAST(a.emp_date AS DATE) ORDER BY a.emp_date) rn
FROM EMPLOYES a
INNER JOIN TABLE2 b on a.table2ID = b.table2ID
WHERE a.emp_id = 'VJAHAJHSJHDAJHSJDH' AND a.id <> 31 AND b.id_type = 'E'
)
--your query, left joined onto the cal table so that you get a row for every day even if there is no emp data for that day
SELECT c.d, yq.*
FROM
Cal c
LEFT JOIN yourQuery yq
ON
c.d = CAST(yq.emp_date AS DATE) AND --cut the time off
yq.rn = 1 --keep only the earliest time per day
WHERE
c.d BETWEEN '2021-01-01' AND EOMONTH('2021-01-01')
We add a rownumbering to your table, it restarts every time the date changes and counts up in order of time. We make this into a CTE (or a subquery, CTE is cleaner) then we simply left join it to the calendar table. This means that for any date you don't have data, you still have the calendar date. For any days you do have data, the rownumber rn being a condition of the join means that only the first datetime from each day is present in the results
Note: something is wonky about your question . You said you SELECT a.emp_id and your results show 'VJAHAJHSJHDAJHSJDH' is the emp id, but your where clause says a.id twice, once as a string and once as a number - this can't be right, so I've guessed at fixing it but I suspect you have translated your query into something for SO, perhaps to hide real column names.. Also your SELECT has a dangling comma that is a syntax error.
If you have translated/obscured your real query, make absolutely sure you understand any answer here when translating it back. It's very frustrating when someone is coming back and saying "hi your query doesn't work" then it turns out that they damaged it trying to translate it back to their own db, because they hid the real column names in the question..
FInally, do not use functions on table data in a where clause; it generally kills indexing. Always try and find a way of leaving table data alone. Want all of january? Do like I did, and say table.datecolumn BETWEEN firstofjan AND endofjan etc - SQLserver at least stands a chance of using an index for this, rather than calling a function on every date in the table, every time the query is run

You can use ROW_NUMBER
WITH CTE AS
(
SELECT a.EMP_ID, a.EMP_DATE,
RN = ROW_NUMBER() OVER (PARTITION BY a.EMP_ID, CAST(a.DATE as Date) ORDER BY a.DATE ASC)
from EMPLOYES a INNER JOIN TABLE2 b
on a.table2ID = b.table2ID
and b.ID_TYPE = 'E'
where a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and MONTH(a.DATE) = 1
and a.ID <> 31
)
SELECT * FROM CTE
WHERE RN = 1

Try with an aggregate function MAX or MIN
create table #tmp(dt datetime, val numeric(4,2))
insert into #tmp values ('2021-01-01 10:30:35', 1)
insert into #tmp values ('2021-01-02 10:30:35', 2)
insert into #tmp values ('2021-01-02 11:30:35', 3)
insert into #tmp values ('2021-01-03 10:35:35', 4)
select * from #tmp
select tmp.*
from #tmp tmp
inner join
(select max(dt) as dt, cast(dt as date) as dt_aux from #tmp group by cast(dt as date)) compressed_rows on
tmp.dt = compressed_rows.dt
drop table #tmp
results:

SQL - Adding conditions to SELECT

I have a table which has a timestamp and inCycle status of a machine. I'm using two CTE's and doing an INNER JOIN on row number so I can easily compare the timestamp of one row to the next. I have the DATEDIFF working and now I need to look at the inCycle status. Basically, if the inCycleThis and inCycleNext both = 1, I need to add it to an InCycle total.
Similarly (Shown table will make this clear):
incycleThis/next = 0,1 = not in cycle
incycleThis/next = 0,0 = not in cycle
incycleThis/next = 1,1 = in cycle
If I was doing this client side, this would be pretty simple. I need to do this in a stored procedure though due to there being a lot of records. I'd love to use an 'IF' in the SELECT section, but it seems that's not how it works.
The result I'm looking for at the end is simply: InCycle = Xtime. Something like:
SUM(Diff_seconds if((InCycleThis = 1 AND InCycleNext = 1) OR (InCycleThis = 1 AND InCycleNext = 0))
This is what I have so far:
WITH History_CTE (DT, MID, FRO, IC, RowNum)
AS
(
SELECT DateAndTime
,MachineID
,FeedRateOverride
,InCycle
,ROW_NUMBER()OVER(ORDER BY MachineID, DateAndTime) AS "row number"
FROM History
WHERE DateAndTime >= '2020-11-15'
AND DateAndTime < '2020-11-16'
),
History2_CTE (DT2, MID2, FRO2, IC2, RowNum2)
AS
(
SELECT DateAndTime
,MachineID
,FeedRateOverride
,InCycle
,ROW_NUMBER()OVER(ORDER BY MachineID, DateAndTime) AS "row number"
FROM History
WHERE DateAndTime >= '2020-11-15'
AND DateAndTime < '2020-11-16'
)
SELECT DT as 'TimeStamp'
,DT2 as 'TimeStamp Next Row'
,MID
,FRO
,IC as 'InCycle this'
,IC2 as 'InCycle next'
,RowNum
,DATEDIFF(s, History2_CTE.DT2, History_CTE.DT) AS 'Diff_seconds'
FROM History_CTE
INNER JOIN
History2_CTE ON History_CTE.RowNum = History2_CTE.RowNum2 + 1

Consider adding a third CTE to first conditionally calculate your needed value. Then aggregate for final statement. Recall CTEs can reference previously defined CTEs. Be sure to always quailfy columns with table aliases in JOIN queries.
WITH
... first two ctes...
, sub AS (
SELECT h1.DT AS 'TimeStamp'
, h2.DT2 AS 'TimeStamp Next Row'
, h1.MID
, h1.FRO
, h1.IC AS 'InCycle this'
, h2.IC2 AS 'InCycle next'
, h1.RowNum
, DATEDIFF(s, h2.DT2, h1.DT) AS 'Diff_seconds'
, CASE
WHEN (h1.IC = 1 AND h2.IC2 = 1) OR (h1.IC= 1 AND h2.IC2 = 0)
THEN DATEDIFF(s, h2.DT2, h1.DT)
END AS 'IC_Diff_seconds'
FROM History_CTE h1
INNER JOIN History2_CTE h2
ON h1.RowNum = h2.RowNum2 + 1
)
SELECT SUM([Diff_seconds]) AS Diff_seconds_Total
, SUM([IC_Diff_seconds]) AS IC_Diff_seconds_Total
FROM sub
And if needing to add groupings, incorporate GROUP BY:
SELECT h1.MID
, h1.FRO
, SUM([Diff_seconds]) AS Diff_seconds_Total
, SUM([IC_Diff_seconds]) AS IC_Diff_seconds_Total
FROM sub
GROUP BY h1.MID
, h1.FRO
Even aggregate calculations by day:
SELECT CONVERT(date, [TimeStamp]) AS [Day]
, SUM([Diff_seconds]) AS Diff_seconds_Total
, SUM([IC_Diff_seconds]) AS IC_Diff_seconds_Total
FROM sub
GROUP BY CONVERT(date, [TimeStamp])

The result I'm looking for at the end is simply: InCycle = Xtime. Something like:
SUM(Diff_seconds if((InCycleThis = 1 AND InCycleNext = 1) OR (InCycleThis = 1 AND InCycleNext = 0))
As I understand your question, you just need to sum the difference betwen the timestamp of "in cycle" rows and the timestamp of the next row.
select machineid,
sum(datediff(s, dateandtime, lead_dateandtime)) as total_in_time
from (
select h.*,
lead(dateandtime) over(partition by machineid order by dateandtime) as lead_dateandtime
from history h
) h
where inclycle = 1
group by machineid

group by issue in sql

i'm trying to get in a new column the sessions who are between 08:00 and 18:00. You can see my last CASE in the CTE. For each date there should be a new column "TotalRestrictedSessions" which indicate how many session were on that particular date. If there are none, in this case i have to write 0. I suspect that my problem is when i convert the DATE?
WITH ParkeonCTE
AS
(
SELECT
OccDate = CONVERT(DATE, OC.LocalStartTime),
TotalOccSessions = COUNT(OC.SessionId),
AuthorityId,
TotalOccDuration = ISNULL(SUM(OC.DurationMinutes),0),
TotalNumberOfOverstay = SUM(CAST(OC.IsOverstay AS INT)),
TotalMinOfOverstays = ISNULL(SUM(OC.OverStayDurationMinutes),0),
(CASE
WHEN OC.OspId IS NULL THEN 'OffStreet' ELSE 'OnStreet'
END
) AS ParkingContextType,
(CASE
WHEN CAST(OC.LocalStartTime AS TIME) >= '08:00:00' AND CAST(OC.LocalStartTime AS TIME) <=
'18:00:00'
THEN COUNT(OC.SessionId)
END
) AS TotalRestrictedSessions
FROM Analytics.OccupancySessions AS OC
WHERE OC.AuthorityId IS NOT NULL
GROUP BY CONVERT(DATE,OC.LocalStartTime), OC.AuthorityId,OC.OspId
)
SELECT OC.OccDate,
OC.ParkingContextType,
OC.AuthorityId,
OC.TotalRestrictedSessions,
SUM(OC.TotalOccSessions) AS TotalOccSessions,
AVG(OC.TotalOccDuration) AS AvgOccMinutesDuration, -- wrong
SUM(OC.TotalOccDuration) AS TotalOccDuration,
SUM(OC.TotalNumberOfOverstay) AS TotalNumberOfOverstay,
SUM(OC.TotalMinOfOverstays) AS TotalMinOfOverstays,
CAST(AVG(OC.TotalMinOfOverstays) AS decimal(10,2)) AS AvgMinOfOverstays -- wrong
FROM ParkeonCTE AS OC
GROUP BY OC.OccDate, OC.AuthorityId, OC.ParkingContextType
ORDER BY OC.OccDate DESC

You just need to move your aggregation outside of your CASE expression, called conditional aggregation.
SUM(CASE
WHEN CAST(OC.LocalStartTime AS TIME) >= '08:00:00'
AND CAST(OC.LocalStartTime AS TIME) <= '18:00:00'
THEN 1
ELSE 0
END
) AS TotalRestrictedSessions
Generally, you should include the current query results and your desired results in your question to make it easier to figure out where the issues are.

Need to calc start and end date from single effective date

I am trying to write SQL to calculate the start and end date from a single date called effective date for each item. Below is a idea of how my data looks. There are times when the last effective date for an item will be in the past so I want the end date for that to be a year from today. The other two items in the table example have effective dates in the future so no need to create and end date of a year from today.
I have tried a few ways but always run into bad data. Below is an example of my query and the bad results
select distinct tb1.itemid,tb1.EffectiveDate as startdate
, case
when dateadd(d,-1,tb2.EffectiveDate) < getdate()
or tb2.EffectiveDate is null
then getdate() +365
else dateadd(d,-1,tb2.EffectiveDate)
end as enddate
from #test tb1
left join #test as tb2 on (tb2.EffectiveDate > tb1.EffectiveDate
or tb2.effectivedate is null) and tb2.itemid = tb1.itemid
left join #test tb3 on (tb1.EffectiveDate < tb3.EffectiveDate
andtb3.EffectiveDate <tb2.EffectiveDate or tb2.effectivedate is null)
and tb1.itemid = tb3.itemid
left join #test tb4 on tb1.effectivedate = tb4.effectivedate \
and tb1.itemid = tb4.itemid
where tb1.itemID in (62741,62740, 65350)
Results - there is an extra line for 62740
Bad Results
I expect to see below since the first two items have a future end date no need to create an end date of today + 365 but the last one only has one effective date so we have to calculate the end date.

I think I've read your question correctly. If you could provide your expected output it would help a lot.
Test Data
CREATE TABLE #TestData (itemID int, EffectiveDate date)
INSERT INTO #TestData (itemID, EffectiveDate)
VALUES
(62741,'2016-06-25')
,(62741,'2016-06-04')
,(62740,'2016-07-09')
,(62740,'2016-06-25')
,(62740,'2016-06-04')
,(65350,'2016-05-28')
Query
SELECT
a.itemID
,MIN(a.EffectiveDate) StartDate
,MAX(CASE WHEN b.MaxDate > GETDATE() THEN b.MaxDate ELSE CONVERT(date,DATEADD(yy,1,GETDATE())) END) EndDate
FROM #TestData a
JOIN (SELECT itemID, MAX(EffectiveDate) MaxDate FROM #TestData GROUP BY itemID) b
ON a.itemID = b.itemID
GROUP BY a.itemID
Result
itemID StartDate EndDate
62740 2016-06-04 2016-07-09
62741 2016-06-04 2016-06-25
65350 2016-05-28 2017-06-24

This should do it:
SELECT itemid
,effective_date AS "Start"
,(SELECT MIN(effective_date)
FROM effective_date_tbl
WHERE effective_date > edt.effective_date
AND itemid = edt.itemid) AS "End"
FROM effective_date_tbl edt
WHERE effective_date <
(SELECT MAX(effective_date) FROM effective_date_tbl WHERE itemid = edt.itemid)
UNION ALL
SELECT itemid
,effective_date AS "Start"
,(SYSDATE + 365) AS "End"
FROM effective_date_tbl edt
WHERE 1 = ( SELECT COUNT(*) FROM effective_date_table WHERE itemid = edt.itemid )
ORDER BY 1, 2, 3;

I did this exercise for Items that have multiple EffectiveDate in the table
you can create this view
CREATE view [VW_TESTDATA]
AS ( SELECT * FROM
(SELECT ROW_NUMBER() OVER (ORDER BY Item,CONVERT(datetime,EffectiveDate,110)) AS ID, Item, DATA
FROM MyTable ) AS Q
)
so use a select to compare the same Item
select * from [VW_TESTDATA] as A inner join [VW_TESTDATA] as B on A.Item = B.Item and A.id = B.id-1
in this way you always minor and major Date
I did not understand how to handle dates with only one Item , but it seems the simplest thing and can be added to this query with a UNION ALL, because the view not cover individual Item
You also need to figure out how to deal with Item with two equal EffectiveDate

you should use the case when statement..
[wrong query because a misunderstand of the requirements]
SELECT
ItemID AS Item,
StartDate,
CASE WHEN EndDate < Sysdate THEN Sysdate + 365 ELSE EndDate END AS EndDate
FROM
(
SELECT tabStartDate.ItemID, tabStartDate.EffectiveDate AS StartDate, tabEndDate.EffectiveDate AS EndDate
FROM TableItems tabStartDate
JOIN TableItems tabEndDate on tabStartDate.ItemID = tabEndDate.ItemID
) TableDatesPerItem
WHERE StartDate < EndDate
update after clarifications in the OP and some comments
I found a solution quite portable, because it doesn't make use of partioning but endorses on a sort of indexing rule that make to correspond the dates of each item with others with the same id, in order of time's succession.
The portability is obviously related to the "difficult" part of query, while row numbering mechanism and conversion go adapted, but I think that it isn't a problem.
I sended a version for MySql that it can try on SQL Fiddle..
Table
CREATE TABLE ITEMS
(`ItemID` int, `EffectiveDate` Date);
INSERT INTO ITEMS
(`ItemID`, `EffectiveDate`)
VALUES
(62741, DATE(20160625)),
(62741, DATE(20160604)),
(62740, DATE(20160709)),
(62740, DATE(20160625)),
(62740, DATE(20160604)),
(62750, DATE(20160528))
;
Query
SELECT
RESULT.ItemID AS ItemID,
DATE_FORMAT(RESULT.StartDate,'%m/%d/%Y') AS StartDate,
CASE WHEN RESULT.EndDate < CURRENT_DATE
THEN DATE_FORMAT((CURRENT_DATE + INTERVAL 365 DAY),'%m/%d/%Y')
ELSE DATE_FORMAT(RESULT.EndDate,'%m/%d/%Y')
END AS EndDate
FROM
(
SELECT
tabStartDate.ItemID AS ItemID,
tabStartDate.StartDate AS StartDate,
tabEndDate.EndDate
,tabStartDate.IDX,
tabEndDate.IDX AS IDX2
FROM
(
SELECT
tabStartDateIDX.ItemID AS ItemID,
tabStartDateIDX.EffectiveDate AS StartDate,
#rownum:=#rownum+1 AS IDX
FROM ITEMS AS tabStartDateIDX
ORDER BY tabStartDateIDX.ItemID, tabStartDateIDX.EffectiveDate
)AS tabStartDate
JOIN
(
SELECT
tabEndDateIDX.ItemID AS ItemID,
tabEndDateIDX.EffectiveDate AS EndDate,
#rownum:=#rownum+1 AS IDX
FROM ITEMS AS tabEndDateIDX
ORDER BY tabEndDateIDX.ItemID, tabEndDateIDX.EffectiveDate
)AS tabEndDate
ON tabStartDate.ItemID = tabEndDate.ItemID AND (tabEndDate.IDX - tabStartDate.IDX = ((select count(*) from ITEMS)+1) )
,(SELECT #rownum:=0) r
UNION
(
SELECT
tabStartDateSingleItem.ItemID AS ItemID,
tabStartDateSingleItem.EffectiveDate AS StartDate,
tabStartDateSingleItem.EffectiveDate AS EndDate
,0 AS IDX,0 AS IDX2
FROM ITEMS AS tabStartDateSingleItem
Group By tabStartDateSingleItem.ItemID
HAVING Count(tabStartDateSingleItem.ItemID) = 1
)
) AS RESULT
;

SQL Table-value function optimisation / improvement

We've got a query that is taking a very long time to complete with a large dataset. I think I've tracked it down to a table-value function in the SQL server.
The query is designed to return the difference in printing usage between two dates. So if a printer had usage of 100 at date x and 200 at date y a row needs to be returned which reflects that it has had a usage change of 100.
These readings are taken periodically (but not every day) and stored in a table called MeterReadings. The code for the table-value function is below. This is then called from another SQL query which joins the returned table on a devices table with an inner join to get extra device information.
Any advise as to how to optimise the below would be appreciated.
ALTER FUNCTION [dbo].[DeviceUsage]
-- Add the parameters for the stored procedure here
( #StartDate DateTime , #EndDate DateTime )
RETURNS table
AS
RETURN
(
SELECT MAX(dbo.MeterReadings.ScanDateTime) AS MX,
MAX(dbo.MeterReadings.DeviceTotal - reading.DeviceTotal) AS TotalDiff,
MAX(dbo.MeterReadings.TotalCopy - reading.TotalCopy) AS CopyDiff,
MAX(dbo.MeterReadings.TotalPrint - reading.TotalPrint) AS PrintDiff,
MAX(dbo.MeterReadings.TotalScan - reading.TotalScan) AS ScanDiff,
MAX(dbo.MeterReadings.TotalFax - reading.TotalFax) AS FaxDiff,
MAX(dbo.MeterReadings.TotalMono - reading.TotalMono) AS MonoDiff,
MAX(dbo.MeterReadings.TotalColour - reading.TotalColour) AS ColourDiff,
MIN(reading.ScanDateTime) AS MN, dbo.MeterReadings.DeviceID
FROM dbo.MeterReadings INNER JOIN (SELECT * FROM dbo.MeterReadings WHERE
(dbo.MeterReadings.ScanDateTime > #StartDate) AND
(dbo.MeterReadings.ScanDateTime < #EndDate) )
AS reading ON dbo.MeterReadings.DeviceID = reading.DeviceID
WHERE (dbo.MeterReadings.ScanDateTime > #StartDate) AND (dbo.MeterReadings.ScanDateTime < #EndDate)
GROUP BY dbo.MeterReadings.DeviceID);

On the assumption that a value can only ever increase over time, it can certainly be simplified.
SELECT
DeviceID,
MIN(ScanDateTime) AS MN,
MAX(ScanDateTime) AS MX,
MAX(DeviceTotal ) - MIN(DeviceTotal) AS TotalDiff,
MAX(TotalCopy ) - MIN(TotalCopy ) AS CopyDiff,
MAX(TotalPrint ) - MIN(TotalPrint ) AS PrintDiff,
MAX(TotalScan ) - MIN(TotalScan ) AS ScanDiff,
MAX(TotalFax ) - MIN(TotalFax ) AS FaxDiff,
MAX(TotalMono ) - MIN(TotalMono ) AS MonoDiff,
MAX(TotalColour ) - MIN(TotalColour) AS ColourDiff
FROM
dbo.MeterReadings
WHERE
ScanDateTime > #StartDate
AND ScanDateTime < #EndDate
GROUP BY
DeviceID
This assumes that if you have reading on dates 1, 3, 5, 7, 9 and you want to report on 2 -> 8 then you want reading 7 - reading 3. I would have thought you wanted reading 7 - reading 1?
The above query should be fine for relatively small ranges. If you have Huge ranges of time, the MAX() - MIN() will be operating on large numbers of rows. This can then possibly be improved even further with the following (with correlated sub-queries to lookup just the two rows that you want).
As a side benefit, this also works even if the values can go down as well as up.
(I assume the existance of a Device table for a simpler query and faster performance.)
SELECT
Device.DeviceID,
start.ScanDateTime AS MN,
finish.ScanDateTime AS MX,
finish.DeviceTotal - start.DeviceTotal AS TotalDiff,
finish.TotalCopy - start.TotalCopy AS CopyDiff,
finish.TotalPrint - start.TotalPrint AS PrintDiff,
finish.TotalScan - start.TotalScan AS ScanDiff,
finish.TotalFax - start.TotalFax AS FaxDiff,
finish.TotalMono - start.TotalMono AS MonoDiff,
finish.TotalColour - start.TotalColour AS ColourDiff
FROM
dbo.Device AS device
INNER JOIN
dbo.MeterReadings AS start
ON start.DeviceID = device.DeviceID
AND start.ScanDateTime = (SELECT MIN(ScanDateTime)
FROM dbo.MeterReadings
WHERE DeviceID = device.DeviceID
AND ScanDateTime > #startDate
AND ScanDateTime < #endDate)
INNER JOIN
dbo.MeterReadings AS finish
ON finish.DeviceID = device.DeviceID
AND finish.ScanDateTime = (SELECT MAX(ScanDateTime)
FROM dbo.MeterReadings
WHERE DeviceID = device.DeviceID
AND ScanDateTime > #startDate
AND ScanDateTime < #endDate)
This can also be modified to pick up the start as being the first date on or before #startDate, if required.
EDIT: Modification to pick the start reading as being for the first date on or before #startDate.
SELECT
Device.DeviceID,
start.ScanDateTime AS MN,
finish.ScanDateTime AS MX,
COALESCE(finish.DeviceTotal, 0) - COALESCE(start.DeviceTotal, 0) AS TotalDiff,
COALESCE(finish.TotalCopy , 0) - COALESCE(start.TotalCopy , 0) AS CopyDiff,
COALESCE(finish.TotalPrint , 0) - COALESCE(start.TotalPrint , 0) AS PrintDiff,
COALESCE(finish.TotalScan , 0) - COALESCE(start.TotalScan , 0) AS ScanDiff,
COALESCE(finish.TotalFax , 0) - COALESCE(start.TotalFax , 0) AS FaxDiff,
COALESCE(finish.TotalMono , 0) - COALESCE(start.TotalMono , 0) AS MonoDiff,
COALESCE(finish.TotalColour, 0) - COALESCE(start.TotalColour, 0) AS ColourDiff
FROM
dbo.Device AS device
LEFT JOIN
dbo.MeterReadings AS start
ON start.DeviceID = device.DeviceID
AND start.ScanDateTime = (SELECT MAX(ScanDateTime)
FROM dbo.MeterReadings
WHERE DeviceID = device.DeviceID
AND ScanDateTime < #startDate)
LEFT JOIN
dbo.MeterReadings AS finish
ON finish.DeviceID = device.DeviceID
AND finish.ScanDateTime = (SELECT MAX(ScanDateTime)
FROM dbo.MeterReadings
WHERE DeviceID = device.DeviceID
AND ScanDateTime < #endDate)

Your query seems to compute a cross-product of all readings in a time range for each particular device. This works semantically because the MIN and MAX aggregates don't care about duplicates. But this is very slow. If you are comparing 100 dates with themselves you need to process 10,000 rows.
I suggest you calculate the MIN and MAX values for each metric/column over the entire time interval and then subtract them. That way you don't need to join and you need a single pass ofer the data. Like this:
select Diff = MAX(col) - MIN(col)
from readings
group by DeviceID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Calculate 20 business days, remove bank hols and reference to another query - sql

Related

Delete the records repeated by date, and keep the oldest

SQL - Adding conditions to SELECT

group by issue in sql

Need to calc start and end date from single effective date

SQL Table-value function optimisation / improvement

Categories

Resources