Any ideas on building a Sql Server (2008) query that will give me say the "date specific prices for an item based on the default or override where exists".
So a Default table might look like this - columns Price, StartDate, EndDate (yyyy-M-d):
Default: $10, 2010-1-1, 2010-2-1
The Override table like this:
Override: $12, 2010-1-5, 2010-1-8
And the query would return:
Result: $10, 2010-1-1, 2010-1-4
$12, 2010-1-5, 2010-1-8
$10, 2010-1-9, 2010-2-1
Probably I'd wrap this in a stored proc or function and call it for a specific date range.
Soemthing like:
SELECT
D.Price, ISNULL(O.StartDate, D.StartDate), ISNULL(O.EndDate, D.EndDate)
FROM
Default D
LEFT JOIN
Override O ON D.Price= O.Price
The proper design will be to have only one table. You don't want the override table at all. Just keep everyhing in the single table - constrained by the date range. The query becomes much simpler as well.
Your table structure becomes
CREATE TABLE Rates
(ID INT NOT NULL,
Rate Decimal NOT NULL,
FromDate NOT NULL,
ToDate NOT NULL,
CONSTRAINT PK_RATES (ID,FromDate,ToDate))
Then the query becomes
SELECT Rate FROM Rates WHERE ID = #ID AND FromDate = (SELECT MAX(FromDate) FROM Rates WHERE ID = #ID AND FromDate <= #Date) AND ToDate = (SELECT MIN(ToDate) FROM Rates WHERE ID = #ID AND ToDate >=#Date)
I think you're going to have to do something like this (assuming that you want to keep the override dates separate, and that you want to avoid anything procedural):
define and populate a utility table, listing each individual date which could be relevant to you
construct a SELECT that "marks" each date from this utility table as either i) belonging to the override dates or ii) belonging to the default dates
group the results of this SELECT by date and "mark"
join these results back to the respective price info
A bit late, but was looking for a solution for similar problem and did not find an answer so tried to cook one up.
One solution can be to try to adjust the first set of data to fit in the second set, and in the last step we will union the two sets, the adjusted data with the overlay data.
So we need to adjust the default set and that will require several steps of work.
Assumptions:-
The Default periods don’t overlap on their self.
Overlap periods don’t overlap on their self.
The overlap data must have gaps that are more than one day, to ensure that any adjustments will not affect another overlay period; we will not lose the overlay values associated as we will get them back in the last step.
For default periods that are partially overlapped (the overlap is larger or equal to a default period)
If a period start date overlaps with any overlay period we will change the start date to the end of the overlap period plus one day
Default: |<-------- P1 ------>|
Overlap: |<-----------O1------>|
=================================================================
Output: |<-----P1 ---->|
If a period end date overlaps with any overlay period we will change the end date to the start of the overlap period minus one day
Default: |<-------- P1 ------>|
Overlap: |<-----------O1------>|
===================================================================
Output: |<--- P1 ---->|
Remove any default period if it’s completely covered by an overlap period
Default: |<--- P1 --->|
Overlap: |<--------O1------>|
===================================================================
Output: nothing
For default periods that are broken by overlapped periods(the overlap is smaller than the periods)
Get the first part (from the beginning of the period to the first overlap period)
Get the intermediate parts, and that is from the end of any overlapped period to the start of the next overlapped period.
Get the last part( from the last overlapped to the end of the period)
Default: |<---------------------------- P1 ------------------------------>|
Overlap: |<---O1--->| |<---O2-->| |<--O3->|
===================================================================
Output: |<-P1->| |<-P1->| |<-P1->| |<-P1->|
Finally merge the data and get the result
Over lapping types that we are considering
Default: |<---- P1 --->|
Overlap: |<-----------O1------>|
===================================================================
Output: |<---- P1 --->||<-----------O1------>|
Lets build the T-SQL
With [System] as (
Select 1 [RowNum],cast('Jan 01,2017' as date) [StartDate],dateadd(day,-1,cast('Feb 01,2017' as date)) [EndDate],0500 [Value] union all
Select 2 [RowNum],cast('Feb 01,2017' as date) [StartDate],dateadd(day,-1,cast('Mar 01,2017' as date)) [EndDate],0700 [Value] union all
Select 3 [RowNum],cast('Mar 01,2017' as date) [StartDate],dateadd(day,-1,cast('Apr 01,2017' as date)) [EndDate],0900 [Value] union all
Select 4 [RowNum],cast('Apr 01,2017' as date) [StartDate],dateadd(day,-1,cast('May 01,2017' as date)) [EndDate],0700 [Value] union all
Select 5 [RowNum],cast('May 01,2017' as date) [StartDate],dateadd(day,-1,cast('Jun 01,2017' as date)) [EndDate],0900 [Value] union all
Select 6 [RowNum],cast('Jun 01,2017' as date) [StartDate],dateadd(day,-1,cast('Jul 01,9999' as date)) [EndDate],1500 [Value]
),Overrides as (
Select 1 [RowNum],cast('Feb 12,2017' as date) [StartDate],cast('Mar 25,2017' as date) [EndDate],1 [Value] union all
Select 2 [RowNum],cast('Mar 28,2017' as date) [StartDate],cast('May 15,2017' as date) [EndDate],2 [Value] union all
Select 3 [RowNum],cast('May 18,2017' as date) [StartDate],cast('May 20,2017' as date) [EndDate],3 [Value] union all
Select 4 [RowNum],cast('Jun 05,2017' as date) [StartDate],cast('Jun 08,2017' as date) [EndDate],4 [Value] union all
Select 5 [RowNum],cast('Jun 09,2017' as date) [StartDate],cast('Jun 16,2017' as date) [EndDate],5 [Value] union all
Select 6 [RowNum],cast('Jun 17,2017' as date) [StartDate],cast('Jun 22,2017' as date) [EndDate],6 [Value] union all
Select 7 [RowNum],cast('Jun 23,2017' as date) [StartDate],cast('Jun 27,2017' as date) [EndDate],7 [Value]
),PrepareOverridePeriods as (--if override periods have no gabs betwwen we need to merge them
Select p1.StartDate, p1.EndDate
from Overrides p1
left join Overrides p2 on p1.StartDate = DATEADD(day,1,p2.EndDate)
where p2.StartDate is null
union all
select p1.StartDate,p2.EndDate
from PrepareOverridePeriods p1
inner join Overrides p2 on p1.EndDate = DATEADD(day,-1,p2.StartDate)
),OverridePeriods as (
select ROW_NUMBER() over (order by StartDate) [RowNum],StartDate,MAX(EndDate) as EndDate
from PrepareOverridePeriods group by StartDate
),AdjustedPeriods as (
select s.RowNum,'Adj.' [type]
,isnull(dateadd(day,1,ShiftRight.EndDate),s.StartDate) [StartDate]
,isnull(dateadd(day,-1,ShiftLeft.StartDate),s.EndDate) [EndDate]
,s.Value
from System s
left outer join OverridePeriods ShiftRight on s.StartDate between ShiftRight.StartDate and ShiftRight.EndDate
left outer join OverridePeriods ShiftLeft on s.EndDate between ShiftLeft.StartDate and ShiftLeft.EndDate
left outer join OverridePeriods RemovePeriod on s.StartDate between RemovePeriod.StartDate and RemovePeriod.EndDate and s.EndDate between RemovePeriod.StartDate and RemovePeriod.EndDate
where RemovePeriod.StartDate is null
),SmallOverrides as ( --TODO: change SystemCalculated to AdjustSystemCalculatedPeriods
select ROW_NUMBER() over (partition by s.RowNum order by o.StartDate ) [RowNum],
o.RowNum [OverrideRowNum],o.StartDate [OverrideStartDate],o.EndDate [OverrideEndDate],s.Value [Value]
,s.RowNum [SystemRowNum],s.StartDate [SystemStartDate],s.EndDate [SystemEndDate]
from OverridePeriods o
inner join AdjustedPeriods s on o.StartDate between s.StartDate and s.EndDate and o.EndDate between s.StartDate and s.EndDate
)
--,FirstAndLastParts as (
--select [SystemRowNum],[type]
-- ,case when [type]='First' then min([SystemStartDate]) else dateadd(day,1,max(OverrideEndDate)) end [StartDate]
-- ,case when [type]='First' then dateadd(day,-1,min(OverrideStartDate)) else max([SystemEndDate]) end [EndDate]
-- ,min(Value) [Value]
-- from (select *,'First' [type] from SmallOverrides o union all
-- select *,'Last' [type] from SmallOverrides o) data
-- group by [SystemRowNum],[type]
--)
,FirstParts as (
select [SystemRowNum],'First' [type]
,min([SystemStartDate]) [StartDate]
,dateadd(day,-1,min(OverrideStartDate)) [EndDate]
,min(Value) [Value]
from SmallOverrides
group by [SystemRowNum]
),LastParts as (
select [SystemRowNum],'Last' [type]
,dateadd(day,1,max(OverrideEndDate)) [StartDate]
,max([SystemEndDate]) [EndDate]
,min(Value) [Value]
from SmallOverrides
group by [SystemRowNum]
),IntermediatParts as (
select s.SystemRowNum [RowNum],'Inter.' [type]
,dateadd(day,1,s.OverrideEndDate) [StartDate]
,dateadd(day,-1,e.OverrideStartDate) [EndDate]
,s.Value
from SmallOverrides s
left outer join SmallOverrides e on e.SystemRowNum=s.SystemRowNum and s.RowNum+1=e.RowNum
where e.RowNum is not null --remove the first and lasts
),AdjustedPeriodsFiltered as (--remove blocks that are broken to smaller pieces
select s.*
from AdjustedPeriods s
left outer join OverridePeriods o on o.StartDate between s.StartDate and s.EndDate and o.EndDate between s.StartDate and s.EndDate
where o.StartDate is null
),AllParts as (
select * from IntermediatParts union all --order by SystemRowNum,OverrideStartDate
select * from FirstParts union all
select * from LastParts union all
select * from AdjustedPeriodsFiltered
),Merged as (
select [RowNum],[type] [Source],StartDate,EndDate,Value,'System' [RecordType] from AllParts
union all
select [RowNum],'override' [Source],StartDate,EndDate,Value,'Override' [RecordType] from Overrides
)
select * from Merged order by StartDate
Can we adjust the data set in a different way, well yes, another approach is to get all the expected values from the start and end dates for the default periods and the overlay periods, then reconstruct a new set of periods, link it to the default values, merge it with the overlay, and we got it.
The same assumptions and step are taken as below:-
With [System] as (
Select 1 [RowNum],cast('Jan 01,2017' as date) [StartDate],dateadd(day,-1,cast('Feb 01,2017' as date)) [EndDate],0500 [Value] union all
Select 2 [RowNum],cast('Feb 01,2017' as date) [StartDate],dateadd(day,-1,cast('Mar 01,2017' as date)) [EndDate],0700 [Value] union all
Select 3 [RowNum],cast('Mar 01,2017' as date) [StartDate],dateadd(day,-1,cast('Apr 01,2017' as date)) [EndDate],0900 [Value] union all
Select 4 [RowNum],cast('Apr 01,2017' as date) [StartDate],dateadd(day,-1,cast('May 01,2017' as date)) [EndDate],0700 [Value] union all
Select 5 [RowNum],cast('May 01,2017' as date) [StartDate],dateadd(day,-1,cast('Jun 01,2017' as date)) [EndDate],0900 [Value] union all
Select 6 [RowNum],cast('Jun 01,2017' as date) [StartDate],dateadd(day,-1,cast('Jul 01,9999' as date)) [EndDate],1500 [Value]
),Overrides as (
Select 1 [RowNum],cast('Feb 12,2017' as date) [StartDate],cast('Mar 25,2017' as date) [EndDate],1 [Value] union all
Select 2 [RowNum],cast('Mar 28,2017' as date) [StartDate],cast('May 15,2017' as date) [EndDate],2 [Value] union all
Select 3 [RowNum],cast('May 18,2017' as date) [StartDate],cast('May 20,2017' as date) [EndDate],3 [Value] union all
Select 4 [RowNum],cast('Jun 05,2017' as date) [StartDate],cast('Jun 08,2017' as date) [EndDate],4 [Value] union all
Select 5 [RowNum],cast('Jun 09,2017' as date) [StartDate],cast('Jun 16,2017' as date) [EndDate],5 [Value] union all
Select 6 [RowNum],cast('Jun 17,2017' as date) [StartDate],cast('Jun 22,2017' as date) [EndDate],6 [Value] union all
Select 7 [RowNum],cast('Jun 23,2017' as date) [StartDate],cast('Jun 27,2017' as date) [EndDate],7 [Value]
),PrepareOverridePeriods as (--if override periods have no gabs between we need to merge them
Select p1.StartDate, p1.EndDate
from Overrides p1
left join Overrides p2 on p1.StartDate = DATEADD(day,1,p2.EndDate)
where p2.StartDate is null
union all
select p1.StartDate,p2.EndDate
from PrepareOverridePeriods p1
inner join Overrides p2 on p1.EndDate = DATEADD(day,-1,p2.StartDate)
),OverridePeriods as (
select ROW_NUMBER() over (order by StartDate) [RowNum],StartDate,MAX(EndDate) as EndDate
from PrepareOverridePeriods group by StartDate
)
,AllDates as (
select ROW_NUMBER() over (order by [Date]) [RowNum],data.Date from (
select dateadd(day,-1,OverridePeriods.StartDate) [Date] from OverridePeriods union all
select dateadd(day,+1,OverridePeriods.EndDate) [Date] from OverridePeriods union all
select StartDate [Date] from [System] union all
select EndDate [Date] from [System] ) as data
)
,NewPeriods as (
select sy.RowNum, s.[Date] [StartDate],n.[Date] [EndDate] ,sy.Value
from AllDates s
left outer join AllDates n on n.RowNum=s.RowNum+1
left outer join OverridePeriods o on s.[Date] between o.StartDate and o.EndDate and n.[Date] between o.StartDate and o.EndDate
left outer join [System] sy on s.[Date] between sy.StartDate and sy.EndDate
where
s.RowNum % 2 =1 and o.StartDate is null--group it by 2 and remove overriden areas
)
,Merged2 as (
select [RowNum], StartDate,EndDate,Value,'System' [RecordType] from NewPeriods
union all
select [RowNum], StartDate,EndDate,Value,'Override' [RecordType] from Overrides
)
select * from Merged2 order by StartDate
Im sure there may be another way to achieve the result required with some recursive approach, but for now this works for me.
For a last step we can try to merge the results if the value is the same, but I dont think that was requested.
Related
I have a table which has the following columns: DeskID *, ProductID *, Date *, Amount (where the columns marked with * make the primary key). The products in use vary over time, as represented in the image below.
Table format on the left, and a (hopefully) intuitive representation of the data on the right for one desk
The objective is to have the sum of the latest amounts of products by desk and date, including products which are no longer in use, over a date range.
e.g. using the data above the desired table is:
So on the 1st Jan, the sum is 1 of Product A
On the 2nd Jan, the sum is 2 of A and 5 of B, so 7
On the 4th Jan, the sum is 1 of A (out of use, so take the value from the 3rd), 5 of B, and 2 of C, so 8 in total
etc.
I have tried using a partition on the desk and product ordered by date to get the most recent value and turned the following code into a function (Function1 below) with #date Date parameter
select #date 'Date', t.DeskID, SUM(t.Amount) 'Sum' from (
select #date 'Date', t.DeskID, t.ProductID, t.Amount
, row_number() over (partition by t.DeskID, t.ProductID order by t.Date desc) as roworder
from Table1 t
where 1 = 1
and t.Date <= #date
) t
where t.roworder = 1
group by t.DeskID
And then using a utility calendar table and cross apply to get the required values over a time range, as below
select * from Calendar c
cross apply Function1(c.CalendarDate)
where c.CalendarDate >= '20190101' and c.CalendarDate <= '20191009'
This has the expected results, but is far too slow. Currently each desk uses around 50 products, and the products roll every month, so after just 5 years each desk has a history of ~3000 products, which causes the whole thing to grind to a halt. (Roughly 30 seconds for a range of a single month)
Is there a better approach?
Change your function to the following should be faster:
select #date 'Date', t.DeskID, SUM(t.Amount) 'Sum'
FROM (SELECT m.DeskID, m.ProductID, MAX(m.[Date) AS MaxDate
FROM Table1 m
where m.[Date] <= #date) d
INNER JOIN Table1 t
ON d.DeskID=t.DeskID
AND d.ProductID=t.ProductID
and t.[Date] = d.MaxDate
group by t.DeskID
The performance of TVF usually suffers. The following removes the TVF completely:
-- DROP TABLE Table1;
CREATE TABLE Table1 (DeskID int not null, ProductID nvarchar(32) not null, [Date] Date not null, Amount int not null, PRIMARY KEY ([Date],DeskID,ProductID));
INSERT Table1(DeskID,ProductID,[Date],Amount)
VALUES (1,'A','2019-01-01',1),(1,'A','2019-01-02',2),(1,'B','2019-01-02',5),(1,'A','2019-01-03',1)
,(1,'B','2019-01-03',4),(1,'C','2019-01-03',3),(1,'B','2019-01-04',5),(1,'C','2019-01-04',2),(1,'C','2019-01-05',2)
GO
DECLARE #StartDate date=N'2019-01-01';
DECLARE #EndDate date=N'2019-01-05';
;WITH cte_p
AS
(
SELECT DISTINCT DeskID,ProductID
FROM Table1
WHERE [Date] <= #EndDate
),
cte_a
AS
(
SELECT #StartDate AS [Date], p.DeskID, p.ProductID, ISNULL(a.Amount,0) AS Amount
FROM (
SELECT t.DeskID, t.ProductID
, MAX(t.Date) AS FirstDate
FROM Table1 t
WHERE t.Date <= #StartDate
GROUP BY t.DeskID, t.ProductID) f
INNER JOIN Table1 a
ON f.DeskID=a.DeskID
AND f.ProductID=a.ProductID
AND f.[FirstDate]=a.[Date]
RIGHT JOIN cte_p p
ON p.DeskID=a.DeskID
AND p.ProductID=a.ProductID
UNION ALL
SELECT DATEADD(DAY,1,a.[Date]) AS [Date], t.DeskID, t.ProductID, t.Amount
FROM Table1 t
INNER JOIN cte_a a
ON t.DeskID=a.DeskID
AND t.ProductID=a.ProductID
AND t.[Date] > a.[Date]
AND t.[Date] <= DATEADD(DAY,1,a.[Date])
WHERE a.[Date]<#EndDate
UNION ALL
SELECT DATEADD(DAY,1,a.[Date]) AS [Date], a.DeskID, a.ProductID, a.Amount
FROM cte_a a
WHERE NOT EXISTS(SELECT 1 FROM Table1 t
WHERE t.DeskID=a.DeskID
AND t.ProductID=a.ProductID
AND t.[Date] > a.[Date]
AND t.[Date] <= DATEADD(DAY,1,a.[Date]))
AND a.[Date]<#EndDate
)
SELECT [Date], DeskID, SUM(Amount)
FROM cte_a
GROUP BY [Date], DeskID;
Initial Question
Given the following dataset paired with a dates table:
MembershipId | ValidFromDate | ValidToDate
==========================================
0001 | 1997-01-01 | 2006-05-09
0002 | 1997-01-01 | 2017-05-12
0003 | 2005-06-02 | 2009-02-07
How many Memberships were open on any given day or timeseries of days?
Initial Answer
Following this question being asked here, this answer provided the necessary functionality:
select d.[Date]
,count(m.MembershipID) as MembershipCount
from DIM.[Date] as d
left join Memberships as m
on(d.[Date] between m.ValidFromDateKey and m.ValidToDateKey)
where d.CalendarYear = 2016
group by d.[Date]
order by d.[Date];
though a commenter remarked that There are other approaches when the non-equijoin takes too long.
Followup
As such, what would the equijoin only logic look like to replicate the output of the query above?
Progress So Far
From the answers provided so far I have come up with the below, which outperforms on the hardware I am working with across 3.2 million Membership records:
declare #s date = '20160101';
declare #e date = getdate();
with s as
(
select d.[Date] as d
,count(s.MembershipID) as s
from dbo.Dates as d
join dbo.Memberships as s
on d.[Date] = s.ValidFromDateKey
group by d.[Date]
)
,e as
(
select d.[Date] as d
,count(e.MembershipID) as e
from dbo.Dates as d
join dbo.Memberships as e
on d.[Date] = e.ValidToDateKey
group by d.[Date]
),c as
(
select isnull(s.d,e.d) as d
,sum(isnull(s.s,0) - isnull(e.e,0)) over (order by isnull(s.d,e.d)) as c
from s
full join e
on s.d = e.d
)
select d.[Date]
,c.c
from dbo.Dates as d
left join c
on d.[Date] = c.d
where d.[Date] between #s and #e
order by d.[Date]
;
Following on from that, to split this aggregate into constituent groups per day I have the following, which is also performing well:
declare #s date = '20160101';
declare #e date = getdate();
with s as
(
select d.[Date] as d
,s.MembershipGrouping as g
,count(s.MembershipID) as s
from dbo.Dates as d
join dbo.Memberships as s
on d.[Date] = s.ValidFromDateKey
group by d.[Date]
,s.MembershipGrouping
)
,e as
(
select d.[Date] as d
,e..MembershipGrouping as g
,count(e.MembershipID) as e
from dbo.Dates as d
join dbo.Memberships as e
on d.[Date] = e.ValidToDateKey
group by d.[Date]
,e.MembershipGrouping
),c as
(
select isnull(s.d,e.d) as d
,isnull(s.g,e.g) as g
,sum(isnull(s.s,0) - isnull(e.e,0)) over (partition by isnull(s.g,e.g) order by isnull(s.d,e.d)) as c
from s
full join e
on s.d = e.d
and s.g = e.g
)
select d.[Date]
,c.g
,c.c
from dbo.Dates as d
left join c
on d.[Date] = c.d
where d.[Date] between #s and #e
order by d.[Date]
,c.g
;
Can anyone improve on the above?
If most of your membership validity intervals are longer than few days, have a look at an answer by Martin Smith. That approach is likely to be faster.
When you take calendar table (DIM.[Date]) and left join it with Memberships, you may end up scanning the Memberships table for each date of the range. Even if there is an index on (ValidFromDate, ValidToDate), it may not be super useful.
It is easy to turn it around.
Scan the Memberships table only once and for each membership find those dates that are valid using CROSS APPLY.
Sample data
DECLARE #T TABLE (MembershipId int, ValidFromDate date, ValidToDate date);
INSERT INTO #T VALUES
(1, '1997-01-01', '2006-05-09'),
(2, '1997-01-01', '2017-05-12'),
(3, '2005-06-02', '2009-02-07');
DECLARE #RangeFrom date = '2006-01-01';
DECLARE #RangeTo date = '2006-12-31';
Query 1
SELECT
CA.dt
,COUNT(*) AS MembershipCount
FROM
#T AS Memberships
CROSS APPLY
(
SELECT dbo.Calendar.dt
FROM dbo.Calendar
WHERE
dbo.Calendar.dt >= Memberships.ValidFromDate
AND dbo.Calendar.dt <= Memberships.ValidToDate
AND dbo.Calendar.dt >= #RangeFrom
AND dbo.Calendar.dt <= #RangeTo
) AS CA
GROUP BY
CA.dt
ORDER BY
CA.dt
OPTION(RECOMPILE);
OPTION(RECOMPILE) is not really needed, I include it in all queries when I compare execution plans to be sure that I'm getting the latest plan when I play with the queries.
When I looked at the plan of this query I saw that the seek in the Calendar.dt table was using only ValidFromDate and ValidToDate, the #RangeFrom and #RangeTo were pushed to the residue predicate. It is not ideal. The optimiser is not smart enough to calculate maximum of two dates (ValidFromDate and #RangeFrom) and use that date as a starting point of the seek.
It is easy to help the optimiser:
Query 2
SELECT
CA.dt
,COUNT(*) AS MembershipCount
FROM
#T AS Memberships
CROSS APPLY
(
SELECT dbo.Calendar.dt
FROM dbo.Calendar
WHERE
dbo.Calendar.dt >=
CASE WHEN Memberships.ValidFromDate > #RangeFrom
THEN Memberships.ValidFromDate
ELSE #RangeFrom END
AND dbo.Calendar.dt <=
CASE WHEN Memberships.ValidToDate < #RangeTo
THEN Memberships.ValidToDate
ELSE #RangeTo END
) AS CA
GROUP BY
CA.dt
ORDER BY
CA.dt
OPTION(RECOMPILE)
;
In this query the seek is optimal and doesn't read dates that may be discarded later.
Finally, you may not need to scan the whole Memberships table.
We need only those rows where the given range of dates intersects with the valid range of the membership.
Query 3
SELECT
CA.dt
,COUNT(*) AS MembershipCount
FROM
#T AS Memberships
CROSS APPLY
(
SELECT dbo.Calendar.dt
FROM dbo.Calendar
WHERE
dbo.Calendar.dt >=
CASE WHEN Memberships.ValidFromDate > #RangeFrom
THEN Memberships.ValidFromDate
ELSE #RangeFrom END
AND dbo.Calendar.dt <=
CASE WHEN Memberships.ValidToDate < #RangeTo
THEN Memberships.ValidToDate
ELSE #RangeTo END
) AS CA
WHERE
Memberships.ValidToDate >= #RangeFrom
AND Memberships.ValidFromDate <= #RangeTo
GROUP BY
CA.dt
ORDER BY
CA.dt
OPTION(RECOMPILE)
;
Two intervals [a1;a2] and [b1;b2] intersect when
a2 >= b1 and a1 <= b2
These queries assume that Calendar table has an index on dt.
You should try and see what indexes are better for the Memberships table.
For the last query, if the table is rather large, most likely two separate indexes on ValidFromDate and on ValidToDate would be better than one index on (ValidFromDate, ValidToDate).
You should try different queries and measure their performance on the real hardware with real data. Performance may depend on the data distribution, how many memberships there are, what are their valid dates, how wide or narrow is the given range, etc.
I recommend to use a great tool called SQL Sentry Plan Explorer to analyse and compare execution plans. It is free. It shows a lot of useful stats, such as execution time and number of reads for each query. The screenshots above are from this tool.
On the assumption your date dimension contains all dates contained in all membership periods you can use something like the following.
The join is an equi join so can use hash join or merge join not just nested loops (which will execute the inside sub tree once for each outer row).
Assuming index on (ValidToDate) include(ValidFromDate) or reverse this can use a single seek against Memberships and a single scan of the date dimension. The below has an elapsed time of less than a second for me to return the results for a year against a table with 3.2 million members and general active membership of 1.4 million (script)
DECLARE #StartDate DATE = '2016-01-01',
#EndDate DATE = '2016-12-31';
WITH MD
AS (SELECT Date,
SUM(Adj) AS MemberDelta
FROM Memberships
CROSS APPLY (VALUES ( ValidFromDate, +1),
--Membership count decremented day after the ValidToDate
(DATEADD(DAY, 1, ValidToDate), -1) ) V(Date, Adj)
WHERE
--Members already expired before the time range of interest can be ignored
ValidToDate >= #StartDate
AND
--Members whose membership starts after the time range of interest can be ignored
ValidFromDate <= #EndDate
GROUP BY Date),
MC
AS (SELECT DD.DateKey,
SUM(MemberDelta) OVER (ORDER BY DD.DateKey ROWS UNBOUNDED PRECEDING) AS CountOfNonIgnoredMembers
FROM DIM_DATE DD
LEFT JOIN MD
ON MD.Date = DD.DateKey)
SELECT DateKey,
CountOfNonIgnoredMembers AS MembershipCount
FROM MC
WHERE DateKey BETWEEN #StartDate AND #EndDate
ORDER BY DateKey
Demo (uses extended period as the calendar year of 2016 isn't very interesting with the example data)
One approach is to first use an INNER JOIN to find the set of matches and COUNT() to project MemberCount GROUPed BY DateKey, then UNION ALL with the same set of dates, with a 0 on that projection for the count of members for each date. The last step is to SUM() the MemberCount of this union, and GROUP BY DateKey. As requested, this avoids LEFT JOIN and NOT EXISTS. As another member pointed out, this is not an equi-join, because we need to use a range, but I think it does what you intend.
This will serve up 1 year's worth of data with around 100k logical reads. On an ordinary laptop with a spinning disk, from cold cache, it serves 1 month in under a second (with correct counts).
Here is an example that creates 3.3 million rows of random duration. The query at the bottom returns one month's worth of data.
--Stay quiet for a moment
SET NOCOUNT ON
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
--Clean up if re-running
DROP TABLE IF EXISTS DIM_DATE
DROP TABLE IF EXISTS FACT_MEMBER
--Date dimension
CREATE TABLE DIM_DATE
(
DateKey DATE NOT NULL
)
--Membership fact
CREATE TABLE FACT_MEMBER
(
MembershipId INT NOT NULL
, ValidFromDateKey DATE NOT NULL
, ValidToDateKey DATE NOT NULL
)
--Populate Date dimension from 2001 through end of 2018
DECLARE #startDate DATE = '2001-01-01'
DECLARE #endDate DATE = '2018-12-31'
;WITH CTE_DATE AS
(
SELECT #startDate AS DateKey
UNION ALL
SELECT
DATEADD(DAY, 1, DateKey)
FROM
CTE_DATE AS D
WHERE
D.DateKey < #endDate
)
INSERT INTO
DIM_DATE
(
DateKey
)
SELECT
D.DateKey
FROM
CTE_DATE AS D
OPTION (MAXRECURSION 32767)
--Populate Membership fact with members having a random membership length from 1 to 36 months
;WITH CTE_DATE AS
(
SELECT #startDate AS DateKey
UNION ALL
SELECT
DATEADD(DAY, 1, DateKey)
FROM
CTE_DATE AS D
WHERE
D.DateKey < #endDate
)
,CTE_MEMBER AS
(
SELECT 1 AS MembershipId
UNION ALL
SELECT MembershipId + 1 FROM CTE_MEMBER WHERE MembershipId < 500
)
,
CTE_MEMBERSHIP
AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY NEWID()) AS MembershipId
, D.DateKey AS ValidFromDateKey
FROM
CTE_DATE AS D
CROSS JOIN CTE_MEMBER AS M
)
INSERT INTO
FACT_MEMBER
(
MembershipId
, ValidFromDateKey
, ValidToDateKey
)
SELECT
M.MembershipId
, M.ValidFromDateKey
, DATEADD(MONTH, FLOOR(RAND(CHECKSUM(NEWID())) * (36-1)+1), M.ValidFromDateKey) AS ValidToDateKey
FROM
CTE_MEMBERSHIP AS M
OPTION (MAXRECURSION 32767)
--Add clustered Primary Key to Date dimension
ALTER TABLE DIM_DATE ADD CONSTRAINT PK_DATE PRIMARY KEY CLUSTERED
(
DateKey ASC
)
--Index
--(Optimize in your spare time)
DROP INDEX IF EXISTS SK_FACT_MEMBER ON FACT_MEMBER
CREATE CLUSTERED INDEX SK_FACT_MEMBER ON FACT_MEMBER
(
ValidFromDateKey ASC
, ValidToDateKey ASC
, MembershipId ASC
)
RETURN
--Start test
--Emit stats
SET STATISTICS IO ON
SET STATISTICS TIME ON
--Establish range of dates
DECLARE
#rangeStartDate DATE = '2010-01-01'
, #rangeEndDate DATE = '2010-01-31'
--UNION the count of members for a specific date range with the "zero" set for the same range, and SUM() the counts
;WITH CTE_MEMBER
AS
(
SELECT
D.DateKey
, COUNT(*) AS MembershipCount
FROM
DIM_DATE AS D
INNER JOIN FACT_MEMBER AS M ON
M.ValidFromDateKey <= #rangeEndDate
AND M.ValidToDateKey >= #rangeStartDate
AND D.DateKey BETWEEN M.ValidFromDateKey AND M.ValidToDateKey
WHERE
D.DateKey BETWEEN #rangeStartDate AND #rangeEndDate
GROUP BY
D.DateKey
UNION ALL
SELECT
D.DateKey
, 0 AS MembershipCount
FROM
DIM_DATE AS D
WHERE
D.DateKey BETWEEN #rangeStartDate AND #rangeEndDate
)
SELECT
M.DateKey
, SUM(M.MembershipCount) AS MembershipCount
FROM
CTE_MEMBER AS M
GROUP BY
M.DateKey
ORDER BY
M.DateKey ASC
OPTION (RECOMPILE, MAXDOP 1)
Here's how I'd solve this problem with equijoin:
--data generation
declare #Membership table (MembershipId varchar(10), ValidFromDate date, ValidToDate date)
insert into #Membership values
('0001', '1997-01-01', '2006-05-09'),
('0002', '1997-01-01', '2017-05-12'),
('0003', '2005-06-02', '2009-02-07')
declare #startDate date, #endDate date
select #startDate = MIN(ValidFromDate), #endDate = max(ValidToDate) from #Membership
--in order to use equijoin I need all days between min date and max date from Membership table (both columns)
;with cte as (
select #startDate [date]
union all
select DATEADD(day, 1, [date]) from cte
where [date] < #endDate
)
--in this query, we will assign value to each day:
--one, if project started on that day
--minus one, if project ended on that day
--then, it's enough to (cumulative) sum all this values to get how many projects were ongoing on particular day
select [date],
sum(case when [DATE] = ValidFromDate then 1 else 0 end +
case when [DATE] = ValidToDate then -1 else 0 end)
over (order by [date] rows between unbounded preceding and current row)
from cte [c]
left join #Membership [m]
on [c].[date] = [m].ValidFromDate or [c].[date] = [m].ValidToDate
option (maxrecursion 0)
Here's another solution:
--data generation
declare #Membership table (MembershipId varchar(10), ValidFromDate date, ValidToDate date)
insert into #Membership values
('0001', '1997-01-01', '2006-05-09'),
('0002', '1997-01-01', '2017-05-12'),
('0003', '2005-06-02', '2009-02-07')
;with cte as (
select CAST('2016-01-01' as date) [date]
union all
select DATEADD(day, 1, [date]) from cte
where [date] < '2016-12-31'
)
select [date],
(select COUNT(*) from #Membership where ValidFromDate < [date]) -
(select COUNT(*) from #Membership where ValidToDate < [date]) [ongoing]
from cte
option (maxrecursion 0)
Pay attention, I think #PittsburghDBA is right when it says that current query return wrong result.
The last day of membership is not counted and so final sum is lower than it should be.
I have corrected it in this version.
This should improve a bit your actual progress:
declare #s date = '20160101';
declare #e date = getdate();
with
x as (
select d, sum(c) c
from (
select ValidFromDateKey d, count(MembershipID) c
from Memberships
group by ValidFromDateKey
union all
-- dateadd needed to count last day of membership too!!
select dateadd(dd, 1, ValidToDateKey) d, -count(MembershipID) c
from Memberships
group by ValidToDateKey
)x
group by d
),
c as
(
select d, sum(x.c) over (order by d) as c
from x
)
select d.day, c cnt
from calendar d
left join c on d.day = c.d
where d.day between #s and #e
order by d.day;
First of all, your query yields '1' as MembershipCount even if no active membership exists for the given date.
You should return SUM(CASE WHEN m.MembershipID IS NOT NULL THEN 1 ELSE 0 END) AS MembershipCount.
For optimal performance create an index on Memberships(ValidFromDateKey, ValidToDateKey, MembershipId) and another on DIM.[Date](CalendarYear, DateKey).
With that done, the optimal query shall be:
DECLARE #CalendarYear INT = 2000
SELECT dim.DateKey, SUM(CASE WHEN con.MembershipID IS NOT NULL THEN 1 ELSE 0 END) AS MembershipCount
FROM
DIM.[Date] dim
LEFT OUTER JOIN (
SELECT ValidFromDateKey, ValidToDateKey, MembershipID
FROM Memberships
WHERE
ValidFromDateKey <= CONVERT(DATETIME, CONVERT(VARCHAR, #CalendarYear) + '1231')
AND ValidToDateKey >= CONVERT(DATETIME, CONVERT(VARCHAR, #CalendarYear) + '0101')
) con
ON dim.DateKey BETWEEN con.ValidFromDateKey AND con.ValidToDateKey
WHERE dim.CalendarYear = #CalendarYear
GROUP BY dim.DateKey
ORDER BY dim.DateKey
Now, for your last question, what would be the equijoin equivalent query.
There is NO WAY you can rewrite this as a non-equijoin!
Equijoin doesn't imply using join sintax. Equijoin implies using an equals predicate, whatever the sintax.
Your query yields a range comparison, hence equals doesn't apply: a between or similar is required.
I have the following query which takes in the opps and calculates the duration, and revenue for each month. However, for some locations, where there is no data, it is missing some months. Essentially, I would like all months to appear for each of the location and record type. I tried a left outer join on the calendar but that didn't seem to work either.
Here is the query:
;With DateSequence( [Date] ) as
(
Select CAST(#fromdate as DATE) as [Date]
union all
Select CAST(dateadd(day, 1, [Date]) as Date)
from DateSequence
where Date < #todate
)
INSERT INTO CalendarTemp (Date, Day, DayOfWeek, DayOfYear, WeekOfYear, Month, MonthName, Year)
Select
[Date] as [Date],
DATEPART(DAY,[Date]) as [Day],
DATENAME(dw, [Date]) as [DayOfWeek],
DATEPART(DAYOFYEAR,[Date]) as [DayOfYear],
DATEPART(WEEK,[Date]) as [WeekOfYear],
DATEPART(MONTH,[Date]) as [Month],
DATENAME(MONTH,[Date]) as [MonthName],
DATEPART(YEAR,[Date]) as [Year]
from DateSequence option (MaxRecursion 10000)
;
DELETE FROM CalendarTemp WHERE DayOfWeek IN ('Saturday', 'Sunday');
SELECT
AccountId
,AccountName
,Office
,Stage = (CASE WHEN StageName = 'Closed Won' THEN 'Closed Won'
ELSE 'Open'
END)
,Id
,Name
,RecordType= (CASE
WHEN recordtypeid = 'LAS1' THEN 'S'
END)
,Start_Date
,End_Date
,Probability
,Estimated_Revenue_Won = ISNULL(Amount, 0)
,ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Name) AS Row
--,Revenue_Per_Day = CAST(ISNULL(Amount/NULLIF(dbo.CalculateNumberOFWorkDays(Start_Date, End_Date),0),0) as money)
,YEAR(c.Date) as year
,MONTH(c.Date) as Month
,c.MonthName
--, ISNULL(CAST(Sum((Amount)/NULLIF(dbo.CalculateNumberOFWorkDays(Start_Date, End_Date),0)) as money),0) As RevenuePerMonth
FROM SF_Extracted_Opps o
LEFT OUTER JOIN CalendarTemp c on o.Start_Date <= c.Date AND o.End_Date >= c.Date
WHERE
Start_Date <= #todate AND End_Date >= #fromdate
AND Office IN (#Location)
AND recordtypeid IN ('LAS1')
GROUP BY
AccountId
,AccountName
,Office
,(CASE WHEN StageName = 'Closed Won' THEN 'Closed Won'
ELSE 'Open'
END)
,Id
,Name
,(CASE
WHEN recordtypeid = 'LAS1' THEN 'S'
END)
,Amount
--, CAST(ISNULL(Amount/NULLIF(dbo.CalculateNumberOFWorkDays(Start_Date, End_Date),0),0) as money)
,Start_Date
,End_Date
,Probability
,YEAR(c.Date)
,Month(c.Date)
,c.MonthName
,dbo.CalculateNumberOFWorkDays(Start_Date, End_Date)
ORDER BY Office
, (CASE
WHEN recordtypeid = 'LAS1' THEN 'S'
END)
,(CASE WHEN StageName = 'Closed Won' THEN 'Closed Won'
ELSE 'Open'
END)
, [Start_Date], Month(c.Date), AccountName, Row;
I tried adding another left outer join to this and using this a sub query and the join essentially on the calendar based on the year and month, but that did not seem to work either. Suggestions would be extremely appreciated.
--Date Calendar for each location:
;With DateSequence( [Date], Locatio) as
(
Select CAST(#fromdate as DATE) as [Date], oo.Office as location
union all
Select CAST(dateadd(day, 1, [Date]) as Date), oo.Office as location
from DateSequence dts
join Opportunity_offices oo on 1 = 1
where Date < #todate
)
--select result
INSERT INTO CalendarTemp (Location,Date, Day, DayOfWeek, DayOfYear, WeekOfYear, Month, MonthName, Year)
Select
location,
[Date] as [Date],
DATEPART(DAY,[Date]) as [Day],
DATENAME(dw, [Date]) as [DayOfWeek],
DATEPART(DAYOFYEAR,[Date]) as [DayOfYear],
DATEPART(WEEK,[Date]) as [WeekOfYear],
DATEPART(MONTH,[Date]) as [Month],
DATENAME(MONTH,[Date]) as [MonthName],
DATEPART(YEAR,[Date]) as [Year]
from DateSequence option (MaxRecursion 10000)
;
you have your LEFT JOIN backwards if you want all records from CalendarTemp and only those that match from SF_Extracted_Opps then you the CalendarTemp should be the table on the LEFT. You can however switch LEFT JOIN to RIGHT JOIN and it should be fixed. The other issue will be your WHERE statement is using columns from your SF_Extracted_Opps table which will just make that an INNER JOIN again.
here is one way to fix.
SELECT
.....
FROM
CalendarTemp c
LEFT JOIN SF_Extracted_Opps o
ON o.Start_Date <= c.Date AND o.End_Date >= c.Date
AND o.Start_Date <= #todate AND End_Date >= #fromdate
AND o.Office IN (#Location)
AND o.recordtypeid IN ('LAS1')
The other issue you might run into is because you remove weekends from your CalendarTemp Table not all dates are represented I would test with the weekends still in and out and see if you get different results.
this line:
AND o.Start_Date <= #todate AND End_Date >= #fromdate
should not be needed either because you are already limiting the dates from the line before and values in your CalendarTempTable
A note about your CalendarDate table you don't have to go back and delete those records simply add the day of week as a WHERE statement on the select that populates that table.
Edit for All Offices you can use a cross join of your offices table with your CalendarTemp table to do this do it in your final query not the cte that builds the calendar. The problem with doing it in the CTE calendar definition is that it is recursive so you would have to do it in both the anchor and the recursive member definition.
SELECT
.....
FROM
CalendarTemp c
CROSS JOIN Opportunity_offices oo
LEFT JOIN SF_Extracted_Opps o
ON o.Start_Date <= c.Date AND o.End_Date >= c.Date
AND o.Start_Date <= #todate AND End_Date >= #fromdate
AND oo.office = o.Office
AND o.recordtypeid IN ('LAS1')
I'm trying to find the average qty on hand of my inventory over a date range from parameter #StartDate by averaging the ending qty from each day. I have three tables: a part table, a part transaction table, and a warehouse table, mocked up below.
PartNum | PartNum TranDate TranQty | PartNum OnHandQty
---------- | ------------------------------------ | --------------------
P1 | P1 6/28/2016 5 | P1 30
P2 | P1 6/26/2016 3 | P2 2
| P1 6/26/2016 -1 |
| P1 6/15/2016 2 |
| P2 6/15/2016 1 |
If today is 6/30/2016 and #StartDate = 6/1/2016, I expect a result like:
PartNum AverageOnHand
------------------------
P1 22.9
P2 1.5
However, I don't know what function would best allow me to get to an appropriate weighted sum which I could divide by the difference in dates. Is there a SumProduct function or similar that I can use here? My code, so far, is below:
select
[Part].[PartNum] as [Part_PartNum],
(max(PartWhse.OnHandQty)*datediff(day,max(PartTran.TranDate),Constants.Today)) as [Calculated_WeightedSum],
(WeightedSum/DATEDIFF(day, #StartDate, Constants.Today)) as [Calculated_AverageOnHand]
from Erp.Part as Part
right outer join Erp.PartTran as PartTran on
Part.PartNum = PartTran.PartNum
inner join Erp.PartWhse as PartWhse on
Part.PartNum = PartWhse.PartNum
group by [Part].[PartNum]
Here is a sql-server 2012 + method that is interesting.
;WITH cte AS (
SELECT
p.PartNum
,CAST(t.TranDate AS DATE) AS TranDate
,i.OnHandQty
--,SUM(SUM(t.TranQty)) OVER (PARTITION BY p.PartNum ORDER BY CAST(t.TranDate AS DATE) DESC) AS InventoryChange
,i.OnHandQty - SUM(SUM(t.TranQty)) OVER (PARTITION BY p.PartNum ORDER BY CAST(t.TranDate AS DATE) DESC) AS InventoryOnDate
,DATEDIFF(day,
CAST(ISNULL(LAG(MAX(TranDate)) OVER (PARTITION BY p.PartNum ORDER BY CAST(t.TranDate AS DATE) ASC),#StartDate) AS DATE)
,CAST(t.TranDate AS DATE)
) AS DaysAtInventory
FROM
#Parts p
LEFT JOIN #Transact t
ON p.PartNum = t.PartNum
LEFT JOIN #Inventory i
ON p.PartNum = i.PartNum
GROUP BY
p.PartNum
,CAST(t.TranDate AS DATE)
,i.OnHandQty
)
SELECT
PartNum
,(SUM(ISNULL(DaysAtInventory,0) * ISNULL(InventoryOnDate,0))
+ ((DATEDIFF(day,MAX(TranDate),CAST(GETDATE() AS DATE)) + 1) * ISNULL(MAX(OnHandQty),0)))
/((DATEDIFF(day,CAST(#StartDate AS DATE),CAST(GETDATE() AS DATE)) + 1) * 1.00) AS AvgDailyInventory
FROM
cte
GROUP BY
PartNum
This one actually gave me the 22.9 but 1.53333 the 333 gets introduced because 1 day has to get put somewhere so I stuck it as the current inventory.
Here is a previous method I answered with and this one it is a little easier to conceptualize the data..... I would be curious about performance differences between the 2 methods.
Some of these steps can be combined to be a little more concise but this works (although I got 22.6 not .1 or .9....) I rounded everything to a whole date while doing this so that you don't have to worry about beginning and end of day.
DECLARE #StartDate DATETIME = '6/1/2016'
;WITH cteDates AS (
SELECT #StartDate AS d
UNION ALL
SELECT
d + 1 AS d
FROM
cteDates c
WHERE c.d + 1 <= CAST(CAST(GETDATE() AS DATE) AS DATETIME)
--get dates to today beginning of day
)
, ctePartsDaysCross AS (
SELECT
d.d
,p.PartNum
,ISNULL(i.OnHandQty,0) AS OnHandQty
FROM
cteDates d
CROSS JOIN #Parts p
LEFT JOIN #Inventory i
ON p.PartNum = i.PartNum
)
, cteTransactsQuantityByDate AS (
SELECT
CAST(t.TranDate AS DATE) as d
,t.PartNum
,TranQty = SUM(t.TranQty)
FROM
#Transact t
GROUP BY
CAST(t.TranDate AS DATE)
,t.PartNum
)
,cteDailyInventory AS (
SELECT
c.d
,c.PartNum
,c.OnHandQty - SUM(ISNULL(t.TranQty,0)) OVER (PARTITION BY c.PartNum ORDER BY c.d DESC) AS DailyOnHand
FROM
ctePartsDaysCross c
LEFT JOIN cteTransactsQuantityByDate t
ON c.d = t.d
AND c.PartNum = t.PartNum
)
SELECT
PartNum
,AVG(CAST(DailyOnHand AS DECIMAL(6,3)))
FROM
cteDailyInventory
GROUP BY
PartNum
Here is the test data:
IF OBJECT_ID('tempdb..#Parts') IS NOT NULL
BEGIN
DROP TABLE #Parts
END
IF OBJECT_ID('tempdb..#Transact') IS NOT NULL
BEGIN
DROP TABLE #Transact
END
IF OBJECT_ID('tempdb..#Inventory') IS NOT NULL
BEGIN
DROP TABLE #Inventory
END
CREATE TABLE #Parts (
PartNum CHAR(2)
)
CREATE TABLE #Transact (
AutoId INT IDENTITY(1,1) NOT NULL
,PartNum CHAR(2)
,TranDate DATETIME
,TranQty INT
)
CREATE TABLE #Inventory (
PartNum CHAR(2)
,OnHandQty INT
)
INSERT INTO #Parts (PartNum) VALUES ('P1'),('P2'),('P3')
INSERT INTO #Transact (PartNum, TranDate, TranQty)
VALUES ('P1','6/28/2016',5),('P1','6/26/2016',3),('P1','6/26/2016',-1)
,('P1','6/15/2016',2) ,('P2','6/15/2016',1)
INSERT INTO #Inventory (PartNum, OnHandQty) VALUES ('P1',30),('P2',2)
I am thinking 1 recursive cte might be simpler might post that as an update.
Reverse the transactions to compute daily quantities. Add in the missing dates and look backward to the most recent date to fill in the daily quantities. I think I'm going to try for a better solution than this one.
http://rextester.com/JLD19862
with trn as (
select PartNum, TranDate, TranQty from PartTran
union all
select PartNum, cast('20160601' as date), 0 from PartWhse
union all
select PartNum, cast('20160630' as date), 0 from PartWhse
), qty as (
select
t.PartNum, t.TranDate,
-- assumes that end date corresponds with OnHandQty
min(w.OnHandQty) + sum(t.TranQty)
- sum(sum(t.TranQty))
over (partition by t.PartNum order by t.TranDate desc) as DailyOnHand,
coalesce(
lead(t.TranDate) over (partition by t.PartNum order by t.TranDate),
dateadd(day, 1, t.TranDate)
) as NextTranDate
-- if lead() isn't available...
-- coalesce(
-- (
-- select min(t2.TranDate) from trn as t2
-- where t2.PartNum = t.PartNum and t2.TranDate > t.TranDate
-- ),
-- dateadd(day, 1, t.TranDate)
-- ) as NextTranDate
from PartWhse as w inner join trn as t on t.PartNum = w.PartNum
where t.TranDate between '20160601' and '20160630'
group by t.PartNum, t.TranDate
)
select
PartNum,
sum(datediff(day, TranDate, NextTranDate) * DailyOnHand) * 1.00
/ sum(datediff(day, TranDate, NextTranDate)) as DailyAvg
from qty
group by PartNum;
I was able to solve this with a sum. First, I multiplied the final quantity on hand by the number of days in the range. Next, I multiplied each change in inventory by the time from #StartDate until the TransDate.
select
[Part].[PartNum] as [Part_PartNum],
(max(PartWhse.OnHandQty)*datediff(day,#StartDate,Constants.Today)-
sum(PartTran.TranQty*datediff(day,#StartDate,PartTran.TranDate))) as [Calculated_WeightedSum],
(WeightedSum/DATEDIFF(day, #StartDate, Constants.Today)) as [Calculated_AverageOnHand]
from Erp.Part as Part
right outer join Erp.PartTran as PartTran on
Part.PartNum = PartTran.PartNum
inner join Erp.PartWhse as PartWhse on
Part.PartNum = PartWhse.PartNum
group by [Part].[PartNum]
Thanks for your help everyone! You really helped me think it through.
I have hospital patient admission data in Microsoft SQL Server r2 that looks something like this:
PatientID, AdmitDate, DischargeDate
Jones. 1-jan-13 01:37. 1-jan-13 17:45
Smith 1-jan-13 02:12. 2-jan-13 02:14
Brooks. 4-jan-13 13:54. 5-jan-13 06:14
I would like count the number of patients in the hospital day by day and hour by hour (ie at
1-jan-13 00:00. 0
1-jan-13 01:00. 0
1-jan-13 02:00. 1
1-jan-13 03:00. 2
And I need to include the hours when there are no patients admitted in the result.
I can't create tables so making a reference table listing all the hours and days is out, though.
Any suggestions?
To solve this problem, you need a list of date-hours. The following gets this from the admit date cross joined to a table with 24 hours. The table of 24 hours is calculating from information_schema.columns -- a trick for getting small sequences of numbers in SQL Server.
The rest is just a join between this table and the hours. This version counts the patients at the hour, so someone admitted and discharged in the same hour, for instance is not counted. And in general someone is not counted until the next hour after they are admitted:
with dh as (
select DATEADD(hour, seqnum - 1, thedatehour ) as DateHour
from (select distinct cast(cast(AdmitDate as DATE) as datetime) as thedatehour
from Admission a
) a cross join
(select ROW_NUMBER() over (order by (select NULL)) as seqnum
from INFORMATION_SCHEMA.COLUMNS
) hours
where hours <= 24
)
select dh.DateHour, COUNT(*) as NumPatients
from dh join
Admissions a
on dh.DateHour between a.AdmitDate and a.DischargeDate
group by dh.DateHour
order by 1
This also assumes that there are admissions on every day. That seems like a reasonable assumption. If not, a calendar table would be a big help.
Here is one (ugly) way:
;WITH DayHours AS
(
SELECT 0 DayHour
UNION ALL
SELECT DayHour+1
FROM DayHours
WHERE DayHour+1 <= 23
)
SELECT B.AdmitDate, A.DayHour, COUNT(DISTINCT PatientID) Patients
FROM DayHours A
CROSS JOIN (SELECT DISTINCT CONVERT(DATE,AdmitDate) AdmitDate
FROM YourTable) B
LEFT JOIN YourTable C
ON B.AdmitDate = CONVERT(DATE,C.AdmitDate)
AND A.DayHour = DATEPART(HOUR,C.AdmitDate)
GROUP BY B.AdmitDate, A.DayHour
This is a bit messy and includes a temp table with the test data you provided but
CREATE TABLE #HospitalPatientData (PatientId NVARCHAR(MAX), AdmitDate DATETIME, DischargeDate DATETIME)
INSERT INTO #HospitalPatientData
SELECT 'Jones.', '1-jan-13 01:37:00.000', '1-jan-13 17:45:00.000' UNION
SELECT 'Smith', '1-jan-13 02:12:00.000', '2-jan-13 02:14:00.000' UNION
SELECT 'Brooks.', '4-jan-13 13:54:00.000', '5-jan-13 06:14:00.000'
;WITH DayHours AS
(
SELECT 0 DayHour
UNION ALL
SELECT DayHour+1
FROM DayHours
WHERE DayHour+1 <= 23
),
HospitalPatientData AS
(
SELECT CONVERT(nvarchar(max),AdmitDate,103) as AdmitDate ,DATEPART(hour,(AdmitDate)) as AdmitHour, COUNT(PatientID) as CountOfPatients
FROM #HospitalPatientData
GROUP BY CONVERT(nvarchar(max),AdmitDate,103), DATEPART(hour,(AdmitDate))
),
Results AS
(
SELECT MAX(h.AdmitDate) as Date, d.DayHour
FROM HospitalPatientData h
INNER JOIN DayHours d ON d.DayHour=d.DayHour
GROUP BY AdmitDate, CountOfPatients, DayHour
)
SELECT r.*, COUNT(h.PatientId) as CountOfPatients
FROM Results r
LEFT JOIN #HospitalPatientData h ON CONVERT(nvarchar(max),AdmitDate,103)=r.Date AND DATEPART(HOUR,h.AdmitDate)=r.DayHour
GROUP BY r.Date, r.DayHour
ORDER BY r.Date, r.DayHour
DROP TABLE #HospitalPatientData
This may get you started:
BEGIN TRAN
DECLARE #pt TABLE
(
PatientID VARCHAR(10)
, AdmitDate DATETIME
, DischargeDate DATETIME
)
INSERT INTO #pt
( PatientID, AdmitDate, DischargeDate )
VALUES ( 'Jones', '1-jan-13 01:37', '1-jan-13 17:45' ),
( 'Smith', '1-jan-13 02:12', '2-jan-13 02:14' )
, ( 'Brooks', '4-jan-13 13:54', '5-jan-13 06:14' )
DECLARE #StartDate DATETIME = '20130101'
, #FutureDays INT = 7
;
WITH dy
AS ( SELECT TOP (#FutureDays)
ROW_NUMBER() OVER ( ORDER BY name ) dy
FROM sys.columns c
) ,
hr
AS ( SELECT TOP 24
ROW_NUMBER() OVER ( ORDER BY name ) hr
FROM sys.columns c
)
SELECT refDate, COUNT(p.PatientID) AS PtCount
FROM ( SELECT DATEADD(HOUR, hr.hr - 1,
DATEADD(DAY, dy.dy - 1, #StartDate)) AS refDate
FROM dy
CROSS JOIN hr
) ref
LEFT JOIN #pt p ON ref.refDate BETWEEN p.AdmitDate AND p.DischargeDate
GROUP BY refDate
ORDER BY refDate
ROLLBACK