Get sum of entries over last 6 months (incomplete months) - sql

My data looks something like this
ProductNumber | YearMonth | Number
1 201803 1
1 201804 3
1 201810 6
2 201807 -3
2 201809 5
Now what I want to have is add an additional entry "6MSum" which is the sum of the last 6 months per ProductNumber (not the last 6 entries).
Please be aware the YearMonth data is not complete, for every ProductNumber there are gaps in between so I cant just use the last 6 entries for the sum. The final result should look something like this.
ProductNumber | YearMonth | Number | 6MSum
1 201803 1 1
1 201804 3 4
1 201810 6 9
2 201807 -3 -3
2 201809 5 2
Additionally I don't want to insert the sum to the table but instead use it in a query like:
SELECT [ProductNumber],[YearMonth],[Number],
6MSum = CONVERT(INT,SUM...)
FROM ...
I found a lot off solutions that use a "sum over period" but only for the last X entries and not for the actual conditional statement of "YearMonth within last 6 months".
Any help would be much appreciated!
Its a SQL Database
EDIT/Answer
It seems to be the case that the gaps within the months have to be filled with data, afterwards something like
sum(Number) OVER (PARTITION BY category
ORDER BY year, week
ROWS 6 PRECEDING) AS 6MSum
Should work.
Reference to the solution : https://dba.stackexchange.com/questions/181773/sum-of-previous-n-number-of-columns-based-on-some-category

You could go the OUTER APPLY route. The following produces your required results exactly:
-- prep data
SELECT
ProductNumber , YearMonth , Number
into #t
FROM ( values
(1, 201803 , 1 ),
(1, 201804 , 3 ),
(1, 201810 , 6 ),
(2, 201807 , -3 ),
(2, 201809 , 5 )
) s (ProductNumber , YearMonth , Number)
-- output
SELECT
ProductNumber
,YearMonth
,Number
,[6MSum]
FROM #t t
outer apply (
SELECT
sum(number) as [6MSum]
FROM #t it
where
it.ProductNumber = t.ProductNumber
and it.yearmonth <= t.yearmonth
and t.yearmonth - it.yearmonth between 0 and 6
) tt
drop table #t

Use outer apply and convert yearmonth to a date, something like this:
with t as (
select t.*,
convert(date, convert(varchar(255), yearmonth) + '01')) as ymd
from yourtable t
)
select t.*, t2.sum_6m
from t outer apply
(select sum(t2.number) as sum_6m
from t t2
where t2.productnumber = t.productnumber and
t2.ymd <= t.ymd and
t2.ymd > dateadd(month, -6, ymd)
) t2;

Just to provide one more option. You can use DATEFROMPARTS to build valid dates from the YearMonth value and then search for values within date ranges.
Testable here: https://rextester.com/APJJ99843
SELECT
ProductNumber , YearMonth , Number
INTO #t
FROM ( values
(1, 201803 , 1 ),
(1, 201804 , 3 ),
(1, 201810 , 6 ),
(2, 201807 , -3 ),
(2, 201809 , 5 )
) s (ProductNumber , YearMonth , Number)
SELECT *
,[6MSum] = (SELECT SUM(number) FROM #t WHERE
ProductNumber = t.ProductNumber
AND DATEFROMPARTS(LEFT(YearMonth,4),RIGHT(YearMonth,2),1) --Build a valid start of month date
BETWEEN
DATEADD(MONTH,-6,DATEFROMPARTS(LEFT(t.YearMonth,4),RIGHT(t.YearMonth,2),1)) --Build a valid start of month date 6 months back
AND DATEFROMPARTS(LEFT(t.YearMonth,4),RIGHT(t.YearMonth,2),1)) --Build a valid end of month date
FROM #t t
DROP TABLE #t

So a working query (provided by a colleauge of mine) can look like this
SELECT [YearMonth]
,[Number]
,[ProductNumber]
, (Select Sum(Number) from [...] DPDS_1 where DPDS.ProductNumber =
DPDS_1.ProductNumber and DPDS_1.YearMonth <= DPDS.YearMonth and DPDS_1.YearMonth >=
convert (int, left (convert (varchar, dateadd(mm, -6, DPDS.YearMonth + '01'), 112),
6)))FROM [...] DPDS

Related

TSQL - dates overlapping - number of days

I have the following table on SQL Server:
ID
FROM
TO
OFFER NUMBER
1
2022.01.02
9999.12.31
1
1
2022.01.02
2022.02.10
2
2
2022.01.05
2022.02.15
1
3
2022.01.02
9999.12.31
1
3
2022.01.15
2022.02.20
2
3
2022.02.03
2022.02.25
3
4
2022.01.16
2022.02.05
1
5
2022.01.17
2022.02.13
1
5
2022.02.05
2022.02.13
2
The range includes the start date but excludes the end date.
The date 9999.12.31 is given (comes from another system), but we could use the last day of the current quarter instead.
I need to find a way to determine the number of days when the customer sees exactly one, two, or three offers. The following picture shows the method upon id 3:
The expected results should be like (without using the last day of the quarter):
ID
# of days when the customer sees only 1 offer
# of days when the customer sees 2 offers
# of days when the customer sees 3 offers
1
2913863
39
0
2
41
0
0
3
2913861
24
17
4
20
0
0
5
19
8
0
I've found this article but it did not enlighten me.
Also I have limited privileges that is I am not able to declare a variable for example so I need to use "basic" TSQL.
Please provide a detailed explanation besides the code.
Thanks in advance!
The following will (for each ID) extract all distinct dates, construct non-overlapping date ranges to test, and will count up the number of offers per range. The final step is to sum and format.
The fact that the start dates are inclusive and the end dates are exclusive while sometimes non-intuitive for the human, actually works well in algorithms like this.
DECLARE #Data TABLE (Id INT, FromDate DATETIME, ToDate DATETIME, OfferNumber INT)
INSERT #Data
VALUES
(1, '2022-01-02', '9999-12-31', 1),
(1, '2022-01-02', '2022-02-10', 2),
(2, '2022-01-05', '2022-02-15', 1),
(3, '2022-01-02', '9999-12-31', 1),
(3, '2022-01-15', '2022-02-20', 2),
(3, '2022-02-03', '2022-02-25', 3),
(4, '2022-01-16', '2022-02-05', 1),
(5, '2022-01-17', '2022-02-13', 1),
(5, '2022-02-05', '2022-02-13', 2)
;
WITH Dates AS ( -- Gather distinct dates
SELECT Id, Date = FromDate FROM #Data
UNION --(distinct)
SELECT Id, Date = ToDate FROM #Data
),
Ranges AS ( --Construct non-overlapping ranges (The ToDate = NULL case will be ignored later)
SELECT ID, FromDate = Date, ToDate = LEAD(Date) OVER(PARTITION BY Id ORDER BY Date)
FROM Dates
),
Counts AS ( -- Calculate days and count offers per date range
SELECT R.Id, R.FromDate, R.ToDate,
Days = DATEDIFF(DAY, R.FromDate, R.ToDate),
Offers = COUNT(*)
FROM Ranges R
JOIN #Data D ON D.Id = R.Id
AND D.FromDate <= R.FromDate
AND D.ToDate >= R.ToDate
GROUP BY R.Id, R.FromDate, R.ToDate
)
SELECT Id
,[Days with 1 Offer] = SUM(CASE WHEN Offers = 1 THEN Days ELSE 0 END)
,[Days with 2 Offers] = SUM(CASE WHEN Offers = 2 THEN Days ELSE 0 END)
,[Days with 3 Offers] = SUM(CASE WHEN Offers = 3 THEN Days ELSE 0 END)
FROM Counts
GROUP BY Id
The WITH clause introduces Common Table Expressions (CTEs) which progressively build up intermediate results until a final select can be made.
Results:
Id
Days with 1 Offer
Days with 2 Offers
Days with 3 Offers
1
2913863
39
0
2
41
0
0
3
2913861
24
17
4
20
0
0
5
19
8
0
Alternately, the final select could use a pivot. Something like:
SELECT Id,
[Days with 1 Offer] = ISNULL([1], 0),
[Days with 2 Offers] = ISNULL([2], 0),
[Days with 3 Offers] = ISNULL([3], 0)
FROM (SELECT Id, Offers, Days FROM Counts) C
PIVOT (SUM(Days) FOR Offers IN ([1], [2], [3])) PVT
ORDER BY Id
See This db<>fiddle for a working example.
Find all date points for each ID. For each date point, find the number of overlapping.
Refer to comments within query
with
dates as
(
-- get all date points
select ID, theDate = FromDate from offers
union -- union to exclude any duplicate
select ID, theDate = ToDate from offers
),
cte as
(
select ID = d.ID,
Date_Start = d.theDate,
Date_End = LEAD(d.theDate) OVER (PARTITION BY ID ORDER BY theDate),
TheCount = c.cnt
from dates d
cross apply
(
-- Count no of overlapping
select cnt = count(*)
from offers x
where x.ID = d.ID
and x.FromDate <= d.theDate
and x.ToDate > d.theDate
) c
)
select ID, TheCount, days = sum(datediff(day, Date_Start, Date_End))
from cte
where Date_End is not null
group by ID, TheCount
order by ID, TheCount
Result :
ID
TheCount
days
1
1
2913863
1
2
39
2
1
41
3
1
2913861
3
2
29
3
3
12
4
1
20
5
1
19
5
2
8
To get to the required format, use PIVOT
dbfiddle demo

Grouping sql rows by weeks

I have a table
DATE Val
01-01-2020 1
01-02-2020 3
01-05-2020 2
01-07-2020 8
01-13-2020 3
...
I want to summarize these values by the following Sunday. For example, in the above example:
1-05-2020, 1-12-2020, and 1-19-2020 are Sundays, so I want to summarize these by those dates.
The final result should be something like
DATE SUM
1-05-2020 6 //(01-01-2020 + 01-02-2020 + 01-05-2020)
1-12-2020 8
1-19-2020 3
I wasn't certain if the best place to start would be to create a temp calendar table, and then try to join backwards based on that? Or if there was an easier way involving DATEDIFF. Any help would be appreciated! Thanks!
Here's a solution that uses DATEADD & DATEPART to calculate the closest Sunday.
With a correction for a different setting of ##datefirst.
(Since the datepart weekday values are different depending on the DATEFIRST setting)
Sample data:
create table #TestTable
(
Id int identity(1,1) primary key,
[Date] date,
Val int
);
insert into #TestTable
([Date], Val)
VALUES
('2020-01-01', 1)
, ('2020-01-02', 3)
, ('2020-01-05', 2)
, ('2020-01-07', 8)
, ('2020-01-13', 3)
;
Query:
WITH CTE_DATA AS
(
SELECT [Date], Val
, DATEADD(day,
((7-(##datefirst+datepart(weekday, [Date])-1)%7)%7),
[Date]) AS Sunday
FROM #TestTable
)
SELECT
Sunday AS [Date],
SUM(Val) AS [Sum]
FROM CTE_DATA
GROUP BY Sunday
ORDER BY Sunday;
Date | Sum
:--------- | --:
2020-01-05 | 6
2020-01-12 | 8
2020-01-19 | 3
db<>fiddle here
Extra:
Apparently the trick of adding the difference of weeks from day 0 to day 6 also works independently from the DATEFIRST setting.
So this query will return the same result for the sample data.
WITH CTE_DATA AS
(
SELECT [Date], Val
, CAST(DATEADD(week, DATEDIFF(week, 0, DATEADD(day, -1, [Date])), 6) AS DATE) AS Sunday
FROM #TestTable
)
SELECT
Sunday AS [Date],
SUM(Val) AS [Sum]
FROM CTE_DATA
GROUP BY Sunday
ORDER BY Sunday;
The subtraction of 1 day makes sure that if the date is already a Sunday that it isn't calculated to the next Sunday.
Here is a way to do it:
nb:1-13-2020 wont show cuz its not a sunday
with cte as
(
select cast('01-01-2020'as Date) as Date, 1 as Val
union select '01-02-2020' , 3
union select '01-05-2020' , 2
union select '01-07-2020' , 8
)
select Date, max(dateadd(dd,number,Date)), sum(distinct Val) as SUM
from master..spt_values a inner join cte on Date <= dateadd(dd,number,Date)
where type = 'p'
and year(dateadd(dd,number,Date))=year(Date)
and DATEPART(dw,dateadd(dd,number,Date)) = 7
group by Date
Output:
Date (No column name) SUM
2020-01-01 2020-12-26 1
2020-01-02 2020-12-26 3
2020-01-05 2020-12-26 2
2020-01-07 2020-12-26 8
Here is a simple solution. Putting your values into a temporary table and viewing the results on that table:
DECLARE #dates TABLE
(
mDATE DATE,
Val INT,
Sunday DATE
)
INSERT INTO #dates (mDATE,Val) VALUES
('01-01-2020',1),('01-02-2020',3),('01-05-2020',2),('01-07-2020',8),('01-13-2020',3)
UPDATE #dates
SET Sunday = dateadd(week, datediff(week, 0, mDATE), 6)
SELECT Sunday,SUM(Val) AS Val FROM #dates
GROUP BY Sunday
OUTPUT:
Sunday Val
2020-01-05 4
2020-01-12 10
2020-01-19 3

Fill up date gap by month

I have table of products and their sales quantity in months.
Product Month Qty
A 2018-01-01 5
A 2018-02-01 3
A 2018-05-01 5
B 2018-08-01 10
B 2018-10-01 12
...
I'd like to first fill in the data gap between each product's min and max dates like below:
Product Month Qty
A 2018-01-01 5
A 2018-02-01 3
A 2018-03-01 0
A 2018-04-01 0
A 2018-05-01 5
B 2018-08-01 10
B 2018-09-01 0
B 2018-10-01 12
...
Then I would need to perform an accumulation of each product's sales quantity by month.
Product Month total_Qty
A 2018-01-01 5
A 2018-02-01 8
A 2018-03-01 8
A 2018-04-01 8
A 2018-05-01 13
B 2018-08-01 10
B 2018-09-01 10
B 2018-10-01 22
...
I fumbled over the "cross join" clause, however it seems to generate some unexpected results for me. Could someone help to give a hint how I can achieve this in SQL?
Thanks a lot in advance.
I think a recursive CTE is a simple way to do this. The code is just:
with cte as (
select product, min(mon) as mon, max(mon) as end_mon
from t
group by product
union all
select product, dateadd(month, 1, mon), end_mon
from cte
where mon < end_mon
)
select cte.product, cte.mon, coalesce(qty, 0) as qty
from cte left join
t
on t.product = cte.product and t.mon = cte.mon;
Here is a db<>fiddle.
Hi i think this example can help you and perform what you excepted :
CREATE TABLE #MyTable
(Product varchar(10),
ProductMonth DATETIME,
Qty int
);
GO
CREATE TABLE #MyTableTempDate
(
FullMonth DATETIME
);
GO
INSERT INTO #MyTable
SELECT 'A', '2019-01-01', 214
UNION
SELECT 'A', '2019-02-01', 4
UNION
SELECT 'A', '2019-03-01', 50
UNION
SELECT 'B', '2019-01-01', 214
UNION
SELECT 'B', '2019-02-01', 10
UNION
SELECT 'C', '2019-04-01', 150
INSERT INTO #MyTableTempDate
SELECT '2019-01-01'
UNION
SELECT '2019-02-01'
UNION
SELECT '2019-03-01'
UNION
SELECT '2019-04-01'
UNION
SELECT '2019-05-01'
UNION
SELECT '2019-06-01'
UNION
SELECT '2019-07-01';
------------- FOR NEWER SQL SERVER VERSION > 2005
WITH MyCTE AS
(
SELECT T.Product, T.ProductMonth AS 'MMonth', T.Qty
FROM #MyTable T
UNION
SELECT T.Product, TD.FullMonth AS 'MMonth', 0 AS 'Qty'
FROM #MyTable T, #MyTableTempDate TD
WHERE NOT EXISTS (SELECT 1 FROM #MyTable TT WHERE TT.Product = T.Product AND TD.FullMonth = TT.ProductMonth)
)
-- SELECT * FROM MyCTE;
SELECT Product, MMonth, Qty, SUM( Qty) OVER(PARTITION BY Product ORDER BY Product
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as 'TotalQty'
FROM MyCTE
ORDER BY Product, MMonth ASC;
DROP TABLE #MyTable
DROP TABLE #MyTableTempDate
I have other way to perform this in lower SQL Server Version (like 2005 and lower)
It's a SELECT on SELECT if it's your case let me know and i provide some other example.
You can create the months with a recursive CTE
DECLARE #MyTable TABLE
(
ProductID CHAR(1),
Date DATE,
Amount INT
)
INSERT INTO #MyTable
VALUES
('A','2018-01-01', 5),
('A','2018-02-01', 3),
('A','2018-05-01', 5),
('B','2018-08-01', 10),
('B','2018-10-01', 12)
DECLARE #StartDate DATE
DECLARE #EndDate DATE
SELECT #StartDate = MIN(Date), #EndDate = MAX(Date) FROM #MyTable
;WITH dates AS (
SELECT #StartDate AS Date
UNION ALL
SELECT DATEADD(Month, 1, Date)
FROM dates
WHERE Date < #EndDate
)
SELECT A.ProductID, d.Date, COALESCE(Amount,0) AS Amount, COALESCE(SUM(Amount) OVER(PARTITION BY A.ProductID ORDER BY A.ProductID, d.Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),0) AS Total
FROM
(
SELECT ProductID, MIN(date) as DateStart, MAX(date) as DateEnd
FROM #MyTable
GROUP BY ProductID -- As I read in your comments that you need different min and max dates per product
) A
JOIN dates d ON d.Date >= A.DateStart AND d.Date <= A.DateEnd
LEFT JOIN #MyTable T ON A.ProductID = T.ProductID AND T.Date = d.Date
ORDER BY A.ProductID, d.Date
Try this below
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
DROP TABLE #Temp
;WITH CTE(Product,[Month],Qty)
AS
(
SELECT 'A','2018-01-01', 5 UNION ALL
SELECT 'A','2018-02-01', 3 UNION ALL
SELECT 'A','2018-05-01', 5 UNION ALL
SELECT 'B','2018-08-01', 10 UNION ALL
SELECT 'D','2018-10-01', 12
)
SELECT ct.Product,[MonthDays],ct.Qty
INTO #Temp
FROM
(
SELECT c.Product,[Month],
ISNULL(Qty,0) AS Qty
FROM CTE c
)ct
RIGHT JOIN
(
SELECT -- This code is to get month data
CONVERT(VARCHAR(10),'2018-'+ RIGHT('00'+CAST(MONTH(DATEADD(MM, s.number, CONVERT(DATETIME, 0)))AS VARCHAR),2) +'-01',120) AS [MonthDays]
FROM master.dbo.spt_values s
WHERE [type] = 'P' AND s.number BETWEEN 0 AND 11
)DT
ON dt.[MonthDays] = ct.[Month]
SELECT
MAX(Product)OVER(ORDER BY [MonthDays])AS Product,
[MonthDays],
ISNULL(Qty,0) Qty,
SUM(ISNULL(Qty,0))OVER(ORDER BY [MonthDays]) As SumQty
FROM #Temp
Result
Product MonthDays Qty SumQty
------------------------------
A 2018-01-01 5 5
A 2018-02-01 3 8
A 2018-03-01 0 8
A 2018-04-01 0 8
A 2018-05-01 5 13
A 2018-06-01 0 13
A 2018-07-01 0 13
B 2018-08-01 10 23
B 2018-09-01 0 23
D 2018-10-01 12 35
D 2018-11-01 0 35
D 2018-12-01 0 35
First of all, i would divide month and year to get easier with statistics.
I will give you an example query, not based on your table but still helpful.
--here i create the table that will be used as calendar
Create Table MA_MonthYears (
Month int not null ,
year int not null
PRIMARY KEY ( month, year) )
--/////////////////
-- here i'm creating a procedure to fill the ma_monthyears table
declare #month as int
declare #year as int
set #month = 1
set #year = 2015
while ( #year != 2099 )
begin
insert into MA_MonthYears(Month, year)
select #month, #year
if #month < 12
set #month=#month+1
else
set #month=1
if #month = 1
set #year = #year + 1
end
--/////////////////
--here you are the possible result you are looking for
select SUM(Ma_saledocdetail.taxableamount) as Sold, MA_MonthYears.month , MA_MonthYears.year , item
from MA_MonthYears left outer join MA_SaleDocDetail on year(MA_SaleDocDetail.DocumentDate) = MA_MonthYears.year
and Month(ma_saledocdetail.documentdate) = MA_MonthYears.Month
group by MA_SaleDocDetail.Item, MA_MonthYears.year , MA_MonthYears.month
order by MA_MonthYears.year , MA_MonthYears.month

Getting a date + 3 days ( using specific date) SQL

I'm trying to get the next available day after a result set.
This is the query I'm using but is totally wrong:
SELECT DateID = ROW_NUMBER() over (order by B.Date_Key) , B.ClosingDate, C.dates AS RecDay
FROM DIM_DATE B JOIN [dbo].[WorkDay_Calendar] C on C.dates = DATEADD(DAY,3, B.ClosingDate) WHERE YEAR(B.ClosingDate) >= '2018'
AND C.[Sentday] = 0 and C.[RecDay] = 0
This query is
retrieving the RecDay when Closingdate +3 days = to Sentday AND What I want is
when Closingdate + 3(Sentday) then pick the next RecDay,
something like C.dates = DATEADD(DAY,3(Sentday), B.ClosingDate).
This is how are looking my tables:
Dim_Date table
WorkDay_Calendar Table
Notice that when Sentday and RecDay are valid when = 0 if 1 is not valid because is a weekend or holiday.
Based on this information for example if I pick from the Dim_Date table 2018-02-02 as one of the Closingdate then the RecDay should be:
DateID RecDay
------------------------
1 2018-02-07
And with the current query is retrieving this which is totally wrong:
DateID RecDay
-----------------------
1 2018-02-05
Graphic explanation below and please follow the 0 in Bold:
More output examples:
Using the dates below as ClosingDate:
Date_Key ClosingDate:
38284 2018-07-24
38287 2018-01-10
38290 2018-03-08
38291 2018-07-13
38293 2018-02-08
Using the same order of the ClosingDates these should be the outputs, I incluided the ClosingDate column so you can follow the order:
OUTPUTS:
DateID ClosingDate RecDay (output)
1 2018-07-24 2018-07-30
2 2018-01-10 2018-01-16
3 2018-03-08 2018-03-13
4 2018-07-13 2018-07-18
5 2018-02-08 2018-02-13
I'm not sure If if followed you correctly, but based on your condition, you want to check the date dimension table based on calendar table. If ClosingDate + 3 days is equal to SentDay then you need to get the ReceiveDay. if that's what you need. then try this out :
UPDATED
SELECT
ROW_NUMBER() OVER (ORDER BY Date_key) DateID,
ClosingDateOLD,
C.Dates
FROM (
SELECT
Date_key,
ClosingDate AS ClosingDateOLD,
CASE
WHEN DATENAME(dw, DATEADD(DAY, 4, ClosingDate)) IN ('Saturday') THEN DATEADD(DAY, 6, ClosingDate)
WHEN DATENAME(dw, DATEADD(DAY, 4, ClosingDate)) IN ('Sunday') THEN DATEADD(DAY, 5, ClosingDate)
ELSE DATEADD(DAY, 5, ClosingDate)
END AS ClosingDate
FROM
#DIM_DATE
WHERE
ClosingDate IS NOT NULL
) D
JOIN #Calendar C ON C.Dates = ClosingDate
As I understand the requirements it would be something like this.
I am posting a full working example in case somebody wants to take a crack at this.
create table #DIM_DATE
(
DateKey int
, ClosingDate date
)
insert #DIM_DATE values
(1, NULL)
, (2, '2018-01-02')
, (3, NULL)
, (4, NULL)
create table #CalendarTable
(
ID int
, SentDay date
, ReceiveDay date
)
insert #CalendarTable values
(1, '2018-01-03', '2018-01-02')
, (2, '2018-01-04', '2018-01-03')
, (3, '2018-01-05', '2018-01-08')
SELECT DateID = ROW_NUMBER() over (order by d.DateKey)
, ct.ReceiveDay
FROM #DIM_DATE d
join #CalendarTable ct on ct.SentDay = dateadd(day, 3, d.ClosingDate)
drop table #DIM_DATE
drop table #CalendarTable

SQL: CTE query Speed

I am using SQL Server 2008 and am trying to increase the speed of my query below. The query assigns points to patients based on readmission dates.
Example: A patient is seen on 1/2, 1/5, 1/7, 1/8, 1/9, 2/4. I want to first group visits within 3 days of each other. 1/2-5 are grouped, 1/7-9 are grouped. 1/5 is NOT grouped with 1/7 because 1/5's actual visit date is 1/2. 1/7 would receive 3 points because it is a readmit from 1/2. 2/4 would also receive 3 points because it is a readmit from 1/7. When the dates are grouped the first date is the actual visit date.
Most articles suggest limiting the data set or adding indexes to increase speed. I have limited the amount of rows to about 15,000 and added a index. When running the query with 45 test visit dates/ 3 test patients, the query takes 1.5 min to run. With my actual data set it takes > 8 hrs.
How can I get this query to run < 1 hr? Is there a better way to write my query? Does my Index look correct? Any help would be greatly appreciated.
Example expected results below query.
;CREATE TABLE RiskReadmits(MRN INT, VisitDate DATE, Category VARCHAR(15))
;CREATE CLUSTERED INDEX Risk_Readmits_Index ON RiskReadmits(VisitDate)
;INSERT RiskReadmits(MRN,VisitDate,CATEGORY)
VALUES
(1, '1/2/2016','Inpatient'),
(1, '1/5/2016','Inpatient'),
(1, '1/7/2016','Inpatient'),
(1, '1/8/2016','Inpatient'),
(1, '1/9/2016','Inpatient'),
(1, '2/4/2016','Inpatient'),
(1, '6/2/2016','Inpatient'),
(1, '6/3/2016','Inpatient'),
(1, '6/5/2016','Inpatient'),
(1, '6/6/2016','Inpatient'),
(1, '6/8/2016','Inpatient'),
(1, '7/1/2016','Inpatient'),
(1, '8/1/2016','Inpatient'),
(1, '8/4/2016','Inpatient'),
(1, '8/15/2016','Inpatient'),
(1, '8/18/2016','Inpatient'),
(1, '8/28/2016','Inpatient'),
(1, '10/12/2016','Inpatient'),
(1, '10/15/2016','Inpatient'),
(1, '11/17/2016','Inpatient'),
(1, '12/20/2016','Inpatient')
;WITH a AS (
SELECT
z1.VisitDate
, z1.MRN
, (SELECT MIN(VisitDate) FROM RiskReadmits WHERE VisitDate > DATEADD(day, 3, z1.VisitDate)) AS NextDay
FROM
RiskReadmits z1
WHERE
CATEGORY = 'Inpatient'
), a1 AS (
SELECT
MRN
, MIN(VisitDate) AS VisitDate
, MIN(NextDay) AS NextDay
FROM
a
GROUP BY
MRN
), b AS (
SELECT
VisitDate
, MRN
, NextDay
, 1 AS OrderRow
FROM
a1
UNION ALL
SELECT
a.VisitDate
, a.MRN
, a.NextDay
, b.OrderRow +1 AS OrderRow
FROM
a
JOIN b
ON a.VisitDate = b.NextDay
), c AS (
SELECT
MRN,
VisitDate
, (SELECT MAX(VisitDate) FROM b WHERE b1.VisitDate > VisitDate AND b.MRN = b1.MRN) AS PreviousVisitDate
FROM
b b1
)
SELECT distinct
c1.MRN,
c1.VisitDate
, CASE
WHEN DATEDIFF(day,c1.PreviousVisitDate,c1.VisitDate) < 30 THEN PreviousVisitDate
ELSE NULL
END AS ReAdmissionFrom
, CASE
WHEN DATEDIFF(day,c1.PreviousVisitDate,c1.VisitDate) < 30 THEN 3
ELSE 0
END AS Points
FROM
c c1
ORDER BY c1.MRN
Expected Results:
MRN VisitDate ReAdmissionFrom Points
1 2016-01-02 NULL 0
1 2016-01-07 2016-01-02 3
1 2016-02-04 2016-01-07 3
1 2016-06-02 NULL 0
1 2016-06-06 2016-06-02 3
1 2016-07-01 2016-06-06 3
1 2016-08-01 NULL 0
1 2016-08-15 2016-08-01 3
1 2016-08-28 2016-08-15 3
1 2016-10-12 NULL 0
1 2016-11-17 NULL 0
1 2016-12-20 NULL 0
oops I changed the names of a few cte's (and the post messed up what was code)
It should be like this:
b AS (
SELECT
VisitDate
, MRN
, NextDay
, 1 AS OrderRow
FROM
a1
UNION ALL
SELECT
a.VisitDate
, a.MRN
, a.NextDay
, b.OrderRow +1 AS OrderRow
FROM
a AS a
JOIN b
ON a.VisitDate = b.NextDay AND a.MRN = b.MRN
)
I'm going to take a wild guess here and say you want to change the b cte to
have AND a.MRN = b.MRN as a second condition in the second select query like this:
, b AS (
SELECT
VisitDate
, MRN
, NextDay
, 1 AS OrderRow
FROM
firstVisitAndFollowUp
UNION ALL
SELECT
a.VisitDate
, a.MRN
, a.NextDay
, b.OrderRow +1 AS OrderRow
FROM
visitsDistance3daysOrMore AS a
JOIN b
ON a.VisitDate = b.NextDay AND a.MRN = b.MRN
)