SQL - need to determine implicit end dates for supplied begin dates - sql

Consider the following:
CREATE TABLE Members
(
MemberID CHAR(10)
, GroupID CHAR(10)
, JoinDate DATETIME
)
INSERT Members VALUES ('1', 'A', 2010-01-01)
INSERT Members VALUES ('1', 'C', 2010-09-05)
INSERT Members VALUES ('1', 'B', 2010-04-15)
INSERT Members VALUES ('1', 'B', 2010-10-10)
INSERT Members VALUES ('1', 'A', 2010-06-01)
INSERT Members VALUES ('1', 'D', 2001-11-30)
What would be the best way to select from this table, determining the implied "LeaveDate", producing the following data set:
MemberID GroupID JoinDate LeaveDate
1 A 2010-01-01 2010-04-14
1 B 2010-04-15 2010-05-31
1 A 2010-06-01 2010-09-04
1 C 2010-09-05 2010-10-09
1 B 2010-10-10 2010-11-29
1 D 2010-11-30 NULL
As you can see, a member is assumed to have no lapse in membership. The [LeaveDate] for each member status period is assumed to be the day prior to the next chronological [JoinDate] that can be found for that member in a different group. Of course this is a simplified illustration of my actual problem, which includes a couple more categorization/grouping columns and thousands of different members with [JoinDate] values stored in no particular order.

Something like this perhaps? Self join, and select the minimum joining date that is greater than the joining date for the current row - i.e. the leave date plus one. Subtract one day from it.
You may need to adjust the date arithmetic for your particular RDBMS.
SELECT
m1.*
, MIN( m2.JoinDate ) - INTERVAL 1 DAY AS LeaveDate
FROM
Members m1
LEFT JOIN
Members m2
ON m2.MemberID = m1.MemberID
AND m2.JoinDate > m1.JoinDate
GROUP BY
m1.MemberID
, m1.GroupID
, m1.JoinDate
ORDER BY
m1.MemberID
, m1.JoinDate

Standard (ANSI) SQL solution:
SELECT memberid,
groupid,
joindate,
lead(joindate) OVER (PARTITION BY memberid ORDER BY joindate ASC) AS leave_date
FROM members
ORDER BY joindate ASC

Related

Remove duplicates from single field only in rollup query

I have a table of data for individual audits on inventory. Every audit has a location, an expected value, a variance value, and some other data that aren't really important here.
I am writing a query for Cognos 11 which summarizes a week of these audits. Currently, it rolls everything up into sums by location class. My problem is that there may be multiple audits for individual locations and while I want the variance field to sum the data from all audits regardless of whether it's the first count on that location, I only want the expected value for distinct locations (i.e. only SUM expected value where the location is distinct).
Below is a simplified version of the query. Is this even possible or will I have to write a separate query in Cognos and make it two reports that will have to be combined after the fact? As you can likely tell, I'm fairly new to SQL and Cognos.
SELECT COALESCE(CASE
WHEN location_class = 'A'
THEN 'Active'
WHEN location_class = 'C'
THEN 'Active'
WHEN location_class IN (
'R'
,'0'
)
THEN 'Reserve'
END, 'Grand Total') "Row Labels"
,SUM(NVL(expected_cost, 0)) "Sum of Expected Cost"
,SUM(NVL(variance_cost, 0)) "Sum of Variance Cost"
,SUM(ABS(NVL(variance_cost, 0))) "Sum of Absolute Cost"
,COUNT(DISTINCT location) "Count of Locations"
,(SUM(NVL(variance_cost, 0)) / SUM(NVL(expected_cost, 0))) "Variance"
FROM audit_table
WHERE audit_datetime <= #prompt('EndDate') # audit_datetime >= #prompt('StartDate') #
GROUP BY ROLLUP(CASE
WHEN location_class = 'A'
THEN 'Active'
WHEN location_class = 'C'
THEN 'Active'
WHEN location_class IN (
'R'
,'0'
)
THEN 'Reserve'
END)
ORDER BY 1 ASC
This is what I'm hoping to end up with:
Thanks for any help!
Have you tried taking a look at the OVER clause in SQL? It allows you to use windowed functions within a result set such that you can get aggregates based on specific conditions. This would probably help since you seem to trying to get a summation of data based on a different grouping within a larger grouping.
For example, let's say we have the below dataset:
group1 group2 val dateadded
----------- ----------- ----------- -----------------------
1 1 1 2020-11-18
1 1 1 2020-11-20
1 2 10 2020-11-18
1 2 10 2020-11-20
2 3 100 2020-11-18
2 3 100 2020-11-20
2 4 1000 2020-11-18
2 4 1000 2020-11-20
Using a single query we can return both the sums of "val" over "group1" as well as the summation of the first (based on datetime) "val" records in "group2":
declare #table table (group1 int, group2 int, val int, dateadded datetime)
insert into #table values (1, 1, 1, getdate())
insert into #table values (1, 1, 1, dateadd(day, 1, getdate()))
insert into #table values (1, 2, 10, getdate())
insert into #table values (1, 2, 10, dateadd(day, 1, getdate()))
insert into #table values (2, 3, 100, getdate())
insert into #table values (2, 3, 100, dateadd(day, 1, getdate()))
insert into #table values (2, 4, 1000, getdate())
insert into #table values (2, 4, 1000, dateadd(day, 1, getdate()))
select t.group1, sum(t.val) as group1_sum, group2_first_val_sum
from #table t
inner join
(
select group1, sum(group2_first_val) as group2_first_val_sum
from
(
select group1, val as group2_first_val, row_number() over (partition by group2 order by dateadded) as rownumber
from #table
) y
where rownumber = 1
group by group1
) x on t.group1 = x.group1
group by t.group1, x.group2_first_val_sum
This returns the below result set:
group1 group1_sum group2_first_val_sum
----------- ----------- --------------------
1 22 11
2 2200 1100
The most inner subquery in the joined table numbers the rows in the data set based on "group2", resulting in the records either having a "1" or a "2" in the "rownum" column since there's only 2 records in each "group2".
The next subquery takes that data and filters out any rows that are not the first (rownum = 1) and sums the "val" data.
The main query gets the sum of "val" in each "group1" from the main table and then joins on the subqueried table to get the "val" sum of only the first records in each "group2".
There are more efficient ways to write this such as moving the summation of the "group1" values to a subquery in the SELECT statement to get rid of one of the nested tabled subqueries, but I wanted to show how to do it without subqueries in the SELECT statement.
Have you tried to put the distinct at the bottom like this ?
(SUM(NVL(variance_cost,0)) / SUM(NVL(expected_cost,0))) "Variance",
COUNT(DISTINCT location) "Count of Locations"
FROM audit_table

T-SQL, rows where a specific column does not change over a date range?

I have a data table structured like so:
ID Date purchaseType
01 03-01-18 apple
01 04-01-18 apple
02 05-01-18 spinach
01 05-01-18 apple
02 06-01-18 spinach
02 07-01-18 apple
...
I want to look at all Id's where the purchase type was the same over 3 months. That is to say, the results I would get from the above table would be:
ID purchaseType Length(months)
01 apple 3
...
and ID=02 is not included, as in the third month, the purchase type was switched to apple from spinach. I hope this makes sense!
Edit: There is always a record per month and ID, There should not be any duplicate records (That is to say, one purchase type, per ID, per month). It is always on the first of the month.
Edit2: I have tried doing something along the lines of
select Min(Date) as 'Min', max(date) as 'Max',ID,purchaseType
From someTableName
GroupBy ID,purchasetype
but not sure where to take it from here
Edit 3: I don't need a specific date range. Just if for an ID:X, if there at anypoint existed a 3 month period where purchase type did not change.
Here is my contribution, for requirements known to date.
(I needed to change ID as string to Idnum as int.)
CREATE TABLE #my_table
( IDnum INTEGER
, DATEp DATE
, PurchaseType VARCHAR(10) )
INSERT INTO #my_table VALUES( '01', '03-01-18', 'apple' );
INSERT INTO #my_table VALUES( '01', '04-01-18', 'apple' );
INSERT INTO #my_table VALUES( '02', '05-01-18', 'spinach' );
INSERT INTO #my_table VALUES( '01', '05-01-18', 'apple' );
INSERT INTO #my_table VALUES( '02', '06-01-18', 'spinach' );
INSERT INTO #my_table VALUES( '02', '07-01-18', 'apple' );
SELECT
M1.IDnum
, M1.PurchaseType
FROM #my_table AS M1
INNER JOIN #my_table AS M2
ON M1.IDnum = M2.IDnum
AND DATEADD(MONTH, 1, M1.DATEp) = M2.DATEp
INNER JOIN #my_table AS M3
ON M1.IDnum = M3.IDnum
AND DATEADD(MONTH, 2, M1.DATEp) = M3.DATEp
WHERE M1.PurchaseType = M2.PurchaseType
AND M1.PurchaseType = M3.PurchaseType
GROUP BY
M1.IDnum
, M1.PurchaseType
-- RESULT:
-- IDnum PurchaseType
-- 1 apple
Based on the comments I understand the problem like this. You want to find 3 months of more with no gap and no product change. If a product was purchased there will exist a item in that month.
First you have gaps -- or hills and valleys. There is a trick to do this. You have two row numbers -- one based on increment every month and another on the value changing -- the difference of these two will give you "groups". Then you need to look at the max and the min per group.
prior answer
SELECT DISTINCT ID, PURCHASE_TYPE
FROM (
SELECT ID, PURCHASE_TYPE,
ROW_NUMBER() OVER (PARTITION BY ID, PURCHASE_TYPE ORDER BY DATE) AS RN
FROM your_table_name_goes_here
) X
WHERE RN >= 3

How to check the overlapping time intervals from one type 2 SCD dimension

I have one problem identifying and fixing some records having overlapping time intervals, for one scd type 2 dimension.
What I have is:
Bkey Uid startDate endDate
'John' 1 1990-01-01 (some time stamp) 2017-01-10 (some time stamp)
'John' 2 2016-11=03 (some time stamp) 2016-11-14 (some time stamp)
'John' 3 2016-11-14 (some time stamp) 2016-12-29 (some time stamp)
'John' 4 2016-12-29 (some time stamp) 2017-01-10 (some time stamp)
'John' 5 2017-01-10 (some time stamp) 2017-04-22 (some time stamp)
......
I want to find (first) which are all the Johns having overlapping time periods, for a table having lots and lots of Johns and then to figure out a way to correct those overlapping time periods. For the latest I know there are some function LAGG, LEAD, which can handle that, but it eludes me how to find those over lappings.
Any hints?
Regards,
[ 1 ] Following query will return overlapping time ranges:
SELECT *,
(
SELECT *
FROM #Dimension1 y
WHERE x.Bkey = y.Bkey
AND x.Uid <> y.Uid
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
FOR XML RAW, ROOT, TYPE
) OverlappingTimeRanges
FROM #Dimension1 x
Full script:
DECLARE #Dimension1 TABLE (
Bkey VARCHAR(50) NOT NULL,
Uid INT NOT NULL,
startDate DATE NOT NULL,
endDate DATE NOT NULL,
CHECK(startDate < endDate)
);
INSERT #Dimension1
SELECT 'John', 1, '1990-01-01', '2017-01-10' UNION ALL
SELECT 'John', 2, '2016-11-03', '2016-11-14' UNION ALL
SELECT 'John', 3, '2016-11-14', '2016-12-29' UNION ALL
SELECT 'John', 4, '2016-12-29', '2017-01-10' UNION ALL
SELECT 'John', 5, '2017-01-11', '2017-04-22';
SELECT *,
(
SELECT *
FROM #Dimension1 y
WHERE x.Bkey = y.Bkey
AND x.Uid <> y.Uid
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
FOR XML RAW, ROOT, TYPE
) OverlappingTimeRanges
FROM #Dimension1 x
Demo here
[ 2 ] In order to find distinct groups of time ranges with overlapping original rows I would use following approach:
-- Edit 1
DECLARE #Groups TABLE (
Bkey VARCHAR(50) NOT NULL,
Uid INT NOT NULL,
startDateNew DATE NOT NULL,
endDateNew DATE NOT NULL,
CHECK(startDateNew < endDateNew)
);
INSERT #Groups
SELECT x.Bkey, x.Uid, z.startDateNew, z.endDateNew
FROM #Dimension1 x
OUTER APPLY (
SELECT MIN(y.startDate) AS startDateNew, MAX(y.endDate) AS endDateNew
FROM #Dimension1 y
WHERE x.Bkey = y.Bkey
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
) z
-- End of Edit 1
-- This returns distinct groups identified by DistinctGroupId together with all overlapping Uid(s) from current group
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY b.Bkey, b.startDateNew, b.endDateNew) AS DistinctGroupId, b.*
FROM (
SELECT DISTINCT a.Bkey, a.startDateNew, a.endDateNew
FROM #Groups a
) b
) c
OUTER APPLY (
SELECT d.Uid AS Overlapping_Uid
FROM #Groups d
WHERE c.Bkey = d.Bkey
AND c.startDateNew = d.startDateNew
AND c.endDateNew = d.endDateNew
) e
-- This returns distinct groups identified by DistinctGroupId together with an XML (XmlCol) which includes overlapping Uid(s)
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY b.Bkey, b.startDateNew, b.endDateNew) AS DistinctGroupId, b.*
FROM (
SELECT DISTINCT a.Bkey, a.startDateNew, a.endDateNew
FROM #Groups a
) b
) c
OUTER APPLY (
SELECT (
SELECT d.Uid AS Overlapping_Uid
FROM #Groups d
WHERE c.Bkey = d.Bkey
AND c.startDateNew = d.startDateNew
AND c.endDateNew = d.endDateNew
FOR XML RAW, TYPE
) AS XmlCol
) e
Note: Last range used in my example is 'John', 5, '2017-01-11', '2017-04-22'; and not 'John', 5, '2017-01-10', '2017-04-22';. Also, data type used is DATE and not DATETIME[2][OFFSET].
I think the tricky part of your query is being able to articulate the logic for overlapping ranges. We can self join on the condition that a row on the left overlaps with any row on the right. All matching rows are those which overlap.
We can think of four possible overlap scenarios:
|---------| |---------| no overlap
|---------|
|---------| 1st end and 2nd start overlap
|---------|
|---------| 1st start and 2nd end overlap
|---------|
|---| 2nd completely contained inside 1st
(could be 1st inside 2nd also)
SELECT DISTINCT
t.Uid
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.startDate <= t2.endDate AND
t2.startDate <= t1.endDate
WHERE
t1.Bkey = 'John' AND t2.Bkey = 'John'
This will at least let you identify overlapping records. Updating and separating them in a meaningful way will probably end up being an ugly gaps and islands problem, perhaps meriting another question.
we can acheive this by doing a self join of emp table.
a.emp_id != b.emp_id ensures same row is not joined with itself.
remaining comparison clause checks if any row's start date or end date falls in other row's date range.
create table emp(name varchar(20), emp_id numeric(10), start_date date, end_date date);
insert into emp values('John', 1, '1990-01-01', '2017-01-10');
insert into emp values( 'John', 2, '2016-11-03', '2016-11-14');
insert into emp values( 'John', 3, '2016-11-14', '2016-12-29');
insert into emp values( 'John', 4, '2016-12-29', '2017-01-10');
insert into emp values( 'John', 5, '2017-01-11', '2017-04-22');
commit;
with A as (select * from EMP),
B as (select * from EMP)
select A.* from A,B where A.EMP_ID != B.EMP_ID
and A.START_DATE < B.END_DATE and B.START_DATE < A.END_DATE
and (A.START_DATE between B.START_DATE and B.END_DATE
or A.END_DATE between B.START_DATE and B.END_DATE);

TSQL - Run date comparison for "duplicates"/false positives on initial query?

I'm pretty new to SQL and am working on pulling some data from several very large tables for analysis. The data is basically triggered events for assets on a system. The events all have a created_date (datetime) field that I care about.
I was able to put together the query below to get the data I need (YAY):
SELECT
event.efkey
,event.e_id
,event.e_key
,l.l_name
,event.created_date
,asset.a_id
,asset.asset_name
FROM event
LEFT JOIN asset
ON event.a_key = asset.a_key
LEFT JOIN l
ON event.l_key = l.l_key
WHERE event.e_key IN (350, 352, 378)
ORDER BY asset.a_id, event.created_date
However, while this gives me the data for the specific events I want, I still have another problem. Assets can trigger these events repeatedly, which can result in large numbers of "false positives" for what I'm looking at.
What I need to do is go through the result set of the query above and remove any events for an asset that occur closer than N minutes together (say 30 minutes for this example). So IF the asset_ID is the same AND the event.created_date is within 30 minutes of another event for that asset in the set THEN I want that removed. For example:
For the following records
a_id 1124 created 2016-02-01 12:30:30
a_id 1124 created 2016-02-01 12:35:31
a_id 1124 created 2016-02-01 12:40:33
a_id 1124 created 2016-02-01 12:45:42
a_id 1124 created 2016-02-02 12:30:30
a_id 1124 created 2016-02-02 13:00:30
a_id 1115 created 2016-02-01-12:30:30
I'd want to return only:
a_id 1124 created 2016-02-01 12:30:30
a_id 1124 created 2016-02-02 12:30:30
a_id 1124 created 2016-02-02 13:00:30
a_id 1115 created 2016-02-01-12:30:30
I tried referencing this and this but I can't make the concepts there work for me. I know I probably need to do a SELECT * FROM (my existing query) but I can't seem to do that without ending up with tons of "multi-part identifier can't be bound" errors (and I have no experience creating temp tables, my attempts at that have failed thus far). I also am not exactly sure how to use DATEDIFF as the date filtering function.
Any help would be greatly appreciated! If you could dumb it down for a novice (or link to explanations) that would also be helpful!
This is a trickier problem than it initially appears. The hard part is capturing the previous good row and removing the next bad rows but not allowing those bad rows to influence whether or not the next row is good. Here is what I came up with. I've tried to explain what is going on with comments in the code.
--sample data since I don't have your table structure and your original query won't work for me
declare #events table
(
id int,
timestamp datetime
)
--note that I changed some of your sample data to test some different scenarios
insert into #events values( 1124, '2016-02-01 12:30:30')
insert into #events values( 1124, '2016-02-01 12:35:31')
insert into #events values( 1124, '2016-02-01 12:40:33')
insert into #events values( 1124, '2016-02-01 13:05:42')
insert into #events values( 1124, '2016-02-02 12:30:30')
insert into #events values( 1124, '2016-02-02 13:00:30')
insert into #events values( 1115, '2016-02-01 12:30:30')
--using a cte here to split the result set of your query into groups
--by id (you would want to partition by whatever criteria you use
--to determine that rows are talking about the same event)
--the row_number function gets the row number for each row within that
--id partition
--the over clause specifies how to break up the result set into groups
--(partitions) and what order to put the rows in within that group so
--that the numbering stays consistant
;with orderedEvents as
(
select id, timestamp, row_number() over (partition by id order by timestamp) as rn
from #events
--you would replace #events here with your query
)
--using a second recursive cte here to determine which rows are "good"
--and which ones are not.
, previousGoodTimestamps as
(
--this is the "seeding" part of the recursive cte where I pick the
--first rows of each group as being a desired result. Since they
--are the first in each group, I know they are good. I also assign
--their timestamp as the previous good timestamp since I know that
--this row is good.
select id, timestamp, rn, timestamp as prev_good_timestamp, 1 as is_good
from orderedEvents
where rn = 1
union all
--this is the recursive part of the cte. It takes the rows we have
--already added to this result set and joins those to the "next" rows
--(as defined by our ordering in the first cte). Then we output
--those rows and do some calculations to determine if this row is
--"good" or not. If it is "good" we set it's timestamp as the
--previous good row timestamp so that rows that come after this one
--can use it to determine if they are good or not. If a row is "bad"
--we just forward along the last known good timestamp to the next row.
--
--We also determine if a row is good by checking if the last good row
--timestamp plus 30 minutes is less than or equal to the current row's
--timestamp. If it is then the row is good.
select e2.id
, e2.timestamp
, e2.rn
, last_good_timestamp.timestamp
, case
when dateadd(mi, 30, last_good_timestamp.timestamp) <= e2.timestamp then 1
else 0
end
from previousGoodTimestamps e1
inner join orderedEvents e2 on e2.id = e1.id and e2.rn = e1.rn + 1
--I used a cross apply here to calculate the last good row timestamp
--once. I could have used two identical subqueries above in the select
--and case statements, but I would rather not duplicate the code.
cross apply
(
select case
when e1.is_good = 1 then e1.timestamp --if the last row is good, just use it's timestamp
else e1.prev_good_timestamp --the last row was bad, forward on what it had for the last good timestamp
end as timestamp
) last_good_timestamp
)
select *
from previousGoodTimestamps
where is_good = 1 --only take the "good" rows
Links to MSDN for some of the more complicated things here:
CTEs and Recursive CTEs
CROSS APPLY
-- Sample data.
declare #Samples as Table ( Id Int Identity, A_Id Int, CreatedDate DateTime );
insert into #Samples ( A_Id, CreatedDate ) values
( 1124, '2016-02-01 12:30:30' ),
( 1124, '2016-02-01 12:35:31' ),
( 1124, '2016-02-01 12:40:33' ),
( 1124, '2016-02-01 12:45:42' ),
( 1124, '2016-02-02 12:30:30' ),
( 1124, '2016-02-02 13:00:30' ),
( 1125, '2016-02-01 12:30:30' );
select * from #Samples;
-- Calculate the windows of 30 minutes before and after each CreatedDate and check for conflicts with other rows.
with Ranges as (
select Id, A_Id, CreatedDate,
DateAdd( minute, -30, S.CreatedDate ) as RangeStart, DateAdd( minute, 30, S.CreatedDate ) as RangeEnd
from #Samples as S )
select Id, A_Id, CreatedDate, RangeStart, RangeEnd,
-- Check for a conflict with another row with:
-- the same A_Id value and an earlier CreatedDate that falls inside the +/-30 minute range.
case when exists ( select 42 from #Samples where A_Id = R.A_Id and CreatedDate < R.CreatedDate and R.RangeStart < CreatedDate and CreatedDate < R.RangeEnd ) then 1
else 0 end as Conflict
from Ranges as R;

Drop rows identified within moving time window

I have a dataset of hospitalisations ('spells') - 1 row per spell. I want to drop any spells recorded within a week after another (there could be multiple) - the rationale being is that they're likely symptomatic of the same underlying cause. Here is some play data:
create table hif_user.rzb_recurse_src (
patid integer not null,
eventdate integer not null,
type smallint not null
);
insert into hif_user.rzb_recurse_src values (1,1,1);
insert into hif_user.rzb_recurse_src values (1,3,2);
insert into hif_user.rzb_recurse_src values (1,5,2);
insert into hif_user.rzb_recurse_src values (1,9,2);
insert into hif_user.rzb_recurse_src values (1,14,2);
insert into hif_user.rzb_recurse_src values (2,1,1);
insert into hif_user.rzb_recurse_src values (2,5,1);
insert into hif_user.rzb_recurse_src values (2,19,2);
Only spells of type 2 - within a week after any other - are to be dropped. Type 1 spells are to remain.
For patient 1, dates 1 & 9 should be kept. For patient 2, all rows should remain.
The issue is with patient 1. Spell date 9 is identified for dropping as it is close to spell date 5; however, as spell date 5 is close to spell date 1 is should be dropped therefore allowing spell date 9 to live...
So, it seems a recursive problem. However, I've not used recursive programming in SQL before and I'm struggling to really picture how to do it. Can anyone help? I should add that I'm using Teradata which has more restrictions than most with recursive SQL (only UNION ALL sets allowed I believe).
It's a cursor logic, check one row after the other if it fits your rules, so recursion is the easiest (maybe the only) way to solve your problem.
To get a decent performance you need a Volatile Table to facilitate this row-by-row processing:
CREATE VOLATILE TABLE vt (patid, eventdate, exac_type, rn, startdate) AS
(
SELECT r.*
,ROW_NUMBER() -- needed to facilitate the join
OVER (PARTITION BY patid ORDER BY eventdate) AS rn
FROM hif_user.rzb_recurse_src AS r
) WITH DATA ON COMMIT PRESERVE ROWS;
WITH RECURSIVE cte (patid, eventdate, exac_type, rn, startdate) AS
(
SELECT vt.*
,eventdate AS startdate
FROM vt
WHERE rn = 1 -- start with the first row
UNION ALL
SELECT vt.*
-- check if type = 1 or more than 7 days from the last eventdate
,CASE WHEN vt.eventdate > cte.startdate + 7
OR vt.exac_type = 1
THEN vt.eventdate -- new start date
ELSE cte.startdate -- keep old date
END
FROM vt JOIN cte
ON vt.patid = cte.patid
AND vt.rn = cte.rn + 1 -- proceed to next row
)
SELECT *
FROM cte
WHERE eventdate - startdate = 0 -- only new start days
order by patid, eventdate
I think the key to solving this is getting the first date more than 7 days from the current date and then doing a recursive subquery:
with rrs as (
select rrs.*,
(select min(rrs2.eventdate)
from hif_user.rzb_recurse_src rrs2
where rrs2.patid = rrs.patid and
rrs2.eventdate > rrs.eventdate + 7
) as eventdate7
from hif_user.rzb_recurse_src rrs
),
recursive cte as (
select patid, min(eventdate) as eventdate, min(eventdate7) as eventdate7
from hif_user.rzb_recurse_src rrs
group by patid
union all
select cte.patid, cte.eventdate7, rrs.eventdate7
from cte join
hif_user.rzb_recurse_src rrs
on rrs.patid = cte.patid and
rrs.eventdate = cte.eventdate7
)
select cte.patid, cte.eventdate
from cte;
If you want additional columns, then join in the original table at the last step.