Counting records given criterias - sql

I give up! I've been trying to make this work for some time now but I can't get the logic/and or code right.
What I'm trying to do is to count ongoing cases for every date in a defined period. I have a table looking basically like this:
CaseId OpenDate CloseDate
1 01JAN2014 05JAN2014
2 02JAN2014 04JAN2014
3 02JAN2014 .
4 03JAN2014 04JAN2014
5 06JAN2014 08JAN2014
6 07JAN2014 .
I created a data set iterating dates from today-30 to today as comparative dates (CompDate).
Definition of ongoing case is (CompDate <= OpenDate and CompDate < CloseDate) or CloseDate = .
My plan was to join the dates with the table and get something like this
CompDate OngoingCases
01JAN2014 1
02JAN2014 3
03JAN2014 4
04JAN2014 2
05JAN2014 1
06JAN2014 2
07JAN2014 3
So far I've came up with this code which gives me something else..
proc sql;
create table Ongoing as
select distinct
a.CompDate,
count(distinct case when (a.CompDate <= datepart(b.caseopendate) and (a.CompDate < datepart(b.caseclosedate) or b.caseclosedate = . )) then b.caseid end) as Cases
from List_of_dates as a
left outer join dcms_cases as b
on a.Date
where
.
.
group by a.Date
;
quit;

I think all you need is to join two tables applying your condition with dates in ON-statement. The following SQL will do the trick:
proc sql;
create table Ongoing as
select a.CompDate, count(b.CaseId) as OngoingCases
from List_of_dates a
left join dcms_cases b
on b.OpenDate<=a.CompDate and (b.CloseDate>a.CompDate or b.CloseDate=.)
group by a.CompDate
;
quit;

I made a few assumptions:
1) You meant to say the comp was greater than open date but less than close date or closed date is missing
2) You want the final result set to show how many ongoing cases were there based off the open date rather than the comp date since the comp date would be one static value... based on your description.
I have code below that used your sample and named it 'AA1'. It will give you the result set below.
Data AA1;
set AA1;
Format CompDate date9. Ongoing $3.;
CompDate = today() - 100; /*Change this to whatever your criteria is for the comparison*/
If ((CompDate >= opendate) and (CompDate < closedate)) or (closedate = .) then Ongoing = 'Y'; else Ongoing = 'N';
run;
/*Sort table in order to do a count of the ongoing cases*/
proc sort data = AA1;
by caseID CompDate Ongoing;
run;
/*Count how many ongoing cases exist based on the Open date values */
data AA1(rename= (count=OngoingCases));
set AA1;
count + 1;
by opendate;
if first.opendate then count = 1;
run;
/*Clean up to keep the variables you want in your result set */
data final;
set new(keep=opendate ongoingCases);
run;

One way is to iterate over your date range for every record in your dataset, and output records which satisfy your criteria...
%LET START = today()-30 ;
%LET END = today() ;
data want ;
set have ;
do CompDate = &START to &END ;
OngoingCase = (OpenDate <= CompDate < CloseDate)
or (OpenDate <= CompDate and missing(CloseDate)) ;
if OngoingCase then output ;
end ;
format CompDate date9. ;
run ;
proc summary data=want nway ;
class CompDate ;
var OngoingCase ;
output out=case_sum (drop=_:) sum= ;
run ;

I modified the current query slightly and moved the CASE condition to the WHERE clause.
proc sql;
create table Ongoing as
select distinct
a.CompDate,
count(distinct b.caseid ) as Cases
from List_of_dates as a
left outer join dcms_cases as b
on a.Date
where (a.CompDate <= datepart(b.caseopendate) and a.CompDate < datepart(b.caseclosedate))
or b.caseclosedate = .
group by a.Date
;
quit;

Related

SQL Rowwise comparison between groups

Question
The following is a snippet of my data:
Create Table Emps(person VARCHAR(50), started DATE, stopped DATE);
Insert Into Emps Values
('p1','2015-10-10','2016-10-10'),
('p1','2016-10-11','2017-10-11'),
('p1','2017-10-12','2018-10-13'),
('p2','2019-11-13','2019-11-13'),
('p2','2019-11-14','2020-10-14'),
('p3','2020-07-15','2021-08-15'),
('p3','2021-08-16','2022-08-16');
db<>fiddle.
I want to use T-SQL to get a count of how many persons fulfil the following criteria at least once - multiples should also count as one:
For a person:
One of the dates in 'started' (say s1) is larger than at least one of the dates in 'ended' (say e1)
s1 and e1 are in the same year, to be set manually - e.g. '2021-01-01' until '2022-01-01'
Example expected response
If I put the date range '2016-01-01' until '2017-01-01' somewhere in a WHERE / HAVING clause, the output should be 1 as only p1 has both a start date and an end date that fall in 2016 where the start date is larger than the end date:
s1 = '2016-10-11', and e1 = '2016-10-10'.
Why can't I do this myself
The reason I'm stuck is that I don't know how to do this rowwise comparison between groups. The question requires comparing values across columns (start with end) across rows, within a person ID.
Use conditional aggregation to get the maximum start date and the minimum stop date in the given range.
select person
from emps
group by person
having max(case when started >= '2016-01-01' and started < '2017-01-01'
then started end) >
min(case when stopped >= '2016-01-01' and stopped < '2017-01-01'
then stopped end);
Demo: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=45adb153fcac9ce72708f1283cac7833
I would choose to use a self-outer-join with an exists correlation, it should be pretty much the most performant, all things being equal.
select Count(*)
from emps e
where exists (
select * from emps e2
where e2.person = e.person
and e2.stopped > e.started
and e.started between '20160101' and '20170101'
and e2.started between '20160101' and '20170101'
);
You said you plan to set the dates manually, so this works where we set the start date in one CTE, and the end date in another CTE. Then we calculate the min/max for each, and use that criteria in the query where statement.
with min_max_start as (
select person,
min(started) as min_start, --obsolete
max(started) as max_start
from emps
where started >= '2016-01-01'
group by person
),
min_max_end as (
select person,
min(stopped) as min_stop,
max(stopped) as max_stop --obsolete
from emps
where stopped < '2017-01-01'
group by person
)
select count(distinct e.person)
from emps e
join min_max_start mms
on e.person = mms.person
join min_max_end mme
on e.person = mme.person
where mms.max_start> mme.min_stop
Output: 1
Try the following:
With CTE as
(
Select D.person, D.started, T.stopped,
case
when Year(D.started) = Year(T.stopped) and D.started > T.stopped
then 1
else 0
end as chk
From
(Select person, started From Emps Where started >= '2016-01-01') D
Join
(Select person, stopped From Emps Where stopped <= '2017-01-01') T
On D.person = T.person
)
Select Count(Distinct person) as CNT
From CTE
Where chk = 1;
To get the employee list who met the criteria use the following on the CTE instead of the above Select Count... query:
Select person, started, stopped
From CTE
Where chk = 1;
See a demo from db<>fiddle.

SQL repetitive code in where clause, how to insert the whole where into variable

I write a lot of queries with the same WHERE clause. I wish i could create a variable to insert each time for a query.
My query:
select distinct order_external_status
from analytics.dwh_orders_details dod
**where dod.merchant_id = 7797
and order_type = 'pre_live'
and order_date >= '2019-09-10' and order_date <= '2019-09-24';**
Next query with the same WHERE:
select dod.order_id,
oc.*
from analytics.dwh_orders_details dod
left join analytics.dwh_oc_all_details oc
on dod.order_id = oc.order_id
**where dod.merchant_id = 7797
and order_type = 'pre_live'
and order_date >= '2019-09-10' and order_date <= '2019-09-24';**
Can have 10 to 15 queries like that in a day. It will be nice if i could put where clause in a variable and just write it once. For now we use Redshift, we will move to Snowflake soon, if it matters.
DBs not allow to create views or temp tables...
You can create a view and use the view in your queries:
create view v_myview as
select dod.*
from analytics.dwh_orders_details dod
where dod.merchant_id = 7797 and
dod.order_type = 'pre_live' and
dod.order_date >= '2019-09-10' and
dod.order_date <= '2019-09-24';

correct count of grouped results

I have a procedure:
ALTER PROCEDURE [dbo].[GetActualFeedbackQueueTree]
#dtNow datetime
as
BEGIN
select
count(f.Id) as [Total],
f.AccountCode,
f.AccountName,
f.Utc,
f2.CityCode,
f2.CityName
from
InnerPortal.Feedback.QueueFeedback f
left join
InnerPortal.Feedback.QueueFeedback f2
on
f2.AccountCode = f.AccountCode
where
(f.Done is null or f.Done = 0) and
(f.Busy is NULL or f.Busy = 0) and
((DATEPART(hour, DATEADD(HOUR, f.Utc, #dtNow)) >= 9 ) and
(DATEPART(hour, DATEADD(HOUR, f.Utc, #dtNow)) <= 20))
group by
f.AccountCode, f2.CityCode,
f2.CityName, f.AccountName, f.Utc
END
I group rows by AccountName and by CityName. As result we have something like a tree. The problem is the [Total] not calculates correctly.
Then I get a select for a special AccountCode the count if much less then get me as result the procedure. For example:
select count(f.Id) from Feedback.QueueFeedback f where f.AccountCode = '01507'
returns 16 rows but the procedure result is 256.
The target is to get a count of collected rows with the same account. How to make it work correctly?
Thanks.
Software: T-Sql, Ms Sql server 2012
Pretty sure you want
count(distinct(f.Id))

Better way to calculate utilisation

I have a rather complicated (and very inefficient) way of getting utilisation from a large list of periods (Code below).
Currently I'm running this for a period of 8 weeks and it's taking between 30 and 40 seconds to return data.
I need to run this regularly for periods of 6 months, 1 year and two years which will obviously take a massive amount of time.
Is there a smarter way to run this query to lower the number of table scans?
I have tried several ways of joining the data, all seem to return junk data.
I've tried to comment the code as much as I can but if anything is unclear let me know.
Table Sizes:
[Stock] ~12,000 records
[Contitems] ~90,000 records
Pseudocode for clarity:
For each week between Start and End:
Get list of unique items active between dates (~12,000 rows)
For each unique item
Loop through ContItems table (~90,000 rows)
Return matches
Group
Group
Return results
The Code
DECLARE #WEEKSTART DATETIME; -- Used to pass start of period to search
DECLARE #WEEKEND DATETIME; -- Used to pass end of period to search
DECLARE #DC DATETIME; -- Used to increment dates
DECLARE #INT INT; -- days to increment for each iteration (7 = weeks)
DECLARE #TBL TABLE(DT DATETIME, SG VARCHAR(20), SN VARCHAR(50), TT INT, US INT); -- Return table
SET #WEEKSTART = '2012-05-01'; -- Set start of period
SET #WEEKEND = '2012-06-25'; -- Set end of period
SET #DC = #WEEKSTART; -- Start counter at first date
SET #INT = 7; -- Set increment to weeks
WHILE (#DC < #WEEKEND) -- Loop through dates every [#INT] days (weeks)
BEGIN
SET #DC = DATEADD(D,#INT,#DC); -- Add 7 days to the counter
INSERT INTO #TBL (DT, SG, SN, TT, US) -- Insert results from subquery into return table
SELECT #DC, SUB.GRPCODE, SubGrp.NAME, SUM(SUB.TOTSTK), SUM(USED)
FROM
(
SELECT STK.GRPCODE, 1 AS TOTSTK, CASE (SELECT COUNT(*)
FROM ContItems -- Contains list of hires with a start and end date
WHERE STK.ITEMNO = ContItems.ITEMNO -- unique item reference
AND ContItems.DELDATE <= DATEADD(MS,-2,DATEADD(D,#INT,#DC)) -- Hires starting before end of week searching
AND (ContItems.DOCDATE#5 >= #DC -- Hires ending after start of week searching
OR ContItems.DOCDATE#5 = '1899-12-30 00:00:00.000')) -- Or hire is still active
WHEN 0 THEN 0 -- None found return zero
WHEN NULL THEN 0 -- NULL return zero
ELSE 1 END AS USED -- Otherwise return 1
FROM Stock STK - List of unique items
WHERE [UNIQUE] = 1 AND [TYPE] != 4 -- Business rules
AND DATEPURCH < #DC AND (DATESOLD = '1899-12-30 00:00:00.000' OR DATESOLD > DATEADD(MS,-2,DATEADD(D,#INT,#DC))) -- Stock is valid between selected week
) SUB
INNER JOIN SubGrp -- Used to get 'pretty' names
ON SUB.GRPCODE = SubGrp.CODE
GROUP BY SUB.GRPCODE, SubGrp.NAME
END
-- Next section gets data from temp table
SELECT SG, SN, SUM(TT) AS TOT, SUM(US) AS USED, CAST(SUM(US) AS FLOAT) / CAST(SUM(TT) AS FLOAT) AS UTIL
FROM #TBL
GROUP BY SG, SN
ORDER BY TOT DESC
I have two suggestions.
First, rewrite the query to move the "select" statement from the case statement to the from clause:
SELECT #DC, SUB.GRPCODE, SubGrp.NAME, SUM(SUB.TOTSTK), SUM(USED)
FROM (SELECT STK.GRPCODE, 1 AS TOTSTK,
(CASE MAX(Contgrp.cnt) -- Or hire is still active
WHEN 0 THEN 0 -- None found return zero
WHEN NULL THEN 0 -- NULL return zero
ELSE 1
END) AS USED -- Otherwise return 1
FROM Stock STK left outer join -- List of unique items
(SELECT itemno, COUNT(*) as cnt
FROM ContItems -- Contains list of hires with a start and end date
WHERE ContItems.DELDATE <= DATEADD(MS,-2,DATEADD(D,#INT,#DC)) AND -- Hires starting before end of week searching
(ContItems.DOCDATE#5 >= #DC OR -- Hires ending after start of week searching
ContItems.DOCDATE#5 = '1899-12-30 00:00:00.000'
)
group by ITEMNO
) ContGrp
on STK.ITEMNO = ContItems.ITEMNO
WHERE [UNIQUE] = 1 AND [TYPE] != 4 AND -- Business rules
DATEPURCH < #DC AND (DATESOLD = '1899-12-30 00:00:00.000' OR DATESOLD > DATEADD(MS,-2,DATEADD(D,#INT,#DC))) -- Stock is valid between selected week
) SUB INNER JOIN SubGrp -- Used to get 'pretty' names
ON SUB.GRPCODE = SubGrp.CODE
GROUP BY SUB.GRPCODE, SubGrp.NAME
In doing this, I found a something suspicious. The case statement is operating at the level of "ItemNo", but the grouping is by "GrpCode". So, the "Count(*)" is really returning the sum at the group level. Is this what you intend?
The second is to dispense with the WHILE loop, if you have multiple weeks. To do this, you just need to convert DatePurch to an appropriate week. However, if the code usually runs on just one or two weeks, this effort may not help very much.
Well, replacing the DATEADD functions in the WHERE clauses at first.
You already have
SET #DC = DATEADD(D,#INT,#DC);
Why not declare another local variable for deletion date:
WHILE (#DC < #WEEKEND) -- Loop through dates every [#INT] days (weeks)
BEGIN
SET #DC = DATEADD(D,#INT,#DC);
DECLARE #DeletionDate DATETIME = DATEADD(MS,-2,DATEADD(D,#INT,#DC));
And use it in the case statement:
CASE (SELECT COUNT(*) .... AND ContItems.DELDATE <= #DeletionDate ....
And also in the outer where clause...
Then you need to make sure that you have correctly indexed your tables.

Comparing Two Sets of Date Ranges in SQL

I have two sets of data with different date ranges.
Tbl 1:
ID, Date_Start, Date_End
1, 2010-01-01, 2010-01-09
1, 2010-01-10, 2010-01-19
1, 2010-01-30, 2010-01-31
Tbl 2:
ID, Date_Start, Date_End
1, 2010-01-01, 2010-01-04
1, 2010-01-08, 2010-01-17
1, 2010-01-30, 2010-01-31
I'd like to find cases date ranges do not entirely overlap date ranges in Tbl 2. So for instance, in this example, I'd like output that looks something like this --
Output:
ID, Gap_Start, Gap_End
1, 2010-01-05, 2010-01-07
1, 2010-01-18, 2010-01-19
Date ranges will never overlap within a table. To do this, I'm using either DB2 SQL or SAS. Unfortunately, the datasets are big enough (millions of records) that I can't just brute force it.
Thank you!
Following on from Jon of All Trades' approach, this is a more completed solution. The crucial features are:
Use an auxiliary calendar table, which is just a list of all dates.
From the calendar table, JOIN to Tbl1 to get a list of dates which are in range.
Also do an anti-JOIN to Tbl2 to get only the dates which aren't in Tbl2's ranges.
I've enclosed those results in a Common Table Expression (CTE) called OutDates.
Define another CTE based on OutDates to get just the dates which start a gap; call this EarliestDates.
Define another CTE based on OutDates to get just the dates which end a gap; call this LatestDates.
JOIN EarliestDates and LatestDates to put each gap into a single row.
WITH
OutDates(ID, dt) AS
( SELECT Tbl1.ID, Calendar.dt FROM Calendar
INNER JOIN Tbl1 ON Calendar.dt BETWEEN Tbl1.Date_Start AND Tbl1.Date_End
LEFT OUTER JOIN Tbl2 ON Calendar.dt BETWEEN Tbl2.Date_Start AND Tbl2.Date_End
WHERE Tbl2.ID IS NULL
)
,
EarliestDates AS
( SELECT earliest.ID, earliest.dt FROM OutDates earliest
LEFT OUTER JOIN OutDates nonesuch_earlier ON DateAdd(day, -1, earliest.dt) = nonesuch_earlier.dt
WHERE nonesuch_earlier.ID IS NULL
)
,
LatestDates AS
( SELECT latest.ID, latest.dt FROM OutDates latest
LEFT OUTER JOIN OutDates nonesuch_later ON DATEADD(day, 1, latest.dt) = nonesuch_later.dt
WHERE nonesuch_later.ID IS NULL
)
SELECT rangestart.ID, rangestart.dt AS Gap_Start, rangeend.dt AS Gap_End
FROM EarliestDates rangestart JOIN LatestDates rangeend
ON rangestart.dt <= rangeend.dt
LEFT OUTER JOIN EarliestDates nonesuch_inner1
ON nonesuch_inner1.dt <= rangeend.dt AND nonesuch_inner1.dt > rangestart.dt
LEFT OUTER JOIN LatestDates nonesuch_inner2
ON nonesuch_inner2.dt >= rangestart.dt AND nonesuch_inner2.dt < rangeend.dt
WHERE nonesuch_inner1.dt IS NULL AND nonesuch_inner2.dt IS NULL
This is a working implementation using Sql Server syntax for the common table expressions, but it should be easy to convert to DB2 syntax. I don't know how well it well scale to be honest, I've only tested it with a very small dataset.
I don't think there is the efficient and general solution for all the cases. Under certain circumstances, however, we can figure out some efficient ones. For instance, below assumes that: (1) datasets one and two have the same set of ids in the same order; and (2) there are relatively short possible date ranges (assumed here to be all the dates in the year of 2010 only). Notice that one input range may generate two gaps.
/* test data */
data one;
input id1 (start1 finish1) (:anydtdte.);
format start1 finish1 e8601da.;
cards;
1 2010-01-01 2010-01-09
1 2010-01-10 2010-01-19
1 2010-01-30 2010-01-31
2 2010-01-02 2010-01-10
;
run;
data two;
input id2 (start2 finish2) (:anydtdte.);
format start2 finish2 e8601da.;
cards;
1 2010-01-01 2010-01-04
1 2010-01-08 2010-01-17
1 2010-01-30 2010-01-31
2 2010-01-05 2010-01-06
;
run;
/* assumptions:
(1) datasets one and two have the same set of ids in the same
sorted order;
(2) only possible dates are in the year of 2010
*/
%let minDate = %sysevalf('01jan2010'd - 1);
%let maxDate = %sysevalf('31dec2010'd + 1);
data gaps;
array inRange[&minDate:&maxDate] _temporary_;
array covered[&minDate:&maxDate] _temporary_;
do i = &minDate to &maxDate; inRange[i] = 0; covered[i] = 0; end;
do until (last.id1);
set one;
by id1;
do i = start1 to finish1; inRange[i] = 1; end;
end;
do until (last.id2);
set two;
by id2;
do i = start2 to finish2; covered[i] = 1; end;
end;
format startGap finishGap e8601da.;
startGap = .;
finishGap = .;
do i = &minDate+1 to &maxDate;
if inRange[i] and not covered[i] and missing(startGap) then startGap = i;
if (covered[i] or not inRange[i]) and not missing(startGap) and not covered[i-1] then do;
finishGap = i - 1;
output;
call missing(startGap, finishGap);
keep id1 startGap finishGap;
end;
end;
run;
/* check */
proc print data=gaps noobs;
run;
/* on lst
id1 startGap finishGap
1 2010-01-05 2010-01-07
1 2010-01-18 2010-01-19
2 2010-01-02 2010-01-04
2 2010-01-07 2010-01-10
*/
This is not a complete solution, as it returns a list of dates rather than ranges, but maybe it will be of use:
SELECT
R1.ID, D.Date
FROM
#Ranges1 AS R1
INNER JOIN Dates AS D ON D.Date BETWEEN R1.StartDate AND R1.EndDate
EXCEPT
SELECT
R2.ID, D.Date
FROM
#Ranges2 AS R2
INNER JOIN Dates AS D ON D.Date BETWEEN R2.StartDate AND R2.EndDate
Note that this solution requires a dates table: a table with one record per day, for all the dates you're likely to use. It has the advantages of being succinct, and handling overlapping date ranges (not necessary in your case, but maybe for the next guy).
For what it's worth, this is the method I ended up using. I think you could do it in pure SQL, but it got horrifically ugly and difficult to debug.
Step 1 -- I consolidated the date ranges in both datasets. This means that something like
ID, Start_Date, End_Date
1, 2010-01-01, 2010-01-31
1, 2010-02-01, 2010-02-28
got transformed into this --
ID, Start_Date, End_Date
1, 2010-01-01, 2010-02-28.
The query I used to produce this was --
WITH Cte_recomb (Id, Start_date, End_date, Hopcount) AS
(SELECT Id,
Start_date,
End_date,
1 AS Hopcount
FROM Table1
UNION ALL
SELECT Cte_recomb.Id,
Cte_recomb.Start_date,
Table1.End_date,
(Recomb.Hopcount + 1) AS Hopcount
FROM Cte_recomb, Table1
WHERE (Cte_recomb.Id = Table1.Id) AND
(Cte_recomb.End_date + 1 day = Table1.Start_date)),
Cte_maxenddate AS
(SELECT Id,
Start_date,
Max (End_date) AS End_date
FROM Cte_recomb
GROUP BY Id, Start_date
ORDER BY Id, Start_date)
SELECT Maxend.*
FROM Cte_maxenddate AS Maxend
LEFT JOIN
Cte_recomb AS Nextrec
ON (Nextrec.Id = Maxend.Id) AND
(Nextrec.Start_date < Maxend.Start_date) AND
(Nextrec.End_date >= Maxend.End_date)
WHERE Nextrec.Id IS NULL;
Step 2 --
I produced another dataset that created a record for every overlap between the two datasets. You'll need an additional step to find cases where a given record in Table1 doesn't have a corresponding record in Table2 at all.
SELECT Table1.Id,
Table1.Start_date AS Table1_start_date,
Table1.End_date AS Table1_end_date,
Table2.Start_date AS Table2_start_date,
Table2.End_date AS Table2_end_date
FROM Table1
INNER JOIN
Table2
ON (Table1.Plcy_id_sk = Id) AND
( (Table1.Start_date BETWEEN Table2.Start_date AND Table2.End_date) OR
(Table2.Start_date BETWEEN Table1.Start_date AND Table1.End_date)) AND
( (Table1.Start_date <> Table2.Start_date) OR
(Table1.End_date <> Table2.End_date))
ORDER BY Table1.Id, Table1.Start_date, Table2.Start_date;
Step 3 --
I take the above dataset, and run the following SAS job. I tried to do this in pure SQL with recursive queries, but it got uglier and uglier every time I looked at it.
Data Table1_Gaps;
Set Table1_Compare;
By ID Table1_Start_Date Table2_Start_Date;
format Gap_Start_Date yymmdd10.;
format Gap_End_Date yymmdd10.;
format Old_Start_Date yymmdd10.;
format Old_End_Date yymmdd10.;
Retain Old_Start_Date Old_End_Date;
IF (Table2_End_Date = .) then do;
Gap_Start_Date = Table1_Start_Date;
Gap_End_Date = Table1_End_Date;
output;
end;
else do;
If (Table2_Start_Date > Table1_Start_Date) then do;
if first.Table1_Start_Date then do;
Gap_Start_Date = Table1_Start_Date;
Gap_End_Date = Table2_Start_Date - 1;
output;
end;
else do;
Gap_Start_Date = Old_End_Date + 1;
Gap_End_Date = Table2_Start_Date - 1;
output;
end;
end;
If (Table2_End_Date < Table1_End_Date) then do;
if Last.Table1_Start_Date then do;
Gap_Start_Date = Table2_End_Date + 1;
Gap_End_Date = Table1_End_Date;
output;
end;
end;
end;
Old_Start_Date = Table2_Start_Date;
Old_End_Date = Table2_End_Date;
drop Old_Start_Date Old_End_Date;
run;
I haven't verified it entirely yet, but this approach does seem to have given me the results I wanted. Any thoughts?