I have a table named x . The data is as follows.
Acccount_num start_dt end_dt
A111326 02/01/2016 02/11/2016
A111326 02/12/2016 03/05/2016
A111326 03/02/2016 03/16/2016
A111331 02/28/2016 02/29/2016
A111331 02/29/2016 03/29/2016
A999999 08/25/2015 08/25/2015
A999999 12/19/2015 12/22/2015
A222222 11/06/2015 11/10/2015
A222222 05/16/2016 05/17/2016
Both A111326 and A111331 should be identified as contiguous data and A999999 and
A222222 should be identified as discontinuous data.In my code I currently use the following query to identify discontinuous data. The A111326 is also erroneously identified as discontinuous data. Please help to modify the below code so that A111326 is not identified as discontinuous data.Thanks in advance for your help.
(SELECT account_num
FROM (SELECT account_num,
(MAX (
END_DT)
OVER (PARTITION BY account_num
ORDER BY START_DT))
START_DT,
(LEAD (
START_DT)
OVER (PARTITION BY account_num
ORDER BY START_DT))
END_DT
FROM x
WHERE (START_DT + 1) <=
(END_DT - 1))
WHERE START_DT < END_DT);
Oracle Setup:
CREATE TABLE accounts ( Account_num, start_dt, end_dt ) AS
SELECT 'A', DATE '2016-02-01', DATE '2016-02-11' FROM DUAL UNION ALL
SELECT 'A', DATE '2016-02-12', DATE '2016-03-05' FROM DUAL UNION ALL
SELECT 'A', DATE '2016-03-02', DATE '2016-03-16' FROM DUAL UNION ALL
SELECT 'B', DATE '2016-02-28', DATE '2016-02-29' FROM DUAL UNION ALL
SELECT 'B', DATE '2016-02-29', DATE '2016-03-29' FROM DUAL UNION ALL
SELECT 'C', DATE '2015-08-25', DATE '2015-08-25' FROM DUAL UNION ALL
SELECT 'C', DATE '2015-12-19', DATE '2015-12-22' FROM DUAL UNION ALL
SELECT 'D', DATE '2015-11-06', DATE '2015-11-10' FROM DUAL UNION ALL
SELECT 'D', DATE '2016-05-16', DATE '2016-05-17' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-01', DATE '2016-01-02' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-05', DATE '2016-01-06' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-03', DATE '2016-01-07' FROM DUAL;
Query:
WITH times ( account_num, dt, lvl ) AS (
SELECT Account_num, start_dt - 1, 1 FROM accounts
UNION ALL
SELECT Account_num, end_dt, -1 FROM accounts
)
, totals ( account_num, dt, total ) AS (
SELECT account_num,
dt,
SUM( lvl ) OVER ( PARTITION BY Account_num ORDER BY dt, lvl DESC )
FROM times
)
SELECT Account_num,
CASE WHEN COUNT( CASE total WHEN 0 THEN 1 END ) > 1
THEN 'N'
ELSE 'Y'
END AS is_contiguous
FROM totals
GROUP BY Account_Num
ORDER BY Account_Num;
Output:
ACCOUNT_NUM IS_CONTIGUOUS
----------- -------------
A Y
B Y
C N
D N
E Y
Alternative Query:
(It's exactly the same method just using UNPIVOT rather than UNION ALL.)
SELECT Account_num,
CASE WHEN COUNT( CASE total WHEN 0 THEN 1 END ) > 1
THEN 'N'
ELSE 'Y'
END AS is_contiguous
FROM (
SELECT Account_num,
SUM( lvl ) OVER ( PARTITION BY Account_Num
ORDER BY CASE lvl WHEN 1 THEN dt - 1 ELSE dt END,
lvl DESC
) AS total
FROM accounts
UNPIVOT ( dt FOR lvl IN ( start_dt AS 1, end_dt AS -1 ) )
)
GROUP BY Account_Num
ORDER BY Account_Num;
WITH cte AS (
SELECT
AccountNumber
,CASE
WHEN
LAG(End_Dt) OVER (PARTITION BY AccountNumber ORDER BY End_Dt) IS NULL THEN 0
WHEN
LAG(End_Dt) OVER (PARTITION BY AccountNumber ORDER BY End_Dt) >= Start_Dt - 1 THEN 0
ELSE 1
END as discontiguous
FROM
#Table
)
SELECT
AccountNumber
,CASE WHEN SUM(discontiguous) > 0 THEN 'discontiguous' ELSE 'contiguous' END
FROM
cte
GROUP BY
AccountNumber;
One of your problems is that your contiguous desired result also includes overlapping date ranges in your example data set. Example A111326 Starts on 3/2/2016 but ends the row before on 3/5/2015 meaning it overlaps by 3 days.
Related
I need SELECT for finding data with overlapping date in Oracle SQL just from today to exactly one year ago. ID_FORMULAR is not UNIQUE value and I need to include just data with overlapping date where ID_FORMULAR is UNIQUE.
My code:
SELECT T1.*
FROM VISITORS T1, VISITORS T2
WHERE ( T1.ID_FORMULAR != T2.ID_FORMULAR
AND t1.FROM_DATE >= t2.FROM_DATE
AND t1.FROM_DATE <= t2.TO_DATE
AND T1.CREATED_DATE >= ADD_MONTHS (TRUNC (CURRENT_DATE), -12)
AND T1.CREATED_DATE < TRUNC (CURRENT_DATE) + 1)
OR ( T1.ID_FORMULAR != T2.ID_FORMULAR
AND t1.TO_DATE >= t2.FROM_DATE
AND t1.TO_DATE <= t2.TO_DATE
AND T1.CREATED_DATE >= ADD_MONTHS (TRUNC (CURRENT_DATE), -12)
AND T1.CREATED_DATE < TRUNC (CURRENT_DATE) + 1)
OR ( T1.ID_FORMULAR != T2.ID_FORMULAR
AND t1.TO_DATE >= t2.TO_DATE
AND t1.FROM_DATE <= t2.FROM_DATE
AND T1.CREATED_DATE >= ADD_MONTHS (TRUNC (CURRENT_DATE), -12)
AND T1.CREATED_DATE < TRUNC (CURRENT_DATE) + 1)
It is not working correctly. Any help?
From Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row processing:
SELECT *
FROM (
SELECT *
FROM visitors
WHERE created_date >= ADD_MONTHS(TRUNC(CURRENT_DATE), -12)
AND created_date < TRUNC(CURRENT_DATE) + 1
)
MATCH_RECOGNIZE(
ORDER BY from_date
ALL ROWS PER MATCH
PATTERN (any_row overlap+)
DEFINE
overlap AS PREV(id_formular) != id_formular
AND PREV(to_date) >= from_date
)
Which, for the sample data:
CREATE TABLE visitors (id_formular, created_date, from_date, to_date) AS
SELECT 1, DATE '2022-08-01', DATE '2022-08-01', DATE '2022-08-03' FROM DUAL UNION ALL
SELECT 2, DATE '2022-08-01', DATE '2022-08-02', DATE '2022-08-04' FROM DUAL UNION ALL
SELECT 3, DATE '2022-08-01', DATE '2022-08-03', DATE '2022-08-05' FROM DUAL UNION ALL
SELECT 1, DATE '2022-08-01', DATE '2022-08-06', DATE '2022-08-06' FROM DUAL UNION ALL
SELECT 2, DATE '2022-08-01', DATE '2022-08-07', DATE '2022-08-09' FROM DUAL UNION ALL
SELECT 2, DATE '2022-08-01', DATE '2022-08-08', DATE '2022-08-10' FROM DUAL UNION ALL
SELECT 1, DATE '2022-08-01', DATE '2022-08-09', DATE '2022-08-11' FROM DUAL;
Outputs:
FROM_DATE
ID_FORMULAR
CREATED_DATE
TO_DATE
01-AUG-22
1
01-AUG-22
03-AUG-22
02-AUG-22
2
01-AUG-22
04-AUG-22
03-AUG-22
3
01-AUG-22
05-AUG-22
08-AUG-22
2
01-AUG-22
10-AUG-22
09-AUG-22
1
01-AUG-22
11-AUG-22
db<>fiddle here
I don't quite understand the question. The thing that is confusing me is that you need just rows where ID is unique. If ID is unique than there is no other row to overlap with. Anyway, lets suppose that the sample data is like below:
WITH
tbl AS
(
SELECT 0 "ID", DATE '2021-07-01' "CREATED", DATE '2021-07-01' "DATE_FROM", DATE '2021-07-13' "DATE_TO" FROM DUAL UNION ALL
SELECT 1, DATE '2021-12-01', DATE '2021-12-01', DATE '2021-12-03' FROM DUAL UNION ALL
SELECT 1, DATE '2021-12-04', DATE '2021-12-04', DATE '2021-12-14' FROM DUAL UNION ALL
SELECT 1, DATE '2021-12-12', DATE '2021-12-12', DATE '2021-12-29' FROM DUAL UNION ALL
SELECT 2, DATE '2022-08-04', DATE '2022-08-04', DATE '2022-08-10' FROM DUAL UNION ALL
SELECT 2, DATE '2022-08-11', DATE '2022-08-11', DATE '2022-08-21' FROM DUAL UNION ALL
SELECT 2, DATE '2022-08-21', DATE '2022-08-21', DATE '2022-08-29' FROM DUAL UNION ALL
SELECT 3, DATE '2022-08-11', DATE '2022-08-11', DATE '2022-08-29' FROM DUAL UNION ALL
SELECT 4, DATE '2022-08-14', DATE '2022-08-14', DATE '2022-08-14' FROM DUAL UNION ALL
SELECT 4, DATE '2022-08-29', DATE '2022-08-14', DATE '2022-08-29' FROM DUAL
)
We can add some columns that will tell us if the ID is unique or not, what is the order of appearance of the same ID, what is the end date of the previous row for the same ID and if the rows of a particular ID overlaps or not. Here is the code: (used analytic functions with windowing clause)
SELECT
ID "ID",
CASE WHEN Count(*) OVER (PARTITION BY ID ORDER BY ID) = 1 THEN 'Y' ELSE 'N' END "IS_UNIQUE",
Count(ID) OVER (PARTITION BY ID ORDER BY ID, DATE_FROM, DATE_TO ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) "ID_ORDER_NO",
CREATED "CREATED",
DATE_FROM "DATE_FROM",
DATE_TO "DATE_TO",
CASE
WHEN Count(ID) OVER (PARTITION BY ID ORDER BY ID, DATE_FROM, DATE_TO ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) = 1
THEN Null
ELSE
First_Value(DATE_TO) OVER (PARTITION BY ID ORDER BY ID, DATE_FROM, DATE_TO ROWS BETWEEN 1 PRECEDING AND CURRENT ROW )
END "PREVIOUS_END_DATE",
CASE
WHEN Count(ID) OVER (PARTITION BY ID ORDER BY ID, DATE_FROM, DATE_TO ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) = 1
THEN 'N'
ELSE
CASE
WHEN DATE_FROM <= First_Value(DATE_TO) OVER (PARTITION BY ID ORDER BY ID, DATE_FROM, DATE_TO ROWS BETWEEN 1 PRECEDING AND CURRENT ROW )
THEN 'Y'
ELSE 'N'
END
END "OVERLAPS"
FROM
TBL
WHERE
CREATED BETWEEN ADD_MONTHS(TRUNC(SYSDATE, 'dd'), -12) And TRUNC(SYSDATE, 'dd')
Here is the resulting dataset...
/* R e s u l t
ID IS_UNIQUE ID_ORDER_NO CREATED DATE_FROM DATE_TO PREVIOUS_END_DATE OVERLAPS
---------- --------- ----------- --------- --------- --------- ----------------- --------
1 N 1 01-DEC-21 01-DEC-21 03-DEC-21 N
1 N 2 04-DEC-21 04-DEC-21 14-DEC-21 03-DEC-21 N
1 N 3 12-DEC-21 12-DEC-21 29-DEC-21 14-DEC-21 Y
2 N 1 04-AUG-22 04-AUG-22 10-AUG-22 N
2 N 2 11-AUG-22 11-AUG-22 21-AUG-22 10-AUG-22 N
2 N 3 21-AUG-22 21-AUG-22 29-AUG-22 21-AUG-22 Y
3 Y 1 11-AUG-22 11-AUG-22 29-AUG-22 N
4 N 1 14-AUG-22 14-AUG-22 14-AUG-22 N
4 N 2 29-AUG-22 14-AUG-22 29-AUG-22 14-AUG-22 Y
*/
This dataset could be further used to get you the rows and columns that you are trying to get. You can filter it, do some other calculations (like number of overlaping days), get number of rows per ID and so on....
Regards...
I'm trying to build a query for the following scenario,
Group records by license ID and get min and max dates
For a given license ID, if there are two earliest start dates, then start date of the particular ID has to be updated as latest start date in that grouping.
Since I'm new to sql, I need help to satisfy condition 2. Any help is greatly appreciated. Thanks
Actual data
LicenseID
StartDate
EndDate
100
4/3/2000
3/1/2013
100
4/3/2000
2/2/2017
100
3/1/2013
1/23/2015
100
1/23/2015
2/2/2017
100
2/2/2017
2/9/2018
100
2/2/2017
12/18/2018
100
12/18/2018
2/16/2021
Expected output
LicenseID
StartDate
EndDate
100
12/18/2018
2/16/2021
Here's one option; read comments within code.
Sample data:
SQL> with test (id, start_date, end_date) as
2 (select 100, date '2000-04-03', date '2013-03-01' from dual union all
3 select 100, date '2000-04-03', date '2017-02-02' from dual union all
4 select 100, date '2018-12-18', date '2021-02-16' from dual
5 ),
Query begins here:
6 -- rank start dates per each ID
7 temp as
8 (select id,
9 min(start_date) over (partition by id) min_sd,
10 max(start_date) over (partition by id) max_sd,
11 rank() over (partition by id order by start_date) rnk_sd,
12 --
13 max(end_date) over (partition by id) max_ed
14 from test
15 ),
16 -- count number of the 1st start dates
17 temp2 as
18 (select id,
19 sum(case when rnk_sd = 1 then 1 else 0 end) cnt_sd
20 from temp
21 group by id
22 )
23 -- if number of the 1st start dates is 1, take MIN_SD. Otherwise, take MAX_SD
24 select distinct
25 b.id,
26 case when b.cnt_sd = 1 then a.min_sd else a.max_sd end start_date,
27 a.max_ed end_date
28 from temp2 b join temp a on a.id = b.id;
Result:
ID START_DATE END_DATE
---------- ---------- ----------
100 12/18/2018 02/16/2021
SQL>
This can filter them:
WITH sample_data AS
(
SELECT 100 AS LicenseID, TO_DATE('04/03/2000','MM/DD/YYYY') AS StartDate, TO_DATE('03/01/2013','MM/DD/YYYY') AS EndDate FROM DUAL UNION ALL
SELECT 100, TO_DATE('04/03/2000','MM/DD/YYYY'), TO_DATE('02/02/2017','MM/DD/YYYY') FROM DUAL UNION ALL
SELECT 100, TO_DATE('03/01/2013','MM/DD/YYYY'), TO_DATE('01/23/2015','MM/DD/YYYY') FROM DUAL UNION ALL
SELECT 100, TO_DATE('01/23/2015','MM/DD/YYYY'), TO_DATE('02/02/2017','MM/DD/YYYY') FROM DUAL UNION ALL
SELECT 100, TO_DATE('02/02/2017','MM/DD/YYYY'), TO_DATE('02/09/2018','MM/DD/YYYY') FROM DUAL UNION ALL
SELECT 100, TO_DATE('02/02/2017','MM/DD/YYYY'), TO_DATE('12/18/2018','MM/DD/YYYY') FROM DUAL UNION ALL
SELECT 100, TO_DATE('12/18/2018','MM/DD/YYYY'), TO_DATE('02/16/2021','MM/DD/YYYY') FROM DUAL
)
SELECT dat.licenseID, CASE WHEN dups.licenseID IS NOT NULL THEN MAX(StartDate)
ELSE MIN(StartDate)
END,
CASE WHEN dups.licenseID IS NOT NULL THEN MAX(EndDate)
ELSE MIN(EndDate)
END
FROM sample_data dat
LEFT OUTER JOIN (SELECT COUNT(1), sd.LicenseID
FROM sample_data sd
INNER JOIN (SELECT MIN(StartDate) AS StartDate, LicenseID
FROM sample_data
GROUP BY LicenseID) mins
ON sd.LicenseID = mins.LicenseID AND sd.startDate = mins.StartDate
GROUP BY sd.LicenseID
HAVING COUNT(1) > 1) dups
ON dups.LicenseID = dat.licenseID
GROUP BY dat.licenseID, dups.licenseID;
You can use:
SELECT licenseid,
MAX(startdate) AS startdate,
MAX(enddate) KEEP (DENSE_RANK LAST ORDER BY startdate) AS enddate
FROM table_name
GROUP BY licenseid
HAVING COUNT(*) KEEP (DENSE_RANK FIRST ORDER BY startdate) > 1;
or:
SELECT licenseid,
max_startdate AS startdate,
max_enddate As enddate
FROM (
SELECT licenseid,
RANK()
OVER (PARTITION BY licenseid ORDER BY startdate) AS rnk,
ROW_NUMBER()
OVER (PARTITION BY licenseid, startdate ORDER BY enddate) AS rn,
MAX(startdate)
OVER (PARTITION BY licenseid) AS max_startdate,
MAX(enddate)
KEEP (DENSE_RANK LAST ORDER BY startdate)
OVER (PARTITION BY licenseid) AS max_enddate
FROM table_name t
)
WHERE rnk = 1
AND rn = 2;
Which, for the sample data:
CREATE TABLE table_name (licenseid, startdate, enddate) AS
SELECT 100, DATE'2000-04-03', DATE'2013-03-01' FROM DUAL UNION ALL
SELECT 100, DATE'2000-04-03', DATE'2017-02-02' FROM DUAL UNION ALL
SELECT 100, DATE'2013-03-01', DATE'2015-01-23' FROM DUAL UNION ALL
SELECT 100, DATE'2015-01-23', DATE'2017-02-02' FROM DUAL UNION ALL
SELECT 100, DATE'2017-02-02', DATE'2018-02-09' FROM DUAL UNION ALL
SELECT 100, DATE'2018-02-02', DATE'2018-12-18' FROM DUAL UNION ALL
SELECT 100, DATE'2018-12-18', DATE'2021-02-16' FROM DUAL;
Both output:
LICENSEID
STARTDATE
ENDDATE
100
2018-12-18 00:00:00
2021-02-16 00:00:00
If you do want to perform an UPDATE of that second row then:
MERGE INTO table_name dst
USING (
SELECT ROWID AS rid,
max_startdate,
max_enddate
FROM (
SELECT RANK()
OVER (PARTITION BY licenseid ORDER BY startdate) AS rnk,
ROW_NUMBER()
OVER (PARTITION BY licenseid, startdate ORDER BY enddate) AS rn,
MAX(startdate)
OVER (PARTITION BY licenseid) AS max_startdate,
MAX(enddate)
KEEP (DENSE_RANK LAST ORDER BY startdate)
OVER (PARTITION BY licenseid) AS max_enddate
FROM table_name t
)
WHERE rnk = 1
AND rn = 2
)src
ON (src.rid = dst.ROWID)
WHEN MATCHED THEN
UPDATE
SET startdate = src.max_startdate,
enddate = src.max_enddate;
db<>fiddle here
I have a table which has balance for accounts on a daily basis. Need to know how I can find all the accounts that have been negative for more than certain no.of days.
Sample data-
Accountid Date Balance
1000 01/01/2020 -1.00
1000 01/02/2020 -1.00
1000 01/03/2020 -1.00
1001 01/01/2020 -20.00
1001 01/02/2020 -20.00
1003 01/01/2020 15.00
1003 01/02/2020 16.00
I need to query all the accounts that have been negative for more than 2 days
You could query the days with negative balances, group by the account ID and then count how many rows you got in the having clause:
SELECT AccountID
FROM mytable
WHERE balance < 0
GROUP BY AccountID
HAVING COUNT(*) >= 2
If you want to consider only consecutive days then:
SELECT AccountId
FROM (
SELECT Accountid, DateTime, Balance,
SUM( has_changed_sign )
OVER ( PARTITION BY AccountId ORDER BY DateTime )
AS grp
FROM (
SELECT Accountid, DateTime, Balance,
CASE
WHEN SIGN( balance )
= LAG( SIGN( Balance ) )
OVER ( PARTITION BY AccountId ORDER BY DateTime )
THEN 0
ELSE 1
END AS has_changed_sign
FROM table_name t
)
WHERE Balance < 0
)
GROUP BY AccountID, grp
HAVING COUNT(*) > 2
So, for the test data:
CREATE TABLE table_name ( Accountid, DateTime, Balance ) AS
SELECT 1000, DATE '2020-01-01', -1.00 FROM DUAL UNION ALL -- 3 consecutive -ve days
SELECT 1000, DATE '2020-01-02', -1.00 FROM DUAL UNION ALL
SELECT 1000, DATE '2020-01-03', -1.00 FROM DUAL UNION ALL
SELECT 1000, DATE '2020-01-04', +1.00 FROM DUAL UNION ALL
SELECT 1001, DATE '2020-01-01', -20.00 FROM DUAL UNION ALL -- Only 2 negative
SELECT 1001, DATE '2020-01-02', -20.00 FROM DUAL UNION ALL
SELECT 1001, DATE '2020-01-03', +20.00 FROM DUAL UNION ALL
SELECT 1001, DATE '2020-01-04', +20.00 FROM DUAL UNION ALL
SELECT 1002, DATE '2020-01-01', -1.00 FROM DUAL UNION ALL -- 3 negative days but
SELECT 1002, DATE '2020-01-02', -1.00 FROM DUAL UNION ALL -- only 2 consecutive
SELECT 1002, DATE '2020-01-03', +1.00 FROM DUAL UNION ALL
SELECT 1002, DATE '2020-01-04', -1.00 FROM DUAL UNION ALL
SELECT 1003, DATE '2020-01-01', +15.00 FROM DUAL UNION ALL -- All positive
SELECT 1003, DATE '2020-01-02', +16.00 FROM DUAL UNION ALL
SELECT 1003, DATE '2020-01-03', +17.00 FROM DUAL UNION ALL
SELECT 1003, DATE '2020-01-04', +18.00 FROM DUAL;
This outputs:
| ACCOUNTID |
| --------: |
| 1000 |
If you only want more than 2 days then you could simply use LAG:
SELECT DISTINCT
AccountID
FROM (
SELECT AccountID,
balance,
LAG( balance, 1 ) OVER ( PARTITION BY AccountID ORDER BY DateTime )
AS balance_1_day_ago,
LAG( balance, 2 ) OVER ( PARTITION BY AccountID ORDER BY DateTime )
AS balance_2_days_ago
FROM table_name
)
WHERE balance < 0
AND balance_1_day_ago < 0
AND balance_2_days_ago < 0;
But that isn't going to scale well if you want to check over a larger period as the query is quickly going to become very large.
db<>fiddle here
Try this.
Select accountid, count(date) from table
Where balance < 0
Group by accountid
Having count(date) >2
Use a filter in a WHERE clause to get only the negative balances, then group by the account ID and in a HAVING clause check for the count of distinct days being greater than your limit of days.
SELECT accountid
FROM elbat
WHERE balance < 0
GROUP BY accountid
HAVING count(DISTINCT date) > 2;
If you want all columns then use partition by. The below solution would take count of date's order too
Select Accountid, Date, Balance,
row_number() over (Partition by
Accountid order by Date)
rn from table
Where balance<0 and rn>2 ;
I want to look at the lead type and if that type is the same for that row then merge in those dates to fit within one row.
I have the below table:
id start_dt end_dt type
1 1/1/19 2/21/19 cross
1 2/22/19 6/5/19 cross
1 6/6/19 8/31/19 cross
1 9/1/19 10/3/19 AAAA
1 10/4/19 10/4/19 cross
1 10/5/19 10/6/19 AAAA
1 10/7/19 10/10/19 AAAA
1 10/11/19 12/31/99 cross
Expected Results:
id start_dt end_dt type
1 1/1/19 8/31/19 cross
1 9/1/19 10/3/19 AAAA
1 10/4/19 10/4/19 cross
1 10/5/19 10/10/19 AAAA
1 10/11/19 12/31/99 cross
How can I get my output to look like the expected results?
I have tested withlead lag rank and case expression but nothing worthy of adding here. Am I on the right path?
This is a gaps-and-islands problem. One option for solving it through contribution of row_number() analytical function :
select min(start_dt) as startdate, max(end_dt) as enddate, type
from
(
with t(id, start_dt, end_dt,type) as
(
select 1, date'2019-01-01', date'2019-02-21', 'cross' from dual union all
select 1, date'2019-02-22', date'2019-06-05', 'cross' from dual union all
select 1, date'2019-06-06', date'2019-08-31', 'cross' from dual union all
select 1, date'2019-09-01', date'2019-10-03', 'AAAA' from dual union all
select 1, date'2019-09-04', date'2019-10-04', 'cross' from dual union all
select 1, date'2019-10-05', date'2019-10-06', 'AAAA' from dual union all
select 1, date'2019-10-07', date'2019-10-10', 'AAAA' from dual union all
select 1, date'2019-10-11', date'2019-12-31', 'cross' from dual
)
select type,
row_number() over (partition by id, type order by end_dt) as rn1,
row_number() over (partition by id order by end_dt) as rn2,
start_dt, end_dt
from t
) tt
group by type, rn1 - rn2
order by enddate;
STARTDATE ENDDATE TYPE
--------- --------- -----
01-JAN-19 31-AUG-19 cross
01-SEP-19 03-OCT-19 AAAA
04-SEP-19 04-OCT-19 cross
05-OCT-19 10-OCT-19 AAAA
11-OCT-19 31-DEC-19 cross
Demo
I actually think this is a pretty good case for Oracle's Pattern Matching Functionality.
with t(id, start_dt, end_dt,type) as
(
select 1, date'2019-01-01', date'2019-02-21', 'cross' from dual union all
select 1, date'2019-02-22', date'2019-06-05', 'cross' from dual union all
select 1, date'2019-06-06', date'2019-08-31', 'cross' from dual union all
select 1, date'2019-09-01', date'2019-10-03', 'AAAA' from dual union all
select 1, date'2019-09-04', date'2019-10-04', 'cross' from dual union all
select 1, date'2019-10-05', date'2019-10-06', 'AAAA' from dual union all
select 1, date'2019-10-07', date'2019-10-10', 'AAAA' from dual union all
select 1, date'2019-10-11', date'2019-12-31', 'cross' from dual
)
SELECT *
FROM t
MATCH_RECOGNIZE(ORDER BY start_dt
MEASURES a.id AS ID,
A.start_dt AS START_DT,
NVL(LAST(B.end_dt), A.end_dt) AS END_DT,
a.type AS TYPE
PATTERN (A B*)
DEFINE B AS start_dt > PREV(start_dt) AND type = PREV(type));
A detailed primer on the topic can be found here
If you want to look at adjacent rows to find groups that can combine, then I recommend lag() to find where groups start and a cumulative sum on that:
select id, type, min(start_dt), max(end_dt)
from (select t.*,
sum(case when prev_end_dt >= start_dt - 1 then 0 else 1 end) over (partition by id, type order by start_dt) as grp
from (select t.*,
lag(end_dt) over (partition by id, type order by start_dt) as prev_end_dt
from t
) t
) t
group by id, type, grp
order by id, min(start_dt);
In particular, this will find cases where the type does not change but there is a gap in the time frames, as shown by this db<>fiddle for id = 2.
I have a table with emplid and end_date columns. I want from all emplids the max end_dates. If at least one end_date is null, I want to have the null value as max. So in this example:
emplid end_date
1 05/04/2019
1 05/10/2019
1 null
2 05/04/2019
2 05/10/2019
I want as result:
emplid end_date
1 null
2 05/10/2019
I tried something like
select emplid,
CASE
WHEN MAX(NVL(end_Date,'01/01/3000'))='01/01/3000' THEN null
ELSE end_date
END as end_dt
from people
group by emplid
then I get a group-by error.
Maybe it is very easy, but I don't figure out how to get properly what I want.
with s(id, dt) as (
select 1, to_date('05/04/2019', 'dd/mm/yyyy') from dual union all
select 1, to_date('05/10/2019', 'dd/mm/yyyy') from dual union all
select 1, null from dual union all
select 2, to_date('05/04/2019', 'dd/mm/yyyy') from dual union all
select 2, to_date('05/10/2019', 'dd/mm/yyyy') from dual)
select id, decode(count(dt), count(*), max(dt)) max_dt
from s
group by id;
ID MAX_DT
---------- -----------------------------
1
2 2019-10-05 00:00:00
I would simply do:
select emplid,
(case when count(*) = count(end_date)
then max(end_date)
end) as max_end_date
from t
group by emplid;
There is no reason to introduce a "magic" maximum value (even if it is correct).
The first expression in the case is simply asking "do the number of non-NULL end-date values match the number of rows".
Try this
SELECT
EMPLID,
CASE WHEN END_DATE='01/01/3000' THEN NULL ELSE END_DATE END AS END_DT
FROM
(
SELECT EMPLID, MAX(END_DATE) AS END_DATE FROM
(
SELECT EMPLID, NVL(END_DATE,'01/01/3000') AS END_DATE FROM PEOPLE
)
GROUP BY EMPLID
);
Case does not go with group by , you have to get the max value using group by first then evaluate the null values. Try below.
select empid, CASE WHEN NVL(eDate,'01-DEC-3000')='01-DEC-3000' THEN null ELSE edate end end_dt from (
select empid, MAX(NVL(eDate,'01-DEC-3000')) eDate
from
(select 1 empid, sysdate-100 edate from dual union all
select 1 empid, sysdate-10 edate from dual union all
select 1 empid, null edate from dual union all
select 2 empid, sysdate-105 edate from dual union all
select 2 empid, sysdate-1 edate from dual ) datad
group by empid);