SQL select case group by - sql

I have a table 'LIST_USERS'.
Table Description -
USER_ID NUMBER(8)
LOGIN_ID VARCHAR2(8)
CREATE_DATE TIMESTAMP(6)
LOGIN_DATE TIMESTAMP(6)
Table data -
USER_ID LOGIN_ID CREATE_DATE LOGIN_DATE
---------------------------------------------------
101 test1 04/24/2016 null
102 test1 04/24/2016 04/29/2016
103 test2 04/25/2016 null
104 test2 04/26/2016 null
105 test3 04/27/2016 04/28/2016
106 test3 04/27/2016 04/29/2016
107 test4 04/28/2016 04/29/2016
987 test5 04/29/2016 null
109 test5 04/29/2016 null
108 test5 04/29/2016 04/29/2016
Condition - I need to fetch USER_ID, and LOGIN_ID from 'LIST_USERS' table based of max LOGIN_DATE. If LOGIN_DATE is null, I need to get the record based on max CREATE_DATE.
I need to get the below result -
USER_ID LOGIN_ID
---------------------
102 test1
104 test2
106 test3
107 test4
108 test5
I am using the below query. But it will give me only LOGIN_ID, and 'Login_Or_Create_Date' but I need USER_ID, and LOGIN_ID. Is there way I can get USER_ID as well as in the result shown above?
select LOGIN_ID,
(case when max(LOGIN_DATE) is null then max(CREATE_DATE)
else max(LOGIN_DATE) end) as Login_Or_Create_Date
from LIST_USERS;

Try this:
SELECT USER_ID, LOGIN_ID
FROM (
SELECT USER_ID, LOGIN_ID,
ROW_NUMBER() OVER (PARTITION BY LOGIN_ID
ORDER BY COALESCE(LOGIN_DATE, CREATE_DATE) DESC) AS rn
FROM LIST_USERS) t
WHERE t.rn = 1

Sounds like a job for keep dense_rank:
select min(user_id) keep (dense_rank last order by coalesce(login_date, create_date))
as user_id,
login_id
from list_users
group by login_id
order by user_id;
The last keeps the record with the latest login/create date; the coalesce() takes the login date first and falls back to the create date if that is null (or you could use nvl() instead of course). You could also do first and order by desc - the result is the same (if there are no nulls anyway, and it looks like there shouldn't be), but last feels more intuitive when you want the latest date I think.
Demo using your data in a CTE:
with list_users(user_id, login_id, create_date, login_date) as (
select 101, 'test1', date '2016-04-24', null from dual
union all select 102, 'test1', date '2016-04-24', date '2016-04-29' from dual
union all select 103, 'test2', date '2016-04-25', null from dual
union all select 104, 'test2', date '2016-04-26', null from dual
union all select 105, 'test3', date '2016-04-27', date '2016-04-28' from dual
union all select 106, 'test3', date '2016-04-27', date '2016-04-29' from dual
union all select 107, 'test4', date '2016-04-28', date '2016-04-29' from dual
)
select min(user_id) keep (dense_rank last order by coalesce(login_date, create_date))
as user_id,
login_id
from list_users
group by login_id
order by user_id;
USER_ID LOGIN
---------- -----
102 test1
104 test2
106 test3
107 test4
And with your modified data:
with list_users(user_id, login_id, create_date, login_date) as (
select 101, 'test1', date '2016-04-24', null from dual
union all select 102, 'test1', date '2016-04-24', date '2016-04-29' from dual
union all select 103, 'test2', date '2016-04-25', null from dual
union all select 104, 'test2', date '2016-04-26', null from dual
union all select 105, 'test3', date '2016-04-27', date '2016-04-28' from dual
union all select 106, 'test3', date '2016-04-27', date '2016-04-29' from dual
union all select 107, 'test4', date '2016-04-28', date '2016-04-29' from dual
union all select 987, 'test5', date '2016-04-29', null from dual
union all select 109, 'test5', date '2016-04-29', null from dual
union all select 108, 'test5', date '2016-04-29', date '2016-04-29' from dual
)
select min(user_id) keep (dense_rank last order by coalesce(login_date, create_date))
as user_id,
login_id
from list_users
group by login_id
order by user_id;
USER_ID LOGIN
---------- -----
102 test1
104 test2
106 test3
107 test4
108 test5

Related

Group historical data

I'm stuck with the following problem and need help:
An object has properties that are calculated every day.
They are stored in a key-value historical table.
Property is mistakenly stored even if it was not changed.
I need a query that will group this data set by "actual values":
If a value was not changed during several days it is output as one row.
If value A was changed to B then back to A, then A, B, A should be output by the query (first A and second A are different date intervals).
Here is a dataset example.
with obj_val_hist as
(
select 123 obj_id, 'k_1' key, 'A' value_, to_date('01.01.2021', 'DD.MM.YYYY') start_dt, to_date('01.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('02.01.2021', 'DD.MM.YYYY') start_dt, to_date('02.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('03.01.2021', 'DD.MM.YYYY') start_dt, to_date('03.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'B' value_, to_date('04.01.2021', 'DD.MM.YYYY') start_dt, to_date('04.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'B' value_, to_date('05.01.2021', 'DD.MM.YYYY') start_dt, to_date('05.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'B' value_, to_date('06.01.2021', 'DD.MM.YYYY') start_dt, to_date('06.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('07.01.2021', 'DD.MM.YYYY') start_dt, to_date('07.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('08.01.2021', 'DD.MM.YYYY') start_dt, to_date('08.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('09.01.2021', 'DD.MM.YYYY') start_dt, to_date('09.01.2021', 'DD.MM.YYYY') end_dt from dual
)
select * from obj_val_hist where obj_id = 123;
Data set:
obj_id
key
value
start_date
end_date
123
k_1
A
01.01.2021
01.01.2021
123
k_1
A
02.01.2021
02.01.2021
123
k_1
A
03.01.2021
03.01.2021
123
k_1
B
04.01.2021
04.01.2021
123
k_1
B
05.01.2021
05.01.2021
123
k_1
B
06.01.2021
06.01.2021
123
k_1
A
07.01.2021
07.01.2021
123
k_1
A
08.01.2021
08.01.2021
123
k_1
A
09.01.2021
09.01.2021
Expected result:
obj_id
key
value
start_date
end_date
123
k_1
A
01.01.2021
03.01.2021
123
k_1
B
04.01.2021
06.01.2021
123
k_1
A
07.01.2021
09.01.2021
This table contains values for million objects.
It is queried by obj_id and has an index on it.
Performance is a key point so using stored functions is most probably not an option.
This query will be a small part of a big view that is used by an external system.
I expected that there should be an analytic function suited for such a problem.
Something like dense_rank but with the possibility to order by one column (start_dt) but increase value when another column (value_) gets a different value.
But I didn't find one.
You may use match_recognize for this, which can also handle gaps in dates and is quite efficient and natural to read:
create table t (
obj_id
, key_
, value_
, start_date
, end_date
)
as
select 123, 'k_1', 'A', to_date('01.01.2021', 'dd.mm.yyyy'), to_date('01.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('02.01.2021', 'dd.mm.yyyy'), to_date('02.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('03.01.2021', 'dd.mm.yyyy'), to_date('03.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'B', to_date('04.01.2021', 'dd.mm.yyyy'), to_date('04.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'B', to_date('05.01.2021', 'dd.mm.yyyy'), to_date('05.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'B', to_date('06.01.2021', 'dd.mm.yyyy'), to_date('06.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('07.01.2021', 'dd.mm.yyyy'), to_date('07.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('08.01.2021', 'dd.mm.yyyy'), to_date('08.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('09.01.2021', 'dd.mm.yyyy'), to_date('09.01.2021', 'dd.mm.yyyy') from dual union all
/*Let's skip 10.01*/
select 123, 'k_1', 'A', to_date('11.01.2021', 'dd.mm.yyyy'), to_date('11.01.2021', 'dd.mm.yyyy') from dual union all
/*And extent validity period for some record*/
select 123, 'k_1', 'A', to_date('12.01.2021', 'dd.mm.yyyy'), to_date('13.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('14.01.2021', 'dd.mm.yyyy'), to_date('14.01.2021', 'dd.mm.yyyy') from dual
select *
from t
match_recognize (
/*For each ID and KEY*/
partition by obj_id, key_
order by start_date asc
/*Output attributes*/
measures
/*start_date of the first row in match group*/
final first(start_date) as min_start_date,
/*end_date of the last row in match group*/
final last(end_date) as max_end_date,
/*value itself as it is constant for match group*/
value_ as val
/*First row and any consequtive matches*/
pattern (init A*)
define
/*Consequtive are the rows which have the same value in value_ field
and start_date of the next row is not farther than
1 day from end_date of the previous row
*/
A as prev(value_) = value_
and prev(end_date) + 1 = start_date
)
OBJ_ID | KEY_ | MIN_START_DATE | MAX_END_DATE | VAL
-----: | :--- | :------------- | :----------- | :--
123 | k_1 | 01-JAN-21 | 03-JAN-21 | A
123 | k_1 | 04-JAN-21 | 06-JAN-21 | B
123 | k_1 | 07-JAN-21 | 09-JAN-21 | A
123 | k_1 | 11-JAN-21 | 14-JAN-21 | A
db<>fiddle here
If you indeed have data every day, then you can use the following relatively simple logic. The subquery calculate when the value changes. The outer query then calculates the end date by looking at the date in the next row:
select obj_id, key, value_, start_dt,
coalesce(lead(start_dt) over (partition by obj_id, key order by start_dt) - interval '1' day, max_end_dt)
from (select ovh.*,
lag(value_) over (partition by obj_id, key order by start_dt) as prev_value_,
max(end_dt) over (partition by obj_id, key) as max_end_dt
from obj_val_hist ovh
where obj_id = 123
) ovh
where prev_value_ is null or prev_value_ <> value_;
However, your data suggests that you could have a much more complicated problem. You have two dates in the row, a start date and end date. These could, in theory, overlap or have gaps. You can handle that by assigning groups when a new key/value pair starts and then aggregating:
select obj_id, key, value_, min(start_dt), max(end_dt)
from (select ovh.*,
sum(case when prev_end_dt >= start_dt - interval '1' day then 0 else 1 end) over (partition by obj_id, key order by start_dt) as grp
from (select ovh.*,
max(end_dt) over (partition by obj_id, key, value_
order by start_dt
range between unbounded preceding and interval '1' day preceding
) as prev_end_dt
from obj_val_hist ovh
) ovh
) ovh
group by obj_id, key, value_, grp;
Here is a db<>fiddle.

How to get min and max from 2 tables in SQL

I am Trying to get start date from min ID (ID=1) and end date from max ID (ID=3) but i am not sure how i can retrieve. Following is my data -
Table1 and Table2 are source table. I am trying to get output like 3rd table.
My requirement is get start date from first record of ID and End Date from last record of ID, we can recognize first and and last record with the help of ID field. If ID is min means first record and ID is max then last record
Please help me!
Here's one option; presuming you use Oracle (regarding you use Oracle SQL Developer), the x inline view selects
start_date which belongs to name with the lowest ID column value for that name (i.e. first_value partition by name order by id)
end_date which belongs to name with the highest ID column value for that name (i.e. first_value partition by name order by id DESC)
SQL> with
2 -- sample data
3 t1 (pid, name) as
4 (select 123, 'xyz' from dual union all
5 select 234, 'pqr' from dual
6 ),
7 t2 (id, name, start_date, end_date) as
8 (select 1, 'xyz', date '2020-01-01', date '2020-07-20' from dual union all
9 select 2, 'xyz', date '2020-02-01', date '2020-05-30' from dual union all
10 select 3, 'xyz', date '2020-06-30', date '2020-07-30' from dual union all
11 --
12 select 1, 'pqr', date '2020-04-30', date '2020-09-30' from dual union all
13 select 2, 'pqr', date '2020-05-30', date '2020-09-30' from dual union all
14 select 3, 'pqr', date '2020-06-30', date '2020-07-01' from dual
15 )
16 select a.pid,
17 x.name,
18 max(x.start_date) start_date,
19 max(x.end_date) end_date
20 from t1 a join
21 (
22 -- start_date: always for the lowest T2.ID value row
23 -- end_date : always for the highest T2.ID value row
24 select b.name,
25 first_value(b.start_date) over (partition by b.name order by b.id ) start_date,
26 first_value(b.end_date) over (partition by b.name order by b.id desc) end_date
27 from t2 b
28 ) x
29 on a.name = x.name
30 group by a.pid,
31 x.name
32 order by a.pid;
PID NAME START_DATE END_DATE
---------- ---- ---------- ----------
123 xyz 01/01/2020 07/30/2020
234 pqr 04/30/2020 07/01/2020
SQL>

Oracle 18c - Complex sql

I have a table with following columns:
Emp_ID Number
Emp_flag Varchar2(1)
Date_1 Date
Date_2 Date
create_date Date
No PK on this table , there are many records with duplicates of Emp_id..
What I need to know, is when a new Date_1 is entered (so Null to a date, or from Date 1 to Date 2) on what date that happened.
I can’t just look at a single record to compare Date_1 with create_date because there are many times in the many records for a given Emp_ID when the Date_1 is simply “copied” to the new record. A Date_1 may have been originally entered on 02/15/2019 with a value of 02/01/2019. Now let’s say Date_2 gets added on 02/12/2020. So the table looks like this:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
123 Y Null Null 1/18/2018
123 Y 02/1/2019 Null 02/15/2019
123 Y 02/1/2019 02/12/2021 02/12/2020
I need a SQL query that would tell me that Emp_ID 123 had a Date_1 of 02/1/2019 entered on 02/15/2019 and NOT pick up any other record.
Expected output:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
123 Y 02/1/2019 Null 02/15/2019
Example 2 (notice date_1 is different):
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y Null Null 1/18/2018
456 Y 10/1/2019 Null 02/15/2019
456 Y 11/2/2019 02/12/2021 02/12/2020
Expected output:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y 10/1/2019 Null 02/15/2019
456 Y 11/2/2019 02/12/2021 02/12/2020
Example 3:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y Null Null 1/18/2018
456 Y 10/1/2019 Null 02/15/2019
456 Y 10/1/2019 Null 02/15/2019
456 Y 11/2/2019 02/12/2021 02/12/2020
Expected output:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y 10/1/2019 Null 02/15/2019
456 Y 11/2/2019 02/12/2021 02/12/2020
Example 4:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y 10/1/2019 Null 02/15/2019
456 Y 10/1/2019 Null 02/16/2019
Expected output: No records.
You can use the Lag function to check whether the previous value of date_1 existed or not.
SELECT x.emp_id,
x.date_1,
x.create_date AS first_date_with_date_1
FROM (
SELECT t.emp_id,
t.create_date,
t.date_1,
LAG(t.date_1) OVER (PARTITION BY t.emp_id ORDER BY t.create_date) AS last_date_1
FROM your_table t
) x
WHERE x.date_1 IS NOT NULL
AND x.last_date_1 IS NULL
Test for all cases:
with t(emp_id, emp_flag, date_1, date_2, create_date) as (
select 101, 'Y', null, null, date '2018-01-18' from dual union all
select 101, 'Y', date '2019-02-01', null, date '2019-02-15' from dual union all
select 101, 'Y', date '2019-02-01', date '2021-02-12', date '2019-02-16' from dual union all
select 102, 'Y', null, null, date '2018-01-18' from dual union all
select 102, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 102, 'Y', date '2019-02-11', date '2021-02-12', date '2019-02-16' from dual union all
select 103, 'Y', null, null, date '2018-01-18' from dual union all
select 103, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 103, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 103, 'Y', date '2019-02-11', date '2021-02-21', date '2020-12-02' from dual )
select emp_id, emp_flag, date_1, date_2, create_date
from (
select emp_ID, emp_flag, date_1, date_2, create_date,
lag(date_1) over (partition by emp_id order by create_date) prev_dt1
from t )
where date_1 <> nvl(prev_dt1, date_1 - 1);
Result:
EMP_ID EMP_FLAG DATE_1 DATE_2 CREATE_DATE
---------- -------- ----------- ----------- -----------
101 Y 2019-02-01 2019-02-15
102 Y 2019-02-10 2019-02-15
102 Y 2019-02-11 2021-02-12 2019-02-16
103 Y 2019-02-10 2019-02-15
103 Y 2019-02-11 2021-02-21 2020-12-02
Edit:
when there are more than one records with no change in Date_1. It
should not return a record for that Emp_id
In this case date_1 is set in first row (id 104). If you want hide rows in such case use:
with t(emp_id, emp_flag, date_1, date_2, create_date) as (
select 104, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 104, 'Y', date '2019-02-10', null, date '2019-02-16' from dual union all
select 105, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 105, 'Y', null, null, date '2019-02-16' from dual )
select emp_id, emp_flag, date_1, date_2, create_date
from (
select emp_ID, emp_flag, date_1, date_2, create_date,
lag(date_1) over (partition by emp_id order by create_date) prev_dt1,
row_number() over (partition by emp_id order by create_date) rn
from t )
where (date_1 is not null and prev_dt1 is null and rn > 1)
or date_1 <> prev_dt1
or date_1 is null and prev_dt1 is not null;
I also added case when previous date was set and now it is null (id 105). If it is not possible or you don't want it then remove last row.
You can use the lag function instead of lead here:
with tableA as
(
select 456 as Emp_ID,'Y' as Emp_flag,CAST(NUll as date) as Date_1,CAST(NULL as date) as Date_2,CAST('18Jan2018' as date) as Create_date from dual union
select 456,'Y',CAST('01Oct2019' as date),Null,CAST('15Feb2019' as date) from dual union
select 456,'Y',CAST('02Nov2019' as date),CAST('12Feb2021' as date),CAST('12Feb2020' as date) from dual)
select x.Emp_ID,x.Emp_flag,x.Date_1,x.Date_2,x.Create_date
from
(select a.*
,lag(a.date_1) Over (partition by a.Emp_ID order by a.create_date) as lag_date
from tableA a) x
where x.date_1 is not null and x.date_1<>COALESCE(x.lag_date,CAST('01Jan2100' as date))
This will give out the values only when there is a change in date_1. Since NULL comparisons won't work, I have replace them with 1/1/2100. Hope this helps.
Edit:
I checked for a sample like you mentioned and it does seem to be working. If it's not working, kindly share the expected and the result you are getting:
with tableA as
(
select 456 as Emp_ID,'Y' as Emp_flag,CAST(NUll as date) as Date_1,CAST(NULL as date) as Date_2,CAST('18Jan2018' as date) as Create_date from dual union
select 456,'Y',CAST('01Oct2019' as date),Null,CAST('15Feb2019' as date) from dual union
select 456,'Y',CAST('01Oct2019' as date),CAST('12Feb2021' as date),CAST('12Feb2020' as date) from dual)
select x.Emp_ID,x.Emp_flag,x.Date_1,x.Date_2,x.Create_date
from
(select a.*
,lag(a.date_1) Over (partition by a.Emp_ID order by a.create_date) as lag_date
from tableA a) x
where x.date_1 is not null and x.date_1<>COALESCE(x.lag_date,CAST('01Jan2100' as date))

Month counts between dates

I have the below table. I need to count how many ids were active in a given month. So thinking I'll need to create a row for each id that was active during that month so that id can be counted each month. A row should be generated for a term_dt during that month.
active_dt term_dt id
1/1/2018 101
1/1/2018 5/15/2018 102
3/1/2018 6/1/2018 103
1/1/2018 4/25/18 104
Apparently this is a "count number of overlapping intervals" problem. The algorithm goes like this:
Create a sorted list of all start and end points
Calculate a running sum over this list, add one when you encounter a start and subtract one when you encounter an end
If two points are same then perform subtractions first
You will end up with list of all points where the sum changed
Here is a rough outline of the query. It is for SQL Server but could be ported to any RDBMS that supports window functions:
WITH cte1(date, val) AS (
SELECT active_dt, 1 FROM #t AS t
UNION ALL
SELECT COALESCE(term_dt, '2099-01-01'), -1 FROM #t AS t
-- if end date is null then assume the row is valid indefinitely
), cte2 AS (
SELECT date, SUM(val) OVER(ORDER BY date, val) AS rs
FROM cte1
)
SELECT YEAR(date) AS YY, MONTH(date) AS MM, MAX(rs) AS MaxActiveThisYearMonth
FROM cte2
GROUP BY YEAR(date), MONTH(date)
DB Fiddle
I was toying with a simpler query, that seemed to do the trick, for Oracle:
with candidates (month_start) as (
select to_date ('2018-' || column_value || '-01','YYYY-MM-DD')
from
table
(sys.odcivarchar2list('01','02','03','04','05',
'06','07','08','09','10','11','12'))
), sample_data (active_dt, term_dt, id) as (
select to_date('01/01/2018', 'MM/DD/YYYY'), null, 101 from dual
union select to_date('01/01/2018', 'MM/DD/YYYY'),
to_date('05/15/2018', 'MM/DD/YYYY'), 102 from dual
union select to_date('03/01/2018', 'MM/DD/YYYY'),
to_date('06/01/2018', 'MM/DD/YYYY'), 103 from dual
union select to_date('01/01/2018', 'MM/DD/YYYY'),
to_date('04/25/2018', 'MM/DD/YYYY'), 104 from dual
)
select c.month_start, count(1)
from candidates c
join sample_data d
on c.month_start between d.active_dt and nvl(d.term_dt,current_date)
group by c.month_start
order by c.month_start
An alternative solution would be to use a hierarchical query, e.g.:
WITH your_table AS (SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, NULL term_dt, 101 ID FROM dual UNION ALL
SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, to_date('15/05/2018', 'dd/mm/yyyy') term_dt, 102 ID FROM dual UNION ALL
SELECT to_date('01/03/2018', 'dd/mm/yyyy') active_dt, to_date('01/06/2018', 'dd/mm/yyyy') term_dt, 103 ID FROM dual UNION ALL
SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, to_date('25/04/2018', 'dd/mm/yyyy') term_dt, 104 ID FROM dual)
SELECT active_month,
COUNT(*) num_active_ids
FROM (SELECT add_months(TRUNC(active_dt, 'mm'), -1 + LEVEL) active_month,
ID
FROM your_table
CONNECT BY PRIOR ID = ID
AND PRIOR sys_guid() IS NOT NULL
AND LEVEL <= FLOOR(months_between(coalesce(term_dt, SYSDATE), active_dt)) + 1)
GROUP BY active_month
ORDER BY active_month;
ACTIVE_MONTH NUM_ACTIVE_IDS
------------ --------------
01/01/2018 3
01/02/2018 3
01/03/2018 4
01/04/2018 4
01/05/2018 3
01/06/2018 2
01/07/2018 1
01/08/2018 1
01/09/2018 1
01/10/2018 1
Whether this is more or less performant than the other answers is up to you to test.

Finding missing dates in a sequence

I have following table with ID and DATE
ID DATE
123 7/1/2015
123 6/1/2015
123 5/1/2015
123 4/1/2015
123 9/1/2014
123 8/1/2014
123 7/1/2014
123 6/1/2014
456 11/1/2014
456 10/1/2014
456 9/1/2014
456 8/1/2014
456 5/1/2014
456 4/1/2014
456 3/1/2014
789 9/1/2014
789 8/1/2014
789 7/1/2014
789 6/1/2014
789 5/1/2014
789 4/1/2014
789 3/1/2014
In this table, I have three customer ids, 123, 456, 789 and date column which shows which month they worked.
I want to find out which of the customers have gap in their work.
Our customers work record is kept per month...so, dates are monthly..
and each customer have different start and end dates.
Expected results:
ID First_Absent_date
123 10/01/2014
456 06/01/2014
To get a simple list of the IDs with gaps, with no further details, you need to look at each ID separately, and as #mikey suggested you can count the number of months and look at the first and last date to see if how many months that spans.
If your table has a column called month (since date isn't allowed unless it's a quoted identifier) you could start with:
select id, count(month), min(month), max(month),
months_between(max(month), min(month)) + 1 as diff
from your_table
group by id
order by id;
ID COUNT(MONTH) MIN(MONTH) MAX(MONTH) DIFF
---------- ------------ ---------- ---------- ----------
123 8 01-JUN-14 01-JUL-15 14
456 7 01-MAR-14 01-NOV-14 9
789 7 01-MAR-14 01-SEP-14 7
Then compare the count with the month span, in a having clause:
select id
from your_table
group by id
having count(month) != months_between(max(month), min(month)) + 1
order by id;
ID
----------
123
456
If you can actually have multiple records in a month for an ID, and/or the date recorded might not be the start of the month, you can do a bit more work to normalise the dates:
select id,
count(distinct trunc(month, 'MM')),
min(trunc(month, 'MM')),
max(trunc(month, 'MM')),
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1 as diff
from your_table
group by id
order by id;
select id
from your_table
group by id
having count(distinct trunc(month, 'MM')) !=
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1
order by id;
Oracle Setup:
CREATE TABLE your_table ( ID, "DATE" ) AS
SELECT 123, DATE '2015-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-06-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-05-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-04-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-11-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-10-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-03-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-03-01' FROM DUAL;
Query:
SELECT ID,
MIN( missing_date )
FROM (
SELECT ID,
CASE WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
= ADD_MONTHS( "DATE", 1 ) THEN NULL
WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
IS NULL THEN NULL
ELSE ADD_MONTHS( "DATE", 1 )
END AS missing_date
FROM your_table
)
GROUP BY ID
HAVING COUNT( missing_date ) > 0;
Output:
ID MIN(MISSING_DATE)
---------- -------------------
123 2014-10-01 00:00:00
456 2014-06-01 00:00:00
You could use a Lag() function to see if records have been skipped for a particular date or not.Lag() basically helps in comparing the data in current row with previous row. So if we order by DATE, we could easily compare and find any gaps.
select * from
(
select ID,DATE_, case when DATE_DIFF>1 then 1 else 0 end comparison from
(
select ID, DATE_ ,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;
This groups all the entries by id, and then arranges the records by date. If a customer is always present, there would not be a gap in his date. So anyone who has a date difference greater than 1 had a gap. You could tweak this as per your requirement.
EDIT : Just observed that you are storing data in mm/dd/yyyy format, when I closely observed above answers.You are storing only first date of every month. So, the above query can be tweaked as :
select * from
(
select ID,DATE_,PREV_DATE,last_day(PREV_DATE)+1 ABSENT_DATE, case when DATE_DIFF>31 then 1 else 0 end comparison from
(
select ID, DATE_ ,LAG(DATE_,1) OVER (PARTITION BY ID ORDER BY DATE_) PREV_DATE,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;