Oracle 18c - Complex sql - sql

I have a table with following columns:
Emp_ID Number
Emp_flag Varchar2(1)
Date_1 Date
Date_2 Date
create_date Date
No PK on this table , there are many records with duplicates of Emp_id..
What I need to know, is when a new Date_1 is entered (so Null to a date, or from Date 1 to Date 2) on what date that happened.
I can’t just look at a single record to compare Date_1 with create_date because there are many times in the many records for a given Emp_ID when the Date_1 is simply “copied” to the new record. A Date_1 may have been originally entered on 02/15/2019 with a value of 02/01/2019. Now let’s say Date_2 gets added on 02/12/2020. So the table looks like this:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
123 Y Null Null 1/18/2018
123 Y 02/1/2019 Null 02/15/2019
123 Y 02/1/2019 02/12/2021 02/12/2020
I need a SQL query that would tell me that Emp_ID 123 had a Date_1 of 02/1/2019 entered on 02/15/2019 and NOT pick up any other record.
Expected output:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
123 Y 02/1/2019 Null 02/15/2019
Example 2 (notice date_1 is different):
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y Null Null 1/18/2018
456 Y 10/1/2019 Null 02/15/2019
456 Y 11/2/2019 02/12/2021 02/12/2020
Expected output:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y 10/1/2019 Null 02/15/2019
456 Y 11/2/2019 02/12/2021 02/12/2020
Example 3:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y Null Null 1/18/2018
456 Y 10/1/2019 Null 02/15/2019
456 Y 10/1/2019 Null 02/15/2019
456 Y 11/2/2019 02/12/2021 02/12/2020
Expected output:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y 10/1/2019 Null 02/15/2019
456 Y 11/2/2019 02/12/2021 02/12/2020
Example 4:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y 10/1/2019 Null 02/15/2019
456 Y 10/1/2019 Null 02/16/2019
Expected output: No records.

You can use the Lag function to check whether the previous value of date_1 existed or not.
SELECT x.emp_id,
x.date_1,
x.create_date AS first_date_with_date_1
FROM (
SELECT t.emp_id,
t.create_date,
t.date_1,
LAG(t.date_1) OVER (PARTITION BY t.emp_id ORDER BY t.create_date) AS last_date_1
FROM your_table t
) x
WHERE x.date_1 IS NOT NULL
AND x.last_date_1 IS NULL

Test for all cases:
with t(emp_id, emp_flag, date_1, date_2, create_date) as (
select 101, 'Y', null, null, date '2018-01-18' from dual union all
select 101, 'Y', date '2019-02-01', null, date '2019-02-15' from dual union all
select 101, 'Y', date '2019-02-01', date '2021-02-12', date '2019-02-16' from dual union all
select 102, 'Y', null, null, date '2018-01-18' from dual union all
select 102, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 102, 'Y', date '2019-02-11', date '2021-02-12', date '2019-02-16' from dual union all
select 103, 'Y', null, null, date '2018-01-18' from dual union all
select 103, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 103, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 103, 'Y', date '2019-02-11', date '2021-02-21', date '2020-12-02' from dual )
select emp_id, emp_flag, date_1, date_2, create_date
from (
select emp_ID, emp_flag, date_1, date_2, create_date,
lag(date_1) over (partition by emp_id order by create_date) prev_dt1
from t )
where date_1 <> nvl(prev_dt1, date_1 - 1);
Result:
EMP_ID EMP_FLAG DATE_1 DATE_2 CREATE_DATE
---------- -------- ----------- ----------- -----------
101 Y 2019-02-01 2019-02-15
102 Y 2019-02-10 2019-02-15
102 Y 2019-02-11 2021-02-12 2019-02-16
103 Y 2019-02-10 2019-02-15
103 Y 2019-02-11 2021-02-21 2020-12-02
Edit:
when there are more than one records with no change in Date_1. It
should not return a record for that Emp_id
In this case date_1 is set in first row (id 104). If you want hide rows in such case use:
with t(emp_id, emp_flag, date_1, date_2, create_date) as (
select 104, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 104, 'Y', date '2019-02-10', null, date '2019-02-16' from dual union all
select 105, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 105, 'Y', null, null, date '2019-02-16' from dual )
select emp_id, emp_flag, date_1, date_2, create_date
from (
select emp_ID, emp_flag, date_1, date_2, create_date,
lag(date_1) over (partition by emp_id order by create_date) prev_dt1,
row_number() over (partition by emp_id order by create_date) rn
from t )
where (date_1 is not null and prev_dt1 is null and rn > 1)
or date_1 <> prev_dt1
or date_1 is null and prev_dt1 is not null;
I also added case when previous date was set and now it is null (id 105). If it is not possible or you don't want it then remove last row.

You can use the lag function instead of lead here:
with tableA as
(
select 456 as Emp_ID,'Y' as Emp_flag,CAST(NUll as date) as Date_1,CAST(NULL as date) as Date_2,CAST('18Jan2018' as date) as Create_date from dual union
select 456,'Y',CAST('01Oct2019' as date),Null,CAST('15Feb2019' as date) from dual union
select 456,'Y',CAST('02Nov2019' as date),CAST('12Feb2021' as date),CAST('12Feb2020' as date) from dual)
select x.Emp_ID,x.Emp_flag,x.Date_1,x.Date_2,x.Create_date
from
(select a.*
,lag(a.date_1) Over (partition by a.Emp_ID order by a.create_date) as lag_date
from tableA a) x
where x.date_1 is not null and x.date_1<>COALESCE(x.lag_date,CAST('01Jan2100' as date))
This will give out the values only when there is a change in date_1. Since NULL comparisons won't work, I have replace them with 1/1/2100. Hope this helps.
Edit:
I checked for a sample like you mentioned and it does seem to be working. If it's not working, kindly share the expected and the result you are getting:
with tableA as
(
select 456 as Emp_ID,'Y' as Emp_flag,CAST(NUll as date) as Date_1,CAST(NULL as date) as Date_2,CAST('18Jan2018' as date) as Create_date from dual union
select 456,'Y',CAST('01Oct2019' as date),Null,CAST('15Feb2019' as date) from dual union
select 456,'Y',CAST('01Oct2019' as date),CAST('12Feb2021' as date),CAST('12Feb2020' as date) from dual)
select x.Emp_ID,x.Emp_flag,x.Date_1,x.Date_2,x.Create_date
from
(select a.*
,lag(a.date_1) Over (partition by a.Emp_ID order by a.create_date) as lag_date
from tableA a) x
where x.date_1 is not null and x.date_1<>COALESCE(x.lag_date,CAST('01Jan2100' as date))

Related

Fetch record with max number in one column except if date in that column is > than today

I have a problem with fetching few exceptions from DB.
Example, table b:
sn
v_num
start_date
end_date
1
001
01-01-2019
31-12-2099
1
002
01-01-2021
31-01-2022
1
003
01-02-2022
31-12-2099
2
001
01-01-2022
31-12-2099
2
002
01-07-2022
31-07-2022
2
003
01-08-2022
31-12-2099
Expected output:
sn
v_num
start_date
end_date
1
003
01-02-2022
31-12-2099
2
001
01-01-2022
31-12-2099
Currently I'm here:
SELECT * FROM table a, table b
WHERE a.sn = b.sn
AND b.v_num = (SELECT max (v_num) FROM b WHERE a.sn = b.sn)
but obviously that is not good because of a few cases like this with sn = 2.
Conclusion, I need to get unique sn record where v_num is max (95% of them in DB) except in case if start_date of max v_num record is > today.
Filter using start_date <= TRUNC(SYSDATE) then use the ROW_NUMBER analytic function:
SELECT *
FROM (
SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY sn ORDER BY v_num DESC) AS rn
FROM "TABLE" a
WHERE start_date <= TRUNC(SYSDATE)
)
WHERE rn = 1;
If the start_date has a time component then you can use start_date < TRUNC(SYSDATE) + INTERVAL '1' DAY to get all the values for today from 00:00:00 to 23:59:59.
If you can have ties for the maximum and want to return all the ties then you can use the RANK analytic function instead of ROW_NUMBER.
Which, for the sample data:
CREATE TABLE "TABLE" (sn, v_num, start_date, end_date) AS
SELECT 1, '001', DATE '2022-01-01', DATE '2099-12-31' FROM DUAL UNION ALL
SELECT 1, '002', DATE '2022-01-01', DATE '2022-01-31' FROM DUAL UNION ALL
SELECT 1, '003', DATE '2022-02-01', DATE '2099-12-31' FROM DUAL UNION ALL
SELECT 2, '001', DATE '2022-01-01', DATE '2099-12-31' FROM DUAL UNION ALL
SELECT 2, '002', DATE '2022-07-01', DATE '2022-07-31' FROM DUAL UNION ALL
SELECT 2, '003', DATE '2022-08-01', DATE '2099-12-31' FROM DUAL;
Outputs:
SN
V_NUM
START_DATE
END_DATE
RN
1
003
2022-02-01 00:00:00
2099-12-31 00:00:00
1
2
001
2022-01-01 00:00:00
2099-12-31 00:00:00
1
db<>fiddle here

Group historical data

I'm stuck with the following problem and need help:
An object has properties that are calculated every day.
They are stored in a key-value historical table.
Property is mistakenly stored even if it was not changed.
I need a query that will group this data set by "actual values":
If a value was not changed during several days it is output as one row.
If value A was changed to B then back to A, then A, B, A should be output by the query (first A and second A are different date intervals).
Here is a dataset example.
with obj_val_hist as
(
select 123 obj_id, 'k_1' key, 'A' value_, to_date('01.01.2021', 'DD.MM.YYYY') start_dt, to_date('01.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('02.01.2021', 'DD.MM.YYYY') start_dt, to_date('02.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('03.01.2021', 'DD.MM.YYYY') start_dt, to_date('03.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'B' value_, to_date('04.01.2021', 'DD.MM.YYYY') start_dt, to_date('04.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'B' value_, to_date('05.01.2021', 'DD.MM.YYYY') start_dt, to_date('05.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'B' value_, to_date('06.01.2021', 'DD.MM.YYYY') start_dt, to_date('06.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('07.01.2021', 'DD.MM.YYYY') start_dt, to_date('07.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('08.01.2021', 'DD.MM.YYYY') start_dt, to_date('08.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('09.01.2021', 'DD.MM.YYYY') start_dt, to_date('09.01.2021', 'DD.MM.YYYY') end_dt from dual
)
select * from obj_val_hist where obj_id = 123;
Data set:
obj_id
key
value
start_date
end_date
123
k_1
A
01.01.2021
01.01.2021
123
k_1
A
02.01.2021
02.01.2021
123
k_1
A
03.01.2021
03.01.2021
123
k_1
B
04.01.2021
04.01.2021
123
k_1
B
05.01.2021
05.01.2021
123
k_1
B
06.01.2021
06.01.2021
123
k_1
A
07.01.2021
07.01.2021
123
k_1
A
08.01.2021
08.01.2021
123
k_1
A
09.01.2021
09.01.2021
Expected result:
obj_id
key
value
start_date
end_date
123
k_1
A
01.01.2021
03.01.2021
123
k_1
B
04.01.2021
06.01.2021
123
k_1
A
07.01.2021
09.01.2021
This table contains values for million objects.
It is queried by obj_id and has an index on it.
Performance is a key point so using stored functions is most probably not an option.
This query will be a small part of a big view that is used by an external system.
I expected that there should be an analytic function suited for such a problem.
Something like dense_rank but with the possibility to order by one column (start_dt) but increase value when another column (value_) gets a different value.
But I didn't find one.
You may use match_recognize for this, which can also handle gaps in dates and is quite efficient and natural to read:
create table t (
obj_id
, key_
, value_
, start_date
, end_date
)
as
select 123, 'k_1', 'A', to_date('01.01.2021', 'dd.mm.yyyy'), to_date('01.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('02.01.2021', 'dd.mm.yyyy'), to_date('02.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('03.01.2021', 'dd.mm.yyyy'), to_date('03.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'B', to_date('04.01.2021', 'dd.mm.yyyy'), to_date('04.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'B', to_date('05.01.2021', 'dd.mm.yyyy'), to_date('05.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'B', to_date('06.01.2021', 'dd.mm.yyyy'), to_date('06.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('07.01.2021', 'dd.mm.yyyy'), to_date('07.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('08.01.2021', 'dd.mm.yyyy'), to_date('08.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('09.01.2021', 'dd.mm.yyyy'), to_date('09.01.2021', 'dd.mm.yyyy') from dual union all
/*Let's skip 10.01*/
select 123, 'k_1', 'A', to_date('11.01.2021', 'dd.mm.yyyy'), to_date('11.01.2021', 'dd.mm.yyyy') from dual union all
/*And extent validity period for some record*/
select 123, 'k_1', 'A', to_date('12.01.2021', 'dd.mm.yyyy'), to_date('13.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('14.01.2021', 'dd.mm.yyyy'), to_date('14.01.2021', 'dd.mm.yyyy') from dual
select *
from t
match_recognize (
/*For each ID and KEY*/
partition by obj_id, key_
order by start_date asc
/*Output attributes*/
measures
/*start_date of the first row in match group*/
final first(start_date) as min_start_date,
/*end_date of the last row in match group*/
final last(end_date) as max_end_date,
/*value itself as it is constant for match group*/
value_ as val
/*First row and any consequtive matches*/
pattern (init A*)
define
/*Consequtive are the rows which have the same value in value_ field
and start_date of the next row is not farther than
1 day from end_date of the previous row
*/
A as prev(value_) = value_
and prev(end_date) + 1 = start_date
)
OBJ_ID | KEY_ | MIN_START_DATE | MAX_END_DATE | VAL
-----: | :--- | :------------- | :----------- | :--
123 | k_1 | 01-JAN-21 | 03-JAN-21 | A
123 | k_1 | 04-JAN-21 | 06-JAN-21 | B
123 | k_1 | 07-JAN-21 | 09-JAN-21 | A
123 | k_1 | 11-JAN-21 | 14-JAN-21 | A
db<>fiddle here
If you indeed have data every day, then you can use the following relatively simple logic. The subquery calculate when the value changes. The outer query then calculates the end date by looking at the date in the next row:
select obj_id, key, value_, start_dt,
coalesce(lead(start_dt) over (partition by obj_id, key order by start_dt) - interval '1' day, max_end_dt)
from (select ovh.*,
lag(value_) over (partition by obj_id, key order by start_dt) as prev_value_,
max(end_dt) over (partition by obj_id, key) as max_end_dt
from obj_val_hist ovh
where obj_id = 123
) ovh
where prev_value_ is null or prev_value_ <> value_;
However, your data suggests that you could have a much more complicated problem. You have two dates in the row, a start date and end date. These could, in theory, overlap or have gaps. You can handle that by assigning groups when a new key/value pair starts and then aggregating:
select obj_id, key, value_, min(start_dt), max(end_dt)
from (select ovh.*,
sum(case when prev_end_dt >= start_dt - interval '1' day then 0 else 1 end) over (partition by obj_id, key order by start_dt) as grp
from (select ovh.*,
max(end_dt) over (partition by obj_id, key, value_
order by start_dt
range between unbounded preceding and interval '1' day preceding
) as prev_end_dt
from obj_val_hist ovh
) ovh
) ovh
group by obj_id, key, value_, grp;
Here is a db<>fiddle.

Oracle SQL - Find origin ID of autoincrement column

There's a table on my ERP database that has data about certain events. It has the start date, end date and a column shows if the event is a continuation of a previous one (sequential_id references unique_id). Here's an example:
unique_id
start_date
end_date
sequential_id
001
2021-01-01
2021-01-15
002
2021-02-01
2021-02-16
001
003
2021-03-01
2021-03-17
002
004
2021-03-10
2021-03-11
005
2021-03-19
In the example above, rows 001, 002 and 003 are all part of the same event, and 004/005 are unique events, with no sequences. How can I group the data in a way that the output is like this:
origin_id
start_date
end_date
001
2021-01-01
2021-03-17
004
2021-03-10
2021-03-11
005
2021-03-19
I've tried using group by, but due to sequential_id being auto incremental, it didn't work.
Thanks in advance.
You can use modern match_recognize which is an optimal solution for such tasks:
Pattern Recognition With MATCH_RECOGNIZE
DBFiddle
select *
from t
match_recognize(
measures
first(unique_id) start_unique_id,
first(start_date) start_date,
last(end_date) end_date
pattern (strt nxt*)
define nxt as sequential_id=prev(unique_id)
);
You can use hierarchical query for this:
with a (unique_id, start_date, end_date, sequential_id) as (
select '001', date '2021-01-01', date '2021-01-15', null from dual union all
select '002', date '2021-02-01', date '2021-02-16', '001' from dual union all
select '003', date '2021-03-01', date '2021-03-17', '002' from dual union all
select '004', date '2021-03-10', date '2021-03-11', null from dual union all
select '005', date '2021-03-19', null, null from dual
)
, b as (
select
connect_by_root(unique_id) as unique_id
, connect_by_root(start_date) as start_date
, end_date
, connect_by_isleaf as l
from a
start with sequential_id is null
connect by prior unique_id = sequential_id
)
select
unique_id
, start_date
, end_date
from b
where l = 1
order by 1 asc
UNIQUE_ID | START_DATE | END_DATE
:-------- | :--------- | :--------
001 | 01-JAN-21 | 17-MAR-21
004 | 10-MAR-21 | 11-MAR-21
005 | 19-MAR-21 | null
db<>fiddle here
This is a graph-walking problem, so you can use a recursive CTE:
with cte (unique_id, start_date, end_date, start_unique_id) as (
select unique_id, start_date, end_date, unique_id
from t
where not exists (select 1 from t t2 where t.sequential_id = t2.unique_id)
union all
select t.unique_id, t.start_date, t.end_date, cte.start_unique_id
from cte join
t
on cte.unique_id = t.sequential_id
)
select start_unique_id, min(start_date), max(end_date)
from cte
group by start_Unique_id;
Here is a db<>fiddle.

How to join two tables to determine date ranges when one table contains (id, start_date) and another contains (id, end_date)

I'm new to SQL, hope you guys don't find it silly. Working with two tables here, one contains start dates and other contains end dates. Entries do not follow sequence/possibility of duplicates.
**TABLE 1**
id start_date
1 2019-04-23
1 2019-06-05
1 2019-06-05
1 2019-10-29
1 2019-12-16
2 2019-01-05
3 2020-02-01
**TABLE 2**
id end_date
1 2019-04-23
1 2019-06-05
1 2019-06-06
1 2019-06-06
1 2019-07-24
1 2019-10-16
2 2020-01-04
**EXPECTED OUTPUT**
id start_date end_date
1 2019-04-23 2019-06-05
1 2019-10-29 null
2 2019-01-05 2020-01-04
3 2020-02-01 null
You can use union all and aggregation with some window functions:
with table1 as (
select 1 as id, date('2019-04-23') as start_date union all
select 1, '2019-06-05' union all
select 1, '2019-06-05' union all
select 1, '2019-10-29' union all
select 1, '2019-12-16' union all
select 2, '2019-01-05' union all
select 3, '2020-02-01'
),
table2 as (
SELECT 1 as id, DATE('2019-04-23') as end_date union all
SELECT 1, '2019-06-05' union all
select 1, '2019-06-06' union all
select 1, '2019-06-06' union all
select 1, '2019-07-24' union all
select 1, '2019-10-16' union all
select 2, '2020-01-04'
)
select id, min(start_date), end_date
from (select id, start_date,
first_value(end_date ignore nulls) over (partition by id order by DATE_DIFF(coalesce(start_date, end_date), CURRENT_DATE, day) RANGE between 1 following and unbounded following) as end_date
from ((select id, start_date, null as end_date
from table1
) union all
(select id, null as start_date, end_date
from table2
)
) se
)
group by id, end_date
having min(start_date) is not null;
Why do you have multiple records with the same id (Am assuming id is a primary key)? My suggestion would be for you to make the id's unique and creating a foreign key constraint in the end dates table (Since there can't be and end date without a start date) and use the foreign key relationship to retrieve the desired results. E.g SELECT S.start_date,E.end_date FROM table1 S JOIN table2 E where S.id=E.table1_fk
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, start_date, IF(end_date = '9999-01-01', NULL, end_date) end_date
FROM (
SELECT id, start_date, ARRAY_AGG(end_date ORDER BY end_date LIMIT 1)[OFFSET(0)] end_date
FROM (
SELECT id, start_date, IF(start_date < end_date, end_date, '9999-01-01') end_date
FROM `project.dataset.table1`
LEFT JOIN `project.dataset.table2`
USING (id)
)
GROUP BY id, start_date
)
If to apply to sample data from your question - result is
Row id start_date end_date
1 1 2019-04-23 2019-06-05
2 1 2019-06-05 2019-06-06
3 1 2019-10-29 null
4 1 2019-12-16 null
5 2 2019-01-05 2020-01-04
6 3 2020-02-01 null
Note: quick and not optimized - but looks like produces desired result

Finding missing dates in a sequence

I have following table with ID and DATE
ID DATE
123 7/1/2015
123 6/1/2015
123 5/1/2015
123 4/1/2015
123 9/1/2014
123 8/1/2014
123 7/1/2014
123 6/1/2014
456 11/1/2014
456 10/1/2014
456 9/1/2014
456 8/1/2014
456 5/1/2014
456 4/1/2014
456 3/1/2014
789 9/1/2014
789 8/1/2014
789 7/1/2014
789 6/1/2014
789 5/1/2014
789 4/1/2014
789 3/1/2014
In this table, I have three customer ids, 123, 456, 789 and date column which shows which month they worked.
I want to find out which of the customers have gap in their work.
Our customers work record is kept per month...so, dates are monthly..
and each customer have different start and end dates.
Expected results:
ID First_Absent_date
123 10/01/2014
456 06/01/2014
To get a simple list of the IDs with gaps, with no further details, you need to look at each ID separately, and as #mikey suggested you can count the number of months and look at the first and last date to see if how many months that spans.
If your table has a column called month (since date isn't allowed unless it's a quoted identifier) you could start with:
select id, count(month), min(month), max(month),
months_between(max(month), min(month)) + 1 as diff
from your_table
group by id
order by id;
ID COUNT(MONTH) MIN(MONTH) MAX(MONTH) DIFF
---------- ------------ ---------- ---------- ----------
123 8 01-JUN-14 01-JUL-15 14
456 7 01-MAR-14 01-NOV-14 9
789 7 01-MAR-14 01-SEP-14 7
Then compare the count with the month span, in a having clause:
select id
from your_table
group by id
having count(month) != months_between(max(month), min(month)) + 1
order by id;
ID
----------
123
456
If you can actually have multiple records in a month for an ID, and/or the date recorded might not be the start of the month, you can do a bit more work to normalise the dates:
select id,
count(distinct trunc(month, 'MM')),
min(trunc(month, 'MM')),
max(trunc(month, 'MM')),
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1 as diff
from your_table
group by id
order by id;
select id
from your_table
group by id
having count(distinct trunc(month, 'MM')) !=
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1
order by id;
Oracle Setup:
CREATE TABLE your_table ( ID, "DATE" ) AS
SELECT 123, DATE '2015-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-06-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-05-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-04-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-11-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-10-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-03-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-03-01' FROM DUAL;
Query:
SELECT ID,
MIN( missing_date )
FROM (
SELECT ID,
CASE WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
= ADD_MONTHS( "DATE", 1 ) THEN NULL
WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
IS NULL THEN NULL
ELSE ADD_MONTHS( "DATE", 1 )
END AS missing_date
FROM your_table
)
GROUP BY ID
HAVING COUNT( missing_date ) > 0;
Output:
ID MIN(MISSING_DATE)
---------- -------------------
123 2014-10-01 00:00:00
456 2014-06-01 00:00:00
You could use a Lag() function to see if records have been skipped for a particular date or not.Lag() basically helps in comparing the data in current row with previous row. So if we order by DATE, we could easily compare and find any gaps.
select * from
(
select ID,DATE_, case when DATE_DIFF>1 then 1 else 0 end comparison from
(
select ID, DATE_ ,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;
This groups all the entries by id, and then arranges the records by date. If a customer is always present, there would not be a gap in his date. So anyone who has a date difference greater than 1 had a gap. You could tweak this as per your requirement.
EDIT : Just observed that you are storing data in mm/dd/yyyy format, when I closely observed above answers.You are storing only first date of every month. So, the above query can be tweaked as :
select * from
(
select ID,DATE_,PREV_DATE,last_day(PREV_DATE)+1 ABSENT_DATE, case when DATE_DIFF>31 then 1 else 0 end comparison from
(
select ID, DATE_ ,LAG(DATE_,1) OVER (PARTITION BY ID ORDER BY DATE_) PREV_DATE,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;