I'm stuck with the following problem and need help:
An object has properties that are calculated every day.
They are stored in a key-value historical table.
Property is mistakenly stored even if it was not changed.
I need a query that will group this data set by "actual values":
If a value was not changed during several days it is output as one row.
If value A was changed to B then back to A, then A, B, A should be output by the query (first A and second A are different date intervals).
Here is a dataset example.
with obj_val_hist as
(
select 123 obj_id, 'k_1' key, 'A' value_, to_date('01.01.2021', 'DD.MM.YYYY') start_dt, to_date('01.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('02.01.2021', 'DD.MM.YYYY') start_dt, to_date('02.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('03.01.2021', 'DD.MM.YYYY') start_dt, to_date('03.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'B' value_, to_date('04.01.2021', 'DD.MM.YYYY') start_dt, to_date('04.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'B' value_, to_date('05.01.2021', 'DD.MM.YYYY') start_dt, to_date('05.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'B' value_, to_date('06.01.2021', 'DD.MM.YYYY') start_dt, to_date('06.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('07.01.2021', 'DD.MM.YYYY') start_dt, to_date('07.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('08.01.2021', 'DD.MM.YYYY') start_dt, to_date('08.01.2021', 'DD.MM.YYYY') end_dt from dual union all
select 123 obj_id, 'k_1' key, 'A' value_, to_date('09.01.2021', 'DD.MM.YYYY') start_dt, to_date('09.01.2021', 'DD.MM.YYYY') end_dt from dual
)
select * from obj_val_hist where obj_id = 123;
Data set:
obj_id
key
value
start_date
end_date
123
k_1
A
01.01.2021
01.01.2021
123
k_1
A
02.01.2021
02.01.2021
123
k_1
A
03.01.2021
03.01.2021
123
k_1
B
04.01.2021
04.01.2021
123
k_1
B
05.01.2021
05.01.2021
123
k_1
B
06.01.2021
06.01.2021
123
k_1
A
07.01.2021
07.01.2021
123
k_1
A
08.01.2021
08.01.2021
123
k_1
A
09.01.2021
09.01.2021
Expected result:
obj_id
key
value
start_date
end_date
123
k_1
A
01.01.2021
03.01.2021
123
k_1
B
04.01.2021
06.01.2021
123
k_1
A
07.01.2021
09.01.2021
This table contains values for million objects.
It is queried by obj_id and has an index on it.
Performance is a key point so using stored functions is most probably not an option.
This query will be a small part of a big view that is used by an external system.
I expected that there should be an analytic function suited for such a problem.
Something like dense_rank but with the possibility to order by one column (start_dt) but increase value when another column (value_) gets a different value.
But I didn't find one.
You may use match_recognize for this, which can also handle gaps in dates and is quite efficient and natural to read:
create table t (
obj_id
, key_
, value_
, start_date
, end_date
)
as
select 123, 'k_1', 'A', to_date('01.01.2021', 'dd.mm.yyyy'), to_date('01.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('02.01.2021', 'dd.mm.yyyy'), to_date('02.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('03.01.2021', 'dd.mm.yyyy'), to_date('03.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'B', to_date('04.01.2021', 'dd.mm.yyyy'), to_date('04.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'B', to_date('05.01.2021', 'dd.mm.yyyy'), to_date('05.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'B', to_date('06.01.2021', 'dd.mm.yyyy'), to_date('06.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('07.01.2021', 'dd.mm.yyyy'), to_date('07.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('08.01.2021', 'dd.mm.yyyy'), to_date('08.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('09.01.2021', 'dd.mm.yyyy'), to_date('09.01.2021', 'dd.mm.yyyy') from dual union all
/*Let's skip 10.01*/
select 123, 'k_1', 'A', to_date('11.01.2021', 'dd.mm.yyyy'), to_date('11.01.2021', 'dd.mm.yyyy') from dual union all
/*And extent validity period for some record*/
select 123, 'k_1', 'A', to_date('12.01.2021', 'dd.mm.yyyy'), to_date('13.01.2021', 'dd.mm.yyyy') from dual union all
select 123, 'k_1', 'A', to_date('14.01.2021', 'dd.mm.yyyy'), to_date('14.01.2021', 'dd.mm.yyyy') from dual
select *
from t
match_recognize (
/*For each ID and KEY*/
partition by obj_id, key_
order by start_date asc
/*Output attributes*/
measures
/*start_date of the first row in match group*/
final first(start_date) as min_start_date,
/*end_date of the last row in match group*/
final last(end_date) as max_end_date,
/*value itself as it is constant for match group*/
value_ as val
/*First row and any consequtive matches*/
pattern (init A*)
define
/*Consequtive are the rows which have the same value in value_ field
and start_date of the next row is not farther than
1 day from end_date of the previous row
*/
A as prev(value_) = value_
and prev(end_date) + 1 = start_date
)
OBJ_ID | KEY_ | MIN_START_DATE | MAX_END_DATE | VAL
-----: | :--- | :------------- | :----------- | :--
123 | k_1 | 01-JAN-21 | 03-JAN-21 | A
123 | k_1 | 04-JAN-21 | 06-JAN-21 | B
123 | k_1 | 07-JAN-21 | 09-JAN-21 | A
123 | k_1 | 11-JAN-21 | 14-JAN-21 | A
db<>fiddle here
If you indeed have data every day, then you can use the following relatively simple logic. The subquery calculate when the value changes. The outer query then calculates the end date by looking at the date in the next row:
select obj_id, key, value_, start_dt,
coalesce(lead(start_dt) over (partition by obj_id, key order by start_dt) - interval '1' day, max_end_dt)
from (select ovh.*,
lag(value_) over (partition by obj_id, key order by start_dt) as prev_value_,
max(end_dt) over (partition by obj_id, key) as max_end_dt
from obj_val_hist ovh
where obj_id = 123
) ovh
where prev_value_ is null or prev_value_ <> value_;
However, your data suggests that you could have a much more complicated problem. You have two dates in the row, a start date and end date. These could, in theory, overlap or have gaps. You can handle that by assigning groups when a new key/value pair starts and then aggregating:
select obj_id, key, value_, min(start_dt), max(end_dt)
from (select ovh.*,
sum(case when prev_end_dt >= start_dt - interval '1' day then 0 else 1 end) over (partition by obj_id, key order by start_dt) as grp
from (select ovh.*,
max(end_dt) over (partition by obj_id, key, value_
order by start_dt
range between unbounded preceding and interval '1' day preceding
) as prev_end_dt
from obj_val_hist ovh
) ovh
) ovh
group by obj_id, key, value_, grp;
Here is a db<>fiddle.
Related
I have a table with following columns:
Emp_ID Number
Emp_flag Varchar2(1)
Date_1 Date
Date_2 Date
create_date Date
No PK on this table , there are many records with duplicates of Emp_id..
What I need to know, is when a new Date_1 is entered (so Null to a date, or from Date 1 to Date 2) on what date that happened.
I can’t just look at a single record to compare Date_1 with create_date because there are many times in the many records for a given Emp_ID when the Date_1 is simply “copied” to the new record. A Date_1 may have been originally entered on 02/15/2019 with a value of 02/01/2019. Now let’s say Date_2 gets added on 02/12/2020. So the table looks like this:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
123 Y Null Null 1/18/2018
123 Y 02/1/2019 Null 02/15/2019
123 Y 02/1/2019 02/12/2021 02/12/2020
I need a SQL query that would tell me that Emp_ID 123 had a Date_1 of 02/1/2019 entered on 02/15/2019 and NOT pick up any other record.
Expected output:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
123 Y 02/1/2019 Null 02/15/2019
Example 2 (notice date_1 is different):
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y Null Null 1/18/2018
456 Y 10/1/2019 Null 02/15/2019
456 Y 11/2/2019 02/12/2021 02/12/2020
Expected output:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y 10/1/2019 Null 02/15/2019
456 Y 11/2/2019 02/12/2021 02/12/2020
Example 3:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y Null Null 1/18/2018
456 Y 10/1/2019 Null 02/15/2019
456 Y 10/1/2019 Null 02/15/2019
456 Y 11/2/2019 02/12/2021 02/12/2020
Expected output:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y 10/1/2019 Null 02/15/2019
456 Y 11/2/2019 02/12/2021 02/12/2020
Example 4:
Emp_ID Emp_flag Date_1 Date_2 Create_Date
456 Y 10/1/2019 Null 02/15/2019
456 Y 10/1/2019 Null 02/16/2019
Expected output: No records.
You can use the Lag function to check whether the previous value of date_1 existed or not.
SELECT x.emp_id,
x.date_1,
x.create_date AS first_date_with_date_1
FROM (
SELECT t.emp_id,
t.create_date,
t.date_1,
LAG(t.date_1) OVER (PARTITION BY t.emp_id ORDER BY t.create_date) AS last_date_1
FROM your_table t
) x
WHERE x.date_1 IS NOT NULL
AND x.last_date_1 IS NULL
Test for all cases:
with t(emp_id, emp_flag, date_1, date_2, create_date) as (
select 101, 'Y', null, null, date '2018-01-18' from dual union all
select 101, 'Y', date '2019-02-01', null, date '2019-02-15' from dual union all
select 101, 'Y', date '2019-02-01', date '2021-02-12', date '2019-02-16' from dual union all
select 102, 'Y', null, null, date '2018-01-18' from dual union all
select 102, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 102, 'Y', date '2019-02-11', date '2021-02-12', date '2019-02-16' from dual union all
select 103, 'Y', null, null, date '2018-01-18' from dual union all
select 103, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 103, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 103, 'Y', date '2019-02-11', date '2021-02-21', date '2020-12-02' from dual )
select emp_id, emp_flag, date_1, date_2, create_date
from (
select emp_ID, emp_flag, date_1, date_2, create_date,
lag(date_1) over (partition by emp_id order by create_date) prev_dt1
from t )
where date_1 <> nvl(prev_dt1, date_1 - 1);
Result:
EMP_ID EMP_FLAG DATE_1 DATE_2 CREATE_DATE
---------- -------- ----------- ----------- -----------
101 Y 2019-02-01 2019-02-15
102 Y 2019-02-10 2019-02-15
102 Y 2019-02-11 2021-02-12 2019-02-16
103 Y 2019-02-10 2019-02-15
103 Y 2019-02-11 2021-02-21 2020-12-02
Edit:
when there are more than one records with no change in Date_1. It
should not return a record for that Emp_id
In this case date_1 is set in first row (id 104). If you want hide rows in such case use:
with t(emp_id, emp_flag, date_1, date_2, create_date) as (
select 104, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 104, 'Y', date '2019-02-10', null, date '2019-02-16' from dual union all
select 105, 'Y', date '2019-02-10', null, date '2019-02-15' from dual union all
select 105, 'Y', null, null, date '2019-02-16' from dual )
select emp_id, emp_flag, date_1, date_2, create_date
from (
select emp_ID, emp_flag, date_1, date_2, create_date,
lag(date_1) over (partition by emp_id order by create_date) prev_dt1,
row_number() over (partition by emp_id order by create_date) rn
from t )
where (date_1 is not null and prev_dt1 is null and rn > 1)
or date_1 <> prev_dt1
or date_1 is null and prev_dt1 is not null;
I also added case when previous date was set and now it is null (id 105). If it is not possible or you don't want it then remove last row.
You can use the lag function instead of lead here:
with tableA as
(
select 456 as Emp_ID,'Y' as Emp_flag,CAST(NUll as date) as Date_1,CAST(NULL as date) as Date_2,CAST('18Jan2018' as date) as Create_date from dual union
select 456,'Y',CAST('01Oct2019' as date),Null,CAST('15Feb2019' as date) from dual union
select 456,'Y',CAST('02Nov2019' as date),CAST('12Feb2021' as date),CAST('12Feb2020' as date) from dual)
select x.Emp_ID,x.Emp_flag,x.Date_1,x.Date_2,x.Create_date
from
(select a.*
,lag(a.date_1) Over (partition by a.Emp_ID order by a.create_date) as lag_date
from tableA a) x
where x.date_1 is not null and x.date_1<>COALESCE(x.lag_date,CAST('01Jan2100' as date))
This will give out the values only when there is a change in date_1. Since NULL comparisons won't work, I have replace them with 1/1/2100. Hope this helps.
Edit:
I checked for a sample like you mentioned and it does seem to be working. If it's not working, kindly share the expected and the result you are getting:
with tableA as
(
select 456 as Emp_ID,'Y' as Emp_flag,CAST(NUll as date) as Date_1,CAST(NULL as date) as Date_2,CAST('18Jan2018' as date) as Create_date from dual union
select 456,'Y',CAST('01Oct2019' as date),Null,CAST('15Feb2019' as date) from dual union
select 456,'Y',CAST('01Oct2019' as date),CAST('12Feb2021' as date),CAST('12Feb2020' as date) from dual)
select x.Emp_ID,x.Emp_flag,x.Date_1,x.Date_2,x.Create_date
from
(select a.*
,lag(a.date_1) Over (partition by a.Emp_ID order by a.create_date) as lag_date
from tableA a) x
where x.date_1 is not null and x.date_1<>COALESCE(x.lag_date,CAST('01Jan2100' as date))
I got an SQL problem I'm not capable to solve.
First of all, an SQL fiddle with it: http://sqlfiddle.com/#!4/fe7b07/2
As you see, I fill the table with some dates, which are bound to some ID. Those dates are day by day. So for this example, we'd have something like this, if we only look at January:
The timelines spanning from 2020-01-01 to 2020-01-31, the blocks are the dates in the database. So this would be the simple SELECT * FROM days output.
What I now want is to fill in some days to this output. These would span from timeline_begin to MIN(date_from); and from MAX(date_from) to timeline_end.
I'll mark these red in the following picture:
The orange span is not necessary to be added, too, but if your solution would do that too, that would be also ok.
Ok, so far so good.
For this I created the SELECT * FROM minmax, which will select the MIN(date_from) and MAX(date_from) for every id_othertable. Still no magic involved.
What I struggle is now creating those days for every id_othertable, while also joining the data they have on them (in this fiddle, it's just the some_info field).
I tried to write this in the SELECT * FROM days_before query, but I just can't get it to work. I read about the magical function CONNECT BY, which will on its own create dates line by line, but I can't get to join my data from the former table. Every time I join the info, I only get one line per id_othertable, not all those dates I need.
So the ideal solution I'm looking for would be to have three select queries:
SELECT * FROM days which select dates out of the database
SELECT * FROM days_before which will show the dates before MIN(date_from) of query 1
SELECT * FROM days_after for dates after MAX(date_from) of query 1
And in the end I'd UNION those three queries to have them all combined.
I hope I could explain my problem good enough. If you need any information or further explaining, please don't hesitate to ask.
EDIT 1: I created a pastebin with some example data: https://pastebin.com/jskrStpZ
Bear in mind that only the first query has actual information from the database, the other two have created data. Also, this example output only has data for id_othertable = 1, so the actual query should also have the information for id_othertable = 2, 3.
EDIT 2: just for clarification, the field date_to is just a simple date_from + 1 day.
If you have denormalised date it's quite simple:
with bas as (
select 1 id_other_table, to_date('2020-01-05', 'YYYY-MM-DD') date_from, to_date('2020-01-06', 'YYYY-MM-DD') date_to, 'hello' some_info from dual
union all select 1 id_other_table, to_date('2020-01-06', 'YYYY-MM-DD') date_from, to_date('2020-01-07', 'YYYY-MM-DD') date_to, 'hello' some_info from dual
union all select 1 id_other_table, to_date('2020-01-07', 'YYYY-MM-DD') date_from, to_date('2020-01-08', 'YYYY-MM-DD') date_to, 'hello' some_info from dual
union all select 1 id_other_table, to_date('2020-01-10', 'YYYY-MM-DD') date_from, to_date('2020-01-11', 'YYYY-MM-DD') date_to, 'hello' some_info from dual
union all select 1 id_other_table, to_date('2020-01-11', 'YYYY-MM-DD') date_from, to_date('2020-01-12', 'YYYY-MM-DD') date_to, 'hello' some_info from dual
union all select 1 id_other_table, to_date('2020-01-12', 'YYYY-MM-DD') date_from, to_date('2020-01-13', 'YYYY-MM-DD') date_to, 'hello' some_info from dual
union all select 2 id_other_table, to_date('2020-01-10', 'YYYY-MM-DD') date_from, to_date('2020-01-11', 'YYYY-MM-DD') date_to, 'my' some_info from dual
union all select 2 id_other_table, to_date('2020-01-11', 'YYYY-MM-DD') date_from, to_date('2020-01-12', 'YYYY-MM-DD') date_to, 'my' some_info from dual
union all select 2 id_other_table, to_date('2020-01-12', 'YYYY-MM-DD') date_from, to_date('2020-01-13', 'YYYY-MM-DD') date_to, 'my' some_info from dual
union all select 3 id_other_table, to_date('2020-01-20', 'YYYY-MM-DD') date_from, to_date('2020-01-21', 'YYYY-MM-DD') date_to, 'friend' some_info from dual
union all select 3 id_other_table, to_date('2020-01-21', 'YYYY-MM-DD') date_from, to_date('2020-01-22', 'YYYY-MM-DD') date_to, 'friend' some_info from dual
union all select 3 id_other_table, to_date('2020-01-22', 'YYYY-MM-DD') date_from, to_date('2020-01-23', 'YYYY-MM-DD') date_to, 'friend' some_info from dual)
, ad as (select trunc(sysdate,'YYYY') -1 + level all_dates from dual connect by level <= 31)
select distinct some_info,all_dates from bas,ad where (some_info,all_dates) not in (select some_info,date_from from bas)
If you have longer date ranges or mind of the time the query needs another solution is helpful. But that is harder to debug. Because it's quite hard to get the orange time slot
If you want the dates per id that are not in the database then you can use the LEAD analytic function:
WITH dates ( id, date_from, date_to ) AS (
SELECT id_othertable,
DATE '2020-01-01',
MIN( date_from )
FROM some_dates
WHERE date_to > DATE '2020-01-01'
AND date_from < ADD_MONTHS( DATE '2020-01-01', 1 )
GROUP BY id_othertable
UNION ALL
SELECT id_othertable,
date_to,
LEAD( date_from, 1, ADD_MONTHS( DATE '2020-01-01', 1 ) )
OVER ( PARTITION BY id_othertable ORDER BY date_from )
FROM some_dates
WHERE date_to > DATE '2020-01-01'
AND date_from < ADD_MONTHS( DATE '2020-01-01', 1 )
)
SELECT id,
date_from,
date_to
FROM dates
WHERE date_from < date_to
ORDER BY id, date_from;
so for the test data:
CREATE TABLE some_dates ( id_othertable, date_from, date_to, some_info ) AS
SELECT 1, DATE '2020-01-05', DATE '2020-01-06', 'hello1' FROM DUAL UNION ALL
SELECT 1, DATE '2020-01-06', DATE '2020-01-07', 'hello2' FROM DUAL UNION ALL
SELECT 1, DATE '2020-01-07', DATE '2020-01-08', 'hello3' FROM DUAL UNION ALL
SELECT 1, DATE '2020-01-10', DATE '2020-01-13', 'hello4' FROM DUAL UNION ALL
SELECT 2, DATE '2020-01-10', DATE '2020-01-13', 'my' FROM DUAL UNION ALL
SELECT 3, DATE '2020-01-20', DATE '2020-01-23', 'friend' FROM DUAL UNION ALL
SELECT 4, DATE '2019-12-31', DATE '2020-01-05', 'before' FROM DUAL UNION ALL
SELECT 4, DATE '2020-01-30', DATE '2020-02-02', 'after' FROM DUAL UNION ALL
SELECT 5, DATE '2019-12-31', DATE '2020-01-10', 'only_before' FROM DUAL UNION ALL
SELECT 6, DATE '2020-01-15', DATE '2020-02-01', 'only_after' FROM DUAL UNION ALL
SELECT 7, DATE '2019-12-31', DATE '2020-02-01', 'exlude_all' FROM DUAL;
this outputs:
ID | DATE_FROM | DATE_TO
-: | :--------- | :---------
1 | 2020-01-01 | 2020-01-05
1 | 2020-01-08 | 2020-01-10
1 | 2020-01-13 | 2020-02-01
2 | 2020-01-01 | 2020-01-10
2 | 2020-01-13 | 2020-02-01
3 | 2020-01-01 | 2020-01-20
3 | 2020-01-23 | 2020-02-01
4 | 2020-01-05 | 2020-01-30
5 | 2020-01-10 | 2020-02-01
6 | 2020-01-01 | 2020-01-15
db<>fiddle here
If you want the days before then filter on:
WHERE day_from = DATE '2020-01-01'
and, similarly, if you want the days after then filter on:
WHERE day_to = ADD_MONTHS( DATE '2020-01-01', 1 )
If you want to specify the start date and number of months duration then use named bind parameters:
WITH dates ( id, date_from, date_to ) AS (
SELECT id_othertable,
:start_date,
MIN( date_from )
FROM some_dates
WHERE date_to > :start_date
AND date_from < ADD_MONTHS( :start_date, :number_months )
GROUP BY id_othertable
UNION ALL
SELECT id_othertable,
date_to,
LEAD( date_from, 1, ADD_MONTHS( :start_date, :number_months ) )
OVER ( PARTITION BY id_othertable ORDER BY date_from )
FROM some_dates
WHERE date_to > :start_date
AND date_from < ADD_MONTHS( :start_date, :number_months )
)
SELECT id,
date_from,
date_to
FROM dates
WHERE date_from < date_to
ORDER BY id, date_from;
Select whole range using connect by generator. Join your table partitioned by id.
select date_from, nvl(date_to, date_from +1) date_to, id_othertable, some_info
from (
select date '2020-01-01' + level - 1 as date_from
from dual
connect by level <= date '2020-01-31' - date '2020-01-01' ) gen
natural left join some_dates partition by (id_othertable)
sqlfiddle
I have following table with ID and DATE
ID DATE
123 7/1/2015
123 6/1/2015
123 5/1/2015
123 4/1/2015
123 9/1/2014
123 8/1/2014
123 7/1/2014
123 6/1/2014
456 11/1/2014
456 10/1/2014
456 9/1/2014
456 8/1/2014
456 5/1/2014
456 4/1/2014
456 3/1/2014
789 9/1/2014
789 8/1/2014
789 7/1/2014
789 6/1/2014
789 5/1/2014
789 4/1/2014
789 3/1/2014
In this table, I have three customer ids, 123, 456, 789 and date column which shows which month they worked.
I want to find out which of the customers have gap in their work.
Our customers work record is kept per month...so, dates are monthly..
and each customer have different start and end dates.
Expected results:
ID First_Absent_date
123 10/01/2014
456 06/01/2014
To get a simple list of the IDs with gaps, with no further details, you need to look at each ID separately, and as #mikey suggested you can count the number of months and look at the first and last date to see if how many months that spans.
If your table has a column called month (since date isn't allowed unless it's a quoted identifier) you could start with:
select id, count(month), min(month), max(month),
months_between(max(month), min(month)) + 1 as diff
from your_table
group by id
order by id;
ID COUNT(MONTH) MIN(MONTH) MAX(MONTH) DIFF
---------- ------------ ---------- ---------- ----------
123 8 01-JUN-14 01-JUL-15 14
456 7 01-MAR-14 01-NOV-14 9
789 7 01-MAR-14 01-SEP-14 7
Then compare the count with the month span, in a having clause:
select id
from your_table
group by id
having count(month) != months_between(max(month), min(month)) + 1
order by id;
ID
----------
123
456
If you can actually have multiple records in a month for an ID, and/or the date recorded might not be the start of the month, you can do a bit more work to normalise the dates:
select id,
count(distinct trunc(month, 'MM')),
min(trunc(month, 'MM')),
max(trunc(month, 'MM')),
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1 as diff
from your_table
group by id
order by id;
select id
from your_table
group by id
having count(distinct trunc(month, 'MM')) !=
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1
order by id;
Oracle Setup:
CREATE TABLE your_table ( ID, "DATE" ) AS
SELECT 123, DATE '2015-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-06-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-05-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-04-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-11-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-10-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-03-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-03-01' FROM DUAL;
Query:
SELECT ID,
MIN( missing_date )
FROM (
SELECT ID,
CASE WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
= ADD_MONTHS( "DATE", 1 ) THEN NULL
WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
IS NULL THEN NULL
ELSE ADD_MONTHS( "DATE", 1 )
END AS missing_date
FROM your_table
)
GROUP BY ID
HAVING COUNT( missing_date ) > 0;
Output:
ID MIN(MISSING_DATE)
---------- -------------------
123 2014-10-01 00:00:00
456 2014-06-01 00:00:00
You could use a Lag() function to see if records have been skipped for a particular date or not.Lag() basically helps in comparing the data in current row with previous row. So if we order by DATE, we could easily compare and find any gaps.
select * from
(
select ID,DATE_, case when DATE_DIFF>1 then 1 else 0 end comparison from
(
select ID, DATE_ ,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;
This groups all the entries by id, and then arranges the records by date. If a customer is always present, there would not be a gap in his date. So anyone who has a date difference greater than 1 had a gap. You could tweak this as per your requirement.
EDIT : Just observed that you are storing data in mm/dd/yyyy format, when I closely observed above answers.You are storing only first date of every month. So, the above query can be tweaked as :
select * from
(
select ID,DATE_,PREV_DATE,last_day(PREV_DATE)+1 ABSENT_DATE, case when DATE_DIFF>31 then 1 else 0 end comparison from
(
select ID, DATE_ ,LAG(DATE_,1) OVER (PARTITION BY ID ORDER BY DATE_) PREV_DATE,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;
Let's say I have the following database table:
id | from | to
1 | 01-JAN-2015 | 03-MAR-2015
1 | 04-MAR-2015 | 31-AUG-2015
1 | 01-SEP-2015 | 31-DEC-2015
2 | 01-JAN-2015 | 30-JUN-2015
2 | 01-NOV-2015 | 31-DEC-2015
And I want to summarise the records with the same id that are continuous in time into one single row covering the full time frame, as follows:
id | from | to
1 | 01-JAN-2015 | 31-DEC-2015
2 | 01-JAN-2015 | 30-JUN-2015
2 | 01-NOV-2015 | 31-DEC-2015
So, because the time frames are sequential and have no gaps between them, the 3 rows for id 1 could be converted into 1 single row with the minimum from date and the maximum to date. The 2 rows for id 2 would remain the same as the time frames are not continuous.
I'm thinking on doing this using a loop through a cursor, but I might be complicating things.
Any better ideas? perhaps with SQL queries only?
You can do it using hierarchical queries, something like this:
select id, min(root_dt_from) dt_from, dt_to
from (select id, dt_from, dt_to, level, connect_by_isleaf, connect_by_root(dt_from) root_dt_from
from t
where connect_by_isleaf = 1
connect by prior id = id and prior (dt_to + 1) = dt_from
)
group by id, dt_to;
Sample execution:
SQL> with t as (
2 select 1 id, to_date('01-JAN-2015', 'DD-MON-YYYY') dt_from, to_date('03-MAR-2015', 'DD-MON-YYYY') dt_to from dual union all
3 select 1 id, to_date('04-MAR-2015', 'DD-MON-YYYY') dt_from, to_date('31-AUG-2015', 'DD-MON-YYYY') dt_to from dual union all
4 select 1 id, to_date('01-SEP-2015', 'DD-MON-YYYY') dt_from, to_date('31-DEC-2015', 'DD-MON-YYYY') dt_to from dual union all
5 select 2 id, to_date('01-JAN-2015', 'DD-MON-YYYY') dt_from, to_date('30-JUN-2015', 'DD-MON-YYYY') dt_to from dual union all
6 select 2 id, to_date('01-NOV-2015', 'DD-MON-YYYY') dt_from, to_date('31-DEC-2015', 'DD-MON-YYYY') dt_to from dual
7 ) -- end of sample data
8 select id, min(root_dt_from) dt_from, dt_to
9 from (select id, dt_from, dt_to, level, connect_by_isleaf, connect_by_root(dt_from) root_dt_from
10 from t
11 where connect_by_isleaf = 1
12 connect by prior id = id and prior (dt_to + 1) = dt_from
13 )
14 group by id, dt_to;
ID DT_FROM DT_TO
---------- ----------- -----------
1 01-JAN-2015 31-DEC-2015
2 01-NOV-2015 31-DEC-2015
2 01-JAN-2015 30-JUN-2015
You can do this is stages with a few analytic and aggregate functions:
with t1(id, from_dt, to_dt) as (
select 1, to_date('01-JAN-2015', 'dd-mon-rrrr'), to_date('03-MAR-2015', 'dd-mon-rrrr') from dual union all
select 1, to_date('04-MAR-2015', 'dd-mon-rrrr'), to_date('31-AUG-2015', 'dd-mon-rrrr') from dual union all
select 1, to_date('01-SEP-2015', 'dd-mon-rrrr'), to_date('31-DEC-2015', 'dd-mon-rrrr') from dual union all
select 2, to_date('01-JAN-2015', 'dd-mon-rrrr'), to_date('30-JUN-2015', 'dd-mon-rrrr') from dual union all
select 2, to_date('01-NOV-2015', 'dd-mon-rrrr'), to_date('31-DEC-2015', 'dd-mon-rrrr') from dual
), t2 as (
select id
, from_dt
, to_dt
, from_dt-lag(to_dt,1,from_dt-1) over (partition by id order by to_dt) dst
, row_number() over (partition by id order by to_dt) rn
from t1
), t3 as (
select id
, from_dt
, to_dt
, sum(dst) over (partition by id order by rn) - rn grp
from t2
)
select id
, min(from_dt) from_dt
, max(to_dt) to_dt
from t3
group by id, grp;
The first stage T1 is just recreating your data. In T2 I subtract the lag of to_dt from from_dt to find the distance (dst) between consecutive records and generate row_number for each record (rn). In T3 I subtract rn from the running sum of dst to generate a group id (grp). Finally in the output stage I take the min and max of from_dt and to_dt respectively grouping by ID and grp columns.
You can try here some analytical functions which can really simplify
the scenario. Hope this below snippet helps. Let me know for any
issues.
SELECT B.ID,
MIN(B.FRM_DT) FRM_DT,
MAX(B.TO_DT) TO_DT
FROM
(SELECT A.ID,
A.FRM_DT,
A.TO_DT,
NVL(LAG(A.TO_DT+1) OVER(PARTITION BY A.ID ORDER BY A.TO_DT),A.FRM_DT) nxt_dt,
CASE
WHEN NULLIF(A.FRM_DT,NVL(LAG(A.TO_DT+1) OVER(PARTITION BY A.ID ORDER BY A.TO_DT),A.FRM_DT)) IS NULL
THEN 'True'
ELSE 'False'
END COND
FROM
(SELECT 1 AS ID,
TO_DATE('01/01/2015') FRM_DT,
TO_DATE('03/03/2015') TO_DT
FROM DUAL
UNION
SELECT 1 AS ID,
TO_DATE('03/04/2015') FRM_DT,
TO_DATE('07/31/2015') TO_DT
FROM DUAL
UNION
SELECT 1 AS ID,
TO_DATE('08/01/2015') FRM_DT,
TO_DATE('12/31/2015') TO_DT
FROM DUAL
UNION
SELECT 2 AS ID,
TO_DATE('01/01/2015') FRM_DT,
TO_DATE('06/30/2015') TO_DT
FROM DUAL
UNION
SELECT 2 AS ID,
TO_DATE('11/01/2015') FRM_DT,
TO_DATE('12/31/2015') TO_DT
FROM DUAL
UNION
SELECT 3 AS ID,
TO_DATE('01/01/2015') FRM_DT,
TO_DATE('03/14/2015') TO_DT
FROM DUAL
UNION
SELECT 3 AS ID,
TO_DATE('03/15/2015') FRM_DT,
TO_DATE('11/30/2015') TO_DT
FROM DUAL
UNION
SELECT 3 AS ID,
TO_DATE('12/01/2015') FRM_DT,
TO_DATE('12/31/2015') TO_DT
FROM DUAL
UNION
SELECT 4 AS ID,
TO_DATE('02/01/2015') FRM_DT,
TO_DATE('05/30/2015') TO_DT
FROM DUAL
UNION
SELECT 4 AS ID,
TO_DATE('06/01/2015') FRM_DT,
TO_DATE('12/31/2015') TO_DT
FROM DUAL
)A
)B
GROUP BY B.ID,
B.COND;
-----------------------------------OUTPUT------------------------------------------
ID FRM_DT TO_DT
4 02/01/2015 05/30/2015
4 06/01/2015 12/31/2015
1 01/01/2015 12/31/2015
2 01/01/2015 06/30/2015
2 11/01/2015 12/31/2015
3 01/01/2015 12/31/2015
-----------------------------------OUTPUT------------------------------------------
I am trying to select a record from a row by looking at both the start date and the end date. What I need to do is pick the max start date, then only return a result from that max date if the end date has a value.
I hope the images below help clarify this a bit more. This is in Oracle based SQL.
Example #2
I can, so far, either return all the records or incorrectly return a record in scenario #2 but I've yet to figure out the best way to make this work. I would greatly appreciate any assistance.
Thank you!
I would use an analytic function:
with sample_data as (select 1 id, 1 grp_id, to_date('01/01/2015', 'dd/mm/yyyy') st_dt, to_date('23/01/2015', 'dd/mm/yyyy') ed_dt from dual union all
select 2 id, 1 grp_id, to_date('24/02/2015', 'dd/mm/yyyy') st_dt, to_date('15/02/2015', 'dd/mm/yyyy') ed_dt from dual union all
select 3 id, 1 grp_id, to_date('17/03/2015', 'dd/mm/yyyy') st_dt, to_date('30/03/2015', 'dd/mm/yyyy') ed_dt from dual union all
select 4 id, 2 grp_id, to_date('01/01/2015', 'dd/mm/yyyy') st_dt, to_date('17/01/2015', 'dd/mm/yyyy') ed_dt from dual union all
select 5 id, 2 grp_id, to_date('21/01/2015', 'dd/mm/yyyy') st_dt, to_date('23/03/2015', 'dd/mm/yyyy') ed_dt from dual union all
select 6 id, 2 grp_id, to_date('14/04/2015', 'dd/mm/yyyy') st_dt, to_date('16/05/2015', 'dd/mm/yyyy') ed_dt from dual union all
select 7 id, 2 grp_id, to_date('28/05/2015', 'dd/mm/yyyy') st_dt, null ed_dt from dual),
res as (select id,
grp_id,
st_dt,
ed_dt,
max(st_dt) over (partition by grp_id) max_st_dt
from sample_data)
select id,
grp_id,
st_dt,
ed_dt
from res
where st_dt = max_st_dt
and ed_dt is not null;
ID GRP_ID ST_DT ED_DT
---------- ---------- ---------- ----------
3 1 17/03/2015 30/03/2015
This would be one of the simplest way.
select * from
(
select apay_id,
max(start_dt) OVER () max_start_dt,
start_dt,
end_dt
from sample
)
where
start_dt=max_start_dt
and end_dt is not null
Idea is to get maximum start_dt and corresponding end_dt.
And then filter result if end_dt is null.
SQL Fiddle
Database Schema
create table sample
(apay_id number(7),
account_number number(7),
start_dt date,
end_dt date);
Sample1
insert into sample values(554433, 123456, '15-Aug-15', null);
insert into sample values(112266, 123456, '21-Jul-15', '31-Aug-15');
insert into sample values(733221, 123456, '29-Jun-15', '31-Jul-15');
Output for Sample1
No rows
Sample2
insert into sample values(554433, 123456, '15-Aug-15', '11-Nov-15');
insert into sample values(112266, 123456, '21-Jul-15', '31-Aug-15');
insert into sample values(733221, 123456, '29-Jun-15', '31-Jul-15');
Output for Sample2
| APAY_ID | MAX_START_DT | END_DT |
|---------|--------------------------|----------------------------|
| 554433 | August, 15 2015 00:00:00 | November, 11 2015 00:00:00 |
select * from ( select apay_id from sample where end_dt is not null order by start_dt desc) where rownum=1
I think this can also work.