there are lot of solutions of similar question but based only one date column.
I would like to know maybe better solution for this to solve, I am attaching my solution but I find it a little bit complicated if you know better approach to this please post it.
here is table with orders with start and end dates for 2 items.
I would like to print at least 2 consecutive rows based on date and item.
ITEM , START , END
1. A, 01.01.2020, 31.01.2020
2. A, 01.02.2020, 31.03.2020
3. B, 01.02.2020, 30.04.2020
4. A, 01.05.2020, 30.06.2020
5. B, 01.06.2020, 31.07.2020
6. B, 01.09.2020, 30.09.2020
7. A, 01.08.2020, 31.10.2020
8. B, 01.10.2020, 31.10.2020
9. B, 01.11.2020, 31.12.2020
the output should be rows 1 and 2 for item A and rows 6,8 and 9 for item B
here is my approuch to this
with pool as (
select ITEM, START_DATE, END_DATE,
nvl(lag(end_date,1) over (partition by item order by end_date),START_DATE-1) prev_End_Date
from orders )
, pool2 as (
select item ,
START_DATE, END_DATE,
sum(case when PREV_END_DATE+1 = START_DATE then 0 else 1 end ) over (partition by item order by START_DATE) grp
from pool )
select item,start_date,end_date from (
select
ITEM,
START_DATE,
END_DATE,
grp,
count(grp) over (partition by item,grp ) cnt
from pool2)
where cnt>=2
;
Hmmm . . . use lag() and lead() to see the next/previous values and check if they match:
select o.*
from (select o.*,
lag(end) over (partition by product order by start) as prev_end,
lead(start) over (partition by product order by start) as next_start
from orders o
) o
where start = prev_end + interval '1' day or
end = next_start - interval '1' day;
-- create table and insert rows for test
Create table order_overlap (id number, item varchar2(1), start_date date , end_date date );
insert into order_overlap(id,start_date, end_date, item) values( 1,to_date('01.01.2020', 'dd.mm.yyyy'), to_date( '31.01.2020', 'dd.mm.yyyy'), 'A');
insert into order_overlap(id,start_date, end_date, item) values( 2, to_date('01.02.2020', 'dd.mm.yyyy'), to_date( '31.03.2020', 'dd.mm.yyyy'), 'A');
insert into order_overlap(id,start_date, end_date, item) values( 3, to_date('01.02.2020', 'dd.mm.yyyy'), to_date( '30.04.2020', 'dd.mm.yyyy'), 'B');
insert into order_overlap(id,start_date, end_date, item) values( 4, to_date('01.05.2020', 'dd.mm.yyyy'), to_date( '30.06.2020', 'dd.mm.yyyy'), 'A');
insert into order_overlap(id,start_date, end_date, item) values( 5, to_date('01.06.2020', 'dd.mm.yyyy'), to_date( '31.07.2020', 'dd.mm.yyyy'), 'B');
insert into order_overlap(id,start_date, end_date, item) values( 6, to_date('01.09.2020', 'dd.mm.yyyy'), to_date( '30.09.2020', 'dd.mm.yyyy'), 'B');
insert into order_overlap(id,start_date, end_date, item) values( 7, to_date('01.08.2020', 'dd.mm.yyyy'), to_date( '31.10.2020', 'dd.mm.yyyy'), 'A');
insert into order_overlap(id,start_date, end_date, item) values( 8, to_date('01.10.2020', 'dd.mm.yyyy'), to_date( '31.10.2020', 'dd.mm.yyyy'), 'B');
insert into order_overlap(id,start_date, end_date, item) values( 5, to_date('01.11.2020', 'dd.mm.yyyy'), to_date( '31.12.2020', 'dd.mm.yyyy'), 'B');
-- I did sth a little bit different but maybe you like it.
-- I joined conseutive rows into one - so if you have item
A 01.01.2020 - 31.01.2020
A 01.02.2020 - 28.02.2020
you get one recod
A 01.01.2020 - 28.02.2020
select item, min(start_date) start_date , max(end_date) end_date, count(*)
from (
select item, start_date, end_date,
case when lead(start_date) over(partition by item order by start_date) = end_date + 1
OR lag(end_date) over(partition by item order by end_date) + 1 = start_date
then 0
else rownum
end continuity
from order_overlap )
group by item, continuity
order by item, start_date;
You can simply use MATCH_RECOGNIZE to perform a row-by-row comparison and to only return the groups of rows which match the pattern:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY item
ORDER BY start_date, end_date
ALL ROWS PER MATCH
PATTERN ( FIRST_ROW NEXT_ROWS+ )
DEFINE
NEXT_ROWS AS (
NEXT_ROWS.START_DATE = PREV( END_DATE ) + INTERVAL '1' DAY
)
)
So, for your sample data:
CREATE TABLE table_name ( ITEM, START_DATE, END_DATE ) AS
SELECT 'A', DATE '2020-01-01', DATE '2020-01-31' FROM DUAL UNION ALL
SELECT 'A', DATE '2020-02-01', DATE '2020-03-31' FROM DUAL UNION ALL
SELECT 'B', DATE '2020-02-01', DATE '2020-04-30' FROM DUAL UNION ALL
SELECT 'A', DATE '2020-05-01', DATE '2020-06-30' FROM DUAL UNION ALL
SELECT 'B', DATE '2020-06-01', DATE '2020-07-31' FROM DUAL UNION ALL
SELECT 'B', DATE '2020-09-01', DATE '2020-09-30' FROM DUAL UNION ALL
SELECT 'A', DATE '2020-08-01', DATE '2020-10-31' FROM DUAL UNION ALL
SELECT 'B', DATE '2020-10-01', DATE '2020-10-31' FROM DUAL UNION ALL
SELECT 'B', DATE '2020-11-01', DATE '2020-12-31' FROM DUAL;
This outputs:
ITEM | START_DATE | END_DATE
:--- | :--------- | :---------
A | 2020-01-01 | 2020-01-31
A | 2020-02-01 | 2020-03-31
B | 2020-09-01 | 2020-09-30
B | 2020-10-01 | 2020-10-31
B | 2020-11-01 | 2020-12-31
db<>fiddle here
Related
I am using Oracle and trying to retrieve the total number of days a person was out of the office during the year. I have 2 tables involved:
Statuses
1 - Active
2 - Out of the Office
3 - Other
ScheduleHistory
RecordID - primary key
PersonID
PreviousStatusID
NextStatusID
DateChanged
I can easily find when the person went on vacation and when they came back, using
SELECT DateChanged FROM ScheduleHistory WHERE PersonID=111 AND NextStatusID = 2
and
SELECT DateChanged FROM ScheduleHistory WHERE PersonID=111 AND PreviousStatusID = 2
But in case a person went on vacation more than once, how can I can I calculate total number of days a person was out of the office. Is it possible to do programmatically, given only PersonID?
Here is some sample data:
RecordID PersonID PreviousStatusID NextStatusID DateChanged
-----------------------------------------------------------------------------
1 111 1 2 03/11/2020
2 111 2 1 03/13/2020
3 111 1 3 04/01/2020
4 111 3 1 04/07/2020
5 111 1 2 06/03/2020
6 111 2 1 06/05/2020
7 111 1 2 09/14/2020
8 111 2 1 09/17/2020
So from the data above, for the year 2020 for PersonID 111 the query should return 7
Try this:
with aux1 AS (
SELECT
a.*,
to_date(datechanged, 'MM/DD/YYYY') - LAG(to_date(datechanged, 'MM/DD/YYYY')) OVER(
PARTITION BY personid
ORDER BY
recordid
) lag_date
FROM
ScheduleHistory a
)
SELECT
personid,
SUM(lag_date) tot_days_ooo
FROM
aux1
WHERE
previousstatusid = 2
GROUP BY
personid;
If you want total days (or weekdays) for each year (and to account for periods when it goes over the year boundary) then:
WITH date_ranges ( personid, status, start_date, end_date ) AS (
SELECT personid,
nextstatusid,
datechanged,
LEAD(datechanged, 1, datechanged) OVER(
PARTITION BY personid
ORDER BY datechanged
)
FROM table_name
),
split_year_ranges ( personid, year, start_date, end_date, max_date ) AS (
SELECT personid,
TRUNC( start_date, 'YY' ),
start_date,
LEAST(
end_date,
ADD_MONTHS( TRUNC( start_date, 'YY' ), 12 )
),
end_date
FROM date_ranges
WHERE status = 2
UNION ALL
SELECT personid,
end_date,
end_date,
LEAST( max_date, ADD_MONTHS( end_date, 12 ) ),
max_date
FROM split_year_ranges
WHERE end_date < max_date
)
SELECT personid,
EXTRACT( YEAR FROM year) AS year,
SUM( end_date - start_date ) AS total_days,
SUM(
( TRUNC( end_date, 'IW' ) - TRUNC( start_date, 'IW' ) ) * 5 / 7
+ LEAST( end_date - TRUNC( end_date, 'IW' ), 5 )
- LEAST( start_date - TRUNC( start_date, 'IW' ), 5 )
) AS total_weekdays
FROM split_year_ranges
GROUP BY personid, year
ORDER BY personid, year
Which, for the sample data:
CREATE TABLE table_name ( RecordID, PersonID, PreviousStatusID, NextStatusID, DateChanged ) AS
SELECT 1, 111, 1, 2, DATE '2020-03-11' FROM DUAL UNION ALL
SELECT 2, 111, 2, 1, DATE '2020-03-13' FROM DUAL UNION ALL
SELECT 3, 111, 1, 3, DATE '2020-04-01' FROM DUAL UNION ALL
SELECT 4, 111, 3, 1, DATE '2020-04-07' FROM DUAL UNION ALL
SELECT 5, 111, 1, 2, DATE '2020-06-03' FROM DUAL UNION ALL
SELECT 6, 111, 2, 1, DATE '2020-06-05' FROM DUAL UNION ALL
SELECT 7, 111, 1, 2, DATE '2020-09-14' FROM DUAL UNION ALL
SELECT 8, 111, 2, 1, DATE '2020-09-17' FROM DUAL UNION ALL
SELECT 9, 222, 1, 2, DATE '2019-12-31' FROM DUAL UNION ALL
SELECT 10, 222, 2, 2, DATE '2020-12-01' FROM DUAL UNION ALL
SELECT 11, 222, 2, 2, DATE '2021-01-02' FROM DUAL;
Outputs:
PERSONID
YEAR
TOTAL_DAYS
TOTAL_WEEKDAYS
111
2020
7
7
222
2019
1
1
222
2020
366
262
222
2021
1
1
db<>fiddle here
Provided no vacation crosses a year boundary
with grps as (
SELECT sh.*,
row_number() over (partition by PersonID, NextStatusID order by DateChanged) grp
FROM ScheduleHistory sh
WHERE NextStatusID in (1,2) and 3 not in (NextStatusID, PreviousStatusID)
), durations as (
SELECT PersonID, min(DateChanged) DateChanged, max(DateChanged) - min(DateChanged) duration
FROM grps
GROUP BY PersonID, grp
)
SELECT PersonID, sum(duration) days_out
FROM durations
GROUP BY PersonID;
db<>fiddle
year_span is used to split an interval spanning across two years in two different records
H1 adds a row number dependent from PersonID to get the right sequence for each person
H2 gets the periods for each status change and extract 1st day of the year of the interval end
H3 split records that span across two years and calculate the right date_start and date_end for each interval
H calculates days elapsed in each interval for each year
final query sum up the records to get output
EDIT
If you need workdays instead of total days, you should not use total_days/7*5 because it is a bad approximation and in some cases gives weird results.
I have posted a solution to jump on fridays to mondays here
with
statuses (sid, sdescr) as (
select 1, 'Active' from dual union all
select 2, 'Out of the Office' from dual union all
select 3, 'Other' from dual
),
ScheduleHistory(RecordID, PersonID, PreviousStatusID, NextStatusID , DateChanged) as (
select 1, 111, 1, 2, date '2020-03-11' from dual union all
select 2, 111, 2, 1, date '2020-03-13' from dual union all
select 3, 111, 1, 3, date '2020-04-01' from dual union all
select 4, 111, 3, 1, date '2020-04-07' from dual union all
select 5, 111, 1, 2, date '2020-06-03' from dual union all
select 6, 111, 2, 1, date '2020-06-05' from dual union all
select 7, 111, 1, 2, date '2020-09-14' from dual union all
select 8, 111, 2, 1, date '2020-09-17' from dual union all
SELECT 9, 222, 1, 2, date '2019-12-31' from dual UNION ALL
SELECT 10, 222, 2, 2, date '2020-12-01' from dual UNION ALL
SELECT 11, 222, 2, 2, date '2021-01-02' from dual
),
year_span (n) as (
select 1 from dual union all
select 2 from dual
),
H1 AS (
SELECT ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY RecordID) PID, H.*
FROM ScheduleHistory H
),
H2 as (
SELECT
H1.*, H2.DateChanged DateChanged2,
EXTRACT(YEAR FROM H2.DateChanged) - EXTRACT(YEAR FROM H1.DateChanged) + 1 Y,
trunc(H2.DateChanged,'YEAR') Y2
FROM H1 H1
LEFT JOIN H1 H2 ON H1.PID = H2.PID-1 AND H1.PersonID = H2.PersonID
),
H3 AS (
SELECT Y, N, H2.PID, H2.RecordID, H2.PersonID, H2.NextStatusID,
CASE WHEN Y=1 THEN H2.DateChanged ELSE CASE WHEN N=1 THEN H2.DateChanged ELSE Y2 END END D1,
CASE WHEN Y=1 THEN H2.DateChanged2 ELSE CASE WHEN N=1 THEN Y2 ELSE H2.DateChanged2 END END D2
FROM H2
JOIN year_span N ON N.N <=Y
),
H AS (
SELECT PersonID, NextStatusID, EXTRACT(year FROM d1) Y, d2-d1 D
FROM H3
)
select PersonID, sdescr Status, Y, sum(d) d
from H
join statuses s on NextStatusID = s.sid
group by PersonID, sdescr, Y
order by PersonID, sdescr, Y
output
PersonID Status Y d
111 Active 2020 177
111 Other 2020 6
111 Out of the Office 2020 7
222 Out of the Office 2019 1
222 Out of the Office 2020 366
222 Out of the Office 2021 1
check the fiddle here
I got an SQL problem I'm not capable to solve.
First of all, an SQL fiddle with it: http://sqlfiddle.com/#!4/fe7b07/2
As you see, I fill the table with some dates, which are bound to some ID. Those dates are day by day. So for this example, we'd have something like this, if we only look at January:
The timelines spanning from 2020-01-01 to 2020-01-31, the blocks are the dates in the database. So this would be the simple SELECT * FROM days output.
What I now want is to fill in some days to this output. These would span from timeline_begin to MIN(date_from); and from MAX(date_from) to timeline_end.
I'll mark these red in the following picture:
The orange span is not necessary to be added, too, but if your solution would do that too, that would be also ok.
Ok, so far so good.
For this I created the SELECT * FROM minmax, which will select the MIN(date_from) and MAX(date_from) for every id_othertable. Still no magic involved.
What I struggle is now creating those days for every id_othertable, while also joining the data they have on them (in this fiddle, it's just the some_info field).
I tried to write this in the SELECT * FROM days_before query, but I just can't get it to work. I read about the magical function CONNECT BY, which will on its own create dates line by line, but I can't get to join my data from the former table. Every time I join the info, I only get one line per id_othertable, not all those dates I need.
So the ideal solution I'm looking for would be to have three select queries:
SELECT * FROM days which select dates out of the database
SELECT * FROM days_before which will show the dates before MIN(date_from) of query 1
SELECT * FROM days_after for dates after MAX(date_from) of query 1
And in the end I'd UNION those three queries to have them all combined.
I hope I could explain my problem good enough. If you need any information or further explaining, please don't hesitate to ask.
EDIT 1: I created a pastebin with some example data: https://pastebin.com/jskrStpZ
Bear in mind that only the first query has actual information from the database, the other two have created data. Also, this example output only has data for id_othertable = 1, so the actual query should also have the information for id_othertable = 2, 3.
EDIT 2: just for clarification, the field date_to is just a simple date_from + 1 day.
If you have denormalised date it's quite simple:
with bas as (
select 1 id_other_table, to_date('2020-01-05', 'YYYY-MM-DD') date_from, to_date('2020-01-06', 'YYYY-MM-DD') date_to, 'hello' some_info from dual
union all select 1 id_other_table, to_date('2020-01-06', 'YYYY-MM-DD') date_from, to_date('2020-01-07', 'YYYY-MM-DD') date_to, 'hello' some_info from dual
union all select 1 id_other_table, to_date('2020-01-07', 'YYYY-MM-DD') date_from, to_date('2020-01-08', 'YYYY-MM-DD') date_to, 'hello' some_info from dual
union all select 1 id_other_table, to_date('2020-01-10', 'YYYY-MM-DD') date_from, to_date('2020-01-11', 'YYYY-MM-DD') date_to, 'hello' some_info from dual
union all select 1 id_other_table, to_date('2020-01-11', 'YYYY-MM-DD') date_from, to_date('2020-01-12', 'YYYY-MM-DD') date_to, 'hello' some_info from dual
union all select 1 id_other_table, to_date('2020-01-12', 'YYYY-MM-DD') date_from, to_date('2020-01-13', 'YYYY-MM-DD') date_to, 'hello' some_info from dual
union all select 2 id_other_table, to_date('2020-01-10', 'YYYY-MM-DD') date_from, to_date('2020-01-11', 'YYYY-MM-DD') date_to, 'my' some_info from dual
union all select 2 id_other_table, to_date('2020-01-11', 'YYYY-MM-DD') date_from, to_date('2020-01-12', 'YYYY-MM-DD') date_to, 'my' some_info from dual
union all select 2 id_other_table, to_date('2020-01-12', 'YYYY-MM-DD') date_from, to_date('2020-01-13', 'YYYY-MM-DD') date_to, 'my' some_info from dual
union all select 3 id_other_table, to_date('2020-01-20', 'YYYY-MM-DD') date_from, to_date('2020-01-21', 'YYYY-MM-DD') date_to, 'friend' some_info from dual
union all select 3 id_other_table, to_date('2020-01-21', 'YYYY-MM-DD') date_from, to_date('2020-01-22', 'YYYY-MM-DD') date_to, 'friend' some_info from dual
union all select 3 id_other_table, to_date('2020-01-22', 'YYYY-MM-DD') date_from, to_date('2020-01-23', 'YYYY-MM-DD') date_to, 'friend' some_info from dual)
, ad as (select trunc(sysdate,'YYYY') -1 + level all_dates from dual connect by level <= 31)
select distinct some_info,all_dates from bas,ad where (some_info,all_dates) not in (select some_info,date_from from bas)
If you have longer date ranges or mind of the time the query needs another solution is helpful. But that is harder to debug. Because it's quite hard to get the orange time slot
If you want the dates per id that are not in the database then you can use the LEAD analytic function:
WITH dates ( id, date_from, date_to ) AS (
SELECT id_othertable,
DATE '2020-01-01',
MIN( date_from )
FROM some_dates
WHERE date_to > DATE '2020-01-01'
AND date_from < ADD_MONTHS( DATE '2020-01-01', 1 )
GROUP BY id_othertable
UNION ALL
SELECT id_othertable,
date_to,
LEAD( date_from, 1, ADD_MONTHS( DATE '2020-01-01', 1 ) )
OVER ( PARTITION BY id_othertable ORDER BY date_from )
FROM some_dates
WHERE date_to > DATE '2020-01-01'
AND date_from < ADD_MONTHS( DATE '2020-01-01', 1 )
)
SELECT id,
date_from,
date_to
FROM dates
WHERE date_from < date_to
ORDER BY id, date_from;
so for the test data:
CREATE TABLE some_dates ( id_othertable, date_from, date_to, some_info ) AS
SELECT 1, DATE '2020-01-05', DATE '2020-01-06', 'hello1' FROM DUAL UNION ALL
SELECT 1, DATE '2020-01-06', DATE '2020-01-07', 'hello2' FROM DUAL UNION ALL
SELECT 1, DATE '2020-01-07', DATE '2020-01-08', 'hello3' FROM DUAL UNION ALL
SELECT 1, DATE '2020-01-10', DATE '2020-01-13', 'hello4' FROM DUAL UNION ALL
SELECT 2, DATE '2020-01-10', DATE '2020-01-13', 'my' FROM DUAL UNION ALL
SELECT 3, DATE '2020-01-20', DATE '2020-01-23', 'friend' FROM DUAL UNION ALL
SELECT 4, DATE '2019-12-31', DATE '2020-01-05', 'before' FROM DUAL UNION ALL
SELECT 4, DATE '2020-01-30', DATE '2020-02-02', 'after' FROM DUAL UNION ALL
SELECT 5, DATE '2019-12-31', DATE '2020-01-10', 'only_before' FROM DUAL UNION ALL
SELECT 6, DATE '2020-01-15', DATE '2020-02-01', 'only_after' FROM DUAL UNION ALL
SELECT 7, DATE '2019-12-31', DATE '2020-02-01', 'exlude_all' FROM DUAL;
this outputs:
ID | DATE_FROM | DATE_TO
-: | :--------- | :---------
1 | 2020-01-01 | 2020-01-05
1 | 2020-01-08 | 2020-01-10
1 | 2020-01-13 | 2020-02-01
2 | 2020-01-01 | 2020-01-10
2 | 2020-01-13 | 2020-02-01
3 | 2020-01-01 | 2020-01-20
3 | 2020-01-23 | 2020-02-01
4 | 2020-01-05 | 2020-01-30
5 | 2020-01-10 | 2020-02-01
6 | 2020-01-01 | 2020-01-15
db<>fiddle here
If you want the days before then filter on:
WHERE day_from = DATE '2020-01-01'
and, similarly, if you want the days after then filter on:
WHERE day_to = ADD_MONTHS( DATE '2020-01-01', 1 )
If you want to specify the start date and number of months duration then use named bind parameters:
WITH dates ( id, date_from, date_to ) AS (
SELECT id_othertable,
:start_date,
MIN( date_from )
FROM some_dates
WHERE date_to > :start_date
AND date_from < ADD_MONTHS( :start_date, :number_months )
GROUP BY id_othertable
UNION ALL
SELECT id_othertable,
date_to,
LEAD( date_from, 1, ADD_MONTHS( :start_date, :number_months ) )
OVER ( PARTITION BY id_othertable ORDER BY date_from )
FROM some_dates
WHERE date_to > :start_date
AND date_from < ADD_MONTHS( :start_date, :number_months )
)
SELECT id,
date_from,
date_to
FROM dates
WHERE date_from < date_to
ORDER BY id, date_from;
Select whole range using connect by generator. Join your table partitioned by id.
select date_from, nvl(date_to, date_from +1) date_to, id_othertable, some_info
from (
select date '2020-01-01' + level - 1 as date_from
from dual
connect by level <= date '2020-01-31' - date '2020-01-01' ) gen
natural left join some_dates partition by (id_othertable)
sqlfiddle
How do I create logic to combine multiple records that have continuous date ranges into a single row
the following sample data
Member_key start_date end_date
1 1/1/2017 1/31/2017
1 2/1/2017 2/28/2017
1 3/1/2017 3/31/2017
2 1/1/2017 1/31/2017
2 3/1/2017 3/31/2017
would end up returning the following result set
1 1/1/2017 3/31/2017
2 1/1/2017 1/31/2017
2 3/1/2017 3/31/2017
I found the following link to be very helpful and I am sure I am on the right track but am running into errors when trying to convert the code to hive sql
http://betteratoracle.com/posts/35-collapsing-continuous-ranges-into-single-rows
here's where I am getting stuck (2nd to last line below - with the order by in my max(grp) over .....
with data as(
select
member_key,
case
when datediff(start_date, lag(end_date) over (partition by member_key order by start_date asc)) <= 1 then
null
else
row_number() over ()
end grp,
start_date,
end_date
from default.eligibility_span_test
order by member_key, start_date
)
select member_key, start_date, end_date
, max(grp) over (order by member_key, start_date) sequence
from data
here are the insert statements I am using to add data to a test table:
insert into default.eligibility_span_test values (1, '2017-01-01','2017-01-31');
insert into default.eligibility_span_test values (1, '2017-02-01', '2017-02-28');
insert into default.eligibility_span_test values (1, '2017-03-01', '2017-03-31');
insert into default.eligibility_span_test values (2, '2017-01-01', '2017-01-31');
insert into default.eligibility_span_test values (2, '2017-03-01', '2017-03-31');
Can you try the below query -
with eligibility_span_test as
(
select 1 as Member_key, from_unixtime(unix_timestamp('2017-01-01', 'yyyy-MM-dd'), 'yyyy-MM-dd') as start_date, from_unixtime(unix_timestamp('2017-01-31', 'yyyy-MM-dd'), 'yyyy-MM-dd') end_date
union
select 1 as Member_key, from_unixtime(unix_timestamp('2017-02-01', 'yyyy-MM-dd'), 'yyyy-MM-dd') as start_date, from_unixtime(unix_timestamp('2017-02-28', 'yyyy-MM-dd'), 'yyyy-MM-dd') end_date
union
select 1 as Member_key, from_unixtime(unix_timestamp('2017-03-01', 'yyyy-MM-dd'), 'yyyy-MM-dd') as start_date, from_unixtime(unix_timestamp('2017-03-31', 'yyyy-MM-dd'), 'yyyy-MM-dd') end_date
union
select 2 as Member_key, from_unixtime(unix_timestamp('2017-01-01', 'yyyy-MM-dd'), 'yyyy-MM-dd') as start_date, from_unixtime(unix_timestamp('2017-01-31', 'yyyy-MM-dd'), 'yyyy-MM-dd') end_date
union
select 2 as Member_key, from_unixtime(unix_timestamp('2017-03-01', 'yyyy-MM-dd'), 'yyyy-MM-dd') as start_date, from_unixtime(unix_timestamp('2017-03-31', 'yyyy-MM-dd'), 'yyyy-MM-dd') end_date
),
res as (select member_key, month(start_date) - row_number() over (partition by member_key order by start_date) as groupBy, start_date, end_date from eligibility_span_test)
select member_key, min(start_date), min(end_date) from res group by groupBy, member_key;
Above query will fetch those memberId where we don't have consecutive start and end dates and one memberId if we have consecutive dates
The events table looks like
event_type value timestamp
2 2 06-06-2016 14:00:00
2 7 06-06-2016 13:00:00
2 2 06-06-2016 12:00:00
3 3 06-06-2016 14:00:00
3 9 06-06-2016 13:00:00
4 9 06-06-2016 13:00:00
My goal is to filter event types that occur more than twice and subtract most two recent values and shows BY event_type.
The end result would be
event_type value
2 -5
3 -6
I was able to get filter events occurred more than twice and order by event_type based on timestamp desc.
The difficult part for me is to subtract most two recent values and shows BY event_type.
DB / SQL experts , please help
You can use a query like this:
SELECT event_type, diff
FROM (
SELECT event_type, value, "timestamp", rn,
value - LEAD(value) OVER (PARTITION BY event_type
ORDER BY "timestamp" DESC) AS diff
FROM (
SELECT event_type, value, "timestamp",
COUNT(*) OVER (PARTITION BY event_type) AS cnt,
ROW_NUMBER() OVER (PARTITION BY event_type ORDER BY "timestamp" DESC) AS rn
FROM mytable) AS t
WHERE cnt >=2 AND rn <= 2 ) AS s
WHERE rn = 1
The innermost subquery uses:
Window function COUNT with PARTITION BY clause, so as to calculate the population of each event_type slice.
Window function ROW_NUMBER so as to get the two latest records within each event_type slice.
The mid-level query uses LEAD window function, so as to calculate the difference between the first and the second records. The outermost query simply returns this difference.
Demo here
This example only for Oracle.
Test data:
with t(event_type,
value,
timestamp) as
(select 2, 2, to_timestamp('06-06-2016 14:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 2, 7, to_timestamp('06-06-2016 13:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 2, 2, to_timestamp('06-06-2016 12:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 3, 3, to_timestamp('06-06-2016 14:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 3, 9, to_timestamp('06-06-2016 13:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 4, 9, to_timestamp('06-06-2016 13:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual)
Query:
select event_type,
max(value) keep(dense_rank first order by rn) - max(value) keep(dense_rank last order by rn) as value
from (select event_type,
row_number() over(partition by event_type order by timestamp desc) rn,
value
from t) t
where rn in (1, 2)
group by event_type
having count (*) >= 2
I have a table named x . The data is as follows.
Acccount_num start_dt end_dt
A111326 02/01/2016 02/11/2016
A111326 02/12/2016 03/05/2016
A111326 03/02/2016 03/16/2016
A111331 02/28/2016 02/29/2016
A111331 02/29/2016 03/29/2016
A999999 08/25/2015 08/25/2015
A999999 12/19/2015 12/22/2015
A222222 11/06/2015 11/10/2015
A222222 05/16/2016 05/17/2016
Both A111326 and A111331 should be identified as contiguous data and A999999 and
A222222 should be identified as discontinuous data.In my code I currently use the following query to identify discontinuous data. The A111326 is also erroneously identified as discontinuous data. Please help to modify the below code so that A111326 is not identified as discontinuous data.Thanks in advance for your help.
(SELECT account_num
FROM (SELECT account_num,
(MAX (
END_DT)
OVER (PARTITION BY account_num
ORDER BY START_DT))
START_DT,
(LEAD (
START_DT)
OVER (PARTITION BY account_num
ORDER BY START_DT))
END_DT
FROM x
WHERE (START_DT + 1) <=
(END_DT - 1))
WHERE START_DT < END_DT);
Oracle Setup:
CREATE TABLE accounts ( Account_num, start_dt, end_dt ) AS
SELECT 'A', DATE '2016-02-01', DATE '2016-02-11' FROM DUAL UNION ALL
SELECT 'A', DATE '2016-02-12', DATE '2016-03-05' FROM DUAL UNION ALL
SELECT 'A', DATE '2016-03-02', DATE '2016-03-16' FROM DUAL UNION ALL
SELECT 'B', DATE '2016-02-28', DATE '2016-02-29' FROM DUAL UNION ALL
SELECT 'B', DATE '2016-02-29', DATE '2016-03-29' FROM DUAL UNION ALL
SELECT 'C', DATE '2015-08-25', DATE '2015-08-25' FROM DUAL UNION ALL
SELECT 'C', DATE '2015-12-19', DATE '2015-12-22' FROM DUAL UNION ALL
SELECT 'D', DATE '2015-11-06', DATE '2015-11-10' FROM DUAL UNION ALL
SELECT 'D', DATE '2016-05-16', DATE '2016-05-17' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-01', DATE '2016-01-02' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-05', DATE '2016-01-06' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-03', DATE '2016-01-07' FROM DUAL;
Query:
WITH times ( account_num, dt, lvl ) AS (
SELECT Account_num, start_dt - 1, 1 FROM accounts
UNION ALL
SELECT Account_num, end_dt, -1 FROM accounts
)
, totals ( account_num, dt, total ) AS (
SELECT account_num,
dt,
SUM( lvl ) OVER ( PARTITION BY Account_num ORDER BY dt, lvl DESC )
FROM times
)
SELECT Account_num,
CASE WHEN COUNT( CASE total WHEN 0 THEN 1 END ) > 1
THEN 'N'
ELSE 'Y'
END AS is_contiguous
FROM totals
GROUP BY Account_Num
ORDER BY Account_Num;
Output:
ACCOUNT_NUM IS_CONTIGUOUS
----------- -------------
A Y
B Y
C N
D N
E Y
Alternative Query:
(It's exactly the same method just using UNPIVOT rather than UNION ALL.)
SELECT Account_num,
CASE WHEN COUNT( CASE total WHEN 0 THEN 1 END ) > 1
THEN 'N'
ELSE 'Y'
END AS is_contiguous
FROM (
SELECT Account_num,
SUM( lvl ) OVER ( PARTITION BY Account_Num
ORDER BY CASE lvl WHEN 1 THEN dt - 1 ELSE dt END,
lvl DESC
) AS total
FROM accounts
UNPIVOT ( dt FOR lvl IN ( start_dt AS 1, end_dt AS -1 ) )
)
GROUP BY Account_Num
ORDER BY Account_Num;
WITH cte AS (
SELECT
AccountNumber
,CASE
WHEN
LAG(End_Dt) OVER (PARTITION BY AccountNumber ORDER BY End_Dt) IS NULL THEN 0
WHEN
LAG(End_Dt) OVER (PARTITION BY AccountNumber ORDER BY End_Dt) >= Start_Dt - 1 THEN 0
ELSE 1
END as discontiguous
FROM
#Table
)
SELECT
AccountNumber
,CASE WHEN SUM(discontiguous) > 0 THEN 'discontiguous' ELSE 'contiguous' END
FROM
cte
GROUP BY
AccountNumber;
One of your problems is that your contiguous desired result also includes overlapping date ranges in your example data set. Example A111326 Starts on 3/2/2016 but ends the row before on 3/5/2015 meaning it overlaps by 3 days.