Create time intervals based on values in one column / SQL Oracle - sql

I need to create query that will return time intervals from table, that has attributes for (almost) every day.
The original table looks like the following:
Person | Date | Date_Type
-------|------------|----------
Sam | 01.06.2020 | Vacation
Sam | 02.06.2020 | Vacation
Sam | 03.06.2020 | Work
Sam | 04.06.2020 | Work
Sam | 05.06.2020 | Work
Frodo | 01.06.2020 | Work
Frodo | 02.06.2020 | Work
.....
And the desired should look like:
Person | Date_Interval | Date_Type
-------|-----------------------|----------
Sam | 01.06.2020-02.06.2020 | Vacation
Sam | 03.06.2020-05.06.2020 | Work
Frodo | 01.06.2020-02.06.2020 | Work
.....
Will be grateful for any idea :)

This reads like a gaps-and-island problem. Here is one approach:
select person, min(date) startdate, max(date) enddate, date_type
from (
select t.*,
row_number() over(partition by person order by date) rn1,
row_number() over(partition by person, date_type order by date) rn2
from mytable t
) t
group by person, date_type, rn1 - rn2
This also works if not all dates are contiguous (since you stated that you have almost all dates, I understood you don't have them all).

This is a type of gaps-and-islands problem.
To get adjacent days with the same date_type, you can subtract a sequence. It will be constant for adjacent days. Then you can aggregate:
select person, date_type, min(date), max(date)
from (select t.*,
row_number() over (partition by person, date_type
order by date) as seqnum
from t
) t
group by person, date_type, (date - seqnum);

One of the simplest methods is to use MATCH_RECOGNIZE to perform a row-by-row comparison and aggregation:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY Person
ORDER BY "DATE"
MEASURES
FIRST( "DATE" ) AS start_date,
LAST( "DATE") AS end_date,
FIRST( Date_Type ) AS date_type
ONE ROW PER MATCH
PATTERN ( successive_dates+ )
DEFINE
SUCCESSIVE_DATES AS (
FIRST( Date_Type ) = NEXT( Date_Type )
AND MAX( "DATE" ) + INTERVAL '1' DAY = NEXT( "DATE")
)
);
Which, for the sample data:
CREATE TABLE table_name ( Person, "DATE", Date_Type ) AS
SELECT 'Sam', DATE '2020-06-01', 'Vacation' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-02', 'Vacation' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-03', 'Work' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-04', 'Work' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-05', 'Work' FROM DUAL UNION ALL
SELECT 'Frodo', DATE '2020-06-01', 'Work' FROM DUAL UNION ALL
SELECT 'Frodo', DATE '2020-06-02', 'Work' FROM DUAL;
Outputs:
PERSON | START_DATE | END_DATE | DATE_TYPE
:----- | :------------------ | :------------------ | :--------
Frodo | 2020-06-01 00:00:00 | 2020-06-01 00:00:00 | Work
Sam | 2020-06-01 00:00:00 | 2020-06-01 00:00:00 | Vacation
Sam | 2020-06-03 00:00:00 | 2020-06-04 00:00:00 | Work
db<>fiddle here

Related

Stack several rows into one with date condition

I've got raw data from table with information about clients. Information comes from different sources, so it causes duplicates but with different dates:
id pp type start_dt end_dt
100| 1 | Y | 01.05.19 | 01.10.20
100| 1 | Y | 10.08.20 | 01.10.20
100| 1 | N | 01.10.20 | 02.12.21
100| 1 | N | 13.12.20 | 02.12.21
100| 1 | Y | 02.12.21 | 02.12.26
100| 1 | Y | 20.12.21 | 20.12.26
For example, in this table row 2, 4 and 6 have start date within "start_dt" and "end_dt" of previous row. It's a duplicate, but I need to combine min start date and max end date from both rows for type.
FYI. First two rows and last two rows have same id, pp and type, but I need to stack them separately because of the timeline.
What I want to get (continuous timeline for a client is a key):
id pp type start_dt end_dt | cnt
100| 1 | Y | 01.05.19 | 01.10.20 | 2
100| 1 | N | 01.10.20 | 02.12.21 | 2
100| 1 | Y | 02.12.21 | 20.12.26 | 2
I'm using PL/SQL. I think it could be solved by window functions, but I can't figure out which functions to use.
Tried to solve it by group by while having > 1, but in this case it stacks four rows with same type (rows 1,2 and 5,6) into one. I need two separate rows for each type while saving continuous timeline of dates for one client.
From Oracle 12, you can use MATCH_RECOGNIZE for row-by-row pattern matching:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY id, pp
ORDER BY start_dt
MEASURES
FIRST(type) AS type,
FIRST(start_dt) AS start_dt,
MAX(end_dt) AS end_dt,
COUNT(*) AS cnt
PATTERN (overlapping* last_row)
DEFINE
overlapping AS type = NEXT(type)
AND MAX(end_dt) >= NEXT(start_dt)
)
Which, for the sample data:
CREATE TABLE table_name (id, pp, type, start_dt, end_dt) AS
SELECT 100, 1, 'Y', DATE '2019-05-01', DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2020-08-10', DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 100, 1, 'N', DATE '2020-10-01', DATE '2021-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'N', DATE '2020-12-13', DATE '2021-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2021-12-02', DATE '2026-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2021-12-20', DATE '2026-12-20' FROM DUAL;
Outputs:
ID
PP
TYPE
START_DT
END_DT
CNT
100
1
Y
2019-05-01 00:00:00
2020-10-01 00:00:00
2
100
1
N
2020-10-01 00:00:00
2021-12-02 00:00:00
2
100
1
Y
2021-12-02 00:00:00
2026-12-20 00:00:00
2
fiddle
If you want to use analytic and aggregation functions then it is a bit more complicated:
SELECT id, pp, type,
MIN(start_dt) AS start_dt,
MAX(end_dt) AS end_dt,
COUNT(*) AS cnt
FROM (
SELECT id, pp, type, start_dt, end_dt,
SUM(grp_change) OVER (
PARTITION BY id, pp, type
ORDER BY start_dt
) AS grp
FROM (
SELECT t.*,
CASE
WHEN start_dt <= MAX(end_dt) OVER (
PARTITION BY id, pp, type
ORDER BY start_dt
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
THEN 0
ELSE 1
END AS grp_change
FROM table_name t
)
)
GROUP BY id, pp, type, grp
ORDER BY id, pp, start_dt
fiddle
I prefer this version because comparing "type = next(type)" without "type" being in the "order by" may lead to errors.
match_recognize(
partition by id, pp, type
order by start_dt,end_dt
measures first(start_dt) as start_dt, max(end_dt) as end_dt, count(*) as n
pattern (merged* strt)
define
merged as max(end_dt) >= next(start_dt)
)

Oracle SQL - Find origin ID of autoincrement column

There's a table on my ERP database that has data about certain events. It has the start date, end date and a column shows if the event is a continuation of a previous one (sequential_id references unique_id). Here's an example:
unique_id
start_date
end_date
sequential_id
001
2021-01-01
2021-01-15
002
2021-02-01
2021-02-16
001
003
2021-03-01
2021-03-17
002
004
2021-03-10
2021-03-11
005
2021-03-19
In the example above, rows 001, 002 and 003 are all part of the same event, and 004/005 are unique events, with no sequences. How can I group the data in a way that the output is like this:
origin_id
start_date
end_date
001
2021-01-01
2021-03-17
004
2021-03-10
2021-03-11
005
2021-03-19
I've tried using group by, but due to sequential_id being auto incremental, it didn't work.
Thanks in advance.
You can use modern match_recognize which is an optimal solution for such tasks:
Pattern Recognition With MATCH_RECOGNIZE
DBFiddle
select *
from t
match_recognize(
measures
first(unique_id) start_unique_id,
first(start_date) start_date,
last(end_date) end_date
pattern (strt nxt*)
define nxt as sequential_id=prev(unique_id)
);
You can use hierarchical query for this:
with a (unique_id, start_date, end_date, sequential_id) as (
select '001', date '2021-01-01', date '2021-01-15', null from dual union all
select '002', date '2021-02-01', date '2021-02-16', '001' from dual union all
select '003', date '2021-03-01', date '2021-03-17', '002' from dual union all
select '004', date '2021-03-10', date '2021-03-11', null from dual union all
select '005', date '2021-03-19', null, null from dual
)
, b as (
select
connect_by_root(unique_id) as unique_id
, connect_by_root(start_date) as start_date
, end_date
, connect_by_isleaf as l
from a
start with sequential_id is null
connect by prior unique_id = sequential_id
)
select
unique_id
, start_date
, end_date
from b
where l = 1
order by 1 asc
UNIQUE_ID | START_DATE | END_DATE
:-------- | :--------- | :--------
001 | 01-JAN-21 | 17-MAR-21
004 | 10-MAR-21 | 11-MAR-21
005 | 19-MAR-21 | null
db<>fiddle here
This is a graph-walking problem, so you can use a recursive CTE:
with cte (unique_id, start_date, end_date, start_unique_id) as (
select unique_id, start_date, end_date, unique_id
from t
where not exists (select 1 from t t2 where t.sequential_id = t2.unique_id)
union all
select t.unique_id, t.start_date, t.end_date, cte.start_unique_id
from cte join
t
on cte.unique_id = t.sequential_id
)
select start_unique_id, min(start_date), max(end_date)
from cte
group by start_Unique_id;
Here is a db<>fiddle.

How to do a query on Oracle SQL to get time intervals, grouping by specific fields

I love a good challenge, but this one has been breaking my head for too long. :)
I'm trying to build a query to get dates intervals, grouping the information by one field.
Let me try to explain it in a simple way.
We have this table:
I need to get the intervals a soldier spent on each ranking, so the end result I need to get should be something like this:
As you can see the soldier can be promoted/demoted along the time.
Any suggestion on how to build a query to do this?
THANK YOU!
From Oracle 12, you can use MATCH_RECOGNIZE:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY id
ORDER BY start_date, end_date
MEASURES
FIRST( name ) AS name,
FIRST( ranking ) AS ranking,
FIRST( start_date ) AS start_date,
LAST( end_Date ) AS end_Date
PATTERN ( same_rank+ )
DEFINE same_rank AS FIRST( ranking ) = ranking
)
Which, for the sample data:
CREATE TABLE table_name ( id, name, ranking, start_date, end_date ) AS
SELECT 1001, 'Jones', 'Lieutenant', DATE '2000-03-20', DATE '2002-08-15' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Lieutenant', DATE '2002-08-16', DATE '2003-03-18' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Lieutenant', DATE '2003-03-19', DATE '2004-06-01' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Lieutenant', DATE '2004-06-02', DATE '2004-10-01' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Captain', DATE '2004-10-02', DATE '2005-04-20' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Captain', DATE '2005-04-21', DATE '2007-02-20' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Major', DATE '2007-02-21', DATE '2008-10-22' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Major', DATE '2008-10-23', DATE '2010-01-26' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Captain', DATE '2010-01-27', DATE '2013-11-25' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Captain', DATE '2013-11-26', DATE '2014-05-11' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Major', DATE '2014-05-12', DATE '2016-04-22' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'General', DATE '2016-04-23', DATE '2020-10-10' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'General', DATE '2020-10-11', DATE '2020-11-30' FROM DUAL;
Outputs:
ID | NAME | RANKING | START_DATE | END_DATE
---: | :---- | :--------- | :------------------ | :------------------
1001 | Jones | Lieutenant | 2000-03-20 00:00:00 | 2004-10-01 00:00:00
1001 | Jones | Captain | 2004-10-02 00:00:00 | 2007-02-20 00:00:00
1001 | Jones | Major | 2007-02-21 00:00:00 | 2010-01-26 00:00:00
1001 | Jones | Captain | 2010-01-27 00:00:00 | 2014-05-11 00:00:00
1001 | Jones | Major | 2014-05-12 00:00:00 | 2016-04-22 00:00:00
1001 | Jones | General | 2016-04-23 00:00:00 | 2020-11-30 00:00:00
db<>fiddle here
This is a type of gaps and islands problem. You want to find groups of rows that are the same, which you can do using lag() to compare the ranking and then a cumulative sum to keep track of the changes:
select soldier_id, soldier_name, ranking,
min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date = start_date - interval '1' day then 0 else 1 end)
(partition by soldier_id order by start_date) as island
from (select t.*,
lag(end_date) over (partition by soldier_id, ranking order by start_date) as prev_end_date
from t
) t
) t
group by soldier_id, soldier_name, ranking, island;
Note: This assumes that the soldier_name does not change over time for a given soldier. If that is something you need to deal with, then ask a new question with appropriate sample data and desired results.

First effective date from a table sql

There is a table asg_table with columns effective_start_date and effective_end_date.
asg_table:
asg_number effective_start_date effective_end_date location department action_code
1 01-jan-2019 20-jan-2019 HR HIRE
1 21-JAN-2019 18-FEB-2019 Vietnam HR CHANGE_ASG
1 19-FEB-2019 31-DEC-4712 Vietnam Manegment CHANGE_ASG
2 02-mar-2019 29-Apr-2019 Peru hr HIRE
2 30-Apr-2019 31-dec-4712 Vietnam HR CHANGE_ASG
I want to create a query to find the first effective_start_date of the employee when the action_code is HIRE, and the location is null.
Is there a function to do so ?
It depends on what order you want to perform the filtering:
Oracle Setup:
CREATE TABLE asg_table ( asg_number, effective_start_date, effective_end_date, location, department, action_code ) AS
SELECT 1, DATE '2019-01-01', DATE '2019-01-20', NULL, 'HR', 'HIRE' FROM DUAL UNION ALL
SELECT 1, DATE '2019-01-21', DATE '2019-02-18', 'Vietnam', 'HR', 'CHANGE_ASG' FROM DUAL UNION ALL
SELECT 1, DATE '2019-02-19', DATE '4712-12-31', 'Vietnam', 'Management', 'CHANGE_ASG' FROM DUAL UNION ALL
SELECT 2, DATE '2019-03-02', DATE '2019-04-29', 'Peru', 'HR', 'HIRE' FROM DUAL UNION ALL
SELECT 2, DATE '2019-04-30', DATE '2019-05-01', 'Vietnam', 'HR', 'CHANGE_ASG' FROM DUAL UNION ALL
SELECT 2, DATE '2019-05-01', DATE '2019-06-01', 'Vietnam', 'HR', 'FIRE' FROM DUAL UNION ALL
SELECT 2, DATE '2019-06-01', DATE '4712-12-31', NULL, 'HR', 'HIRE' FROM DUAL;
Query 1:
If you want to filter where location is NULL and action_code is HIRE first and then find the earliest effective_start_date for each asg_number then:
SELECT *
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY asg_number ORDER BY effective_start_date ASC ) rn
FROM asg_table t
WHERE location IS NULL
AND action_code = 'HIRE'
)
WHERE rn = 1
Output:
ASG_NUMBER | EFFECTIVE_START_DATE | EFFECTIVE_END_DATE | LOCATION | DEPARTMENT | ACTION_CODE | RN
---------: | :------------------- | :----------------- | :------- | :--------- | :---------- | -:
1 | 2019-01-01 | 2019-01-20 | null | HR | HIRE | 1
2 | 2019-06-01 | 4712-12-31 | null | HR | HIRE | 1
Query 2:
If you want to find the earliest effective_start_date for each asg_number and then filter where location is NULL and action_code is HIRE first then:
SELECT *
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY asg_number ORDER BY effective_start_date ASC ) rn
FROM asg_table t
)
WHERE location IS NULL
AND action_code = 'HIRE'
AND rn = 1
Output:
ASG_NUMBER | EFFECTIVE_START_DATE | EFFECTIVE_END_DATE | LOCATION | DEPARTMENT | ACTION_CODE | RN
---------: | :------------------- | :----------------- | :------- | :--------- | :---------- | -:
1 | 2019-01-01 | 2019-01-20 | null | HR | HIRE | 1
db<>fiddle here
I hope there is some column to identify the employee.
Try this:
Select * from
(Select t.*,
Row_number() over
(partition by t.employee_id order by t.effective_start_date) as rn
From asg_table t
Where t.location is null
AND t.action_code = 'HIRE')
Where rn = 1;
Cheers!!
SELECT MIN(effective_start_date) OVER(PARTITION BY action_code ORDER BY action_code) AS first_effective_start_date,A.*
FROM asg_table A
WHERE LOCATION IS NULL
AND action_code ='HIRE'
;
If you would like to get all records along with first_effective_start_date as a separate column.
Cheers!!
No need for function
SELECT * FROM TABLENAME WHERE EFFECTIVE_START_DATE IN (CASE WHEN ACTION_CODE='HIRE' AND LOCATION IS NULL THEN EFFECTIVE_START_DATE END)
On top of the above query use
ROW_NUMBER ANALYTICAL FUNCTION
to get the first
EFFECTIVE_START_DATE
SELECT
(SELECT MIN(effective_start_date) FROM asg_table) as first
FROM
asg_table
WHERE
location IS NULL AND action_code = "HIRE"
LIMIT 1;
Enjoy!

SQL find effective price of the products based on the date

I have a table with four columns : id,validFrom,validTo and price.
This table contains the price of an article and the duration when that price is effective.
| id| validFrom | validTo | price
|---|-----------|-----------|---------
| 1 | 01-01-17 | 10-01-17 | 30000
| 1 | 04-01-17 | 09-01-17 | 20000
Now, for this inputs in my table my query output should be :
| id| validFrom | validTo | price
|---|-----------|----------|-------
| 1 | 01-01-17 | 03-01-17 | 30000
| 1 | 04-01-17 | 09-01-17 | 20000
| 1 | 10-01-17 | 10-01-17 | 30000
I can compare the dates and check if products with same id have overlapping dates but I have no idea how to split those dates into non-overlapping dates. Also I am not allowed to use PL/SQL.
Is this possible using only SQL ?
Oracle Setup:
CREATE TABLE prices ( id, validFrom, validTo, price ) AS
SELECT 1, DATE '2017-01-01', DATE '2017-01-10', 30000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-04', DATE '2017-01-09', 20000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-11', DATE '2017-01-15', 10000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-16', DATE '2017-01-18', 15000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-17', DATE '2017-01-20', 40000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-21', DATE '2017-01-24', 28000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-23', DATE '2017-01-26', 23000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-26', DATE '2017-01-26', 17000 FROM DUAL;
Query:
WITH daily_prices ( id, dt, price, duration ) AS (
-- Unroll the price ranges to individual days
SELECT id,
d.COLUMN_VALUE,
price,
validTo - validFrom
FROM prices p,
TABLE(
CAST(
MULTISET(
SELECT p.validFrom + LEVEL - 1
FROM DUAL
CONNECT BY p.validFrom + LEVEL - 1 <= p.validTo
)
AS SYS.ODCIDATELIST
)
) d
),
min_daily_prices ( id, dt, price ) AS (
-- Where a day falls between multiple ranges group them so the price
-- is for the shortest duration offer and if there are two equally short
-- durations then take the minimum price
SELECT id,
dt,
MIN( price ) KEEP ( DENSE_RANK FIRST ORDER BY duration )
FROM daily_prices
GROUP BY id, dt
),
group_changes ( id, dt, price, has_changed_group ) AS (
-- Find when the price changes or a day is skipped which means a new price
-- group is beginning
SELECT id,
dt,
price,
CASE WHEN dt = LAG( dt ) OVER ( PARTITION BY id ORDER BY dt ) + 1
AND price = LAG( price ) OVER ( PARTITION BY id ORDER BY dt )
THEN 0
ELSE 1
END
FROM min_daily_prices
),
groups ( id, dt, price, grp ) AS (
-- Calculate unique indexes (per id) for each group of price ranges
SELECT id,
dt,
price,
SUM( has_changed_group ) OVER ( PARTITION BY id ORDER BY dt )
FROM group_changes
)
SELECT id,
MIN( dt ) AS validFrom,
MAX( dt ) AS validTo,
MIN( price ) AS price
FROM groups
GROUP BY id, grp
ORDER BY id, validFrom;
Output:
ID VALIDFROM VALIDTO PRICE
---------- -------------------- -------------------- ----------
1 01-JAN-2017 00:00:00 03-JAN-2017 00:00:00 30000
1 04-JAN-2017 00:00:00 09-JAN-2017 00:00:00 20000
1 10-JAN-2017 00:00:00 10-JAN-2017 00:00:00 30000
1 11-JAN-2017 00:00:00 15-JAN-2017 00:00:00 10000
1 16-JAN-2017 00:00:00 18-JAN-2017 00:00:00 15000
1 19-JAN-2017 00:00:00 20-JAN-2017 00:00:00 40000
1 21-JAN-2017 00:00:00 22-JAN-2017 00:00:00 28000
1 23-JAN-2017 00:00:00 25-JAN-2017 00:00:00 23000
1 26-JAN-2017 00:00:00 26-JAN-2017 00:00:00 17000