Stack several rows into one with date condition - sql

I've got raw data from table with information about clients. Information comes from different sources, so it causes duplicates but with different dates:
id pp type start_dt end_dt
100| 1 | Y | 01.05.19 | 01.10.20
100| 1 | Y | 10.08.20 | 01.10.20
100| 1 | N | 01.10.20 | 02.12.21
100| 1 | N | 13.12.20 | 02.12.21
100| 1 | Y | 02.12.21 | 02.12.26
100| 1 | Y | 20.12.21 | 20.12.26
For example, in this table row 2, 4 and 6 have start date within "start_dt" and "end_dt" of previous row. It's a duplicate, but I need to combine min start date and max end date from both rows for type.
FYI. First two rows and last two rows have same id, pp and type, but I need to stack them separately because of the timeline.
What I want to get (continuous timeline for a client is a key):
id pp type start_dt end_dt | cnt
100| 1 | Y | 01.05.19 | 01.10.20 | 2
100| 1 | N | 01.10.20 | 02.12.21 | 2
100| 1 | Y | 02.12.21 | 20.12.26 | 2
I'm using PL/SQL. I think it could be solved by window functions, but I can't figure out which functions to use.
Tried to solve it by group by while having > 1, but in this case it stacks four rows with same type (rows 1,2 and 5,6) into one. I need two separate rows for each type while saving continuous timeline of dates for one client.

From Oracle 12, you can use MATCH_RECOGNIZE for row-by-row pattern matching:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY id, pp
ORDER BY start_dt
MEASURES
FIRST(type) AS type,
FIRST(start_dt) AS start_dt,
MAX(end_dt) AS end_dt,
COUNT(*) AS cnt
PATTERN (overlapping* last_row)
DEFINE
overlapping AS type = NEXT(type)
AND MAX(end_dt) >= NEXT(start_dt)
)
Which, for the sample data:
CREATE TABLE table_name (id, pp, type, start_dt, end_dt) AS
SELECT 100, 1, 'Y', DATE '2019-05-01', DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2020-08-10', DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 100, 1, 'N', DATE '2020-10-01', DATE '2021-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'N', DATE '2020-12-13', DATE '2021-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2021-12-02', DATE '2026-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2021-12-20', DATE '2026-12-20' FROM DUAL;
Outputs:
ID
PP
TYPE
START_DT
END_DT
CNT
100
1
Y
2019-05-01 00:00:00
2020-10-01 00:00:00
2
100
1
N
2020-10-01 00:00:00
2021-12-02 00:00:00
2
100
1
Y
2021-12-02 00:00:00
2026-12-20 00:00:00
2
fiddle
If you want to use analytic and aggregation functions then it is a bit more complicated:
SELECT id, pp, type,
MIN(start_dt) AS start_dt,
MAX(end_dt) AS end_dt,
COUNT(*) AS cnt
FROM (
SELECT id, pp, type, start_dt, end_dt,
SUM(grp_change) OVER (
PARTITION BY id, pp, type
ORDER BY start_dt
) AS grp
FROM (
SELECT t.*,
CASE
WHEN start_dt <= MAX(end_dt) OVER (
PARTITION BY id, pp, type
ORDER BY start_dt
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
THEN 0
ELSE 1
END AS grp_change
FROM table_name t
)
)
GROUP BY id, pp, type, grp
ORDER BY id, pp, start_dt
fiddle

I prefer this version because comparing "type = next(type)" without "type" being in the "order by" may lead to errors.
match_recognize(
partition by id, pp, type
order by start_dt,end_dt
measures first(start_dt) as start_dt, max(end_dt) as end_dt, count(*) as n
pattern (merged* strt)
define
merged as max(end_dt) >= next(start_dt)
)

Related

Oracle SQL count dates that do not exist between range of dates

I have range of dates:
| date |
| -------- |
| 1/1/2022 |
| 2/1/2022 |
| 3/1/2022 |
| 5/1/2022 |
| 6/1/2022 |
| 7/1/2022 |
| 8/1/2022 |
| 10/1/2022 |
I want to get the dates that are not included between these dates, in this case 4/1 and 9/1, I want the count of these dates, in this case 2, so I want the count of dates that do not exist between a specific range of dates, how can I achieve that?
select (max(date) - min(date) + 1) - count(distinct date)
from table_name
https://dbfiddle.uk/cSKZloYA
(max(date) - min(date) + 1) will give the total number of days in the range.
count(distinct date) will be the number of existing (different) days in the table.
The difference between these is the number of non-existing days.
Note: date is a reserved word, so if it's the actual column name, it has to be delimited as "date". (https://en.wikipedia.org/wiki/List_of_SQL_reserved_words)
You can use the LAG analytic function to find the previous date and then work out the number of days difference and if it is more than 1 then you have that many missing days:
SELECT SUM(missing_dates) AS num_missing
FROM (
SELECT GREATEST("DATE" - LAG("DATE") OVER (ORDER BY "DATE") - 1, 0)
AS missing_dates
FROM table_name
);
Which, for the sample data:
CREATE TABLE table_name ("DATE") AS
SELECT DATE '2020-01-01' FROM DUAL UNION ALL
SELECT DATE '2020-01-02' FROM DUAL UNION ALL
SELECT DATE '2020-01-03' FROM DUAL UNION ALL
SELECT DATE '2020-01-05' FROM DUAL UNION ALL
SELECT DATE '2020-01-06' FROM DUAL UNION ALL
SELECT DATE '2020-01-07' FROM DUAL UNION ALL
SELECT DATE '2020-01-08' FROM DUAL UNION ALL
SELECT DATE '2020-01-10' FROM DUAL;
Outputs:
NUM_MISSING
2
fiddle

Create time intervals based on values in one column / SQL Oracle

I need to create query that will return time intervals from table, that has attributes for (almost) every day.
The original table looks like the following:
Person | Date | Date_Type
-------|------------|----------
Sam | 01.06.2020 | Vacation
Sam | 02.06.2020 | Vacation
Sam | 03.06.2020 | Work
Sam | 04.06.2020 | Work
Sam | 05.06.2020 | Work
Frodo | 01.06.2020 | Work
Frodo | 02.06.2020 | Work
.....
And the desired should look like:
Person | Date_Interval | Date_Type
-------|-----------------------|----------
Sam | 01.06.2020-02.06.2020 | Vacation
Sam | 03.06.2020-05.06.2020 | Work
Frodo | 01.06.2020-02.06.2020 | Work
.....
Will be grateful for any idea :)
This reads like a gaps-and-island problem. Here is one approach:
select person, min(date) startdate, max(date) enddate, date_type
from (
select t.*,
row_number() over(partition by person order by date) rn1,
row_number() over(partition by person, date_type order by date) rn2
from mytable t
) t
group by person, date_type, rn1 - rn2
This also works if not all dates are contiguous (since you stated that you have almost all dates, I understood you don't have them all).
This is a type of gaps-and-islands problem.
To get adjacent days with the same date_type, you can subtract a sequence. It will be constant for adjacent days. Then you can aggregate:
select person, date_type, min(date), max(date)
from (select t.*,
row_number() over (partition by person, date_type
order by date) as seqnum
from t
) t
group by person, date_type, (date - seqnum);
One of the simplest methods is to use MATCH_RECOGNIZE to perform a row-by-row comparison and aggregation:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY Person
ORDER BY "DATE"
MEASURES
FIRST( "DATE" ) AS start_date,
LAST( "DATE") AS end_date,
FIRST( Date_Type ) AS date_type
ONE ROW PER MATCH
PATTERN ( successive_dates+ )
DEFINE
SUCCESSIVE_DATES AS (
FIRST( Date_Type ) = NEXT( Date_Type )
AND MAX( "DATE" ) + INTERVAL '1' DAY = NEXT( "DATE")
)
);
Which, for the sample data:
CREATE TABLE table_name ( Person, "DATE", Date_Type ) AS
SELECT 'Sam', DATE '2020-06-01', 'Vacation' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-02', 'Vacation' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-03', 'Work' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-04', 'Work' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-05', 'Work' FROM DUAL UNION ALL
SELECT 'Frodo', DATE '2020-06-01', 'Work' FROM DUAL UNION ALL
SELECT 'Frodo', DATE '2020-06-02', 'Work' FROM DUAL;
Outputs:
PERSON | START_DATE | END_DATE | DATE_TYPE
:----- | :------------------ | :------------------ | :--------
Frodo | 2020-06-01 00:00:00 | 2020-06-01 00:00:00 | Work
Sam | 2020-06-01 00:00:00 | 2020-06-01 00:00:00 | Vacation
Sam | 2020-06-03 00:00:00 | 2020-06-04 00:00:00 | Work
db<>fiddle here

Row for each date from start date to end date

What I'm trying to do is take a record that looks like this:
Start_DT End_DT ID
4/5/2013 4/9/2013 1
and change it to look like this:
DT ID
4/5/2013 1
4/6/2013 1
4/7/2013 1
4/8/2013 1
4/9/2013 1
it can be done in Python but I am not sure if it is possible with SQL Oracle? I am having difficult time making this work. Any help would be appreciated.
Thanks
Use a recursive subquery-factoring clause:
WITH ranges ( start_dt, end_dt, id ) AS (
SELECT start_dt, end_dt, id
FROM table_name
UNION ALL
SELECT start_dt + INTERVAL '1' DAY, end_dt, id
FROM ranges
WHERE start_dt + INTERVAL '1' DAY <= end_dt
)
SELECT start_dt,
id
FROM ranges;
Which for your sample data:
CREATE TABLE table_name ( start_dt, end_dt, id ) AS
SELECT DATE '2013-04-05', DATE '2013-04-09', 1 FROM DUAL
Outputs:
START_DT | ID
:------------------ | -:
2013-04-05 00:00:00 | 1
2013-04-06 00:00:00 | 1
2013-04-07 00:00:00 | 1
2013-04-08 00:00:00 | 1
2013-04-09 00:00:00 | 1
db<>fiddle here
connect by level is useful for these problems. suppose the first CTE named "table_DT" is your table name so you can use the select statement after that.
with table_DT as (
select
to_date('4/5/2013','mm/dd/yyyy') as Start_DT,
to_date('4/9/2013', 'mm/dd/yyyy') as End_DT,
1 as ID
from dual
)
select
Start_DT + (level-1) as DT,
ID
from table_DT
connect by level <= End_DT - Start_DT +1
;

Convert 1 column to 2 columns in Oracle SQL

We get the data in the following format which I am able to convert using Regular expression query. The data is start and end data of tasks concatenated with pipe.
Data:
|2020/04/26|2020/05/02|2020/05/03|2020/05/10
Query:
select REGEXP_SUBSTR (:p, '[^|]+', 1, level) as periods from dual
connect by level <= length (regexp_replace(:p, '[^|]+'))
Result:
2020/04/26
2020/05/02
2020/05/03
2020/05/10
We need to separate it with start date and end date. The number of start date and end date combination would is dynamic. But there will be end date for start date and we won't get null.
Expected Result
START DATE END DATE
2020/04/26 2020/05/02
2020/05/03 2020/05/10
Thanks in advance.
You could do arithmetics and conditional aggregation:
select
max(case when mod(lvl, 2) = 0 then periods end) start_date,
max(case when mod(lvl, 2) = 1 then periods end) end_date
from (
select
regexp_substr (:p, '[^|]+', 1, level) as periods,
level - 1 as lvl
from dual
connect by level <= length (regexp_replace(:p, '[^|]+'))
) t
group by trunc(lvl / 2)
Demo on DB Fiddle:
START_DATE | END_DATE
:--------- | :---------
2020/04/26 | 2020/05/02
2020/05/03 | 2020/05/10
A solution that will work if you have one or more input rows (whereas your hierarchical query will generate exponentially increasing numbers of duplicate rows if you input more than one row of data to it).
Convert pairs of dates to XML and then use XMLTABLE to convert:
SELECT id,
x.*
FROM test_data t
CROSS JOIN
XMLTABLE(
( LTRIM(
REGEXP_REPLACE(
t.value,
'\|(\d{4}/\d{2}/\d{2})\|(\d{4}/\d{2}/\d{2})',
',<row><start>\1</start><end>\2</end></row>'
),
','
)
)
COLUMNS
start_date DATE PATH '/row/start',
end_date DATE PATH '/row/end'
) x
So, for your test data:
CREATE TABLE test_data ( id, value ) AS
SELECT 1, '|2020/04/26|2020/05/02|2020/05/03|2020/05/10' FROM DUAL UNION ALL
SELECT 2, '|2020/06/01|2020/06/02' FROM DUAL
This outputs:
ID | START_DATE | END_DATE
-: | :--------- | :--------
1 | 26-APR-20 | 02-MAY-20
1 | 03-MAY-20 | 10-MAY-20
2 | 01-JUN-20 | 02-JUN-20
db<>fiddle here
Or, if you only have a single input, you can split your data on pairs:
SELECT REGEXP_SUBSTR ( :p, '\|(\d{4}/\d{2}/\d{2})\|(\d{4}/\d{2}/\d{2})', 1, level, NULL, 1 ) as start_date,
REGEXP_SUBSTR ( :p, '\|(\d{4}/\d{2}/\d{2})\|(\d{4}/\d{2}/\d{2})', 1, level, NULL, 2 ) as end_date
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( :p, '\|(\d{4}/\d{2}/\d{2})\|(\d{4}/\d{2}/\d{2})' )
Which outputs:
START_DATE | END_DATE
:--------- | :---------
2020/04/26 | 2020/05/02
2020/05/03 | 2020/05/10
db<>fiddle here
Or use:
SELECT *
FROM XMLTABLE(
( LTRIM(
REGEXP_REPLACE(
:p,
'\|(\d{4}/\d{2}/\d{2})\|(\d{4}/\d{2}/\d{2})',
',<row><start>\1</start><end>\2</end></row>'
),
','
)
)
COLUMNS
start_date DATE PATH '/row/start',
end_date DATE PATH '/row/end'
)
db<>fiddle here

SQL find effective price of the products based on the date

I have a table with four columns : id,validFrom,validTo and price.
This table contains the price of an article and the duration when that price is effective.
| id| validFrom | validTo | price
|---|-----------|-----------|---------
| 1 | 01-01-17 | 10-01-17 | 30000
| 1 | 04-01-17 | 09-01-17 | 20000
Now, for this inputs in my table my query output should be :
| id| validFrom | validTo | price
|---|-----------|----------|-------
| 1 | 01-01-17 | 03-01-17 | 30000
| 1 | 04-01-17 | 09-01-17 | 20000
| 1 | 10-01-17 | 10-01-17 | 30000
I can compare the dates and check if products with same id have overlapping dates but I have no idea how to split those dates into non-overlapping dates. Also I am not allowed to use PL/SQL.
Is this possible using only SQL ?
Oracle Setup:
CREATE TABLE prices ( id, validFrom, validTo, price ) AS
SELECT 1, DATE '2017-01-01', DATE '2017-01-10', 30000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-04', DATE '2017-01-09', 20000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-11', DATE '2017-01-15', 10000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-16', DATE '2017-01-18', 15000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-17', DATE '2017-01-20', 40000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-21', DATE '2017-01-24', 28000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-23', DATE '2017-01-26', 23000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-26', DATE '2017-01-26', 17000 FROM DUAL;
Query:
WITH daily_prices ( id, dt, price, duration ) AS (
-- Unroll the price ranges to individual days
SELECT id,
d.COLUMN_VALUE,
price,
validTo - validFrom
FROM prices p,
TABLE(
CAST(
MULTISET(
SELECT p.validFrom + LEVEL - 1
FROM DUAL
CONNECT BY p.validFrom + LEVEL - 1 <= p.validTo
)
AS SYS.ODCIDATELIST
)
) d
),
min_daily_prices ( id, dt, price ) AS (
-- Where a day falls between multiple ranges group them so the price
-- is for the shortest duration offer and if there are two equally short
-- durations then take the minimum price
SELECT id,
dt,
MIN( price ) KEEP ( DENSE_RANK FIRST ORDER BY duration )
FROM daily_prices
GROUP BY id, dt
),
group_changes ( id, dt, price, has_changed_group ) AS (
-- Find when the price changes or a day is skipped which means a new price
-- group is beginning
SELECT id,
dt,
price,
CASE WHEN dt = LAG( dt ) OVER ( PARTITION BY id ORDER BY dt ) + 1
AND price = LAG( price ) OVER ( PARTITION BY id ORDER BY dt )
THEN 0
ELSE 1
END
FROM min_daily_prices
),
groups ( id, dt, price, grp ) AS (
-- Calculate unique indexes (per id) for each group of price ranges
SELECT id,
dt,
price,
SUM( has_changed_group ) OVER ( PARTITION BY id ORDER BY dt )
FROM group_changes
)
SELECT id,
MIN( dt ) AS validFrom,
MAX( dt ) AS validTo,
MIN( price ) AS price
FROM groups
GROUP BY id, grp
ORDER BY id, validFrom;
Output:
ID VALIDFROM VALIDTO PRICE
---------- -------------------- -------------------- ----------
1 01-JAN-2017 00:00:00 03-JAN-2017 00:00:00 30000
1 04-JAN-2017 00:00:00 09-JAN-2017 00:00:00 20000
1 10-JAN-2017 00:00:00 10-JAN-2017 00:00:00 30000
1 11-JAN-2017 00:00:00 15-JAN-2017 00:00:00 10000
1 16-JAN-2017 00:00:00 18-JAN-2017 00:00:00 15000
1 19-JAN-2017 00:00:00 20-JAN-2017 00:00:00 40000
1 21-JAN-2017 00:00:00 22-JAN-2017 00:00:00 28000
1 23-JAN-2017 00:00:00 25-JAN-2017 00:00:00 23000
1 26-JAN-2017 00:00:00 26-JAN-2017 00:00:00 17000