There's a table on my ERP database that has data about certain events. It has the start date, end date and a column shows if the event is a continuation of a previous one (sequential_id references unique_id). Here's an example:
unique_id
start_date
end_date
sequential_id
001
2021-01-01
2021-01-15
002
2021-02-01
2021-02-16
001
003
2021-03-01
2021-03-17
002
004
2021-03-10
2021-03-11
005
2021-03-19
In the example above, rows 001, 002 and 003 are all part of the same event, and 004/005 are unique events, with no sequences. How can I group the data in a way that the output is like this:
origin_id
start_date
end_date
001
2021-01-01
2021-03-17
004
2021-03-10
2021-03-11
005
2021-03-19
I've tried using group by, but due to sequential_id being auto incremental, it didn't work.
Thanks in advance.
You can use modern match_recognize which is an optimal solution for such tasks:
Pattern Recognition With MATCH_RECOGNIZE
DBFiddle
select *
from t
match_recognize(
measures
first(unique_id) start_unique_id,
first(start_date) start_date,
last(end_date) end_date
pattern (strt nxt*)
define nxt as sequential_id=prev(unique_id)
);
You can use hierarchical query for this:
with a (unique_id, start_date, end_date, sequential_id) as (
select '001', date '2021-01-01', date '2021-01-15', null from dual union all
select '002', date '2021-02-01', date '2021-02-16', '001' from dual union all
select '003', date '2021-03-01', date '2021-03-17', '002' from dual union all
select '004', date '2021-03-10', date '2021-03-11', null from dual union all
select '005', date '2021-03-19', null, null from dual
)
, b as (
select
connect_by_root(unique_id) as unique_id
, connect_by_root(start_date) as start_date
, end_date
, connect_by_isleaf as l
from a
start with sequential_id is null
connect by prior unique_id = sequential_id
)
select
unique_id
, start_date
, end_date
from b
where l = 1
order by 1 asc
UNIQUE_ID | START_DATE | END_DATE
:-------- | :--------- | :--------
001 | 01-JAN-21 | 17-MAR-21
004 | 10-MAR-21 | 11-MAR-21
005 | 19-MAR-21 | null
db<>fiddle here
This is a graph-walking problem, so you can use a recursive CTE:
with cte (unique_id, start_date, end_date, start_unique_id) as (
select unique_id, start_date, end_date, unique_id
from t
where not exists (select 1 from t t2 where t.sequential_id = t2.unique_id)
union all
select t.unique_id, t.start_date, t.end_date, cte.start_unique_id
from cte join
t
on cte.unique_id = t.sequential_id
)
select start_unique_id, min(start_date), max(end_date)
from cte
group by start_Unique_id;
Here is a db<>fiddle.
Related
I have a problem with fetching few exceptions from DB.
Example, table b:
sn
v_num
start_date
end_date
1
001
01-01-2019
31-12-2099
1
002
01-01-2021
31-01-2022
1
003
01-02-2022
31-12-2099
2
001
01-01-2022
31-12-2099
2
002
01-07-2022
31-07-2022
2
003
01-08-2022
31-12-2099
Expected output:
sn
v_num
start_date
end_date
1
003
01-02-2022
31-12-2099
2
001
01-01-2022
31-12-2099
Currently I'm here:
SELECT * FROM table a, table b
WHERE a.sn = b.sn
AND b.v_num = (SELECT max (v_num) FROM b WHERE a.sn = b.sn)
but obviously that is not good because of a few cases like this with sn = 2.
Conclusion, I need to get unique sn record where v_num is max (95% of them in DB) except in case if start_date of max v_num record is > today.
Filter using start_date <= TRUNC(SYSDATE) then use the ROW_NUMBER analytic function:
SELECT *
FROM (
SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY sn ORDER BY v_num DESC) AS rn
FROM "TABLE" a
WHERE start_date <= TRUNC(SYSDATE)
)
WHERE rn = 1;
If the start_date has a time component then you can use start_date < TRUNC(SYSDATE) + INTERVAL '1' DAY to get all the values for today from 00:00:00 to 23:59:59.
If you can have ties for the maximum and want to return all the ties then you can use the RANK analytic function instead of ROW_NUMBER.
Which, for the sample data:
CREATE TABLE "TABLE" (sn, v_num, start_date, end_date) AS
SELECT 1, '001', DATE '2022-01-01', DATE '2099-12-31' FROM DUAL UNION ALL
SELECT 1, '002', DATE '2022-01-01', DATE '2022-01-31' FROM DUAL UNION ALL
SELECT 1, '003', DATE '2022-02-01', DATE '2099-12-31' FROM DUAL UNION ALL
SELECT 2, '001', DATE '2022-01-01', DATE '2099-12-31' FROM DUAL UNION ALL
SELECT 2, '002', DATE '2022-07-01', DATE '2022-07-31' FROM DUAL UNION ALL
SELECT 2, '003', DATE '2022-08-01', DATE '2099-12-31' FROM DUAL;
Outputs:
SN
V_NUM
START_DATE
END_DATE
RN
1
003
2022-02-01 00:00:00
2099-12-31 00:00:00
1
2
001
2022-01-01 00:00:00
2099-12-31 00:00:00
1
db<>fiddle here
I love a good challenge, but this one has been breaking my head for too long. :)
I'm trying to build a query to get dates intervals, grouping the information by one field.
Let me try to explain it in a simple way.
We have this table:
I need to get the intervals a soldier spent on each ranking, so the end result I need to get should be something like this:
As you can see the soldier can be promoted/demoted along the time.
Any suggestion on how to build a query to do this?
THANK YOU!
From Oracle 12, you can use MATCH_RECOGNIZE:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY id
ORDER BY start_date, end_date
MEASURES
FIRST( name ) AS name,
FIRST( ranking ) AS ranking,
FIRST( start_date ) AS start_date,
LAST( end_Date ) AS end_Date
PATTERN ( same_rank+ )
DEFINE same_rank AS FIRST( ranking ) = ranking
)
Which, for the sample data:
CREATE TABLE table_name ( id, name, ranking, start_date, end_date ) AS
SELECT 1001, 'Jones', 'Lieutenant', DATE '2000-03-20', DATE '2002-08-15' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Lieutenant', DATE '2002-08-16', DATE '2003-03-18' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Lieutenant', DATE '2003-03-19', DATE '2004-06-01' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Lieutenant', DATE '2004-06-02', DATE '2004-10-01' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Captain', DATE '2004-10-02', DATE '2005-04-20' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Captain', DATE '2005-04-21', DATE '2007-02-20' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Major', DATE '2007-02-21', DATE '2008-10-22' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Major', DATE '2008-10-23', DATE '2010-01-26' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Captain', DATE '2010-01-27', DATE '2013-11-25' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Captain', DATE '2013-11-26', DATE '2014-05-11' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'Major', DATE '2014-05-12', DATE '2016-04-22' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'General', DATE '2016-04-23', DATE '2020-10-10' FROM DUAL UNION ALL
SELECT 1001, 'Jones', 'General', DATE '2020-10-11', DATE '2020-11-30' FROM DUAL;
Outputs:
ID | NAME | RANKING | START_DATE | END_DATE
---: | :---- | :--------- | :------------------ | :------------------
1001 | Jones | Lieutenant | 2000-03-20 00:00:00 | 2004-10-01 00:00:00
1001 | Jones | Captain | 2004-10-02 00:00:00 | 2007-02-20 00:00:00
1001 | Jones | Major | 2007-02-21 00:00:00 | 2010-01-26 00:00:00
1001 | Jones | Captain | 2010-01-27 00:00:00 | 2014-05-11 00:00:00
1001 | Jones | Major | 2014-05-12 00:00:00 | 2016-04-22 00:00:00
1001 | Jones | General | 2016-04-23 00:00:00 | 2020-11-30 00:00:00
db<>fiddle here
This is a type of gaps and islands problem. You want to find groups of rows that are the same, which you can do using lag() to compare the ranking and then a cumulative sum to keep track of the changes:
select soldier_id, soldier_name, ranking,
min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date = start_date - interval '1' day then 0 else 1 end)
(partition by soldier_id order by start_date) as island
from (select t.*,
lag(end_date) over (partition by soldier_id, ranking order by start_date) as prev_end_date
from t
) t
) t
group by soldier_id, soldier_name, ranking, island;
Note: This assumes that the soldier_name does not change over time for a given soldier. If that is something you need to deal with, then ask a new question with appropriate sample data and desired results.
I need to create query that will return time intervals from table, that has attributes for (almost) every day.
The original table looks like the following:
Person | Date | Date_Type
-------|------------|----------
Sam | 01.06.2020 | Vacation
Sam | 02.06.2020 | Vacation
Sam | 03.06.2020 | Work
Sam | 04.06.2020 | Work
Sam | 05.06.2020 | Work
Frodo | 01.06.2020 | Work
Frodo | 02.06.2020 | Work
.....
And the desired should look like:
Person | Date_Interval | Date_Type
-------|-----------------------|----------
Sam | 01.06.2020-02.06.2020 | Vacation
Sam | 03.06.2020-05.06.2020 | Work
Frodo | 01.06.2020-02.06.2020 | Work
.....
Will be grateful for any idea :)
This reads like a gaps-and-island problem. Here is one approach:
select person, min(date) startdate, max(date) enddate, date_type
from (
select t.*,
row_number() over(partition by person order by date) rn1,
row_number() over(partition by person, date_type order by date) rn2
from mytable t
) t
group by person, date_type, rn1 - rn2
This also works if not all dates are contiguous (since you stated that you have almost all dates, I understood you don't have them all).
This is a type of gaps-and-islands problem.
To get adjacent days with the same date_type, you can subtract a sequence. It will be constant for adjacent days. Then you can aggregate:
select person, date_type, min(date), max(date)
from (select t.*,
row_number() over (partition by person, date_type
order by date) as seqnum
from t
) t
group by person, date_type, (date - seqnum);
One of the simplest methods is to use MATCH_RECOGNIZE to perform a row-by-row comparison and aggregation:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY Person
ORDER BY "DATE"
MEASURES
FIRST( "DATE" ) AS start_date,
LAST( "DATE") AS end_date,
FIRST( Date_Type ) AS date_type
ONE ROW PER MATCH
PATTERN ( successive_dates+ )
DEFINE
SUCCESSIVE_DATES AS (
FIRST( Date_Type ) = NEXT( Date_Type )
AND MAX( "DATE" ) + INTERVAL '1' DAY = NEXT( "DATE")
)
);
Which, for the sample data:
CREATE TABLE table_name ( Person, "DATE", Date_Type ) AS
SELECT 'Sam', DATE '2020-06-01', 'Vacation' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-02', 'Vacation' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-03', 'Work' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-04', 'Work' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-05', 'Work' FROM DUAL UNION ALL
SELECT 'Frodo', DATE '2020-06-01', 'Work' FROM DUAL UNION ALL
SELECT 'Frodo', DATE '2020-06-02', 'Work' FROM DUAL;
Outputs:
PERSON | START_DATE | END_DATE | DATE_TYPE
:----- | :------------------ | :------------------ | :--------
Frodo | 2020-06-01 00:00:00 | 2020-06-01 00:00:00 | Work
Sam | 2020-06-01 00:00:00 | 2020-06-01 00:00:00 | Vacation
Sam | 2020-06-03 00:00:00 | 2020-06-04 00:00:00 | Work
db<>fiddle here
I'm new to SQL, hope you guys don't find it silly. Working with two tables here, one contains start dates and other contains end dates. Entries do not follow sequence/possibility of duplicates.
**TABLE 1**
id start_date
1 2019-04-23
1 2019-06-05
1 2019-06-05
1 2019-10-29
1 2019-12-16
2 2019-01-05
3 2020-02-01
**TABLE 2**
id end_date
1 2019-04-23
1 2019-06-05
1 2019-06-06
1 2019-06-06
1 2019-07-24
1 2019-10-16
2 2020-01-04
**EXPECTED OUTPUT**
id start_date end_date
1 2019-04-23 2019-06-05
1 2019-10-29 null
2 2019-01-05 2020-01-04
3 2020-02-01 null
You can use union all and aggregation with some window functions:
with table1 as (
select 1 as id, date('2019-04-23') as start_date union all
select 1, '2019-06-05' union all
select 1, '2019-06-05' union all
select 1, '2019-10-29' union all
select 1, '2019-12-16' union all
select 2, '2019-01-05' union all
select 3, '2020-02-01'
),
table2 as (
SELECT 1 as id, DATE('2019-04-23') as end_date union all
SELECT 1, '2019-06-05' union all
select 1, '2019-06-06' union all
select 1, '2019-06-06' union all
select 1, '2019-07-24' union all
select 1, '2019-10-16' union all
select 2, '2020-01-04'
)
select id, min(start_date), end_date
from (select id, start_date,
first_value(end_date ignore nulls) over (partition by id order by DATE_DIFF(coalesce(start_date, end_date), CURRENT_DATE, day) RANGE between 1 following and unbounded following) as end_date
from ((select id, start_date, null as end_date
from table1
) union all
(select id, null as start_date, end_date
from table2
)
) se
)
group by id, end_date
having min(start_date) is not null;
Why do you have multiple records with the same id (Am assuming id is a primary key)? My suggestion would be for you to make the id's unique and creating a foreign key constraint in the end dates table (Since there can't be and end date without a start date) and use the foreign key relationship to retrieve the desired results. E.g SELECT S.start_date,E.end_date FROM table1 S JOIN table2 E where S.id=E.table1_fk
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, start_date, IF(end_date = '9999-01-01', NULL, end_date) end_date
FROM (
SELECT id, start_date, ARRAY_AGG(end_date ORDER BY end_date LIMIT 1)[OFFSET(0)] end_date
FROM (
SELECT id, start_date, IF(start_date < end_date, end_date, '9999-01-01') end_date
FROM `project.dataset.table1`
LEFT JOIN `project.dataset.table2`
USING (id)
)
GROUP BY id, start_date
)
If to apply to sample data from your question - result is
Row id start_date end_date
1 1 2019-04-23 2019-06-05
2 1 2019-06-05 2019-06-06
3 1 2019-10-29 null
4 1 2019-12-16 null
5 2 2019-01-05 2020-01-04
6 3 2020-02-01 null
Note: quick and not optimized - but looks like produces desired result
I have a table with four columns : id,validFrom,validTo and price.
This table contains the price of an article and the duration when that price is effective.
| id| validFrom | validTo | price
|---|-----------|-----------|---------
| 1 | 01-01-17 | 10-01-17 | 30000
| 1 | 04-01-17 | 09-01-17 | 20000
Now, for this inputs in my table my query output should be :
| id| validFrom | validTo | price
|---|-----------|----------|-------
| 1 | 01-01-17 | 03-01-17 | 30000
| 1 | 04-01-17 | 09-01-17 | 20000
| 1 | 10-01-17 | 10-01-17 | 30000
I can compare the dates and check if products with same id have overlapping dates but I have no idea how to split those dates into non-overlapping dates. Also I am not allowed to use PL/SQL.
Is this possible using only SQL ?
Oracle Setup:
CREATE TABLE prices ( id, validFrom, validTo, price ) AS
SELECT 1, DATE '2017-01-01', DATE '2017-01-10', 30000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-04', DATE '2017-01-09', 20000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-11', DATE '2017-01-15', 10000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-16', DATE '2017-01-18', 15000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-17', DATE '2017-01-20', 40000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-21', DATE '2017-01-24', 28000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-23', DATE '2017-01-26', 23000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-26', DATE '2017-01-26', 17000 FROM DUAL;
Query:
WITH daily_prices ( id, dt, price, duration ) AS (
-- Unroll the price ranges to individual days
SELECT id,
d.COLUMN_VALUE,
price,
validTo - validFrom
FROM prices p,
TABLE(
CAST(
MULTISET(
SELECT p.validFrom + LEVEL - 1
FROM DUAL
CONNECT BY p.validFrom + LEVEL - 1 <= p.validTo
)
AS SYS.ODCIDATELIST
)
) d
),
min_daily_prices ( id, dt, price ) AS (
-- Where a day falls between multiple ranges group them so the price
-- is for the shortest duration offer and if there are two equally short
-- durations then take the minimum price
SELECT id,
dt,
MIN( price ) KEEP ( DENSE_RANK FIRST ORDER BY duration )
FROM daily_prices
GROUP BY id, dt
),
group_changes ( id, dt, price, has_changed_group ) AS (
-- Find when the price changes or a day is skipped which means a new price
-- group is beginning
SELECT id,
dt,
price,
CASE WHEN dt = LAG( dt ) OVER ( PARTITION BY id ORDER BY dt ) + 1
AND price = LAG( price ) OVER ( PARTITION BY id ORDER BY dt )
THEN 0
ELSE 1
END
FROM min_daily_prices
),
groups ( id, dt, price, grp ) AS (
-- Calculate unique indexes (per id) for each group of price ranges
SELECT id,
dt,
price,
SUM( has_changed_group ) OVER ( PARTITION BY id ORDER BY dt )
FROM group_changes
)
SELECT id,
MIN( dt ) AS validFrom,
MAX( dt ) AS validTo,
MIN( price ) AS price
FROM groups
GROUP BY id, grp
ORDER BY id, validFrom;
Output:
ID VALIDFROM VALIDTO PRICE
---------- -------------------- -------------------- ----------
1 01-JAN-2017 00:00:00 03-JAN-2017 00:00:00 30000
1 04-JAN-2017 00:00:00 09-JAN-2017 00:00:00 20000
1 10-JAN-2017 00:00:00 10-JAN-2017 00:00:00 30000
1 11-JAN-2017 00:00:00 15-JAN-2017 00:00:00 10000
1 16-JAN-2017 00:00:00 18-JAN-2017 00:00:00 15000
1 19-JAN-2017 00:00:00 20-JAN-2017 00:00:00 40000
1 21-JAN-2017 00:00:00 22-JAN-2017 00:00:00 28000
1 23-JAN-2017 00:00:00 25-JAN-2017 00:00:00 23000
1 26-JAN-2017 00:00:00 26-JAN-2017 00:00:00 17000