Need a join between different rows of a table - sql

I have a table named projects. It has 3 rows, task_id, start_date and end _date.
It is guaranteed that the difference between the End_Date and the Start_Date is equal to 1 day for each row in the table.
If the End_Date of the tasks are consecutive, then they are part of the same project.
I need the start and end dates of projects listed by the number of days it took to complete the project in ascending order. If there is more than one project that have the same number of completion days, then order by the start date of the project.
So far I only extracted I project with a triple join, but can not list the other projects. Any idea how to use a more general JOIN in here?
input:
Task_ID Start_Date End_Date
----------- ---------- ----------
1 2015-10-01 2015-10-02
2 2015-10-02 2015-10-03
3 2015-10-03 2015-10-04
4 2015-10-13 2015-10-14
5 2015-10-14 2015-10-15
6 2015-10-28 2015-10-29
7 2015-10-30 2015-10-31
output:
start_date end_date
---------- ----------
2015-10-28 2015-10-29
2015-10-30 2015-10-31
2015-10-13 2015-10-15
2015-10-01 2015-10-04
my query:
select p3.start_date,p1.end_date
from projects p1,projects p2, projects p3
where p1.start_Date=p2.end_date and p2.start_date=p3.end_date
my query output:
start_date end_date
---------- ----------
2015-10-01 2015-10-04

This is a type of gaps-and-islands problem. You can solve it by identifying when the islands start -- and that can use lag():
select min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date = start_date then 0 else 1 end) over (order by start_date) as grp
from (select t.*,
lag(end_date) over (order by start_date) as prev_end_date
from t
) t
) t
group by grp
order by min(start_date);
The middle subquery is calculating when an "island" starts. This occurs when the previous end date is not the start_date on the next row.

Related

SQL Select with grouping and replacing a column

I have a requirement in which I need to retrieve rows in a select query in which I have to get value of END_DATE as EFFECTIVE_DATE -1 DAY for the records with same key (CARD_NBR in this case)
I have tried using it by GROUP by but I am not able to get the desired output. Could someone please help in guiding me ? The record with most recent effective date should have END_DATE as 9999-12-31 only.
Table:
CARD_NBR
SERIEL_NO
EFFECTIVE_DATE
END_DATE
12345
1
2021-01-01
9999-12-31
12345
2
2021-01-25
9999-12-31
12345
3
2021-02-15
9999-12-31
67899
1
2021-03-01
9999-12-31
67899
2
2021-04-02
9999-12-31
67899
3
2021-05-24
9999-12-31
Output:
CARD_NBR
SERIEL_NO
EFFECTIVE_DATE
END_DATE
12345
1
2021-01-01
2021-01-24
12345
2
2021-01-25
2021-02-14
12345
3
2021-02-15
9999-12-31
67899
1
2021-03-01
2021-04-01
67899
2
2021-04-02
2021-05-24
67899
3
2021-05-24
9999-12-31
You can use lead():
select t.*,
lead(effective_date - interval '1 day', 1, effective_date) over (partition by card_nbr order by effective_date) as imputed_end_date
from t;
Date manipulations are highly database-dependent so this uses Standard SQL syntax. You can incorporate this into an update, but the best approach also depends on the database.
SQLite v.3.25 now supports windows function and you can use below code to get your result.
SELECT A.CARD_NBR,
A.SRL_NO,
A.START_DT,
COALESCE(B.START_DT,A.END_DT) AS END_DT
FROM
(
SELECT A.CARD_NBR,
A.SRL_NO,
A.START_DT,
A.END_DT,
ROW_NUMBER() OVER(PARTITION BY A.CARD_NBR ORDER BY A.SRL_NO ASC) RNUM1
FROM T1 A
)A
LEFT JOIN
(
SELECT B.CARD_NBR,
B.SRL_NO,
B.START_DT,
B.END_DT,
ROW_NUMBER() OVER(PARTITION BY B.CARD_NBR ORDER BY B.SRL_NO ASC) RNUM1
FROM T1 B
)B
ON A.CARD_NBR=B.CARD_NBR
AND A.RNUM1+1=B.RNUM1

How to generate series using start and end date and quarters on postgres

I have a table like shown below where I want to use the start and end date to evenly distribute the value for each row to the 3 months in each quarter to all of the quarters in between start and end date (last two columns).
I am familiar with generate series and intervals in Postgres but I am having hard time to get what I want.
My table has and ID column that groups rows together, a quarter column that indicates which quarter the row references for the ID, a value column that is the value for the whole quarter (and every quarter in the date range), and start_date and end_date columns indicating the date range. Here is a sample:
ID quarter value start_date end_date
1 2 152 2019-11-07 2050-12-30
1 1 785 2019-11-07 2050-12-30
2 2 152 2019-03-05 2050-12-30
2 1 785 2019-03-05 2050-12-30
3 4 41 2018-06-12 2050-12-30
3 3 50 2018-06-12 2050-12-30
3 2 88 2018-06-12 2050-12-30
3 1 29 2018-06-12 2050-12-30
4 2 1607 2018-12-17 2050-12-30
4 1 4803 2018-12-17 2050-12-30
Here is my desired output (for ID 1):
ID quarter value start_date end_date
1 2 152/3 2020-04-01 2020-07-01
1 1 785/3 2020-01-01 2020-04-01
1 2 152/3 2021-04-01 2021-07-01
1 1 785/3 2021-01-01 2021-04-01
start_date in the output will be the next quarter on first table. I need the series to be generated from the start_date to the end_date of the first table.
You can do this by using the GENERATE_SERIES function and passing in the start and end date for each unique (by ID) row and setting the interval to 3 months. Then join the result back with your original table on both ID and quarter.
Here's an example (note original_data is what I've called your first table):
WITH
quarters_table AS (
SELECT
t.ID,
(EXTRACT('month' FROM t.quarter_date) - 1)::INT / 3 + 1 AS quarter,
t.quarter_date::DATE AS start_date,
COALESCE(
LEAD(t.quarter_date) OVER (),
DATE_TRUNC('quarter', t.original_end_date) + INTERVAL '3 months'
)::DATE AS end_date
FROM (
SELECT
original_record.ID,
original_record.end_date AS original_end_date,
GENERATE_SERIES(
DATE_TRUNC('quarter', original_record.start_date),
DATE_TRUNC('quarter', original_record.end_date),
INTERVAL '3 months'
) AS quarter_date
FROM (
SELECT DISTINCT ON (original_data.ID)
original_data.ID,
original_data.start_date,
original_data.end_date
FROM
original_data
ORDER BY
original_data.ID
) AS original_record
) AS t
)
SELECT
quarters_table.ID,
quarters_table.quarter,
original_data.value::DOUBLE PRECISION / 3 AS value,
quarters_table.start_date,
quarters_table.end_date
FROM
quarters_table
INNER JOIN
original_data
ON
quarters_table.ID = original_data.ID
AND quarters_table.quarter = original_data.quarter;
Sample output:
id | quarter | value | start_date | end_date
----+---------+------------------+------------+------------
1 | 1 | 261.666666666667 | 2020-01-01 | 2020-04-01
1 | 2 | 50.6666666666667 | 2020-04-01 | 2020-07-01
1 | 1 | 261.666666666667 | 2021-01-01 | 2021-04-01
1 | 2 | 50.6666666666667 | 2021-04-01 | 2021-07-01
For completeness, here's the original_data table I've used in testing:
WITH
original_data AS (
SELECT
1 AS ID,
2 AS quarter,
152 AS value,
'2019-11-07'::DATE AS start_date,
'2050-12-30'::DATE AS end_date
UNION ALL
SELECT
1 AS ID,
1 AS quarter,
785 AS value,
'2019-11-07'::DATE AS start_date,
'2050-12-30'::DATE AS end_date
UNION ALL
SELECT
2 AS ID,
2 AS quarter,
152 AS value,
'2019-03-05'::DATE AS start_date,
'2050-12-30'::DATE AS end_date
-- ...
)
This is one way to go about it. Showing an example based on the output you've outlined. You can then add more conditions to the CASE/WHEN for additional quarters.
SELECT
ID,
Quarter,
Value/3 AS "Value",
CASE
WHEN Quarter = 1 THEN '2020-01-01'
WHEN Quarter = 2 THEN '2020-04-01'
END AS "Start_Date",
CASE
WHEN Quarter = 1 THEN '2020-04-01'
WHEN Quarter = 2 THEN '2020-07-01'
END AS "End_Date"
FROM
Table

adjust date overlaps within a group

I have this table and I want to adjust END_DATE one day prior to the next ST_DATE in case if there are overlap dates for a group of ID
TABLE HAVE
ID ST_DATE END_DATE
1 2020-01-01 2020-02-01
1 2020-05-10 2020-05-20
1 2020-05-18 2020-06-19
1 2020-11-11 2020-12-01
2 1999-03-09 1999-05-10
2 1999-04-09 2000-05-10
3 1999-04-09 2000-05-10
3 2000-06-09 2000-08-16
3 2000-08-17 2009-02-17
Below is what I'm looking for
TABLE WANT
ID ST_DATE END_DATE
1 2020-01-01 2020-02-01
1 2020-05-10 2020-05-17 =====changed to a day less than the next ST_DATE due to some sort of overlap
1 2020-05-18 2020-06-19
1 2020-11-11 2020-12-01
2 1999-03-09 1999-04-08 =====changed to a day less than the next ST_DATE due to some sort of overlap
2 1999-04-09 2000-05-10
3 1999-04-09 2000-05-10
3 2000-06-09 2000-08-16
3 2000-08-17 2009-02-17
Maybe you can use LEAD() for this. Initial idea:
select
id, st_date, end_date
, lead( st_date ) over ( partition by id order by st_date ) nextstart_
from overlap
;
-- result
ID ST_DATE END_DATE NEXTSTART
---------- --------- --------- ---------
1 01-JAN-20 01-FEB-20 10-MAY-20
1 10-MAY-20 20-MAY-20 18-MAY-20
1 18-MAY-20 19-JUN-20 11-NOV-20
1 11-NOV-20 01-DEC-20
2 09-MAR-99 10-MAY-99 09-APR-99
2 09-APR-99 10-MAY-00
3 09-APR-99 10-MAY-00 09-JUN-00
3 09-JUN-00 16-AUG-00 17-AUG-00
3 17-AUG-00 17-FEB-09
Once you have the next start date and the end_date side by side (as it were),
you can use CASE ... for adjusting the dates as you need them.
select ilv.id, ilv.st_date
, case
when ilv.end_date > ilv.nextstart_ then
to_char( ilv.nextstart_ - 1 ) || ' <- modified end date'
else
to_char( ilv.end_date )
end dt_modified
from (
select
id, st_date, end_date
, lead( st_date ) over ( partition by id order by st_date ) nextstart_
from overlap
) ilv
;
ID ST_DATE DT_MODIFIED
---------- --------- ---------------------------------------
1 01-JAN-20 01-FEB-20
1 10-MAY-20 17-MAY-20 <- modified end date
1 18-MAY-20 19-JUN-20
1 11-NOV-20 01-DEC-20
2 09-MAR-99 08-APR-99 <- modified end date
2 09-APR-99 10-MAY-00
3 09-APR-99 10-MAY-00
3 09-JUN-00 16-AUG-00
3 17-AUG-00 17-FEB-09
DBfiddle here.
If two "windows" for the same id have the same start date, then the problem doesn't make sense. So, let's assume that the problem makes sense - that is, the combination (id, st_date) is unique in the inputs.
Then, the problem can be formulated as follows: for each id, order rows by st_date ascending. Then, for each row, if its end_dt is less than the following st_date, return the row as is. Otherwise replace end_dt with the following st_date, minus 1. This last step can be achieved with the analytic lead() function.
A solution might look like this:
select id, st_date,
least(end_date, lead(st_date, 1, end_date + 1)
over (partition by id order by st_date) - 1) as end_date
from have
;
The bit about end_date + 1 in the lead function handles the last row for each id. For such rows there is no "next" row, so the default application of lead will return null. The default can be overridden by using the third parameter to the function.

How to join multiple rows by continue from and to id columns in oracle

I have a scenario where I need to find the start date and end date from multiple rows which are tied by continued_from and continued_to date fields in Oracle.
result should look like
ID STARTDATE ENDDATE
-- ---------- ----------
3 01/01/1000 12/31/9999
ID STARTDATE ENDDATE CONT_FROM_ID CONT_TO_ID
-- ---------- ---------- ------------ -----------
1 01/01/1000 10/10/1999 NULL 2
2 10/10/1999 11/11/2000 1 3
3 11/11/2000 12/31/9999 2 NULL
Oracle's hierarchical query syntax makes it easy to walk the tree from parent to child. The analytical lead() and lag() functions track the next and previous IDs.
select c23.id
, c23.startdate
, c23.enddate
, lag(c23.id) over (partition by p23.id order by c23.id) as cont_from_id
, lead(c23.id) over (partition by p23.id order by c23.id) as cont_to_id
from p23
join c23 on p23.startdate <= c23.startdate
and p23.enddate >= c23.enddate
order by c23.id
/
Here is a test using your sample data:
SQL> select c23.id
2 , c23.startdate
3 , c23.enddate
4 , lag(c23.id) over (partition by p23.id order by c23.id) as cont_from_id
5 , lead(c23.id) over (partition by p23.id order by c23.id) as cont_to_id
6 from p23
7 join c23 on p23.startdate <= c23.startdate
8 and p23.enddate >= c23.enddate
9 order by c23.id
10 /
ID STARTDATE ENDDATE CONT_FROM_ID CONT_TO_ID
---------- --------- --------- ------------ ----------
1 01-JAN-00 10-OCT-99 2
2 10-OCT-99 11-NOV-00 1 3
3 11-NOV-00 31-DEC-99 2
SQL>

How to identify and aggregate sequence from start and end dates

I'm trying to identify a consecutive sequence in dates, per person, as well as sum amount for that sequence. My records table looks like this:
person start_date end_date amount
1 2015-09-10 2015-09-11 500
1 2015-09-11 2015-09-12 100
1 2015-09-13 2015-09-14 200
1 2015-10-05 2015-10-07 2000
2 2015-10-05 2015-10-05 300
2 2015-10-06 2015-10-06 1000
3 2015-04-23 2015-04-23 900
The resulting query should be this:
person sequence_start_date sequence_end_date amount
1 2015-09-10 2015-09-14 800
1 2015-10-05 2015-10-07 2000
2 2015-10-05 2015-10-06 1400
3 2015-04-23 2015-04-23 900
Below, I can use LAG and LEAD to identify the sequence start_date and end_date, but I don't have a way to aggregate the amount. I'm assuming the answer will involve some sort of ROW_NUMBER() window function that will partition by sequence, I just can't figure out how to make the sequence identifiable to the function.
SELECT
person
,COALESCE(sequence_start_date, LAG(sequence_start_date, 1) OVER (ORDER BY person, start_date)) AS "sequence_start_date"
,COALESCE(sequence_end_date, LEAD(sequence_end_date, 1) OVER (ORDER BY person, start_date)) AS "sequence_end_date"
FROM
(
SELECT
person
,start_date
,end_date
,CASE WHEN LAG(end_date, 1) OVER (PARTITION BY person ORDER BY start_date) + interval '1 day' = start_date
THEN NULL
ELSE start_date
END AS "sequence_start_date"
,CASE WHEN LEAD(start_date, 1) OVER (PARTITION BY person ORDER BY start_date) - interval '1 day' = end_date
THEN NULL
ELSE end_date
END AS "sequence_end_date"
,amount
FROM records
) sq
Even your updated (sub)query still isn't quite right for the data you've presented, which is inconsistent about whether the start date of the second and subsequent rows in a sequence should be equal to their previous rows' end date or one day later. The query can be updated pretty easily to accommodate both, if that's needed.
In any case, you cannot use COALESCE as a window function. Aggregate functions may be used as window functions by providing an OVER clause, but not ordinary functions. There are nevertheless ways to apply window function to this task. Here's a way to identify the sequences in your data (as presented):
SELECT
person
,MAX(sequence_start_date)
OVER (
PARTITION BY person
ORDER BY start_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS "sequence_start_date"
,MIN(sequence_end_date)
OVER (
PARTITION BY person
ORDER BY start_date
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
AS "sequence_end_date"
,amount
FROM
(
SELECT
person
,start_date
,end_date
,CASE WHEN LAG(end_date, 1) OVER (PARTITION BY person ORDER BY start_date) + interval '1 day' >= start_date
THEN date '0001-01-01'
ELSE start_date
END AS "sequence_start_date"
,CASE WHEN LEAD(start_date, 1) OVER (PARTITION BY person ORDER BY start_date) - interval '1 day' <= end_date
THEN NULL
ELSE end_date
END AS "sequence_end_date"
,amount
FROM records
order by person, start_date
) sq_part
ORDER BY person, sequence_start_date
That relies on MAX() and MIN() instead of COALESCE(), and it applies window framing to get the appropriate scope for each of those within each partition. Results:
person sequence_start_date sequence_end_date amount
1 September, 10 2015 00:00:00 September, 12 2015 00:00:00 500
1 September, 10 2015 00:00:00 September, 12 2015 00:00:00 100
1 October, 05 2015 00:00:00 October, 07 2015 00:00:00 2000
2 October, 05 2015 00:00:00 October, 06 2015 00:00:00 300
2 October, 05 2015 00:00:00 October, 06 2015 00:00:00 1000
3 April, 23 2015 00:00:00 April, 23 2015 00:00:00 900
Do note that that does not require an exact match of end date with subsequent start date; all rows for each person that abut or overlap will be assigned to the same sequence. If (person, start_date) cannot be relied upon to be unique, however, then you probably need to order the partitions by end date as well.
And now you have a way to identify the sequences: they are characterized by the triple person, sequence_start_date, sequence_end_date. (Or actually, you need only the person and one of those dates for identification purposes, but read on.) You can wrap the above query as an inline view of an outer aggregate query to produce your desired result:
SELECT
person,
sequence_start_date,
sequence_end_date,
SUM(amount) AS "amount"
FROM ( <above query> ) sq
GROUP BY person, sequence_start_date, sequence_end_date
Of course you need both dates as grouping columns if you're going to select them.
Why not:
select a1.person, a1.sequence_start_date, a1.sequence_end_date,
sum(rx.amount)
as amount
from (EXISTING_QUERY) a1
left join records rx
on rx.person = a1.person
and rx.start_date >= a1.start_date
and rx.end_date <= a1.end_date
group by a1.person, a1.sequence_start_date, a1.sequence_end_date