Compare multiple date ranges - sql

I am using iReport 3.0.0, PostgreSQL 9.1. For a report I need to compare date ranges from invoices with date ranges in filters and print for every invoice code if a filter range is covered, partially covered, etc. To complicate things, there can be multiple date ranges per invoice code.
Table Invoices
ID Code StartDate EndDate
1 111 1.5.2012 31.5.2012
2 111 1.7.2012 20.7.2012
3 111 25.7.2012 31.7.2012
4 222 1.4.2012 15.4.2012
5 222 18.4.2012 30.4.2012
Examples
Filter: 1.5.2012. - 5.6.2012.
Result that I need to get is:
code 111 - partialy covered
code 222 - invoice missing
Filter: 1.5.2012. - 31.5.2012.
code 111 - fully covered
code 222 - invoice missing
Filter: 1.6.2012. - 30.6.2012.
code 111 - invoice missing
code 222 - invoice missing

After clarification in comment.
Your task as I understand it:
Check for all supplied individual date ranges (filter) whether they are are covered by the combined date ranges of sets of codes in your table (invoice).
It can be done with plain SQL, but it is not a trivial task. The steps could be:
Supply date ranges as filters.
Combine date ranges in invoice table per code.
Can result in one or more ranges per code.
Look for overlaps between filters and combined invoices
Classify: fully covered / partially covered.
Can result in one full coverage, one or two partial coverages or no coverage.
Reduce to maximum level of coverage.
Display one row for every combination of (filter, code) with the resulting coverage, in a sensible sort order
Ad hoc filter ranges
WITH filter(filter_id, startdate, enddate) AS (
VALUES
(1, '2012-05-01'::date, '2012-06-05'::date) -- list filters here.
,(2, '2012-05-01', '2012-05-31')
,(3, '2012-06-01', '2012-06-30')
)
SELECT * FROM filter;
Or put them in a (temporary) table and use the table instead.
Combine overlapping / adjacent date ranges per code
WITH a AS (
SELECT code, startdate, enddate
,max(enddate) OVER (PARTITION BY code ORDER BY startdate) AS max_end
-- Calculate the cumulative maximum end of the ranges sorted by start
FROM invoice
), b AS (
SELECT *
,CASE WHEN lag(max_end) OVER (PARTITION BY code
ORDER BY startdate) + 2 > startdate
-- Compare to the cumulative maximum end of the last row.
-- Only if there is a gap, start a new group. Therefore the + 2.
THEN 0 ELSE 1 END AS step
FROM a
), c AS (
SELECT code, startdate, enddate, max_end
,sum(step) OVER (PARTITION BY code ORDER BY startdate) AS grp
-- Members of the same date range end up in the same grp
-- If there is a gap, the grp number is incremented one step
FROM b
)
SELECT code, grp
,min(startdate) AS startdate
,max(enddate) AS enddate
FROM c
GROUP BY 1, 2
ORDER BY 1, 2
Alternative final SELECT (may be faster or not, you'll have to test):
SELECT DISTINCT code, grp
,first_value(startdate) OVER w AS startdate
,last_value(enddate) OVER w AS enddate
FROM c
WINDOW W AS (PARTITION BY code, grp ORDER BY startdate
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
ORDER BY 1, 2;
Combine to one query
WITH
-- supply one or more filter values
filter(filter_id, startdate, enddate) AS (
VALUES
(1, '2012-05-01'::date, '2012-06-05'::date) -- cast values in first row
,(2, '2012-05-01', '2012-05-31')
,(3, '2012-06-01', '2012-06-30')
)
-- combine date ranges per code
,a AS (
SELECT code, startdate, enddate
,max(enddate) OVER (PARTITION BY code ORDER BY startdate) AS max_end
FROM invoice
), b AS (
SELECT *
,CASE WHEN (lag(max_end) OVER (PARTITION BY code ORDER BY startdate)
+ 2) > startdate THEN 0 ELSE 1 END AS step
FROM a
), c AS (
SELECT code, startdate, enddate, max_end
,sum(step) OVER (PARTITION BY code ORDER BY startdate) AS grp
FROM b
), i AS ( -- substitutes original invoice table
SELECT code, grp
,min(startdate) AS startdate
,max(enddate) AS enddate
FROM c
GROUP BY 1, 2
)
-- match filters
, x AS (
SELECT f.filter_id, i.code
,bool_or(f.startdate >= i.startdate
AND f.enddate <= i.enddate) AS full_cover
FROM filter f
JOIN i ON i.enddate >= f.startdate
AND i.startdate <= f.enddate -- only overlapping
GROUP BY 1,2
)
SELECT f.*, i.code
,CASE x.full_cover
WHEN TRUE THEN 'fully covered'
WHEN FALSE THEN 'partially covered'
ELSE 'invoice missing'
END AS covered
FROM (SELECT DISTINCT code FROM i) i
CROSS JOIN filter f -- all combinations of filter and code
LEFT JOIN x USING (filter_id, code) -- join in overlapping
ORDER BY filter_id, code;
Tested and works for me on PostgreSQL 9.1.

Related

SQL Redshift conditional running sum based on the running sum result

I have a table with different invoice values (column "invoice") aggregated by month (column "date") and partner (column "partner"). I need to have as output a running sum that starts when my invoice value is negative and continues until the running sum becomes positive. Below is the representation of what would be the result I need:
In the date 2022-05-01, the invoice value is 100, so it should not be considered in the running sum. In the date 2022-06-01, the invoice value is negative (-250) so the running sum should start until it turns into a positive value (which happens in the date 2022-09-01). The same logic should happen continuously: In the date 2022-11-01 the running sum starts again and goes until 2023-01-01 when it becomes positive again.
What SQL query could get me the output I need?
I tried to perform a partition sum based on the partner and ordered by date when the invoice value is negative, however the running sum starts and stop in the negative values and don't consider the positive values in the sequence.
Select
date
,partner
,case when invoice > 0 then 1 else 0 end as index
,sum(invoice) over (partition by partner,index order by date)
from table t
The logic you describe requires iterating over the dataset. Window functions do not provide this feature: in SQL, we would use a recursive query to address this problem.
Redshift recently added support for recursive CTEs: yay! Consider:
with recursive
data (date, partner, invoice, rn) as (
select date, partner, invoice,
row_number() over(partition by partner order by date) rn
from mytable
),
cte (date, partner, invoice, rn, rsum) as (
select date, partner, invoice, rn, invoice
from data
where rn = 1
union all
select d.date, d.partner, d.invoice, d.rn,
case when c.rsum >= 0 and c.rn > 1 then d.invoice else c.rsum + d.invoice end
from cte c
inner join data d on d.partner = c.partner and d.rn = c.rn + 1
)
select * from cte order by partner, date

Rank the closest date, but not if date has already been used in previous row

Complicated to summarize the issue in the title, and in text so check the IMG for visual explanation.
I've got an issue joining two tables. The date from the first table (startdatetable) should get the next/closest date in the other table (enddatetable). This is to 99% easy done with rank, because most rows have a startdate that can find the next enddate, and before another enddate is available there is a new startdate.
However, if there are two startdates and only one enddate available the startdates will join the same enddate.
What I'm trying to do is, that if a date has been used in the row before, it should not be used in the next row.
The rows I want are highlighted.
The SQL I started out with looked like
select *
from (
select rank() over (partition by tid order by enddate asc) as rnk
, id, startdate, enddate
from startdateTable
inner join enddateTable on startdateTable.ID = enddateTable.id
and enddateTable.enddate > startdateTable.startdate
) q
where q.rnk = 1
This gets me the following result. The last row should instead get the 2100 date, since the 2020-06 date has been used in the previous row.
If you have two tables and you want to align them, then you can use row_number():
select s.id, s.startdate, e.enddate
from (select s.*, row_number() over (partition by id order by startdate) as seqnum
from startdateTable
) s join
(select e.*, row_number() over (partition by id order by enddate) as seqnum
from enddateTable
) e
on e.id = s.id and e.seqnum = s.seqnum

Finding the commencement date of a new project

Interested in a challenging SQL problem, read ahead:
For the data set below, I'm trying to find a logic which identifies the commencement date of a new project for each employee.
Data Set
The logic to identify commencement date of new project is that:
An employee will not have any date record prior to the present one in a 14 day time frame.
Project windows only last 14 days after the commencement. The first record falling outside such a window will be counted as the start of the next project.
What is needed
Both Redshift/ Postgres solutions accepted.
Please note Redshift doesn't support recursive CTEs or RANGE keyword in window frame.
Thanks for reading.
For Postgresql, including the CTE (DataSet) for the dataset, here you go:
WITH RECURSIVE TimeLine(Employee, ProjectID, ProjectStartDate, Date, DateRank) AS (
SELECT Employee, 1, Date, Date, DateRank
FROM DataSetWithRank
WHERE DateRank = 1
UNION ALL
SELECT T.Employee,
T.ProjectID + CASE When D.Date >= T.ProjectStartDate+14 THEN 1 Else 0 END,
CASE When D.Date >= T.ProjectStartDate+14 THEN D.Date Else T.ProjectStartDate END,
D.Date, D.DateRank
FROM TimeLine T
JOIN DataSetWithRank D ON D.Employee = T.Employee AND D.DateRank = T.DateRank + 1
), DataSet(Employee,Date) AS (
SELECT UNNEST(ARRAY['Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1','Employee1']),
UNNEST(ARRAY['2018-01-01','2018-01-03','2018-01-05','2018-01-08','2018-01-11','2018-01-13','2018-01-14','2018-01-16','2018-01-18','2018-01-21','2018-01-22','2018-01-24','2018-01-25','2018-01-27','2018-01-29']::date[])
UNION
SELECT UNNEST(ARRAY['Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2','Employee2']),
UNNEST(ARRAY['2018-01-03','2018-01-05','2018-01-07','2018-01-10','2018-01-13','2018-01-15','2018-01-16','2018-01-18','2018-01-20','2018-01-23','2018-01-24','2018-01-26','2018-01-27','2018-01-29','2018-01-31']::date[])
), DataSetWithRank AS (
SELECT *, DENSE_RANK() OVER (PARTITION BY Employee ORDER BY Date) AS DateRank
FROM DataSet
)
SELECT Employee,
'Project ' || ProjectID AS "Project #",
Date,
DENSE_RANK() OVER (PARTITION BY Employee, ProjectID ORDER BY Date) AS Rank,
CASE WHEN Date = ProjectStartDate THEN 'Y' ELSE NULL END AS Is_New
FROM TimeLine

Find the date after a gap in date range in sql

I have these date ranges that represent start and end dates of subscription. There are no overlaps in date ranges.
Start Date End Date
1/5/2015 - 1/14/2015
1/15/2015 - 1/20/2015
1/24/2015 - 1/28/2015
1/29/2015 - 2/3/2015
I want to identify delays of more than 1 day between any subscription ending and a new one starting. e.g. for the data above, i want the output: 1/24/2015 - 1/28/2015.
How can I do this using a sql query?
Edit : Also there can be multiple gaps in the subscription date ranges but I want the date range after the latest one.
You do this using a left join or not exists:
select t.*
from t
where not exists (select 1
from t t2
where t2.enddate = dateadd(day, -1, t.startdate)
);
Note that this will also give you the first record in the sequence . . . which, strictly speaking, matches the conditions. Here is one solution to that problem:
select t.*
from t cross join
(select min(startdate) as minsd from t) as x
where not exists (select 1
from t t2
where t2.enddate = dateadd(day, -1, t.startdate)
) and
t.startdate <> minsd;
You can also approach this with window functions:
select t.*
from (select t.*,
lag(enddate) over (order by startdate) as prev_enddate,
min(startdate) over () as min_startdate
from t
) t
where minstartdate <> startdate and
enddate <> dateadd(day, -1, startdate);
Also note that this logic assumes that the time periods do not overlap. If they do, a clearer problem statement is needed to understand what you are really looking for.
You can achieve this using window function LAG() that would get value from previous row in ordered set for later comparison in WHERE clause. Then, in WHERE you just apply your "gapping definition" and discard the first row.
SQL FIDDLE - Test it!
Sample data:
create table dates(start_date date, end_date date);
insert into dates values
('2015-01-05','2015-01-14'),
('2015-01-15','2015-01-20'),
('2015-01-24','2015-01-28'), -- gap
('2015-01-29','2015-02-03'),
('2015-02-04','2015-02-07'),
('2015-02-09','2015-02-11'); -- gap
Query
SELECT
start_date,
end_date
FROM (
SELECT
start_date,
end_date,
LAG(end_date, 1) OVER (ORDER BY start_date) AS prev_end_date
FROM dates
) foo
WHERE
start_date IS DISTINCT FROM ( prev_end_date + 1 ) -- compare current row start_date with previous row end_date + 1 day
AND prev_end_date IS NOT NULL -- discard first row, which has null value in LAG() calculation
I assume that there are no overlaps in your data and that there are unique values for each pair. If that's not the case, you need to clarify this.

Find date ranges between large gaps and ignore smaller gaps

I have a column of a mostly continous unique dates in ascending order. Although the dates are mostly continuos, there are some gaps in the dates of less than 3 days, others have more than 3 days.
I need to create a table where each record has a start date and an end date of the range that includes a gap of 3 days or less. But a new record has to be generated if the gap is longer than 3 days.
so if dates are:
1/2/2012
1/3/2012
1/4/2012
1/15/2012
1/16/2012
1/18/2012
1/19/2012
I need:
1/2/2012 1/4/2012
1/15/2012 1/19/2012
You can do something like this:
WITH CTE_Source AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY DT) RN
FROM dbo.Table1
)
,CTE_Recursion AS
(
SELECT *, 1 AS Grp
FROM CTE_Source
WHERE RN = 1
UNION ALL
SELECT src.*, CASE WHEN DATEADD(DD,3,rec.DT) < src.DT THEN rec.Grp + 1 ELSE Grp END AS Grp
FROM CTE_Source src
INNER JOIN CTE_Recursion rec ON src.RN = rec.RN +1
)
SELECT
MIN(DT) AS StartDT, MAX(DT) AS EndDT
FROM CTE_Recursion
GROUP BY Grp
First CTE is just to assign continuous numbers for all rows in order to join them later. Then using recursive CTE you can join on each next row assigning groups if date difference is larger than 3 days. In the end just group by grouping column and select desired results.
SQLFiddle DEMO