SQL: Merge Overlapping Date Ranges - sql

I have two tables both containing date ranges, the first table contains the default record ID that applies during the date range:
STARTDATE | ENDDATE | RECORDID
__________________________________________________
2022/Nov/01 00:00 | 2022/Nov/30 00:00 | 10
2022/Dec/01 00:00 | 2022/Dec/31 00:00 | 16
The second table contains the override record id that overrides the default record ID for the date range specified if present, if there isnt an override for the date in question then the default record represents that date range:
STARTDATE | ENDDATE | RECORDID
__________________________________________________
2022/Nov/14 00:00 | 2022/Nov/16 00:00 | 12
2022/Dec/06 00:00 | 2022/Dec/20 00:00 | 18
The override table records always fall within the dates covered by dates in the default table.
The end result for the above scenario should look like this:
STARTDATE | ENDDATE | RECORDID
__________________________________________________
2022/Nov/01 00:00 | 2022/Nov/14 00:00 | 10
2022/Nov/14 00:00 | 2022/Nov/16 00:00 | 12
2022/Nov/16 00:00 | 2022/Nov/30 00:00 | 10
2022/Dec/01 00:00 | 2022/Dec/06 00:00 | 16
2022/Dec/06 00:00 | 2022/Dec/20 00:00 | 18
2022/Dec/20 00:00 | 2022/Dec/31 00:00 | 16
Is there a way to do this with SQL Server 2019? I made an attempt but it was filled with clumsy loops and the end result was incorrect and would not have been efficient at all.

Try the following query, maybe there is a simpler way to do it, but the following works.
/*
left join the two tables to get the overlapped date ranges.
the left join is used in case there is no default date range
in the default table that overlapped with the override table date range.
*/
with overlapped_ranges as
(
select D.StartDate def_stdt, D.EndDate def_endt, D.RecordId def_rid,
O.StartDate ovr_stdt, O.EndDate ovr_endt, O.RecordId ovr_rid
from override_records O left join default_records D
on O.StartDate <= D.EndDate and O.EndDate >= D.StartDate
),
/*
from the joined tables, get all of the date range boundaries as values
using cross-apply. you could run (select * from get_all_ranges_boundaries)
to see the output of this query.
*/
get_all_ranges_boundaries as
(
select OV.*, dt
from overlapped_ranges OV
cross apply
(
select *
from (values (def_stdt), (def_endt), (ovr_stdt), (ovr_endt)) as T(dt)
) all_dates_boundaries
),
/*
now we have all the date range boundaries, so the start date, end date values
will be the date boundary as the start date and the next date boundary as the end date.
i.e.
Values('2022-11-01','2022-11-30','2022-11-14','2022-11-16') will be
('2022-11-01'-'2022-11-14'),('2022-11-14'-'2022-11-16'),('2022-11-16'-'2022-11-30')
*/
stDate_enDate as
(
select dt StartDate,
lead(dt) over (partition by def_stdt, def_endt, ovr_stdt, ovr_endt order by dt) EndDate,
case
when dt >= ovr_stdt and lead(dt) over (partition by def_stdt, def_endt, ovr_stdt, ovr_endt order by dt) <= ovr_endt
then ovr_rid else def_rid
end as RecordId -- check if the populated date range occurs within the override date range, if yes select override id, else select default id.
from get_all_ranges_boundaries
)
select StartDate, EndDate, RecordId
from stDate_enDate
where EndDate is not null and -- the last date range boundary has no lead value, this is to exclude the last row for each overlapped pair.
StartDate is not null and -- exclude null values where an override range does not overlap with default ranges. (from the left join query)
StartDate <> EndDate -- exclude rows where StartDate = EndDate i.e. ('2022-11-01',**'2022-11-05','2022-11-05'**,'2022-11-16')
order by StartDate, EndDate
See demo.

Related

Postgres SQL Join on Nearest less than quarter end

I have table 1
ID | public_date
1 | 1992-06-03
2 | 2000-12-15
Table 2 is a series of the quarter end dates in a range
Date
1995-12-31
1996-03-31
..
..
2000-12-31
I would like to have the result table as
ID | date | public_date
1 | 1995-12-31 | 1992-06-03
1 | 1996-03-31 | 1992-06-03
1 | 1996-06-30 | 1992-06-03
...
...
1 | 2000-12-31 | 2000-12-15
Basically, assign the public date to the nearest quarter end date. Currently, I have this query
SELECT DISTINCT ON (x."date")
x."date", r.public_date
FROM quarter_end_series as x
LEFT JOIN public_time r ON r.public_date <= x."date"
where x.date >= '1995-12-31 00:00:00'
ORDER BY x."date", r.outlookdate desc;
But this query took 4 hours, any way to do it more efficiently?
Try a subquery:
select pt.*,
(select qes.date
from quarter_end_series qes
where qes.date <= pt.date
order by qes.date desc
) as quarter_end_date
from public_time pt;
Include an index on quarter_end_series(date).
This saves the sorting on a large amount of data -- which should make this more performant.
I guess your quarters are fixed for each year. Like:
1995-12-31
1996-03-31
1996-06-30
1996-09-31
1996-12-31
.... and so on
If it is then just find closest date from fixed quarter dates.
If quarter_end_series is not same dates for each year. You can try subquery instead of join. Like below:
SELECT DISTINCT ON ("date")
"date", (SELECT r.public_date FROM public_time r ORDER BY abs(date_diff(x."date",r.public_date)) ASC limit 1) as public_date
FROM quarter_end_series as x
where x.date >= '1995-12-31 00:00:00'
ORDER BY x."date";

Calculating cumulative sum with date filtering in PostgreSQL

I have table users with the following values:
id | created_at
-------+---------------------
20127 | 2015-01-31 04:23:46
21468 | 2015-02-04 07:50:34
21571 | 2015-02-04 08:23:50
20730 | 2015-03-12 10:20:16
19955 | 2015-03-30 07:44:35
20148 | 2015-04-17 13:03:26
21552 | 2015-05-07 19:00:00
20145 | 2015-06-02 03:12:46
21467 | 2015-06-03 13:21:51
21074 | 2015-07-03 19:00:00
I want to:
find the cumulative sum for number of users over time (return count of users for every day in the date range, not just for the days that exist in the database)
be able to filter that sum by date, so if I put the date that is after some row, that row should be included in the cumulative sum (everything before the range specified should be included in the first sum, it shouldn't start counting from 0 at the beginning of the range specified)
return results grouped by each day in epoch format
I'm trying to achieve this with the following SQL:
SELECT extract(epoch from created_at)::bigint,
sum(count(id)::integer) OVER (ORDER BY created_at)
FROM data_users
WHERE created_at IS NOT NULL
GROUP BY created_at
But it's not working as expected since I can't add filtering by date here, without excluding records from the cumulative sum. Also it doesn't take into account days that have been missed (those for which the users don't exist).
Any help greatly appreciated.
As far as I understand your question a simple query with GROUP BY should be enough. You can use a left outer join with GENERATE_SERIES() to get all dates in the range. If you have the start and end date of the range, you can use this:
SELECT EXTRACT(EPOCH FROM d)::BIGINT, COALESCE(COUNT(u.id), 0)
FROM GENERATE_SERIES(start, end, '1 DAY'::INTERVAL) d
LEFT OUTER JOIN data_users u ON u.created_at::DATE = d
GROUP BY 1 ORDER BY 1
You can determine start and end from your table, too:
SELECT EXTRACT(EPOCH FROM d.date)::BIGINT, COALESCE(COUNT(u.id), 0)
FROM
(SELECT GENERATE_SERIES(MIN(created_at)::DATE, MAX(created_at)::DATE, '1 DAY'::INTERVAL) AS date
FROM data_users) d
LEFT OUTER JOIN data_users u ON u.created_at::DATE = d.date::DATE
GROUP BY 1 ORDER BY 1;
This returns:
date_part | coalesce
------------+----------
1422662400 | 1
1422748800 | 0
1422835200 | 0
1422921600 | 0
1423008000 | 2
1423094400 | 0
1423180800 | 0
...
1435536000 | 0
1435622400 | 0
1435708800 | 0
1435795200 | 0
1435881600 | 1
With this query you can get the cumulative sum for the rows before a start date:
SELECT EXTRACT(EPOCH FROM GREATEST(d.date, start))::BIGINT, COALESCE(COUNT(u.id), 0)
FROM
(SELECT GENERATE_SERIES(MIN(created_at)::DATE, MAX(created_at)::DATE, '1 DAY'::INTERVAL) AS date
FROM data_users) d
LEFT OUTER JOIN data_users u ON u.created_at::DATE = d.date::DATE
GROUP BY 1 ORDER BY 1;

Discard existing dates that are included in the result, SQL Server

In my database I have a Reservation table and it has three columns Initial Day, Last Day and the House Id.
I want to count the total days and omit those who are repeated, for example:
+-------------+------------+------------+
| | Results | |
+-------------+------------+------------+
| House Id | InitialDay | LastDay |
+-------------+------------+------------+
| 1 | 2017-09-18 | 2017-09-20 |
| 1 | 2017-09-18 | 2017-09-22 |
| 19 | 2017-09-18 | 2017-09-22 |
| 20 | 2017-09-18 | 2017-09-22 |
+-------------+------------+------------+
If you noticed the House Id with the number 1 has two rows, and each row has dates but the first row is in the interval of dates of the second row. In total the number of days should be 5 because the first shouldn't be counted as those days already exist in the second.
The reason why this is happening is that each house has two rooms, and different persons can stay in that house on the same dates.
My question is: how can I omit those cases, and only count the real days the house was occupied?
In your are using SQL Server 2012 or higher you can use LAG() to get the previous final date and adjust the initial date:
with ReservationAdjusted as (
select *,
lag(LastDay) over(partition by HouseID order by InitialDay, LastDay) as PreviousLast
from Reservation
)
select HouseId,
sum(case when PreviousLast>LastDay then 0 -- fully contained in the previous reservation
when PreviousLast>=InitialDay then datediff(day,PreviousLast,LastDay) -- overlap
else datediff(day,InitialDay,LastDay)+1 -- no overlap
end) as Days
from ReservationAdjusted
group by HouseId
The cases are:
The reservation is fully included in the previous reservation: we only need to compare end dates because the previous row is obtained ordering by InitialDay, LastDay, so the previous start date is always minor or equal than the current start date.
The current reservation overlaps with the previous: in this case we adjust the start and don't add 1 (the initial day is already counted), this case include when the previous end is equal to the current start (is a one day overlap).
There is no overlap: we just calculate the difference and add 1 to count also the initial day.
Note that we don't need extra condition for the reservation of a HouseID because by default the LAG() function returns NULL when there isn't a previous row, and comparisons with null always are false.
Sample input and output:
| HouseId | InitialDay | LastDay |
|---------|------------|------------|
| 1 | 2017-09-18 | 2017-09-20 |
| 1 | 2017-09-18 | 2017-09-22 |
| 1 | 2017-09-21 | 2017-09-22 |
| 19 | 2017-09-18 | 2017-09-27 |
| 19 | 2017-09-24 | 2017-09-26 |
| 19 | 2017-09-29 | 2017-09-30 |
| 20 | 2017-09-19 | 2017-09-22 |
| 20 | 2017-09-22 | 2017-09-26 |
| 20 | 2017-09-24 | 2017-09-27 |
| HouseId | Days |
|---------|------|
| 1 | 5 |
| 19 | 12 |
| 20 | 9 |
select house_id,min(initialDay),max(LastDay)
group by houseId
If I understood correctly!
Try out and let me know how it works out for you.
Ted.
While thinking through your question I came across the wonder that is the idea of a Calendar table. You'd use this code to create one, with whatever range of dates your want for your calendar. Code is from http://blog.jontav.com/post/9380766884/calendar-tables-are-incredibly-useful-in-sql
declare #start_dt as date = '1/1/2010';
declare #end_dt as date = '1/1/2020';
declare #dates as table (
date_id date primary key,
date_year smallint,
date_month tinyint,
date_day tinyint,
weekday_id tinyint,
weekday_nm varchar(10),
month_nm varchar(10),
day_of_year smallint,
quarter_id tinyint,
first_day_of_month date,
last_day_of_month date,
start_dts datetime,
end_dts datetime
)
while #start_dt < #end_dt
begin
insert into #dates(
date_id, date_year, date_month, date_day,
weekday_id, weekday_nm, month_nm, day_of_year, quarter_id,
first_day_of_month, last_day_of_month,
start_dts, end_dts
)
values(
#start_dt, year(#start_dt), month(#start_dt), day(#start_dt),
datepart(weekday, #start_dt), datename(weekday, #start_dt), datename(month, #start_dt), datepart(dayofyear, #start_dt), datepart(quarter, #start_dt),
dateadd(day,-(day(#start_dt)-1),#start_dt), dateadd(day,-(day(dateadd(month,1,#start_dt))),dateadd(month,1,#start_dt)),
cast(#start_dt as datetime), dateadd(second,-1,cast(dateadd(day, 1, #start_dt) as datetime))
)
set #start_dt = dateadd(day, 1, #start_dt)
end
select *
into Calendar
from #dates
Once you have a calendar table your query is as simple as:
select distinct t.House_id, c.date_id
from Reservation as r
inner join Calendar as c
on
c.date_id >= r.InitialDay
and c.date_id <= r.LastDay
Which gives you a row for each unique day each room was occupied. If you need a sum of how many days each room was occupied it becomes:
select a.House_id, count(a.House_id) as Days_occupied
from
(select distinct t.House_id, c.date_id
from so_test as t
inner join Calendar as c
on
c.date_id >= t.InitialDay
and c.date_id <= t.LastDay) as a
group by a.House_id
Create a table of all the possible dates and then join it to the Reservations table so that you have a list of all days between InitialDay and LastDay. Like this:
DECLARE #i date
DECLARE #last date
CREATE TABLE #temp (Date date)
SELECT #i = MIN(Date) FROM Reservations
SELECT #last = MAX(Date) FROM Reservations
WHILE #i <= #last
BEGIN
INSERT INTO #temp VALUES(#i)
SET #i = DATEADD(day, 1, #i)
END
SELECT HouseID, COUNT(*) FROM
(
SELECT DISTINCT HouseID, Date FROM Reservation
LEFT JOIN #temp
ON Reservation.InitialDay <= #temp.Date
AND Reservation.LastDay >= #temp.Date
) AS a
GROUP BY HouseID
DROP TABLE #temp

Select record by given date from and date till.

I having the records as below
StartDate | EndDate | ID
---------------------------------
25-12-2016 30-12-2016 0
01-01-2017 05-01-2017 1
10-01-2017 12-01-2017 2
01-02-2017 05-02-2017 3
By given selecting the Date Range from 02-01-2017 till 11-01-2017 , How do we select the record Startdate n EndDate that is fall on between the Date Range given as expected?
Would like to expect table result as below
StartDate | EndDate | ID
------------------------------
01-01-2017 05-01-2017 1
10-01-2017 12-01-2017 2
So, basically you are asking how to check if two date ranges overlap.
The way to do this is to check that one starts before the other ends, while the other starts before one ends. You can see a visualization in the overlap tag wiki.
Your query should be something like this:
SELECT StartDate, EndDate, ID
FROM YourTable
WHERE StartDate <= '11-01-2017'
AND EndDate >= '02-01-2017'
Try the below query,
DECLARE #V_START_DATE DATETIME = '2017-01-02'
,#V_END_DATE DATETIME = '2017-01-11'
SELECT *
FROM #TABLE
WHERE StartDate BETWEEN #V_START_DATE AND #V_END_DATE
OR EndDate BETWEEN #V_START_DATE AND #V_END_DATE
Try the below Query
SELECT * FROM DateRanges
WHERE StartDate BETWEEN '02-01-2017' and '11-01-2017'
OR ENDdate BETWEEN '02-01-2017' and '11-01-2017'

sql how to add missing week to the table

I have a table that has this data: Date when the employees reported and the week start-date(Monday) for that week. Now they did not work all the dates. For example there is no data on week of christmas. Is there a way I can add the missing week.So, I will still have the week start-date for each and every week. But the report-date can be null.
I cannot declare variables
This is what I have
and this is what i want to add the missing week
Query
SQLFIDDLEEXAMPLE:
CREATE TABLE tb
(
d1 date,
d2 date
);
INSERT INTO tb
(d1, d2)
VALUES
('2015-12-10', '2015-12-07'),
('2015-12-15', '2015-12-14'),
('2015-12-29', '2015-12-28'),
('2016-01-05', '2016-01-04');
SET DATEFIRST 1
INSERT INTO tb
( d1, d2 )
select null, DATEADD(day,number,'2015-01-01')
FROM master..spt_values t1
LEFT JOIN tb t2
ON DATEADD(day,number,'2015-01-01') = t2.d2
WHERE type = 'P'
AND DATEADD(day,number,'2015-01-01') >= '2015-12-01'
AND DATEADD(day,number,'2015-01-01') <= '2016-01-04'
AND DATEPART(weekday,DATEADD(day,number,'2015-01-01')) = 1
AND t2.d2 is null
SELECT *
FROM tb
Result:
| d1 | d2 |
|------------|------------|
| 2015-12-10 | 2015-12-07 |
| 2015-12-15 | 2015-12-14 |
| 2015-12-29 | 2015-12-28 |
| 2016-01-05 | 2016-01-04 |
| (null) | 2015-12-21 |
You can create a new Calendar/Weeks table containing all the weeks in the year. This table should be in advance.
You can then make a reference from your data table to this calendar table (by id or week/year).
Your report should be based on the calendar table with an outer join to your data table.
This way your report will contain all weeks even if some weeks don't have any data.
EDIT: You would need a new table like this:
Week:
| Start date | End date |
| 12/07/15 | 12/13/15 |
| 12/14/15 | 12/20/15 |
| 12/21/15 | 12/27/15 |
etc...
Assuming that #weekly_calendar table contains your valid work weeks (i.e., for Dec 2015). By the way, syntax is for MSSQL. You should specify what database you are using.
You can also dynamically create the calendar on run-time. This is just to show the concept in an easy to understand way.
-- week start dates
-- 2015-12-01
-- 2015-12-07
-- 2015-12-14
-- 2015-12-21
-- 2015-12-28
create table #weekly_calendar (
week_start_date datetime,
week_end_date datetime
)
Assuming that #report_date contains the report date of the employee.
-- report dates
-- 2015-12-02
-- 2015-12-15
-- 2015-12-29
create table #report_date (
report_date datetime
)
This is how you display the unreported dates.
select * from #weekly_calendar w
left join #report_date r
on r.report_date between w.week_start_date and w.week_end_date
If you do not have the week_end_date. Again, assuming your work days start from Monday to Friday.
select * from #weekly_calendar w
left join #report_date r
on r.report_date between w.week_start_date and DATEADD(dd, 6-(DATEPART(dw, w.week_end_date)), w.week_end_date)