Determining the the Length of a Date Range Overlap - sql

I have a constraint that I have two date ranges. One of them will never change it is a measurement period (Date Range A). I need to try and find if the second Date range (Date Range B) Overlaps with A for a length of at least 6 months. What would be the best way to go about this?
I have considered trying to compare the start and end dates of the two ranges in different ways depending on how the two intersect, but have yet to hit upon a solid methodology.

You didn't say what's the database so I worked an example in PostgreSQL:
create table a (
since date,
upto date
);
insert into a (since, upto) values ('2017-01-01', '2017-10-25');
create table b (
since date,
upto date
);
insert into b (since, upto) values ('2017-04-24', '2018-01-15');
insert into b (since, upto) values ('2017-06-01', '2018-01-15');
insert into b (since, upto) values ('2016-08-15', '2017-10-04');
select a.*, b.*
from a, b
where a.upto > b.since + interval '6 month' and a.since < b.since
or b.upto > a.since + interval '6 month' and b.since < a.since
Result:
since upto since upto
------------------------------------------------------------
2017-01-01 2017-10-25 2017-04-24 2018-01-15
2017-01-01 2017-10-25 2016-08-15 2017-10-04

Related

Back-filling time-series data with previous time's values

I have a table of time-series data that has some gaps in the series. An example of the data is below:
Date
Value
2022-11-17
1
2022-11-14
2
I want to insert rows for the dates between the existing rows (2022-11-15, 2022-11-16) that have the value of the latest date before the date being inserted (the 2022-11-14 row).
I started by using an imperative solution in my application programming language but I'm convinced there must be a way to do this in SQL.
demo:db<>fiddle
INSERT INTO mytable -- 5
SELECT
generate_series( -- 1
mydate + 1, -- 2
lead(mydate) OVER (ORDER BY mydate) - 1, -- 3
interval '1 day'
)::date as gs,
t.myvalue -- 4
FROM mytable t;
Use generate_series() to generate date series
Start of your date series is the next day of the row's date value
End of your date series is the day before the next row's date value. Here the lead() window function is used to access the next row
Use the generated dates from the function and the value of the actual row for the newly generated rows.
Finally insert them into your table.

Returning Sum of all rows that fit date criteria

I've really searched but couldn't find an answer to this one, I have a simple table in my postgres DB;
start_date | end_date | amount
Because the dates aren't continuous and because of the nature of needing a "snapshot date" I'm using a generate_series to create a separate table and attempting to join the two. What I need is that for every row where the date in my generate_series table falls between the start and end date in my table I will sum all of those rows and put that amount next to the date in my generate_series table.
I am not sure how I join the two tables and I feel I need to have some kind of loop that loops through all relevant rows and sums them. Ideally the solution would all be in SQL so that I can plug into Looker as a derived table without the need for pre ETLs.
Any help/thoughts would be greatly appreciated
Thanks
I hope I understood correctly that this is a join of tables. I include a sample of datum that is not in any range. Such has a sum of 0.
SQL:
select date, coalesce(sum(amount),0) as sum from
(select dates.date as date, snap.amount as amount
from dates left join snap
on
dates."date" > snap."start date"
and dates."date" < snap."end date") as a
group by date
order by date
;
Input:
Table dates
date
2021-01-02T00:00:00Z
2021-01-09T00:00:00Z
2021-01-13T00:00:00Z
2021-01-20T00:00:00Z
Table snap
start date
end date
amount
2021-01-01T00:00:00Z
2021-01-10T00:00:00Z
2
2021-01-08T00:00:00Z
2021-01-15T00:00:00Z
5
Output:
date
sum
2021-01-02T00:00:00Z
2
2021-01-09T00:00:00Z
7
2021-01-13T00:00:00Z
5
2021-01-20T00:00:00Z
0
DDL:
CREATE TABLE snap
("start date" timestamp, "end date" timestamp, "amount" int)
;
INSERT INTO snap
("start date", "end date", "amount")
VALUES
('2021-01-01 00:00:00', '2021-01-10 00:00:00', 2),
('2021-01-08 00:00:00', '2021-01-15 00:00:00', 5)
;
CREATE TABLE dates
("date" timestamp)
;
INSERT INTO dates
("date")
VALUES
('2021-01-02 00:00:00'),
('2021-01-09 00:00:00'),
('2021-01-13 00:00:00'),
('2021-01-20 00:00:00')
;

How do I add number of days from original dates in new column

Hopefully a quick one on BigQuery
I've tried intervals and days but can't quite seem to get what I want. For date row on the example table below I want and adjacent row in a new column that just adds 42 days to the original date and time (time is always 00:00:00 if that helps).
Desired output below:
original_datetime
date_time_plus_42_days
2016-04-01T00:00:00
plus 42 days to left column
2016-05-04T00:00:00
plus 42 days to left column
2018-05-17T00:00:00
plus 42 days to left column
2019-09-01T00:00:00
plus 42 days to left column
2016-04-01T00:00:00
plus 42 days to left column
Consider also below approach with explicit use of interval data type
select original_datetime,
original_datetime + interval 42 day as date_time_plus_42_days
from your_table
if applied to sample data in your question
with your_table as (
select datetime '2016-04-01T00:00:00' original_datetime union all
select '2016-05-04T00:00:00' union all
select '2018-05-17T00:00:00' union all
select '2019-09-01T00:00:00' union all
select '2016-04-01T00:00:00'
)
output is
Benefit of using interval data type is that in one shot you can add multiple units - for example not just days but also hours as in example below
select original_datetime,
original_datetime + make_interval(day => 42, hour => 5) as date_time_plus_42_days
from your_table
with output
The function you are looking for is called: DATETIME_ADD. It is documented here.
For instance:
WITH table AS (
SELECT DATETIME("2016-04-01T00:00:00") AS datetime)
SELECT
datetime,
DATETIME_ADD(datetime, INTERVAL 42 DAY) as datetime_plus_42
FROM table;

SQL - Select query not displaying all dates

I have a table that has a start date, an end date, and the pay period information according to the start and end dates. When I try to find the pay period information with a date range, the very first pay period information does not show in the result.
For example, when I run the following query:
select *
FROM PayPeriod
where start_date between '2020-12-01' and '2020-12-21'
I should see the following result:
Start_date End_date Pay_perild
2020-11-22 2020-12-05 2020-12-wk1
2020-12-06 2020-12-19 2020-12-wk3
2020-12-20 2021-01-02 2021-01-wk1
Instead, I get:
Start_date End_date Pay_period
2020-12-06 2020-12-19 2020-12-wk3
2020-12-20 2021-01-02 2021-01-wk1
The date range and the pay period that includes '2020-12-01' is omitted. Why isn't it showing, and how do I correct this?
Looks like I've got what you wanted. You need to intersect two time intervals.
To find intersecting intervals of two tables (say TableA and TableB, I use tables as more general case to distinguish two intervals by their meaning/role/whatever) you need to compare begin date of one table with end date of another for both tables (putting each of them to "one" and "another" role):
TableA.start_date < TableB.end_date
and TableB.start_date < TableA.end_date
It is the rule for the case where your intervals are continuous, e.g. end_date of one period is "just before" (like real numbers) of the start_dare of another (so all items in the interval will have start_date <= item_date < end_date). For discrete intervals (like days, where duration of one day will have identical values of start_date and end_date) there would be <= in intersection condition.
So, your query will look like
DECLARE #period_from date = CONVERT ('2020-12-01' 23),
#period_to date = CONVERT ('2020-12-21', 23);
select *
FROM PayPeriod
where start_date < #period_to /*or <= depending on inclusion of end_date*/
and #period_from < end_date /*or <= depending on inclusion of end_date*/
The query is returning the result as you instructed. It's working perfectly. In your query you put the date range condition one the start_date:
where start_date between '2020-12-01' and '2020-12-21'
The first row that you expect has start_date = 2020-11-22. This date is not in the rage you specified in the condition.
If you want the first row in the result set simply you need to change the condition.
If you have to put condition on the start date you have to make the date range wider in the condition. For example:
SELECT *
FROM PayPeriod
WHERE start_date between '2020-11-21' and '2020-12-21'
Based on your application requirement you have to arrange the condition.

How to find first free time in reservations table in PostgreSql

Reservation table contains reservations start dates, start hours and durations.
Start hour is by half hour increments in working hours 8:00 .. 18:00 in work days.
Duration is also by half hour increments in day.
CREATE TABLE reservation (
startdate date not null, -- start date
starthour numeric(4,1) not null , -- start hour 8 8.5 9 9.5 .. 16.5 17 17.5
duration Numeric(3,1) not null, -- duration by hours 0.5 1 1.5 .. 9 9.5 10
primary key (startdate, starthour)
);
table structure can changed if required.
How to find first free half hour in table which is not reserved ?
E.q if table contains
startdate starthour duration
14 9 1 -- ends at 9:59
14 10 1.5 -- ends at 11:29, e.q there is 30 minute gap before next
14 12 2
14 16 2.5
result should be:
starthour duration
11.5 0.5
Probably PostgreSql 9.2 window function should used to find
first row whose starthour is greater than previous row starthour + duration
How to write select statement which returns this information ?
Postgres 9.2 has range type and I would recommend to use them.
create table reservation (reservation tsrange);
insert into reservation values
('[2012-11-14 09:00:00,2012-11-14 10:00:00)'),
('[2012-11-14 10:00:00,2012-11-14 11:30:00)'),
('[2012-11-14 12:00:00,2012-11-14 14:00:00)'),
('[2012-11-14 16:00:00,2012-11-14 18:30:00)');
ALTER TABLE reservation ADD EXCLUDE USING gist (reservation WITH &&);
"EXCLUDE USING gist" creates index which disallows to inset overlapping entries. You can use the following query to find gaps (variant of vyegorov's query):
with gaps as (
select
upper(reservation) as start,
lead(lower(reservation),1,upper(reservation)) over (ORDER BY reservation) - upper(reservation) as gap
from (
select *
from reservation
union all values
('[2012-11-14 00:00:00, 2012-11-14 08:00:00)'::tsrange),
('[2012-11-14 18:00:00, 2012-11-15 00:00:00)'::tsrange)
) as x
)
select * from gaps where gap > '0'::interval;
'union all values' masks out non working times hence you can make reservation between 8am and 18pm only.
Here is the result:
start | gap
---------------------+----------
2012-11-14 08:00:00 | 01:00:00
2012-11-14 11:30:00 | 00:30:00
2012-11-14 14:00:00 | 02:00:00
Documentation links:
- http://www.postgresql.org/docs/9.2/static/rangetypes.html "Range Types"
- https://wiki.postgresql.org/images/7/73/Range-types-pgopen-2012.pdf
Maybe not the best query, but it does what you want:
WITH
times AS (
SELECT startdate sdate,
startdate + (floor(starthour)||'h '||
((starthour-floor(starthour))*60)||'min')::interval shour,
startdate + (floor(starthour)||'h '||
((starthour-floor(starthour))*60)||'min')::interval
+ (floor(duration)||'h '||
((duration-floor(duration))*60)||'min')::interval ehour
FROM reservation),
gaps AS (
SELECT sdate,shour,ehour,lead(shour,1,ehour)
OVER (PARTITION BY sdate ORDER BY shour) - ehour as gap
FROM times)
SELECT * FROM gaps WHERE gap > '0'::interval;
Some notes:
It will be better not to separate time and data of the event. If you have to, then use standard types;
If it is not possible to go with standard types, create function to convert numeric hours into the time format.