Splitting interval overlapping more days in PostgreSQL - sql

I have a PostgreSQL table containing start timestamp and duration time.
timestamp | interval
------------------------------
2018-01-01 15:00:00 | 06:00:00
2018-01-02 23:00:00 | 04:00:00
2018-01-04 09:00:00 | 2 days 16 hours
What I would like is to have the interval splitted into every day like this:
timestamp | interval
------------------------------
2018-01-01 15:00:00 | 06:00:00
2018-01-02 23:00:00 | 01:00:00
2018-01-03 00:00:00 | 03:00:00
2018-01-04 09:00:00 | 15:00:00
2018-01-05 00:00:00 | 24:00:00
2018-01-06 00:00:00 | 24:00:00
2018-01-07 00:00:00 | 01:00:00
I am playing with generate_series(), width_bucket(), range functions, but I still can't find plausible solution. Is there any existing or working solution?

not sure about all edge cases, but this seems working:
t=# with c as (select *,min(t) over (), max(t+i) over (), tsrange(date_trunc('day',t),t+i) tr from t)
, mid as (
select distinct t,i,g,tr
, case when g < t then t else g end tt
from c
right outer join (select generate_series(date_trunc('day',min),date_trunc('day',max),'1 day') g from c) e on g <# tr order by 3,1
)
select
tt
, i
, case when tt+'1 day' > upper(tr) and t < g then upper(tr)::time::interval when upper(tr) - lower(tr) < '1 day' then i else g+'1 day' - tt end
from mid
order by tt;
tt | i | case
---------------------+-----------------+----------
2018-01-01 15:00:00 | 06:00:00 | 06:00:00
2018-01-02 23:00:00 | 04:00:00 | 01:00:00
2018-01-03 00:00:00 | 04:00:00 | 03:00:00
2018-01-04 09:00:00 | 2 days 16:00:00 | 15:00:00
2018-01-05 00:00:00 | 2 days 16:00:00 | 1 day
2018-01-06 00:00:00 | 2 days 16:00:00 | 1 day
2018-01-07 00:00:00 | 2 days 16:00:00 | 01:00:00
(7 rows)
also please mind that timestamp without time zone can fail you when comparing timestamps...

Related

SQL-Aggregate Timeseries Table (HourOfDay, Val) to Average Value of HourOfDay by Weeekday (fi. Avg of Mondays 10:00-11:00, 11:00-12:00,...,Tue...)

So far I made an SQL query that provides me with a table containing the amount of customers handled for each hour of the day - given a arbitrary start and an end datetime value (from Grafana interface). The result might be over many weeks. My goal is to implement an hourly heatmap by weekday with averaged values.
How do I aggregate those customer per hour to show the average value of that hours per weekday?
So let's say I got 24 values per day over 19 days. How do I aggregate so I get 24 values for each mon, tue, wed, thu, fri, sat, sun - each hour representing the average value for those days?
Also only use data of full weeks, so strip leading and trailing days, that are not part of a fully represented week (so same amount of individual weekdays representing an average value).
Here is a segment on how the return of my SQL query looks so far. (hour of each day, number of customers):
...
2021-12-13 11:00:00 | 0
2021-12-13 12:00:00 | 3
2021-12-13 13:00:00 | 4
2021-12-13 14:00:00 | 4
2021-12-13 15:00:00 | 7
2021-12-13 16:00:00 | 17
2021-12-13 17:00:00 | 12
2021-12-13 18:00:00 | 18
2021-12-13 19:00:00 | 15
2021-12-13 20:00:00 | 8
2021-12-13 21:00:00 | 10
2021-12-13 22:00:00 | 1
2021-12-13 23:00:00 | 0
2021-12-14 00:00:00 | 0
2021-12-14 01:00:00 | 0
2021-12-14 02:00:00 | 0
2021-12-14 03:00:00 | 0
2021-12-14 04:00:00 | 0
2021-12-14 05:00:00 | 0
2021-12-14 06:00:00 | 0
2021-12-14 07:00:00 | 0
2021-12-14 08:00:00 | 0
2021-12-14 09:00:00 | 0
2021-12-14 10:00:00 | 12
2021-12-14 11:00:00 | 12
2021-12-14 12:00:00 | 19
2021-12-14 13:00:00 | 11
2021-12-14 14:00:00 | 11
2021-12-14 15:00:00 | 12
2021-12-14 16:00:00 | 9
2021-12-14 17:00:00 | 2
...
So (schematically, example data) startDate 2021-12-10 11:00 to endDate 2021-12-31 17:00
-------------------------------
...
Mon 2021-12-13 12:00 | 3
Mon 2021-12-13 13:00 | 4
Mon 2021-12-13 14:00 | 4
...
Mon 2021-12-20 12:00 | 1
Mon 2021-12-20 13:00 | 6
Mon 2021-12-20 13:00 | 2
...
Mon 2021-12-27 12:00 | 2
Mon 2021-12-27 13:00 | 2
Mon 2021-12-27 13:00 | 3
...
-------------------------------
into this:
strip leading fri 10., sat 11., sun 12.
strip trailing tue 28., wen 29., thu 30., fri 31.
average hours per weekday
-------------------------------
...
Mon 12:00 | 2
Mon 13:00 | 4
Mon 14:00 | 3
...
Tue 12:00 | x
Tue 13:00 | y
Tue 13:00 | z
...
-------------------------------
My approach so far:
WITH CustomersPerHour as (
SELECT dateadd(hour, datediff(hour, 0, Systemdatum),0) as DayHour, Count(*) as C
FROM CustomerList
WHERE CustomerID > 0
AND Datum BETWEEN '2021-12-010T11:00:00Z' AND '2021-12-31T17:00:00Z'
AND EntryID IN (62,65)
AND CustomerID IN (SELECT * FROM udf_getActiveUsers())
GROUP BY dateadd(hour, datediff(hour, 0, Systemdatum), 0)
)
-- add null values on missing data/insert missing hours
SELECT DATEDIFF(second, '1970-01-01', dt.Date) AS time, C as Customers
FROM dbo.udf_generateHoursTable('2021-12-03T18:14:56Z', '2022-03-13T18:14:56Z') as dt
LEFT JOIN CustomersPerHour cPh ON dt.Date = cPh.DayHour
ORDER BY
time ASC
Hi simpliest solution is just do what you have written in example. Create custom base for aggregation.
So first step is to prepare your data in aggregated table with Date & Hour precision & customer count.
Then create base.
This is example of basic idea:
-- EXAMPLE
SELECT
DATENAME(WEEKDAY, GETDATE()) + ' ' + CAST(DATEPART(HOUR, GETDATE()) + ':00' AS varchar(8))
-- OUTPUT: Sunday 21:00
You can concatenate data and then use it in GROUP BY clause.
Adjust this query for your use case:
SELECT
DATENAME(WEEKDAY, <DATETIME_COL>) + ' ' + CAST(DATEPART(HOUR, <DATETIME_COL>) AS varchar(8)) + ':00' as base
,SUM(...) as sum_of_whatever
,AVG(...) as avg_of_whatever
FROM <YOUR_AGG_TABLE>
GROUP BY DATENAME(WEEKDAY, <DATETIME_COL>) + ' ' + CAST(DATEPART(HOUR, <DATETIME_COL>) AS varchar(8)) + ':00'
This create base exactly as you wanted.
You can use this logic to create other desired agg. bases.

SQL insert values from previous date if specific date information is missing

I have got the following table.
date2 Group number
2020-28-05 00:00:00 A 55
2020-28-05 00:00:00 B 1.09
2020-28-05 00:00:00 C 1.8
2020-29-05 00:00:00 A 68
2020-29-05 00:00:00 B 1.9
2020-29-05 00:00:00 C 1.19
2020-01-06 00:00:00 A 10
2020-01-06 00:00:00 B 15
2020-01-06 00:00:00 C 0.88
2020-02-06 00:00:00 A 22
2020-02-06 00:00:00 B 15
2020-02-06 00:00:00 C 13
2020-03-06 00:00:00 A 66
2020-03-06 00:00:00 B 88
2020-03-06 00:00:00 C 99
As you can see between dates 2020-30-05 and 2020-31-05 are missing in this table. So it is necessary to fill these dates with 2020-29-05 information grouped by GROUP. As a result the final output should be like that:
date2 Group number
2020-28-05 00:00:00 A 55
2020-28-05 00:00:00 B 1.09
2020-28-05 00:00:00 C 1.8
2020-29-05 00:00:00 A 68
2020-29-05 00:00:00 B 1.9
2020-29-05 00:00:00 C 1.19
2020-30-05 00:00:00 A 68
2020-30-05 00:00:00 B 1.9
2020-30-05 00:00:00 C 1.19
2020-31-05 00:00:00 A 68
2020-31-05 00:00:00 B 1.9
2020-31-05 00:00:00 C 1.19
2020-01-06 00:00:00 A 10
2020-01-06 00:00:00 B 15
2020-01-06 00:00:00 C 0.88
2020-02-06 00:00:00 A 22
2020-02-06 00:00:00 B 15
2020-02-06 00:00:00 C 13
2020-03-06 00:00:00 A 66
2020-03-06 00:00:00 B 88
2020-03-06 00:00:00 C 99
I tried to do in the following way:
create a temporary table (table B) with only dates for period 2020-28-05 till 2020-03-06 and then use left merge, thus making these new dates as null (in order to then insert a CASE when null, so fill in last_value). However, it does not work, because when merging I got nulls only for one date (but should be 3 times one date(because of groups). This is only part of the larger dataset, can you help how can I get the necessary output?
PS I use Vertica
It's Vertica. And Vertica has the TIMESERIES clause, which seems to exactly match with what you need:
Out of a time series - like you have one - with irregular intervals between the rows, or with longer gaps in an otherwise regular time series, it creates a regular time series, with the same interval between each row pair as you specify in the AS sub-clause of the TIMESERIES clause itself. TS_FIRST_VALUE() and TS_LAST_VALUE() are functions that rely on that clause and return the right value deduced from the input rows at the generated time stamp. This right value can be obtained 'const', that is from the row in the original row set closest to the generated time stamp, or 'linear', that is, interpolated from the original row just before and the original row just after the generated timestamp. For your needs, you would use the constant value. See here:
WITH
-- your input ....
input(tmstmp,grp,nbr) AS (
SELECT TIMESTAMP '2020-05-28 00:00:00','A',55
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','B',1.09
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','C',1.8
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','A',68
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','B',1.9
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','C',1.19
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','A',10
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','C',0.88
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','A',22
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','C',13
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','A',66
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','B',88
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','C',99
)
-- real query here ...
SELECT
ts AS tmstmp
, grp
, TS_FIRST_VALUE(nbr,'const') AS nbr
FROM input
TIMESERIES ts AS '1 DAY' OVER(PARTITION BY grp ORDER BY tmstmp)
ORDER BY 1,2
;
-- out tmstmp | grp | nbr
-- out ---------------------+-----+-------
-- out 2020-05-28 00:00:00 | A | 55.00
-- out 2020-05-28 00:00:00 | B | 1.09
-- out 2020-05-28 00:00:00 | C | 1.80
-- out 2020-05-29 00:00:00 | A | 68.00
-- out 2020-05-29 00:00:00 | B | 1.90
-- out 2020-05-29 00:00:00 | C | 1.19
-- out 2020-05-30 00:00:00 | A | 68.00
-- out 2020-05-30 00:00:00 | B | 1.90
-- out 2020-05-30 00:00:00 | C | 1.19
-- out 2020-05-31 00:00:00 | A | 68.00
-- out 2020-05-31 00:00:00 | B | 1.90
-- out 2020-05-31 00:00:00 | C | 1.19
-- out 2020-06-01 00:00:00 | A | 10.00
-- out 2020-06-01 00:00:00 | B | 15.00
-- out 2020-06-01 00:00:00 | C | 0.88
-- out 2020-06-02 00:00:00 | A | 22.00
-- out 2020-06-02 00:00:00 | B | 15.00
-- out 2020-06-02 00:00:00 | C | 13.00
-- out 2020-06-03 00:00:00 | A | 66.00
-- out 2020-06-03 00:00:00 | B | 88.00

Oracle SQL List Intervals

I need to create new interval rows based on a start datetime column and an end datetime column.
My statement looks like this currently
select id,
startdatetime,
enddatetime
from calls
result looks like this
id startdatetime enddatetime
1 01/01/2020 00:00:00 01/01/2020 04:00:00
I would like a result like this
id startdatetime enddatetime Intervals
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 00:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 01:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 02:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 03:00:00
Thanking you in advance
p.s. I'm new to SQL
You can use a recursive sub-query factoring clause to loop and incrementally add an hour:
WITH times ( id, startdatetime, enddatetime, intervals ) AS (
SELECT id,
startdatetime,
enddatetime,
startdatetime
FROM calls c
UNION ALL
SELECT id,
startdatetime,
enddatetime,
intervals + INTERVAL '1' HOUR
FROM times
WHERE intervals + INTERVAL '1' HOUR <= enddatetime
)
SELECT *
FROM times;
outputs:
ID | STARTDATETIME | ENDDATETIME | INTERVALS
-: | :------------------ | :------------------ | :------------------
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 00:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 01:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 02:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 03:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 04:00:00
db<>fiddle here
You can use the hierarchy query as following:
SQL> WITH CALLS (ID, STARTDATETIME, ENDDATETIME)
2 AS ( SELECT 1,
3 TO_DATE('01/01/2020 00:00:00', 'dd/mm/rrrr hh24:mi:ss'),
4 TO_DATE('01/01/2020 04:00:00', 'dd/mm/rrrr hh24:mi:ss')
5 FROM DUAL)
6 -- Your query starts from here
7 SELECT
8 ID,
9 STARTDATETIME,
10 ENDDATETIME,
11 STARTDATETIME + ( COLUMN_VALUE / 24 ) AS INTERVALS
12 FROM
13 CALLS C
14 CROSS JOIN TABLE ( CAST(MULTISET(
15 SELECT LEVEL - 1
16 FROM DUAL
17 CONNECT BY LEVEL <= TRUNC(24 *(ENDDATETIME - STARTDATETIME))
18 ) AS SYS.ODCINUMBERLIST) )
19 ORDER BY INTERVALS;
ID STARTDATETIME ENDDATETIME INTERVALS
---------- ------------------- ------------------- -------------------
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 00:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 01:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 02:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 03:00:00
SQL>
Cheers!!

How to generate series for date range with minutes interval in oracle?

In Postgres below query is working using generate_series function
SELECT dates
FROM generate_series(CAST('2019-03-01' as TIMESTAMP), CAST('2019-04-01' as TIMESTAMP), interval '30 mins') AS dates
Below query is also working in Oracle but only for date interval
select to_date('2019-03-01','YYYY-MM-DD') + rownum -1 as dates
from all_objects
where rownum <= to_date('2019-03-06','YYYY-MM-DD')-to_date('2019-03-01','YYYY-MM-DD')+1
SELECT dates
FROM generate_series(CAST('2019-03-01' as TIMESTAMP), CAST('2019-04-01' as TIMESTAMP), interval '30 mins') AS dates
I want same result in Oracle for below query
SELECT dates
FROM generate_series(CAST('2019-03-01' as TIMESTAMP), CAST('2019-04-01' as TIMESTAMP), interval '30 mins') AS dates
Use a hierarchical query:
SELECT DATE '2019-03-01' + ( LEVEL - 1 ) * INTERVAL '30' MINUTE AS dates
FROM DUAL
CONNECT BY DATE '2019-03-01' + ( LEVEL - 1 ) * INTERVAL '30' MINUTE <= DATE '2019-04-01';
Output:
| DATES |
| :------------------ |
| 2019-03-01 00:00:00 |
| 2019-03-01 00:30:00 |
| 2019-03-01 01:00:00 |
| 2019-03-01 01:30:00 |
| 2019-03-01 02:00:00 |
| 2019-03-01 02:30:00 |
| 2019-03-01 03:00:00 |
| 2019-03-01 03:30:00 |
| 2019-03-01 04:00:00 |
| 2019-03-01 04:30:00 |
| 2019-03-01 05:00:00 |
| 2019-03-01 05:30:00 |
...
| 2019-03-31 19:30:00 |
| 2019-03-31 20:00:00 |
| 2019-03-31 20:30:00 |
| 2019-03-31 21:00:00 |
| 2019-03-31 21:30:00 |
| 2019-03-31 22:00:00 |
| 2019-03-31 22:30:00 |
| 2019-03-31 23:00:00 |
| 2019-03-31 23:30:00 |
| 2019-04-01 00:00:00 |
db<>fiddle here

SQL query hourly for each day

I have a question that seems to be quite complex. I needed to know what happens in a session that day, at a certain time.
Briefly I have a table that shows me all sessions of a given area. These sessions have a start date and a start time and an end time.
You can see in this table:
idArea | idSession | startDate | startTime | endTime
1 | 1 | 2013-01-01 | 1900-01-01 09:00:00 | 1900-01-01 12:00:00
1 | 2 | 2013-01-01 | 1900-01-01 14:00:00 | 1900-01-01 15:00:00
1 | 3 | 2013-01-04 | 1900-01-01 09:00:00 | 1900-01-01 13:00:00
1 | 4 | 2013-01-07 | 1900-01-01 10:00:00 | 1900-01-01 12:00:00
1 | 5 | 2013-01-07 | 1900-01-01 13:00:00 | 1900-01-01 18:00:00
1 | 6 | 2013-01-08 | 1900-01-01 10:00:00 | 1900-01-01 12:00:00
Then I also have a table that shows me all hours interspersed, ie every half hour (I created this table on purpose for this requirement, if someone has a better idea, I can say that I will try to adapt).
idHour | Hour
1 | 1900-01-01 00:00:00
2 | 1900-01-01 00:30:00
3 | 1900-01-01 01:00:00
............................
4 | 1900-01-01 09:00:00
5 | 1900-01-01 09:30:00
6 | 1900-01-01 10:00:00
7 | 1900-01-01 10:30:00
............................
In the end that's what I want to present was this:
startDate | startTime | SessionID
2013-01-01 | 1900-01-01 09:00:00 | 1
2013-01-01 | 1900-01-01 09:30:00 | 1
2013-01-01 | 1900-01-01 10:00:00 | 1
2013-01-01 | 1900-01-01 10:30:00 | 1
2013-01-01 | 1900-01-01 11:00:00 | 1
2013-01-01 | 1900-01-01 11:30:00 | 1
2013-01-01 | 1900-01-01 11:30:00 | 1
2013-01-01 | 1900-01-01 14:00:00 | 1
2013-01-01 | 1900-01-01 14:30:00 | 1
2013-01-01 | 1900-01-01 15:00:00 | 1
This table is only for idSession=1 what I wanted was for all sessions. If there are no sessions for one day can return NULL.
The hard this query or procedure, is that they have to show me all the days of the month when there are sessions for that area.
For this, I already used this query:
;WITH t1 AS
(
SELECT
startDate,
DATEADD(MONTH, DATEDIFF(MONTH, '1900-01-01', startDate), '1900-01-01') firstInMonth,
DATEADD(DAY, -1, DATEADD(MONTH, DATEDIFF(MONTH, '1900-01-01', startDate) + 1, '1900-01-01')) lastInMonth,
COUNT(*) cnt
FROM
#SessionsPerArea
WHERE
idArea = 1
GROUP BY
startDate
), calendar AS
(
SELECT DISTINCT
DATEADD(DAY, c.number, t1.firstInMonth) d
FROM
t1
JOIN
master..spt_values c ON type = 'P'
AND DATEADD(DAY, c.number, t1.firstInMonth) BETWEEN t1.firstInMonth AND t1.lastInMonth
)
SELECT
d date,
cnt Session
FROM
calendar c
LEFT JOIN
t1 ON t1.startDate = c.d
It is quite complex, if anyone has an easy way to do this was excellent.
If I understand correctly, this is simply a join between the calendar table and #SessionPerArea,w ith the right conditions:
select spa.StartDate, c.hour as StartTime, spa.idSession as SessionId
from calendar c join
#SessionsPerArea spa
on c.hour between spa.startTime and spa.EndTime
The join is matching all times between the start and end times in the data, and then returning the values.
I think maybe you simply need an outer join between calendar and #SessionsPerArea...so all the days in the calendar table are returned regardless of a match to the #SessionsPerArea table?