GROUP by data by time range in postgresql - sql

I want to GROUP by data by time range. The example I have start_date and end_date, and I want
the separate range between start_date and end_date on 25 range and get sum value from 1 to 25.
Simple presentation of my table:
select * from t1
where time between start_date and end_date
table t1 have:
time 2019-10-01 value 50
time 2019-10-01 value 50
time 2019-10-02 value 50
time 2019-10-02 value 50
time 2019-10-02 value 50
time 2019-10-02 value 50
time 2019-10-03 value 50
time 2019-10-04 value 50
time 2019-10-05 value 50
time 2019-10-05 value 50
time 2019-10-05 value 50
start_date 2019-10-01
end_date 2019-10-25
generate_series function to separate on
2019-10-01
2019-10-02
2019-10-03
2019-10-04
2019-10-05
2019-10-06
2019-10-07
2019-10-07
2019-10-07
2019-10-08
2019-10-09
2019-10-10
2019-10-11
2019-10-12
2019-10-13
2019-10-14
2019-10-15
2019-10-16
2019-10-17
2019-10-18
2019-10-19
2019-10-20
2019-10-21
2019-10-22
2019-10-23
2019-10-24
2019-10-25
and sum by how this 25
for 2019-10-01 to have value 100
for 2019-10-02 to have value 400

I am going to recommend a lateral join:
select d.dt, t.total_value
from generate_series(date '2019-10-01', date '2019-10-25', interval '1' day
) d(dt) left join lateral
(select coalesce(sum(value), 0) as total_value
from t
where t.time >= d.dt and
t.time < d.dt + interval '1' day
) t
on true;
A lateral join can have better performance than overall aggregation, particularly with an index on (time, value).

I understand that you want to generate a list of days, and compute the sum of a column for each:
select d.dt, coalesce(sum(value), 0) total_value
from
generate_series(date'2019-10-01', date'2019-10-25', interval '1' day) as d(dt)
left join mytable t
on t.time >= d.dt
and t.time < d.dt + interval '1' day
group by d.dt
order by d.dt
On dates for which no record is available in your table, total_value will display 0.

Assuming start_date and end_date are variables, you might wanna try the following CTE. It will group by a sum over value by time. In case you want to replace the null values with a 0, try coalesce as pointed out by #GMB in the other answer.
WITH j AS (
SELECT generate_series(DATE '2019-10-01', DATE '2019-10-25', '1 day') AS day)
SELECT j.day, coalesce(sum(value), 0) FROM t1
RIGHT JOIN j ON j.day = time
GROUP BY j.day ORDER BY j.day;
day | coalesce
------------------------+----------
2019-10-01 00:00:00+02 | 100
2019-10-02 00:00:00+02 | 200
2019-10-03 00:00:00+02 | 50
2019-10-04 00:00:00+02 | 50
2019-10-05 00:00:00+02 | 150
2019-10-06 00:00:00+02 | 0
2019-10-07 00:00:00+02 | 0
2019-10-08 00:00:00+02 | 0
2019-10-09 00:00:00+02 | 0
2019-10-10 00:00:00+02 | 0
2019-10-11 00:00:00+02 | 0
2019-10-12 00:00:00+02 | 0
2019-10-13 00:00:00+02 | 0
2019-10-14 00:00:00+02 | 0
2019-10-15 00:00:00+02 | 0
2019-10-16 00:00:00+02 | 0
2019-10-17 00:00:00+02 | 0
2019-10-18 00:00:00+02 | 0
2019-10-19 00:00:00+02 | 0
2019-10-20 00:00:00+02 | 0
2019-10-21 00:00:00+02 | 0
2019-10-22 00:00:00+02 | 0
2019-10-23 00:00:00+02 | 0
2019-10-24 00:00:00+02 | 0
2019-10-25 00:00:00+02 | 0
(25 rows)
EDIT (see comments below):
Changing the series with a 12 hours interval between the generated elements.
WITH j AS (
SELECT generate_series(DATE '2019-10-01 01:30:00',
DATE '2019-10-03 12:30:00', '12 hours') AS day)
SELECT j.day, coalesce(sum(value),0) FROM t1
RIGHT JOIN j ON j.day = time
GROUP BY j.day ORDER BY j.day;
day | coalesce
------------------------+----------
2019-10-01 00:00:00+02 | 100
2019-10-01 12:00:00+02 | 0
2019-10-02 00:00:00+02 | 200
2019-10-02 12:00:00+02 | 0
2019-10-03 00:00:00+02 | 50
(5 rows)
You can change the parameters inside of the generate_series function as you wish, e.g. 30 minutes, 1 hour, etc.
The same can be done with TIMESTAMP, but the dates you'll join with your table need to be identical!
WITH j AS (
SELECT generate_series(TIMESTAMP '2019-10-01 00:00:00',
TIMESTAMP '2019-10-05 12:30:00', '8 hours') AS day)
SELECT j.day, coalesce(sum(value),0) FROM t1
RIGHT JOIN j ON j.day = time
GROUP BY j.day ORDER BY j.day;
day | coalesce
---------------------+----------
2019-10-01 00:00:00 | 100
2019-10-01 08:00:00 | 0
2019-10-01 16:00:00 | 0
2019-10-02 00:00:00 | 200
2019-10-02 08:00:00 | 0
2019-10-02 16:00:00 | 0
2019-10-03 00:00:00 | 50
2019-10-03 08:00:00 | 0
2019-10-03 16:00:00 | 0
2019-10-04 00:00:00 | 50
2019-10-04 08:00:00 | 0
2019-10-04 16:00:00 | 0
2019-10-05 00:00:00 | 150
2019-10-05 08:00:00 | 0
(14 rows)

Related

Oracle SQL: to count the records based on fixed time frame (say 15 or 30 minutes)

I have a table similar to
Start time | End Time | User |
09/02/2021 03:01:13 | 09/02/2021 03:45:15 | ABC |
09/02/2021 03:15:20 | 09/02/2021 05:03:20 | XYZ |
09/02/2021 06:03:12 | 09/02/2021 06:15:30 | DEF |
Expecting output:
StDt | EndDt | Count(1)
09/02/2021 00:00:00 | 09/02/2021 01:00:00 | 0
09/02/2021 01:00:00 | 09/02/2021 02:00:00 | 0
09/02/2021 02:00:00 | 09/02/2021 03:00:00 | 0
09/02/2021 03:00:00 | 09/02/2021 04:00:00 | 2
09/02/2021 04:00:00 | 09/02/2021 05:00:00 | 1
09/02/2021 05:00:00 | 09/02/2021 06:00:00 | 0
09/02/2021 06:00:00 | 09/02/2021 07:00:00 | 1
The interval in this example is hourly but i would like to keep it flexible for 10 mins/15 mins/30 mins.
I want this to be written in single sql.
All i could work out till now is how to generate the range.
select t1.StartDt, t1.EndDt from
(
select
(to_char(timestamp '2021-02-09 00:00:00' + numtodsinterval(rownum*60,'MINUTE') - numtodsinterval(60,'MINUTE'),'DD-MM-YYYY hh24:mi')) as StartDt,
(to_char(timestamp '2021-02-09 00:00:00' + numtodsinterval(rownum*60,'MINUTE'),'DD-MM-YYYY hh24:mi')) as EndDt
from dual connect by level <= 24
) t1;
I dont know how to link to the table mentioned above to get the data in the format i require.
You have such a nice startup, except keep the timestamp format for the time values within the subquery, and move TO_CHAR formatting to the main query at the result displaying phase along with using correlated subquery with distinctly count aggregation for the overlapping intervals, and use bind variables as the placeholder for the time portion values(60,30,15) such as
SQL> var min number
SQL> exec :min := 60
PL/SQL procedure successfully completed
min
---------
60
SQL> SELECT TO_CHAR(t.StartDt,'DD-MM-YYYY HH24:MI') AS StartDt,
2 TO_CHAR(t.EndDt,'DD-MM-YYYY HH24:MI') AS EndDt,
3 ( SELECT COUNT(DISTINCT "User")
FROM tab
WHERE t.EndDt >= Start_Time
AND t.StartDt <= End_Time ) AS Count
4 FROM
5 (
6 SELECT timestamp '2021-02-09 00:00:00' +
7 numtodsinterval(rownum * :min, 'MINUTE') -
8 numtodsinterval(:min, 'MINUTE') AS StartDt,
9 timestamp '2021-02-09 00:00:00' +
10 numtodsinterval(rownum * :min, 'MINUTE') AS EndDt
11 FROM dual
12 CONNECT BY level <= 24
13 ) t
14 ORDER BY StartDt;
STARTDT ENDDT COUNT
---------------- ---------------- ----------
09-02-2021 00:00 09-02-2021 01:00 0
09-02-2021 01:00 09-02-2021 02:00 0
09-02-2021 02:00 09-02-2021 03:00 0
09-02-2021 03:00 09-02-2021 04:00 2
09-02-2021 04:00 09-02-2021 05:00 1
09-02-2021 05:00 09-02-2021 06:00 1
09-02-2021 06:00 09-02-2021 07:00 1
09-02-2021 07:00 09-02-2021 08:00 0
.....
.....
Demo

SQL insert values from previous date if specific date information is missing

I have got the following table.
date2 Group number
2020-28-05 00:00:00 A 55
2020-28-05 00:00:00 B 1.09
2020-28-05 00:00:00 C 1.8
2020-29-05 00:00:00 A 68
2020-29-05 00:00:00 B 1.9
2020-29-05 00:00:00 C 1.19
2020-01-06 00:00:00 A 10
2020-01-06 00:00:00 B 15
2020-01-06 00:00:00 C 0.88
2020-02-06 00:00:00 A 22
2020-02-06 00:00:00 B 15
2020-02-06 00:00:00 C 13
2020-03-06 00:00:00 A 66
2020-03-06 00:00:00 B 88
2020-03-06 00:00:00 C 99
As you can see between dates 2020-30-05 and 2020-31-05 are missing in this table. So it is necessary to fill these dates with 2020-29-05 information grouped by GROUP. As a result the final output should be like that:
date2 Group number
2020-28-05 00:00:00 A 55
2020-28-05 00:00:00 B 1.09
2020-28-05 00:00:00 C 1.8
2020-29-05 00:00:00 A 68
2020-29-05 00:00:00 B 1.9
2020-29-05 00:00:00 C 1.19
2020-30-05 00:00:00 A 68
2020-30-05 00:00:00 B 1.9
2020-30-05 00:00:00 C 1.19
2020-31-05 00:00:00 A 68
2020-31-05 00:00:00 B 1.9
2020-31-05 00:00:00 C 1.19
2020-01-06 00:00:00 A 10
2020-01-06 00:00:00 B 15
2020-01-06 00:00:00 C 0.88
2020-02-06 00:00:00 A 22
2020-02-06 00:00:00 B 15
2020-02-06 00:00:00 C 13
2020-03-06 00:00:00 A 66
2020-03-06 00:00:00 B 88
2020-03-06 00:00:00 C 99
I tried to do in the following way:
create a temporary table (table B) with only dates for period 2020-28-05 till 2020-03-06 and then use left merge, thus making these new dates as null (in order to then insert a CASE when null, so fill in last_value). However, it does not work, because when merging I got nulls only for one date (but should be 3 times one date(because of groups). This is only part of the larger dataset, can you help how can I get the necessary output?
PS I use Vertica
It's Vertica. And Vertica has the TIMESERIES clause, which seems to exactly match with what you need:
Out of a time series - like you have one - with irregular intervals between the rows, or with longer gaps in an otherwise regular time series, it creates a regular time series, with the same interval between each row pair as you specify in the AS sub-clause of the TIMESERIES clause itself. TS_FIRST_VALUE() and TS_LAST_VALUE() are functions that rely on that clause and return the right value deduced from the input rows at the generated time stamp. This right value can be obtained 'const', that is from the row in the original row set closest to the generated time stamp, or 'linear', that is, interpolated from the original row just before and the original row just after the generated timestamp. For your needs, you would use the constant value. See here:
WITH
-- your input ....
input(tmstmp,grp,nbr) AS (
SELECT TIMESTAMP '2020-05-28 00:00:00','A',55
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','B',1.09
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','C',1.8
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','A',68
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','B',1.9
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','C',1.19
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','A',10
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','C',0.88
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','A',22
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','C',13
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','A',66
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','B',88
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','C',99
)
-- real query here ...
SELECT
ts AS tmstmp
, grp
, TS_FIRST_VALUE(nbr,'const') AS nbr
FROM input
TIMESERIES ts AS '1 DAY' OVER(PARTITION BY grp ORDER BY tmstmp)
ORDER BY 1,2
;
-- out tmstmp | grp | nbr
-- out ---------------------+-----+-------
-- out 2020-05-28 00:00:00 | A | 55.00
-- out 2020-05-28 00:00:00 | B | 1.09
-- out 2020-05-28 00:00:00 | C | 1.80
-- out 2020-05-29 00:00:00 | A | 68.00
-- out 2020-05-29 00:00:00 | B | 1.90
-- out 2020-05-29 00:00:00 | C | 1.19
-- out 2020-05-30 00:00:00 | A | 68.00
-- out 2020-05-30 00:00:00 | B | 1.90
-- out 2020-05-30 00:00:00 | C | 1.19
-- out 2020-05-31 00:00:00 | A | 68.00
-- out 2020-05-31 00:00:00 | B | 1.90
-- out 2020-05-31 00:00:00 | C | 1.19
-- out 2020-06-01 00:00:00 | A | 10.00
-- out 2020-06-01 00:00:00 | B | 15.00
-- out 2020-06-01 00:00:00 | C | 0.88
-- out 2020-06-02 00:00:00 | A | 22.00
-- out 2020-06-02 00:00:00 | B | 15.00
-- out 2020-06-02 00:00:00 | C | 13.00
-- out 2020-06-03 00:00:00 | A | 66.00
-- out 2020-06-03 00:00:00 | B | 88.00

How to compare current row with previous column next row in sql

Date from Date to
2018-12-11 2019-01-08
2019-01-08 2019-02-09
2019-02-10 2019-03-14
2019-03-17 2019-04-11
2019-04-15 2019-05-16
2019-05-16 2019-06-13
output will be like this
Date from Date to Days
2018-12-11 2019-01-08 0
2019-01-08 2019-02-09 1
2019-02-10 2019-03-14 3
2019-03-17 2019-04-11 4
2019-04-15 2019-05-16 0
2019-05-16 2019-06-13 -
To return the difference between two date values in days you could use the DATEDIFF() Function, something like:
SELECT DATEDIFF(DAY, DayFrom, DayTo) AS 'DaysBetween'
FROM DateTable
You want lead() and a date diff function:
select
date_from,
date_to,
datediff(day, date_to, lead(date_from) over(order by date_from)) days
from mytable
datediff() is a SQLServer function. There are equivalents in other RDBMS.
Side note: I would recommend againts using a string value (-) for records that do not have a next record, since other values are numeric (the datatypes in a column must be consistant). null is good enough for this (which the above query will produce).
Demo on DB Fiddle:
date_from | date_to | days
:------------------ | :------------------ | ---:
11/12/2018 00:00:00 | 08/01/2019 00:00:00 | 0
08/01/2019 00:00:00 | 09/02/2019 00:00:00 | 1
10/02/2019 00:00:00 | 14/03/2019 00:00:00 | 3
17/03/2019 00:00:00 | 11/04/2019 00:00:00 | 4
15/04/2019 00:00:00 | 16/05/2019 00:00:00 | 0
16/05/2019 00:00:00 | 13/06/2019 00:00:00 | null

How to generate series for date range with minutes interval in oracle?

In Postgres below query is working using generate_series function
SELECT dates
FROM generate_series(CAST('2019-03-01' as TIMESTAMP), CAST('2019-04-01' as TIMESTAMP), interval '30 mins') AS dates
Below query is also working in Oracle but only for date interval
select to_date('2019-03-01','YYYY-MM-DD') + rownum -1 as dates
from all_objects
where rownum <= to_date('2019-03-06','YYYY-MM-DD')-to_date('2019-03-01','YYYY-MM-DD')+1
SELECT dates
FROM generate_series(CAST('2019-03-01' as TIMESTAMP), CAST('2019-04-01' as TIMESTAMP), interval '30 mins') AS dates
I want same result in Oracle for below query
SELECT dates
FROM generate_series(CAST('2019-03-01' as TIMESTAMP), CAST('2019-04-01' as TIMESTAMP), interval '30 mins') AS dates
Use a hierarchical query:
SELECT DATE '2019-03-01' + ( LEVEL - 1 ) * INTERVAL '30' MINUTE AS dates
FROM DUAL
CONNECT BY DATE '2019-03-01' + ( LEVEL - 1 ) * INTERVAL '30' MINUTE <= DATE '2019-04-01';
Output:
| DATES |
| :------------------ |
| 2019-03-01 00:00:00 |
| 2019-03-01 00:30:00 |
| 2019-03-01 01:00:00 |
| 2019-03-01 01:30:00 |
| 2019-03-01 02:00:00 |
| 2019-03-01 02:30:00 |
| 2019-03-01 03:00:00 |
| 2019-03-01 03:30:00 |
| 2019-03-01 04:00:00 |
| 2019-03-01 04:30:00 |
| 2019-03-01 05:00:00 |
| 2019-03-01 05:30:00 |
...
| 2019-03-31 19:30:00 |
| 2019-03-31 20:00:00 |
| 2019-03-31 20:30:00 |
| 2019-03-31 21:00:00 |
| 2019-03-31 21:30:00 |
| 2019-03-31 22:00:00 |
| 2019-03-31 22:30:00 |
| 2019-03-31 23:00:00 |
| 2019-03-31 23:30:00 |
| 2019-04-01 00:00:00 |
db<>fiddle here

Splitting interval overlapping more days in PostgreSQL

I have a PostgreSQL table containing start timestamp and duration time.
timestamp | interval
------------------------------
2018-01-01 15:00:00 | 06:00:00
2018-01-02 23:00:00 | 04:00:00
2018-01-04 09:00:00 | 2 days 16 hours
What I would like is to have the interval splitted into every day like this:
timestamp | interval
------------------------------
2018-01-01 15:00:00 | 06:00:00
2018-01-02 23:00:00 | 01:00:00
2018-01-03 00:00:00 | 03:00:00
2018-01-04 09:00:00 | 15:00:00
2018-01-05 00:00:00 | 24:00:00
2018-01-06 00:00:00 | 24:00:00
2018-01-07 00:00:00 | 01:00:00
I am playing with generate_series(), width_bucket(), range functions, but I still can't find plausible solution. Is there any existing or working solution?
not sure about all edge cases, but this seems working:
t=# with c as (select *,min(t) over (), max(t+i) over (), tsrange(date_trunc('day',t),t+i) tr from t)
, mid as (
select distinct t,i,g,tr
, case when g < t then t else g end tt
from c
right outer join (select generate_series(date_trunc('day',min),date_trunc('day',max),'1 day') g from c) e on g <# tr order by 3,1
)
select
tt
, i
, case when tt+'1 day' > upper(tr) and t < g then upper(tr)::time::interval when upper(tr) - lower(tr) < '1 day' then i else g+'1 day' - tt end
from mid
order by tt;
tt | i | case
---------------------+-----------------+----------
2018-01-01 15:00:00 | 06:00:00 | 06:00:00
2018-01-02 23:00:00 | 04:00:00 | 01:00:00
2018-01-03 00:00:00 | 04:00:00 | 03:00:00
2018-01-04 09:00:00 | 2 days 16:00:00 | 15:00:00
2018-01-05 00:00:00 | 2 days 16:00:00 | 1 day
2018-01-06 00:00:00 | 2 days 16:00:00 | 1 day
2018-01-07 00:00:00 | 2 days 16:00:00 | 01:00:00
(7 rows)
also please mind that timestamp without time zone can fail you when comparing timestamps...