How to calculate range in 1 week using Postgres? - sql

tanggal | product
2021-01-01 bag 1
2021-01-05 bag 5
2021-01-08 bag 8
2021-01-11 bag 11
2021-01-12 bag 12
2021-01-13 bag 13
2021-01-14 bag 14
here I have a product tbl, in this table there are input dates and product names,
I want to calculate the product based on 1 week how the query to calculate the data with a range of 7 days?
and this my query
select tanggal, product from tbl_product
where tanggal > current_date + interval '7' day

You could solve this for arbitrary dates using a generated time series.
For example:
SELECT series::date
FROM generate_series(
(now() - interval '1 week')::date,
now()::date,
'1 day'::interval
) series;
Would result in:
2021-05-26
2021-05-27
2021-05-28
2021-05-29
2021-05-30
2021-05-31
2021-06-01
2021-06-02
which you can join with other tables as you see fit.
For further information on generate_series() and other set-returning functions, check out the documentation.

Related

Generate step time-series in SQL (PostgreSQL)

We are storing data corresponding to rates (ex: electricity price) in a SQL table, such as:
Date
Value
2022-08-25 01:00
12.3
2022-09-23 06:12
14.5
2022-10-18 05:34
9.8
The date interval between two rows is not regular. In this table, 12.3 is the current rate until it's replaced by the new value on September 23rd, when the rate becomes 14.5
From there, we want to generate an hourly time-series, with each value corresponding to the correct rate, such as:
Date
Value
2022-08-25 01:00
12.3
2022-08-25 02:00
12.3
2022-08-25 03:00
12.3
2022-08-25 04:00
12.3
2022-08-25 05:00
12.3
...
12.3
2022-09-23 06:12
14.5
2022-09-23 07:00
14.5
2022-09-23 08:00
14.5
...
14.5
2022-10-18 05:34
9.8
...
9.8
how you would generate such as time-series in PostgreSQL ?
So you need to do two things: generate the time series with hourly intervals and then check for each interval which value was active during that.
For Postgres I would also create a timestamp range that contains the start and end of the range in which the price is valid (excluding the upper bound). This can be used in a join condition against the generated time series
with time_series ("date") as (
select g.*
from (
select min("date") as start_date, max("date") as end_date
from the_table
) x
cross join generate_series(x.start_date, x.end_date, interval '1 hour') as g
), ranges as (
select tsrange("date", lead("date") over (order by "date"), '(]') as valid_during,
value
from the_table
)
select ts."date",
r.value
from time_series ts
join ranges r on r.valid_during #> ts."date"
If you don't really need a "dynamic time series", you can just use generate_series() with a hard-coded start and end which would simplify this a bit.
Online example
This is solution for Postgres. I think it's what you wanted, the intervals end with full hour and after generation ends the next hour is exact timestamp from the original table (see table). It was done through comparison of the generated date with original date truncated to the hours. To make sure that the last date appears in the result I made COALESCE on LAG window function to fill the NULL value with the last date. Hope it doesn't look too hacky.
hourly_interval
value
2022-08-25 01:00:00
12.3
2022-08-25 02:00:00
12.3
...
...
2022-09-23 06:00:00
12.3
2022-09-23 06:12:00
14.5
2022-09-23 07:00:00
14.5
...
...
2022-10-18 05:00:00
14.5
2022-10-18 05:34:00
9.8
The result has 1303 rows
WITH cte AS (
SELECT *,
date_trunc('hour',generate_series(date,
COALESCE((LAG(date,-1) OVER (ORDER BY date)),date),
'1 hour')) hourly_interval
FROM electricity
)
SELECT
CASE WHEN
hourly_interval = date_trunc('hour',date)
THEN
date
ELSE
hourly_interval
END AS hourly_interval,
value
FROM cte
Feel free to fiddle around

Rolling Sum Calculation Based on 2 Date Fields

Giving up after a few hours of failed attempts.
My data is in the following format - event_date can never be higher than create_date.
I'd need to calculate on a rolling n-day basis (let's say 3) the sum of units where the create_date and event_date were within the same 3-day window. The data is illustrative but each event_date can have over 500+ different create_dates associated with it and the number isn't constant. There is a possibility of event_dates missing.
So let's say for 2022-02-03, I only want to sum units where both the event_date and create_date values were between 2022-02-01 and 2022-02-03.
event_date
create_date
rowid
units
2022-02-01
2022-01-20
1
100
2022-02-01
2022-02-01
2
100
2022-02-02
2022-01-21
3
100
2022-02-02
2022-01-23
4
100
2022-02-02
2022-01-31
5
100
2022-02-02
2022-02-02
6
100
2022-02-03
2022-01-30
7
100
2022-02-03
2022-02-01
8
100
2022-02-03
2022-02-03
9
100
2022-02-05
2022-02-01
10
100
2022-02-05
2022-02-03
11
100
The output I'd need to get to (added in brackets the rows I'd need to include in the calculation for each date but my result would only need to include the numerical sum) . I tried calculating using either dates but neither of them returned the results I needed.
date
units
2022-02-01
100 (Row 2)
2022-02-02
300 (Row 2,5,6)
2022-02-03
300 (Row 2,6,8,9)
2022-02-04
200 (Row 6,9)
2022-02-05
200 (Row 9,11)
In Python I solved above with a definition that looped through filtering a dataframe for each date but I am struggling to do the same in SQL.
Thank you!
Consider below approach
with events_dates as (
select date from (
select min(event_date) min_date, max(event_date) max_date
from your_table
), unnest(generate_date_array(min_date, max_date)) date
)
select date, sum(units) as units, string_agg('' || rowid) rows_included
from events_dates
left join your_table
on create_date between date - 2 and date
and event_date between date - 2 and date
group by date
if applied to sample data in your question - output is

How to count users group by time interval

I have a table with user id and created_at of type timestamp, I want to count how many users have created their account in 3 hours interval for a given day. so far I have created this query but I'm not able to get the count for each three hours
with time_cte AS (
SELECT time_sample from
generate_series('2021-12-01'::date, '2021-12-01'::date + interval '1 day', interval '3 hour')
as time_sample
) SELECT time_sample, count(u.id) FROM time_cte
join users u ON u.created_at::date = '2021-12-01'::date
GROUP BY time_sample;
I am able to get series and count but they are total users count for that day
The output I got
time_sample count
2021-12-01 00:00:00.000000, 4
2021-12-01 03:00:00.000000, 4
2021-12-01 06:00:00.000000, 4
2021-12-01 09:00:00.000000, 4
2021-12-01 12:00:00.000000, 4
2021-12-01 15:00:00.000000, 4
2021-12-01 18:00:00.000000, 4
2021-12-01 21:00:00.000000, 4
2021-12-02 00:00:00.000000, 4
The output I expect is
time_sample count
2021-12-01 00:00:00.000000, 0
2021-12-01 03:00:00.000000, 0
2021-12-01 06:00:00.000000, 3
2021-12-01 09:00:00.000000, 1
2021-12-01 12:00:00.000000, 0
2021-12-01 15:00:00.000000, 0
2021-12-01 18:00:00.000000, 0
2021-12-01 21:00:00.000000, 0
2021-12-02 00:00:00.000000, 0
For PostgreSQL 14 you can use the built-in date_bin function.
select
date_bin(interval '3 hours', created_at, date_trunc('day', created_at)) as time_slot,
count(*) as cnt
from users
group by time_slot
order by time_slot;
For PostgreSQL versions before 14 you may use this implementation of date_bin.

BigQuery - A way to generate timestamps based on hour/minute/seconds?

Is there a way to generate sequential timestamps in BigQuery that is focused on hours, minutes, and seconds?
In BigQuery you can generate sequential dates by:
select *
FROM UNNEST(GENERATE_DATE_ARRAY('2016-10-18', '2016-10-19', INTERVAL 1 DAY)) as day
This will generate the dates from 2016-10-18 to 2016-10-19 in date intervals
Row day
1 2016-10-18
2 2016-10-19
But let's say I want intervals in 15 minutes or 5 minutes, is there a way to do that?
First, I would recommend "starring" the feature request for GENERATE_TIMESTAMP_ARRAY to express interest in having a function like this. Given GENERATE_ARRAY, though, the best option currently is to use a query of this form:
SELECT TIMESTAMP_ADD('2018-04-01', INTERVAL 15 * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, 13)) AS x;
If you want a minute-based GENERATE_TIMESTAMP_ARRAY equivalent, you can use a UDF like this:
CREATE TEMP FUNCTION GenerateMinuteTimestampArray(
t0 TIMESTAMP, t1 TIMESTAMP, minutes INT64) AS (
ARRAY(
SELECT TIMESTAMP_ADD(t0, INTERVAL minutes * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, TIMESTAMP_DIFF(t1, t0, MINUTE))) AS x
)
);
SELECT ts
FROM UNNEST(GenerateMinuteTimestampArray('2018-04-01', '2018-04-01 12:00:00', 15)) AS ts;
This returns a timestamp for each 15-minute interval between midnight and 12 PM on April 1.
Update: You can now use the GENERATE_TIMESTAMP_ARRAY function in BigQuery. If you want to generate timestamps at intervals of 15 minutes, for example, you can use:
SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-18', '2016-10-19', INTERVAL 15 MINUTE);
Epochs seems like the way to go.
But requires to convert date to epoch first.
select TIMESTAMP_MICROS(CAST(day * 1000000 as INT64))
FROM UNNEST(GENERATE_ARRAY(1522540800, 1525132799, 900)) as day
Row f0_
1 2018-04-01 00:00:00.000 UTC
2 2018-04-01 00:15:00.000 UTC
3 2018-04-01 00:30:00.000 UTC
4 2018-04-01 00:45:00.000 UTC
5 2018-04-01 01:00:00.000 UTC
6 2018-04-01 01:15:00.000 UTC
7 2018-04-01 01:30:00.000 UTC
8 2018-04-01 01:45:00.000 UTC
9 2018-04-01 02:00:00.000 UTC
10 2018-04-01 02:15:00.000 UTC
11 2018-04-01 02:30:00.000 UTC
12 2018-04-01 02:45:00.000 UTC
13 2018-04-01 03:00:00.000 UTC

How do I generate a series of hourly averages in MySQL?

I've got data in ten minutes intervals in my table:
2009-01-26 00:00:00 12
2009-01-26 00:10:00 1.1
2009-01-26 00:20:00 11
2009-01-26 00:30:00 0
2009-01-26 00:40:00 5
2009-01-26 00:50:00 3.4
2009-01-26 01:00:00 7
2009-01-26 01:10:00 7
2009-01-26 01:20:00 7.2
2009-01-26 01:30:00 3
2009-01-26 01:40:00 25
2009-01-26 01:50:00 4
2009-01-26 02:00:00 3
2009-01-26 02:10:00 4
etc.
Is it possible to formulate a single SQL-query for MySQL which will return a series of averages over each hour?
In this case it should return:
5.42
8.87
etc.
It's unclear whether you want the average to be aggregated over days or not.
If you want a different average for midnight on the 26th vs midnight on the 27th, then modify Mabwi's query thus:
SELECT AVG( value ) , thetime
FROM hourly_averages
GROUP BY DATE( thetime ), HOUR( thetime )
Note the additional DATE() in the GROUP BY clause. Without this, the query would average together all of the data from 00:00 to 00:59 without regard to the date on which it happened.
This should work:
SELECT AVG( value ) , thetime
FROM hourly_averages
GROUP BY HOUR( thetime )
Here's the result
AVG(value) thetime
5.4166666865349 2009-01-26 00:00:00
8.8666666348775 2009-01-26 01:00:00
3.5 2009-01-26 02:00:00
There is also another possibility considering the fact that dates have a string representation in the database:
You can use SUBSTRING(thetime, 1, [len]), extracting the common part of your group. For the example with hourly averages you have the SQL query
SELECT SUBSTRING(thetime, 1, 13) AS hours, AVG(value) FROM hourly_averages GROUP BY hours
By the len parameter you can specify the aggregated time interval considering the MySQL date format yyyy-MM-dd HH:mm:ss[.SS...]:
len = 4: group by years
len = 7: group by months
len = 10: group by days
len = 13: group by hours
len = 16: group by minutes
len = 19: group by seconds
We encountered a better performance of this method over using date and time function, especially when used in JOINs in MySQL 5.7. However in MySQL 8 at least for grouping both ways seem to take approximately the same time.