Group timestamped record by 5, 10, 15 minutes block - sql

I have minute to minute financial records stored in similar format in my table,
dt | open | high | low | close | vol
---------------------+----------+----------+----------+----------+-------
2018-05-04 15:30:00 | 171.0000 | 171.3000 | 170.9000 | 171.0000 | 42817
2018-05-04 15:29:00 | 170.8000 | 171.0000 | 170.8000 | 170.9500 | 32801
2018-05-04 15:28:00 | 170.8500 | 171.0000 | 170.8000 | 170.8000 | 22991
2018-05-04 15:27:00 | 170.8500 | 170.8500 | 170.7500 | 170.8000 | 40283
2018-05-04 15:26:00 | 170.9500 | 171.0000 | 170.8000 | 170.8500 | 46636
and so on.
I want to group them into blocks of 5 minutes, 10 minutes, 60 minutes just like candlesticks. Using date_trunc('hour', dt) is not possible as I want to group them as block of last 60 minutes, last 15 minutes etc.
I am using PostgreSQL.

You should use a GROUP BY with :
floor(extract('epoch' from dt) / 300)
to have your data grouped in 5 minutes intervals. 300 is the number of seconds in 5 minutes. Thus if you want 10 minutes, you'd divide by 600. If you want 1 hour, by 3600.
If you want your interval to begin at 00 05 10, use floor(). If you want them to finish at 00, 05, 10, use ceil()
In the SELECT clause, you should re-transform the Unix epoch used in the GROUP BY into a timestamp using
to_timestamp(floor((extract('epoch' from dt) / 300)) * 300) as ts
Its not clear if you want all the "block" results in the same query, I assumed yes if you want a candlestick graph. I have also logically deduced the right aggregate function (MIN, MAX, AVG, SUM) for each column, following their names . You might have to adapt this.
Here we go :
SELECT '5 minutes' as block,
to_timestamp(floor((extract('epoch' from dt) / 300)) * 300) as ts,
round(AVG(open),4) as avg_open,
round(MAX(high),4) as max_high,
round(MIN(low),4) as min_low,
round(AVG(close),4) as avg_close,
SUM(vol) as sum_vol
FROM mytable
GROUP BY floor(extract('epoch' from dt) / 300)
UNION ALL
SELECT '10 minutes' as block,
to_timestamp(floor((extract('epoch' from dt) / 600)) * 600) as ts,
round(AVG(open),4) as avg_open,
round(MAX(high),4) as max_high,
round(MIN(low),4) as min_low,
round(AVG(close),4) as avg_close,
SUM(vol) as sum_vol
FROM mytable
GROUP BY floor(extract('epoch' from dt) / 600)
UNION ALL
SELECT '1 hour' as block,
to_timestamp(floor((extract('epoch' from dt) / 3600)) * 3600) as ts,
round(AVG(open),4) as avg_open,
round(MAX(high),4) as max_high,
round(MIN(low),4) as min_low,
round(AVG(close),4) as avg_close,
SUM(vol) as sum_vol
FROM mytable
GROUP BY floor(extract('epoch' from dt) / 3600)
Results:
block ts avg_open max_high min_low avg_close sum_vol
5 minutes 04.05.2018 17:30:00 171 171,3 170,9 171 42817
5 minutes 04.05.2018 17:25:00 170,8625 171 170,75 170,85 142711
10 minutes 04.05.2018 17:20:00 170,8625 171 170,75 170,85 142711
10 minutes 04.05.2018 17:30:00 171 171,3 170,9 171 42817
1 hour 04.05.2018 17:00:00 170,89 171,3 170,75 170,88 185528
Test it on REXTESTER

You can use generate_series() to create any range you want
SQL DEMO:
SELECT dd as start_range, dd + '30 min'::interval as end_range
FROM generate_series
( '2018-05-05'::timestamp
, '2018-05-06'::timestamp
, '30 min'::interval) dd
;
Then check if your record fall on that range.

Related

Calculate the days to reach a certain date - PostgreSQL

I need to create a query to calculate the difference in days until a date reach another date. Something like the "how many days until my birthday".
Current_date | Reach_date
2000-01-01 | 2015-01-03
-- Should Return: 2
2015-03-01 | 2021-03-05
-- Should Return: 4
The most similar built-in function I found to solve this problem, was using "age()", but it returns me "year, month and days":
select age(current_date,reach_date) from sample_table;
age
-------------------------
3 years 10 mons 1 day
I also tried to use "extract()" trying to get the difference in days, but it just returns me the part of the age function of the days. At my last sample, instead of it returns me more than 1000 days, it returns me just 1.
Try if this works for you. It checks where it's a leap year to calculate the difference correctly, and then uses different logic to calculate the difference between the dates depending on whether the dates are in the same year or not.
with cte as
(
SELECT *,
CASE WHEN extract(year from CurrentDate)::INT % 4 = 0
and (extract(year from CurrentDate)::INT % 100 <> 0
or extract(year from CurrentDate)::INT % 400 = 0)
THEN TRUE
ELSE FALSE
END AS isLeapYear,
Extract(day from (Reach_date - CurrentDate)) AS diff_in_days
FRoM test
)
SELECT CurrentDate,
Reach_date,
CASE WHEN isLeapYear
THEN
CASE WHEN diff_in_days < 366
THEN diff_in_days
ELSE Extract(day from AGE(Reach_date, CurrentDate))
END
ELSE CASE WHEN diff_in_days < 365
THEN diff_in_days
ELSE Extract(day from AGE(Reach_date, CurrentDate))
END
END AS diff
FROM cte
Test here: SQL Fiddle
SELECT
d_date,
'2021-01-01'::date - '2020-01-01'::date AS diff_2021_minus_2020,
CASE WHEN (date_part('month', d_date)::integer) = 1
AND (date_part('day', d_date)::integer) = 1 THEN
(date_trunc('year', d_date) + interval '1 year')::date - date_trunc('year', d_date)::date
WHEN ((d_date - (date_trunc('year', d_date))::date)) <= 182 THEN
(d_date - (date_trunc('year', d_date))::date)
ELSE
365 - (d_date - (date_trunc('year', d_date))::date)
END AS till_to_birthday
FROM (
VALUES ('2021-12-01'::date),
('2021-06-01'::date),
('2020-01-01'::date),
('2021-01-01'::date),
('2021-09-01'::date),
('2021-11-01'::date),
('2020-06-01'::date)) s (d_date);
returns:
+------------+----------------------+------------------+
| d_date | diff_2021_minus_2020 | till_to_birthday |
+------------+----------------------+------------------+
| 2021-12-01 | 366 | 31 |
| 2021-06-01 | 366 | 151 |
| 2020-01-01 | 366 | 366 |
| 2021-01-01 | 366 | 365 |
| 2021-09-01 | 366 | 122 |
| 2021-11-01 | 366 | 61 |
| 2020-06-01 | 366 | 152 |
+------------+----------------------+------------------+
The behavior that you've got with using age() is because extract() only extract the amount of days but it won't convert months and years into days for you before extraction.
On a SQL Server you could use DATEDIFF() but in Postgre you have to compute it yourself by substracting dates, as shown in this answer.
There's also few examples with all the time units here.

How do I work out minutes that occurred during office hours (9-5) and out of office hours

I have a datetime field that is when an activity starts and an int field with active_time in minutes.
What I want to do is work out in minutes (and then hours /60) how much time the activity was spent during work hours (9-5) and time outside of those hours.
E.G.
Data columns
Datetime: '2022-02-28 16:54:00.000 +0000'.
Active_time in minutes: '20'
Desired output:
Activity time in work hours: '6'
Activity time out of work hours: '14'
Can anyone help?
Many thanks.
SELECT start_time, minutes, time_pre_work, work_time, post_work_time
FROM (
SELECT *
,timeadd('minute', minutes, start_time) as time_end
,date_trunc('day', start_time) as day
,timeadd('hour', 8, day) as workday_start
,timeadd('hour', 17, day) as workday_end
,timediff('minute', least(start_time, workday_start), workday_start) as time_pre_work
,timediff('minute', greatest(start_time, workday_start), least(workday_end, time_end)) as work_time
,timediff('minute', greatest(workday_end, workday_end), greatest(workday_end, time_end)) as post_work_time
FROM VALUES
('2022-02-28 16:54:00.000'::timestamp, 20)
t(start_time, minutes)
);
gives:
START_TIME
MINUTES
TIME_PRE_WORK
WORK_TIME
POST_WORK_TIME
2022-02-28
16:54:00.000 20
0
6
14
Within day clipping:
And not correctly bounding for multi-days, this data:
FROM VALUES
('2022-02-28 16:54:00.000'::timestamp, 20),
('2022-02-28 7:54:00.000'::timestamp, 20),
('2022-02-28 6:54:00.000'::timestamp, 1000)
t(start_time, minutes)
gives:
START_TIME
MINUTES
TIME_PRE_WORK
WORK_TIME
POST_WORK_TIME
2022-02-28
16:54:00.000
20
0
6
2022-02-28
07:54:00.000
20
6
14
2022-02-28
06:54:00.000
1,000
66
540
Across days with daily clipping:
WITH input_data as (
SELECT * FROM VALUES
('2022-02-28 16:54:00.000'::timestamp, 20),
('2022-02-28 7:54:00.000'::timestamp, 20),
('2022-02-28 6:54:00.000'::timestamp, 3000)
t(start_time, minutes)
), range as(
SELECT row_number() over(order by null)-1 as rn
FROM TABLE(generator(ROWCOUNT => 100))
), day_condition as (
SELECT *
,timeadd('minute', minutes, start_time) as time_end
,date_trunc('day', dateadd('day', r.rn, start_time)) as r_day_start
,dateadd('day', 1, r_day_start ) as r_day_end
,greatest(r_day_start, start_time) as clip_start
,least(r_day_end, time_end) as clip_end
-- insert logic for "which day is it and what hours it has here"
,timeadd('hour', 8, r_day_start) as workday_start
,timeadd('hour', 17, r_day_start) as workday_end
FROM input_data i
JOIN range r ON r.rn <= datediff(day, start_time, timeadd('minute', minutes, start_time))
)
SELECT
start_time
,minutes
,r_day_start
--,clip_start
--,clip_end
,timediff('minute', least(clip_start, workday_start), workday_start) as time_pre_work
,timediff('minute', greatest(clip_start, workday_start), least(workday_end, clip_end)) as work_time
,timediff('minute', greatest(workday_end, workday_end), greatest(workday_end, clip_end)) as post_work_time
FROM day_condition
ORDER BY 1,3;
START_TIME
MINUTES
R_DAY_START
TIME_PRE_WORK
WORK_TIME
POST_WORK_TIME
2022-02-28 06:54:00.000
3,000
2022-02-28 00:00:00.000
66
540
420
2022-02-28 06:54:00.000
3,000
2022-03-01 00:00:00.000
480
540
420
2022-02-28 06:54:00.000
3,000
2022-03-02 00:00:00.000
480
54
0
2022-02-28 07:54:00.000
20
2022-02-28 00:00:00.000
6
14
0
2022-02-28 16:54:00.000
20
2022-02-28 00:00:00.000
0
6
14
I couldn't think of a way to approach this using built in date functions, so this is maybe not the prettiest solution, but the math is there. This converts the start_timestamp/active_time into minutes relative to your business hours
I subtracted 540 to essentially set the time 9:00AM to "minute 0" to make the numbers easier to work with. This makes 5:00PM "minute 480"
Then it's just a matter of subtracting times within and outside of your business hours.
set startdatetime = '2022-03-23 7:31:00'::timestamp_ntz;
set active_minutes = 125;
set business_start = 0;
set business_end = 480;
select
-540 + (hour($startdatetime) * 60 + minute($startdatetime)) as start_minute,
start_minute + $active_minutes as end_minute,
-- End time (within business hours) - Start time (within business hours). Can't be less than 0
greatest(0, least(end_minute, $business_end) - greatest(start_minute, $business_start)) as minutes_during_business,
-- End - Start. With any "business minutes" ignored. first for pre-work minutes, then for post-work minutes
least(end_minute, $business_start) - least(start_minute, $business_start)
+ greatest(end_minute, $business_end) - greatest(start_minute, $business_end) as minutes_outside_business,
minutes_during_business / 60 as hours_during_business,
minutes_outside_business / 60 as hours_outside_business;
;
This does not work well if your active minutes spans into business hours of the following day. That would take some extra handling.
You could also add on seconds, and convert all of the hardcoded numbers to seconds if you do want that extra granularity.

Extract 30 minutes from timestamp and group it by 30 mins time interval -PGSQL

In PostgreSQL I am extracting hour from the timestamp using below query.
select count(*) as logged_users, EXTRACT(hour from login_time::timestamp) as Hour
from loginhistory
where login_time::date = '2021-04-21'
group by Hour order by Hour;
And the output is as follows
logged_users | hour
--------------+------
27 | 7
82 | 8
229 | 9
1620 | 10
1264 | 11
1990 | 12
1027 | 13
1273 | 14
1794 | 15
1733 | 16
878 | 17
126 | 18
21 | 19
5 | 20
3 | 21
1 | 22
I want the same output for same SQL for 30 mins. Please suggest
SELECT to_timestamp((extract(epoch FROM login_time::timestamp)::bigint / 1800) * 1800)::timestamp AS interval_30_min
, count(*) AS logged_users
FROM loginhistory
WHERE login_time::date = '2021-04-21' -- inefficient!
GROUP BY 1
ORDER BY 1;
Extracting the epoch gets the number of seconds since the epoch. Integer division truncates. Multiplying back effectively rounds down, achieving the same as date_trunc() for arbitrary time intervals.
1800 because 30 minutes contain 1800 seconds.
Detailed explanation:
Truncate timestamp to arbitrary intervals
The cast to timestamp makes me wonder about the actual data type of login_time? If it's timestamptz, the cast depends on your current time zone setting and sets you up for surprises if that setting changes. See:
How do I match an entire day to a datetime field?
Subtract hours from the now() function
Ignoring time zones altogether in Rails and PostgreSQL
Depending on the actual data type, and exact definition of your date boundaries, there is a more efficient way to phrase your WHERE clause.
You can change the column on which you're aggregating to use the minute too:
select
count(*) as logged_users,
CONCAT(EXTRACT(hour from login_time::timestamp), '-', CASE WHEN EXTRACT(minute from login_time::timestamp) < 30 THEN 0 ELSE 30 END) as HalfHour
from loginhistory
where login_time::date = '2021-04-21'
group by HalfHour
order by HalfHour;

How to determine number of days in a month in Presto?

I have data with date, userid, and amount. I want to calculate sum(amount) divided by total day for each month. the final will be presented in monthly basis.
The table I have is looks like this
date userid amount
2019-01-01 111 10
2019-01-15 112 20
2019-01-20 113 10
2019-02-01 114 30
2019-02-15 111 20
2019-03-01 115 40
2019-03-23 155 50
desired result is like this
date avg_qty_sol
Jan-19 1.29
Feb-19 1.79
Mar-19 2.90
avg_qty_sold is coming from sum(amount) / total day for respective month
e.g for jan 2019 sum amount is 40 and total days in jan is 31. so the avg_qty_sold is 40/31
Currently Im using case when for this solution. is there any better approach to this?
Since Presto 318, you this is as easy as:
SELECT day(last_day_of_month(some_date))
See https://trino.io/docs/current/functions/datetime.html#last_day_of_month
Before Presto 318, You can combine date_trunc with EXTRACT:
date_trunc('month', date_value)) gives beginning of the month, while date_add('month', 1, date_trunc('month', date_value)) gives beginning of the next month
subtracting date values returns an interval day to second
EXTRACT(DAY FROM interval) returns day-portion of the interval. You can also use day convenience function instead of EXTRACT(DAY FROM ...). The EXTRACT syntax is more verbose and more standard.
presto:default> SELECT
-> date_value,
-> EXTRACT(DAY FROM (
-> date_add('month', 1, date_trunc('month', date_value)) - date_trunc('month', date_value)))
-> FROM (VALUES DATE '2019-01-15', DATE '2019-02-01') t(date_value);
date_value | _col1
------------+-------
2019-01-15 | 31
2019-02-01 | 28
(2 rows)
Less natural, but a bit shorter alternative would be to get day number for the last day of given month with day(date_add('day', -1, date_add('month', 1, date_trunc('month', date_value)))):
presto:default> SELECT
-> date_value,
-> day(date_add('day', -1, date_add('month', 1, date_trunc('month', date_value))))
-> FROM (VALUES DATE '2019-01-15', DATE '2019-02-01') t(date_value);
date_value | _col1
------------+-------
2019-01-15 | 31
2019-02-01 | 28
(2 rows)

T-SQL: Check temperature over 24 hour period

I have a SQL Server database with two values I'm interested in:
dtime - datetime
temperature - varchar
The table is fed by an external process that takes a building's temperature every 30 minutes. I'm interested in triggering an alert if the temperature exceeds 80 degrees for 48 periods (24 hours).
I think this needs to be an external process that scans the table and sends an alert when this condition is met. I'm struggling with writing the SQL to do this.
EDIT
The data I'm pulling in comes in on a weekly basis. During this week I need to see if at any time in a 24-hour period the temperature has exceeded 80 degrees. The air conditioning could fail at any time and span two days or more, so I need to potentially check this across multiple days. The temperature is taken every half hour, so during the week I need to check if there are 48+ instances where the temperature exceeded 80 degrees.
Sample data:
10/1/2012 12:00:00 AM | 70 | {ok}
10/1/2012 12:30:00 AM | 70 | {ok}
10/1/2012 1:00:00 AM | 70 | {ok}
10/1/2012 1:30:00 AM | 75 | {ok}
10/1/2012 2:00:00 AM | 75 | {ok}
10/1/2012 2:30:00 AM | 80 | {ok}
You can use ALL:
IF 80 < ALL(
SELECT temperature
FROM (
SELECT temperature, dtime,
RN = ROW_NUMBER() OVER (ORDER BY dtime DESC)
FROM dbo.Temps
) X
WHERE RN <= 48
)
SELECT 'ALERT, the last 48 measurements exceeded 80 degrees!'
ELSE
SELECT 'everything is okay';
Fiddle: http://sqlfiddle.com/#!6/4e2c8/7/0
Edit: As Blam has mentioned this can be simplified by using TOP
IF 80 < ALL(
SELECT TOP 48 temperature
FROM dbo.Temps
ORDER BY dtime DESC
)
SELECT 'ALERT, the last 48 measurements exceeded 80 degrees!'
ELSE
SELECT 'everything is okay';
select 'alert'
where not exists (
select 1
from MyTable
where dtime > dateadd(day, -1, getdate())
and cast(temperature as int) <= 80
)