How to get average value for each hourly increment that is split into 5 minute intervals - sql

I have an AWS Redshift table that looks like this:
interval_date
interval_time
power
on_status
2022-05-01
00:00
2.65
Y
2022-05-01
00:05
3.92
Y
2022-05-01
00:10
2.05
Y
2022-05-01
00:15
1.85
Y
2022-05-01
00:20
5.92
Y
2022-05-01
00:25
7.52
Y
2022-05-01
00:30
9.84
Y
2022-05-01
00:35
6.84
N
2022-05-01
00:40
5.01
N
2022-05-01
00:45
4.70
N
2022-05-01
00:50
8.57
N
2022-05-01
00:55
1.94
N
2022-05-01
01:00
3.87
Y
The table continues with more timestamps going all the way until 11:55 PM for any given day up to the current day/time. I am trying to get the average value of power for each hourly interval (so the average for 12 AM should be the values from the previous day (4/30/2022) at 23:05 to current day (5/1/2022) at 00:00, 1 AM is 00:05 to 01:00, 2 AM is 01:05 to 02:00, etc) where the on_status equals Y.
I have a basic query that gets me the average for a whole day (for context, the interval_date would be parameterized).
SELECT AVG(power)
FROM table
WHERE on_status = 'Y'
AND interval_date = '2022-05-01';
I am unsure how to partition the interval_time column so that the values are averaged hourly. An idea of the final result I am looking for is:
interval_date
interval_time
power
on_status
2022-05-01
00:00
2.65
Y
2022-05-01
01:00
5.00
Y
2022-05-01
02:00
X
Y
2022-05-01
03:00
X
Y

You didn't specify the type for interval_time, so I'm assuming a string, you can parse it out with a case statement like this:
SELECT interval_date,
CASE WHEN SUBSTRING(interval_time,4,2)='00' THEN interval_time
WHEN SUBSTRING(interval_time,1,2)='23' THEN '00:00'
ELSE FORMAT(convert(int,SUBSTRING(interval_time,1,2))+1,'00')+':00'
END interval_time,
AVG(power)
FROM mytable
WHERE on_status = 'Y'
GROUP BY interval_date,
CASE WHEN SUBSTRING(interval_time,4,2)='00' THEN interval_time
WHEN SUBSTRING(interval_time,1,2)='23' THEN '00:00'
ELSE FORMAT(convert(int,SUBSTRING(interval_time,1,2))+1,'00')+':00'
END
Note that to get your target of 5.17, I had to comment out the on_status = 'Y' filter.
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=b604e4fe6696465aac75676e69b92a47

Related

CASE in WHERE Clause in Snowflake

I am trying to do a case statement within the where clause in snowflake but I’m not quite sure how should I go about doing it.
What I’m trying to do is, if my current month is Jan, then the where clause for date is between start of previous year and today. If not, the where clause for date would be between start of current year and today.
WHERE
CASE MONTH(CURRENT_DATE()) = 1 THEN DATE BETWEEN DATE_TRUNC(‘YEAR’, DATEADD(YEAR, -1, CURRENT_DATE())) AND CURRENT_DATE()
CASE MONTH(CURRENT_DATE()) != 1 THEN DATE BETWEEN DATE_TRUNC(‘YEAR’, CURRENT_DATE()) AND CURRENT_DATE()
END
Appreciate any help on this!
Use a CASE expression that returns -1 if the current month is January or 0 for any other month, so that you can get with DATEADD() a date of the previous or the current year to use in DATE_TRUNC():
WHERE DATE BETWEEN
DATE_TRUNC('YEAR', DATEADD(YEAR, CASE WHEN MONTH(CURRENT_DATE()) = 1 THEN -1 ELSE 0 END, CURRENT_DATE()))
AND
CURRENT_DATE()
I suspect that you don't even need to use CASE here:
WHERE
(MONTH(CURRENT_DATE()) = 1 AND
DATE BETWEEN DATE_TRUNC(‘YEAR’, DATEADD(YEAR, -1, CURRENT_DATE())) AND
CURRENT_DATE()) OR
(MONTH(CURRENT_DATE()) != 1 AND
DATE BETWEEN DATE_TRUNC(‘YEAR’, CURRENT_DATE()) AND CURRENT_DATE())
So the other answers are quite good, but... the answer can be even simpler
Making a little table to brake down what is happening.
select
row_number() over (order by null) - 1 as rn,
dateadd('day', rn * 5, date_trunc('year',current_date())) as pretend_current_date,
DATEADD(YEAR, -1, pretend_current_date) as pcd_sub1,
month(pretend_current_date) as pcd_month,
DATE_TRUNC(year, iff(pcd_month = 1, pcd_sub1, pretend_current_date)) as _from,
pretend_current_date as _to
from table(generator(ROWCOUNT => 30))
order by rn;
this shows:
RN
PRETEND_CURRENT_DATE
PCD_SUB1
PCD_MONTH
_FROM
_TO
0
2022-01-01
2021-01-01
1
2021-01-01
2022-01-01
1
2022-01-06
2021-01-06
1
2021-01-01
2022-01-06
2
2022-01-11
2021-01-11
1
2021-01-01
2022-01-11
3
2022-01-16
2021-01-16
1
2021-01-01
2022-01-16
4
2022-01-21
2021-01-21
1
2021-01-01
2022-01-21
5
2022-01-26
2021-01-26
1
2021-01-01
2022-01-26
6
2022-01-31
2021-01-31
1
2021-01-01
2022-01-31
7
2022-02-05
2021-02-05
2
2022-01-01
2022-02-05
8
2022-02-10
2021-02-10
2
2022-01-01
2022-02-10
9
2022-02-15
2021-02-15
2
2022-01-01
2022-02-15
10
2022-02-20
2021-02-20
2
2022-01-01
2022-02-20
11
2022-02-25
2021-02-25
2
2022-01-01
2022-02-25
12
2022-03-02
2021-03-02
3
2022-01-01
2022-03-02
13
2022-03-07
2021-03-07
3
2022-01-01
2022-03-07
14
2022-03-12
2021-03-12
3
2022-01-01
2022-03-12
15
2022-03-17
2021-03-17
3
2022-01-01
2022-03-17
16
2022-03-22
2021-03-22
3
2022-01-01
2022-03-22
17
2022-03-27
2021-03-27
3
2022-01-01
2022-03-27
18
2022-04-01
2021-04-01
4
2022-01-01
2022-04-01
19
2022-04-06
2021-04-06
4
2022-01-01
2022-04-06
20
2022-04-11
2021-04-11
4
2022-01-01
2022-04-11
21
2022-04-16
2021-04-16
4
2022-01-01
2022-04-16
22
2022-04-21
2021-04-21
4
2022-01-01
2022-04-21
23
2022-04-26
2021-04-26
4
2022-01-01
2022-04-26
24
2022-05-01
2021-05-01
5
2022-01-01
2022-05-01
25
2022-05-06
2021-05-06
5
2022-01-01
2022-05-06
26
2022-05-11
2021-05-11
5
2022-01-01
2022-05-11
27
2022-05-16
2021-05-16
5
2022-01-01
2022-05-16
28
2022-05-21
2021-05-21
5
2022-01-01
2022-05-21
29
2022-05-26
2021-05-26
5
2022-01-01
2022-05-26
Your logic is asking "is the current date in the month of January", at which point take the prior year, and then date truncate to the year, otherwise take the current date and truncate to the year. As the start of a BETWEEN test.
This is the same as getting the current date subtracting one month, and truncating this to year.
Thus there is no need for any IFF or CASE
WHERE date BETWEEN DATE_TRUNC(year, DATEADD(month,-1, CURRENT_DATE())) AND CURRENT_DATE()
and if you like to drop some paren's, CURRENT_DATE can be used if you leave it in upper case, thus it can even be smaller:
WHERE date BETWEEN DATE_TRUNC(year, DATEADD(month,-1, CURRENT_DATE)) AND CURRENT_DATE

Having trouble joining information from two tables using timestamps

I have two tables in BigQuery:
A - Has the exact start and end time of the processes
B - It has the cost per hour of several products consumed by the processes
I need to calculate an estimate of the cost of each process (table A) using the data in table B. I thought of doing this by summing the cost of all products (table B) included in the time period consumed by the process in table A.
So, here is some fake data for the two tables and the desired output:
Process metadata (Table A)
process_name
timestamp_init
timestamp_end
a
2021-04-01 11:15:44.888153 UTC
2021-04-01 12:25:44.888153 UTC
b
2021-04-01 13:50:17.033498 UTC
2021-04-01 14:50:17.033498 UTC
c
2008-04-02 20:19:36.983747 UTC
2008-04-02 20:58:20.983747 UTC
d
2010-04-02 22:06:10.348753 UTC
2010-04-02 23:08:28.348753 UTC
Platform costs (Table B)
product
usage_start_time
usage_end_time
cost
ax
2021-04-01 11:00:00 UTC
2021-04-01 12:00:00 UTC
10
b4
2021-04-01 11:00:00 UTC
2021-04-01 12:00:00 UTC
9
cf
2021-04-01 11:00:00 UTC
2021-04-01 12:00:00 UTC
25
jw
2021-04-01 14:00:00 UTC
2021-04-01 15:00:00 UTC
125
ki
2021-04-01 20:00:00 UTC
2021-04-01 21:00:00 UTC
180
fr
2021-04-01 22:00:00 UTC
2021-04-01 23:00:00 UTC
250
Desired Results
process_name
total_cost
a
44
b
125
c
180
d
250
I developed the following code:
SELECT a.process_name,
SUM(b.cost) as total_cost
FROM A a,
B b
WHERE b.usage_start_time >= timestamp_trunc(timestamp_add(a.timestamp_init, interval 30 minute), hour)
AND b.usage_end_time <= timestamp_trunc(timestamp_add(a.timestamp_end, interval 30 minute), hour)
GROUP BY a.process_name
Note that I'm rounding the timestamps from table A so it matches the format of table B.
But for some reason I don't know, it is not returning any results. What am I doing wrong?
I'm not sure where the 30 minutes is coming from. The logic for an overlap would be:
SELECT a.process_name,
SUM(b.cost) as total_cost
FROM A a JOIN
B b
ON b.usage_start_time < a.timestamp_end AND
b.ussage_end_time >= a.timestamp_init
GROUP BY a.process_name

BigQuery - A way to generate timestamps based on hour/minute/seconds?

Is there a way to generate sequential timestamps in BigQuery that is focused on hours, minutes, and seconds?
In BigQuery you can generate sequential dates by:
select *
FROM UNNEST(GENERATE_DATE_ARRAY('2016-10-18', '2016-10-19', INTERVAL 1 DAY)) as day
This will generate the dates from 2016-10-18 to 2016-10-19 in date intervals
Row day
1 2016-10-18
2 2016-10-19
But let's say I want intervals in 15 minutes or 5 minutes, is there a way to do that?
First, I would recommend "starring" the feature request for GENERATE_TIMESTAMP_ARRAY to express interest in having a function like this. Given GENERATE_ARRAY, though, the best option currently is to use a query of this form:
SELECT TIMESTAMP_ADD('2018-04-01', INTERVAL 15 * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, 13)) AS x;
If you want a minute-based GENERATE_TIMESTAMP_ARRAY equivalent, you can use a UDF like this:
CREATE TEMP FUNCTION GenerateMinuteTimestampArray(
t0 TIMESTAMP, t1 TIMESTAMP, minutes INT64) AS (
ARRAY(
SELECT TIMESTAMP_ADD(t0, INTERVAL minutes * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, TIMESTAMP_DIFF(t1, t0, MINUTE))) AS x
)
);
SELECT ts
FROM UNNEST(GenerateMinuteTimestampArray('2018-04-01', '2018-04-01 12:00:00', 15)) AS ts;
This returns a timestamp for each 15-minute interval between midnight and 12 PM on April 1.
Update: You can now use the GENERATE_TIMESTAMP_ARRAY function in BigQuery. If you want to generate timestamps at intervals of 15 minutes, for example, you can use:
SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-18', '2016-10-19', INTERVAL 15 MINUTE);
Epochs seems like the way to go.
But requires to convert date to epoch first.
select TIMESTAMP_MICROS(CAST(day * 1000000 as INT64))
FROM UNNEST(GENERATE_ARRAY(1522540800, 1525132799, 900)) as day
Row f0_
1 2018-04-01 00:00:00.000 UTC
2 2018-04-01 00:15:00.000 UTC
3 2018-04-01 00:30:00.000 UTC
4 2018-04-01 00:45:00.000 UTC
5 2018-04-01 01:00:00.000 UTC
6 2018-04-01 01:15:00.000 UTC
7 2018-04-01 01:30:00.000 UTC
8 2018-04-01 01:45:00.000 UTC
9 2018-04-01 02:00:00.000 UTC
10 2018-04-01 02:15:00.000 UTC
11 2018-04-01 02:30:00.000 UTC
12 2018-04-01 02:45:00.000 UTC
13 2018-04-01 03:00:00.000 UTC

Find duration on overlapping time segments in SQL

I am working on building a query report where i have multiple types of segment with priority ranking on one table and second table with all types of segment with date, time etc. (as shown below) The duration for lower ranked segment during overlap should not be considered.
Please help me as I am unable to figure out query and get the duration of segment excluding the overlap based on ranking
table_Rank
Rank Code
1 x
2 y
3 z
4 a
5 b
6 c
7 d
8 r
9 f
Table_Segments
Code Date Start Time End Time Duration
a 18-Jul-15 17:30 17:45 0:15
c 18-Jul-15 18:00 19:00 1:00
y 18-Jul-15 18:45 19:00 0:15
a 18-Jul-15 20:15 20:20 0:05
b 18-Jul-15 23:45 1:00 1:15
z 19-Jul-15 0:30 1:15 0:45
f 19-Jul-15 2:00 3:00 1:00
Table With Ranks = Table_Rank
Table With Data = Table_Segments
What i am trying to achieves is, in an overlap situation overlap span should be considered for code with higher rank.
e.g.
Code Date Start Time End Time Duration
b 18-Jul-15 23:45 1:00 1:15
z 19-Jul-15 0:30 1:15 0:45
The actual duration output for b should 45 minutes as it has a lower rank compared to z and z should be 45 minutes

how to find the date difference in hours between two records with nearest datetime value and it must be compared in same group

How to find the date difference in hours between two records with nearest datetime value and it must be compared in same group?
Sample Data as follows:
Select * from tblGroup
Group FinishedDatetime
1 03-01-2009 00:00
1 13-01-2009 22:00
1 08-01-2009 03:00
2 01-01-2009 10:00
2 13-01-2009 20:00
2 10:01-2009 10:00
3 27-10-2008 00:00
3 29-10-2008 00:00
Expected Output :
Group FinishedDatetime Hours
1 03-01-2009 00:00 123
1 13-01-2009 22:00 139
1 08-01-2009 03:00 117
2 01-01-2009 10:00 216
2 13-01-2009 20:00 82
2 10:01-2009 10:00 82
3 27-10-2008 00:00 48
3 29-10-2008 00:00 48
Try this:
Select t1.[Group], DATEDIFF(HOUR, z.FinishedDatetime, t1.FinishedDatetime)
FROM tblGroup t1
OUTER APPLY(SELECT TOP 1 *
FROM tblGroup t2
WHERE t2.[Group] = t1.[Group] AND t2.FinishedDatetime<t1.FinishedDatetime
ORDER BY FinishedDatetime DESC)z