Not output as expected using Tumbling window 2 hours in Stream Analytics - azure-stream-analytics

Expected output of tumbling window 2 hours is 00:00 ,02:00, 04:00...
but output was 01:00, 03:00, 05:00...
Also for 6 hours window, expected output is 00:00, 06:00, 12:00, 18:00 but output was 03:00,09:00,15:00,21:00.
How to make it output as expected.
This is my query:
Select
GatewayId,
AVG(HumidityData) as DataValue
into Humidity120m
from input
group by TumblingWindow(hour,2),GatewayId
Select
GatewayId,
AVG(HumidityData) as DataValue
into Humidity360m
from input
group by TumblingWindow(hour,6),GatewayId

This is not supported. Tumbling window is partitioning the timeline by itself. More details, you can refer to this MSDN.

Related

Select unique IDs and divide result into X minute intervals based on given timespan

I'm trying to knock some dust off my good old SQL queries, but I'm afraid I need a push in the right direction into taking those dusty skills and transform them into something useful when it comes to BigQuery statements.
I'm currently working with a single table schema looking like this:
In the query I would like to be able to supply the following in my where clause:
The date of which I would like the results to stem from.
A time range - in the above result example this range would be from 20:00 to 21:00. If 1. and 2. in this list should be merged together that's also fine.
The eventId I would like to find records for.
Optionally to be able to determine the interval frequency - should it be divided into each ie. 5, 10 or 15 minute intervals.
Also I would like to count the unique userIds for each interval. If one user is present during the entire session he/she should be taken into the count in every interval.
So think of it as the following:
How many unique users did we have every 5 minutes at X event, between 20:00 and 21:00 at Y day?
How should my query look if I want a result looking (something) like the following pseudo result:
time_interval number_of_unique_userIds
1 2022-03-16 20:00:00 10
2 2022-03-16 20:05:00 12
3 2022-03-16 20:10:00 15
4 2022-03-16 20:15:00 20
5 2022-03-16 20:20:00 30
6 ... etc.
If the time of the query is before the provided end time in the timespan, it should fill out the rest of the interval rows with 0 unique userIds.
In the following result we've executed mentioned query earlier than the provided end date - let's say that it's executed at 20:49:
time_interval number_of_unique_userIds
X 2022-03-16 20:50:00 0
X 2022-03-16 20:55:00 0
X 2022-03-16 21:00:00 0
Here's what I have so far, but it gives me several of the same interval records with what looks like each userId:
SELECT
TIMESTAMP_SECONDS(5*60 * DIV(UNIX_SECONDS(creationTime), 5*60)) time_interval,
COUNT(DISTINCT(userId)) number_of_unique_userIds
FROM `bigquery.table`
WHERE eventId = 'xyz'
AND creationTime > '2022-03-16 20:00:00' AND creationTime < '2022-03-16 21:00:00'
GROUP BY time_interval
ORDER BY time_interval DESC
This gives me somewhat what I expect - but I think the number_of_unique_userIds seems too low, so I'm a little worried that I'm not getting unique userIds for each interval. What I'm thinking is, that userIds that were counted into the first 5 minute interval is not counted in the next. So I'm not sure this query is sufficient for my needs. Also it's not filling the blanks with 0 number_of_unique_userIds.
I hope you can help me out here.
Thanks!

How to get time difference inside window

I have table like following. I would like to get duration from start to finish in each type. I can get finish time by agg function max in time
type event attribute time
A start start 2019-04-21 23:58:33.0
A result process1 2019-04-22 23:58:33.0
A result process2 2019-04-23 23:58:33.0
A result process3 2019-04-24 23:58:33.0
B result process1 2019-04-26 23:58:33.0
B start start 2019-04-25 23:58:33.0
B result process2 2019-04-27 23:58:33.0
I created following queries and joined them.
select type,event,attribute,time
from table
where event in ('start')
select type,event,attribute,max(time) as time
from table
where event in ('result')
group by type,event,attribute
select tmp2.time - tmp1.time as duration
But I guess window function will be useful in this condition.to simplify my query, I'd like to refactor with window function.
Are there good way to achieve this ?
Thanks
If you consider the start time as the min value of time in the grouped by type then you don't need a window function, only agg functions :
SELECT type
, min(time) AS start
, max(time) AS finish
FROM table
GROUP BY type ;
If you consider the start time as the time associated to the start event, and the finish time as the max time associated to the result event in the group by type, then you need window functions :
SELECT min(time) FILTER (WHERE event = 'start') AS start
, max(time) FILTER (WHERE event = 'result') AS finish
FROM table
GROUP BY type
PS : as stated in the manual, any aggregate function can be used as a window function :
any built-in or user-defined ordinary aggregate (i.e., not ordered-set
or hypothetical-set aggregates) can be used as a window function

BQ: Select latest date from multiple columns

Good day, all. I wrote a question relating to this earlier, but now I have encountered another problem.
I have to calculate the timestamp difference between the install_time and contributer_time columns. HOWEVER, I have three contributor_time columns, and I need to select the latest time from those columns first then subtract it from install time.
Sample Data
users
install_time
contributor_time_1
contributor_time_2
contributor_time_3
1
8:00
7:45
7:50
7:55
2
10:00
9:15
9:45
9:30
3
11:00
10:30
null
null
For example, in the table above I would need to select contributor_time_3 and subtract it from install_time for user 1. For user 2, I would do the same, but with contributor_time_2.
Sample Results
users
install_time
time_diff_min
1
8:00
5
2
10:00
15
3
11:00
30
The problem I am facing is that 1) the contributor_time columns are in string format and 2) some of them have 'null' string values (which means that I cannot cast it into a timestamp.)
I created a query, but I am am facing an error stating that I cannot subtract a string from timestamp. So I added safe_cast, however the time_diff_min results are only showing when I have all three contributor_time columns as a timestamp. For example, in the sample table above, only the first two rows will pull.
The query I have so far is below:
SELECT
users,
install_time,
TIMESTAMP_DIFF(install_time, greatest(contributor_time_1, contributor_time_2, contributor_time_3), MINUTE) as ctct_min
FROM
(SELECT
users,
install_time,
safe_cast(contributor_time_1 as timestamp) as contributor_time_1,
safe_cast(contributor_time_2 as timestamp) as contributor_time_2,
safe_cast(contributor_time_3 as timestamp) as contributor_time_3,
FROM
(SELECT
users,
install_time,
case when contributor_time_1 = 'null' then '0' else contributor_time_1 end as contributor_time_1,
....
FROM datasource
Any help to point me in the right direction is appreciated! Thank you in advance!
Consider below
select users, install_time,
time_diff(
parse_time('%H:%M',install_time),
greatest(
parse_time('%H:%M',contributor_time_1),
parse_time('%H:%M',contributor_time_2),
parse_time('%H:%M',contributor_time_3)
),
minute) as time_diff_min
from `project.dataset.table`
if applied to sample data in your question - output is
Above can be refactored slightly into below
create temp function latest_time(arr any type) as ((
select parse_time('%H:%M',val) time
from unnest(arr) val
order by time desc
limit 1
));
select users, install_time,
time_diff(
parse_time('%H:%M',install_time),
latest_time([contributor_time_1, contributor_time_2, contributor_time_3]),
minute) as time_diff_min
from `project.dataset.table`
less verbose and no redundant parsing - with same result - so just matter of preferences
You can use greatest():
select t.*,
time_diff(install_time, greatest(contributor_time_1, contributor_time_2, contributor_time_3), minute) as diff_min
from t;
Note: this assumes that the values are never NULL, which seems reasonable based on your sample data.

Azure stream analytics query using Tumbling window

In our application, multiple IoT devices publish data to IoT hub. They emits some reading in rooms (for ex: power usage).
Now we have a requirement to find out total energy consumed in an area in last hour and log it.
Suppose, there is a light bulb which was switched on 8:00 AM and take 60 watt power, and it was switched off at 8:20 for 10 min. At 8:30 it was switched on in dimmed manner with power usage 40 watt. So energy (Watt per hour) consumed between 8 and 9 AM should be:
60*20/60 (for 8:00 AM to 8:20 AM) + 0 (8:20 to 8:30) + 40*30/60 (8:30 to 9:00) = 40 watt per hour.
How can we write Stream Analytic query (using Tumbling window to achieve this).
You can use HoppingWindow to produce events every minute repeating latest signal from the device and then use TumblingWindow to get hourly aggregates.
-- First query produces event every minute with latest known value up to 1 hour back
WITH MinuteData AS
(
SELECT deviceId, TopOne() OVER (ORDER BY ts DESC) AS lastRecord
FROM input TIMESTAMP BY ts
GROUP BY deviceId, HoppinWindow(miute, 1, 60)
)
SELECT
deviceId,
SUM(lastRecord.wat)/60
FROM MinuteData
GROUP BY deviceId, TumblingWindow(hour, 1)

Time averaging non-continuous data with PostgreSQL9.2

I have multiple datasets of 1 second time resolution real data. This data will often have gaps in the time-series where the instrument dropped data, or when the instrument was turned off, resulting in a patchy (albeit still very useful) dataset. The resulting data might look like the following
Timestamp [timestamp] : datastream1 [double precision] : datastream2 [double precision] : etc
2011-01-01 00:00:01 153.256 1255.325
2011-01-01 00:00:02 152.954 1254.288
2011-01-01 00:00:03 151.738 1248.951
2011-01-01 00:00:04 150.015 1249.185
2011-01-01 00:10:08 179.132 1328.115
2011-01-01 00:10:09 178.051 1323.125
2011-01-01 00:10:10 180.870 1336.983
2011-01-04 09:19:02 152.198 1462.814
2011-01-04 09:19:03 158.014 1458.122
2011-01-04 09:19:04 156.070 1464.174
Please note: this data are generally continuous but will have random gaps which must be dealt with.
I need to write code to take the average and stdev of a given time interval, "timeInt", that is able to deal with these gaps. For example, if I wanted a 10 min average of data, my required output would be:
Timestamp_10min : avg_data1 : med_data1 : count_data1
where avg_data1 would be the average of all the data points within a given 10 minute period, and count_data1 would be the number of points used in the calculation of that average (i.e. 600 if there was no missing data, 300 if every second point is missing, etc etc).
This code needs to work with any desired input interval (i.e. x minutes, y days, z weeks, months, years, etc).
Currently I am only able output minute averages using the following code.
CREATE OR REPLACE VIEW "DATATABLE_MIN" AS
SELECT MIN("DATATABLE"."Timestamp") AS "Timestamp_min",
avg("DATATABLE"."datastream1") AS "datastream1_avg_min",
stddev("DATATABLE"."datastream1") AS "datastream1_stdev_min",
count("DATATABLE"."datastream1") AS "datastream1_avg_min"
FROM "DATATABLE"
GROUP BY to_char("DATATABLE"."Timestamp",'YYYY-MM-DD HH24:MI'::text);
Thanks in advance for the help!
To group by 10 minutes, you can do this by using the "epoch":
SELECT MIN(dt."Timestamp") AS "Timestamp_min",
avg(dt."datastream1") AS "datastream1_avg_min",
stddev(dt."datastream1") AS "datastream1_stdev_min",
count(dt."datastream1") AS "datastream1_avg_min"
FROM "DATATABLE" dt
GROUP BY trunc(extract(epoch from dt."TimeStamp") / (60*10));
This is the number of seconds since a fixed time in the past. If you divide it by 600, you get the number of 10 minute intervals -- what you need for the aggregation.