Azure Stream Analytics - Last Event compare with the stream of input data for specific interval - azure-iot-hub

Can anyone help me to get the following scenario with the sample query. I have added my sample code at the end but it never worked. I am new to stream analytics and your help will be greatly appreciated.
1) Get the last event of the incoming input Group by CARID (by timestamp)
2) Read the speed of the last event for each CARID
3) Check if the last event speed (of each CARID) with the input data stream received for last 12 seconds
4) If the Last event speed is less, then send to the output otherwise, don't process the last event (again, it should be by CARID)
I used TubmlingWindow function to get the event stream for the past seconds.
Here is the sample code i used:
WITH LastInWindow AS
(
SELECT
CARID,
MAX(cast(timestamp as datetime)) AS LastEventTime
FROM
IoTHubInput
GROUP BY
TumblingWindow(second, 12) , CARID
),
InputLog AS
(
SELECT
CARID,
max(Speed) speed
FROM
IoTHubLog
GROUP BY
TumblingWindow(second, 12),timestamp,CARID
)
SELECT
IoTHubLog.CARID,
IoTHubLog.Speed
INTO Outputlog
FROM
IoTHubLog
INNER JOIN LastInWindow
ON DATEDIFF(second, IoTHubLog, LastInWindow) BETWEEN 0 AND 12
AND cast( IoTHubLog.timestamp as datetime)=LastInWindow.LastEventTime
AND IoTHubLog.CARID = LastInWindow.CARID
INNER JOIN InputLog
ON DATEDIFF(second, IoTHubLog, InputLog) BETWEEN 0 AND 12
AND cast(IoTHubLog.timestamp as datetime)> cast( InputLog.timestamp as datetime)
AND IoTHubLog.CARID = InputLog.CARID
AND InputLog.speed > IoTHubLog.speed
GROUP BY
TumblingWindow(second, 1), IoTHubLog.CARID

Related

Create partitions based on column values in sql

I am very new to sql and query writing and after alot of trying, I am asking for help.
As shown in the picture, I want to create partition of data based on is_late = 1 and show its count (that is 2) but at the same time want to capture the value of last_status where is_late = 0 to be displayed in the single row.
The task is to calculate how many time the rider was late and time taken by him from first occurrence of estimated time to the last_status.
Desired output:
You can use following query
SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
last_status,
task_count,
CONVERT(VARCHAR(5), DATEADD(MINUTE, DATEDIFF(MINUTE, expected_time_to_arrive, last_status), 0), 114) AS time_delayed
FROM
(SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
SUM(CASE WHEN is_late = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY rider_id ORDER BY rider_id) AS task_count,
ROW_NUMBER() OVER(PARTITION BY rider_id ORDER BY rider_id) AS num,
MAX(last_status) OVER(PARTITION BY rider_id ORDER BY rider_id) AS last_status
FROM myTestTable) t
WHERE num = 1
db<>fiddle

Determining how often and how long a specific event occurs in SQL data

I am curious how would be the best method to find out how often and how long an event occured within a set of SQL data that is managed using Microsoft SQL Server Management Studio 17.
Below is a simplified data table to illustrate the type of thing I'd be interested in solving. Say data is collected by a sensor for every 100ms and I want to know how often and how long the power dropped to 0.
I have a couple ideas how to do this using CTEs and/or Window functions, however my understanding of these functions doesn't seem to translate in SQL Management Studio as my code keeps tripping errors at points that should theoretically be correct.
For example I thought I could Window functions partitioned by the position number, filtered by the points when power was 0 then subtract the LAST_VALUE from the FIRST_VALUE. However the environment doesn't recognize these arguments.
I thought also about a CTE that already filters out the points where power was zero, but I couldn't bring that to a remotely functional point.
CREATE TABLE SensorData
(
[TimeStamp] DATETIME ,
[Position] INT,
[POWER] INT
);
INSERT INTO SensorData ([TimeStamp], [Position], [Power])
VALUES (4, 1, 59), (101, 1, 60), (207, 1, 50), (321, 1, 58),
(428, 1, 55), (534, 1, 59), (646, 1, 51), (755, 1, 0),
(868, 1, 0), (975, 1, 0), (1081, 1, 0), (1193, 2, 45),
(1307, 2, 52), (1412, 2, 51), (1519, 2, 55), (1629, 2, 58),
(1735, 2, 0), (1851, 2, 0), (1960, 2, 0), (2066, 2, 54);
SELECT *
FROM SensorData;
How the output looks at the end isn't so important. What's important is I know the number of events where, in this case, the power went to zero and how long this event lasted (last TimeStamp within the event minus the first TimeStamp)
Any advice would be greatly appreciated!
Doing this in multiple CTEs to keep things nicely organized can be done as follows:
with sensorevents as (
select
[TimeStamp]
, position
, power
, lag(power,1) over (order by timestamp) as prevPower
from SensorData
)
, powerloss as (
select
*
, case when [prevPower] > 0 and power = 0 then 'power loss'
when [prevPower] = 0 and power > 0 then 'power on'
end as status
, case when [prevPower] = 0 then lag(timestamp,1) over (order by timestamp)
end as powerOffTimestamp
, case when [prevPower] > 0 and power = 0 then 0
when [prevPower] = 0 and power > 0 then timestamp - lag(timestamp,1) over (order by timestamp)
end as duration
from Sensorevents
where ([prevPower] > 0 and power = 0)
or
([prevPower] = 0 and power > 0)
)
select
*
from powerloss
where status = 'power on'
The first CTE defines a new column, prevPower which tells us if we are at an edge where power loss occurs or power restoration occurs. The next CTE uses these edges and window functions again to find the timestamp where the previous event (loss) occurs for a power restoration event, and calculates the duration from the timestamp difference.
The last select statement just filters on the power restoration events:
TimeStamp position power prevPower nextPower status powerOffTimestamp duration
09/04/1903 00:00:00 2 45 0 52 power on 26/01/1902 00:00:00 15/03/1901 00:00:00
29/08/1905 00:00:00 2 54 0 power on 02/10/1904 00:00:00 28/11/1900 00:00:00
I just saw that the other reply partitions by position. To add that to this solution, you need to modify all window functions by adding a partition by position clause:
, lag(power,1) over (order by timestamp partition by position) as prevPower
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=264deed484604cda3ace1fb60d674068
This is a gaps-and-islands problem. You need to assign a group to the "0" records. A handy identifier for the group is the number of non-zero values before the value. The rest is aggregation:
select position, min(timestamp), max(timestamp)
from (select sd.*,
sum(case when power <> 0 then 1 else 0 end) over (partition by position order by timestamp) as grp
from sensordata sd
) sd
where power = 0
group by position, grp;
Note that this assumes that you want the 0s per position.
Specifically because you are looking at power = 0, you can simplify the definition of the group: it is the sum of the power up to that point. This is constant for a group of adjacent rows with power = 0:
select position, min(timestamp), max(timestamp)
from (select sd.*,
sum(power) over (partition by position order by timestamp) as grp
from sensordata sd
) sd
where power = 0
group by position, grp;

How to get the time difference between two rows?

I have a table like this
Task Event Time
2 opened "2018-12-14 16:23:49.058707+01"
2 closed "2018-12-14 16:24:49.058707+01"
3 opened "2018-12-14 16:25:49.058707+01"
3 Interrupted "2018-12-14 16:26:49.058707+01"
3 closed "2018-12-14 16:27:49.058707+01"
I need to get the data from the table something like this
Task Difference
2 1
The data should be fetched only when there are only 2 events opened and closed.
If there are only 2 events then time difference between abs(closed - opened) should be taken.
I can not figure out figure out how to do it based on Event column
This can be done using conditional aggregation.
select task
,max(case when event = 'closed' then time end) - max(case when event = 'opened' then time end) as diff
--The aggregation can also be expressed using FILTER as shown below
--,max(time) FILTER(where event = 'closed') - max(time) FILTER (where event = 'opened')
from tbl
group by task
having count(distinct case when event in ('opened','closed') then event end) = 2
and count(distinct event) = 2
Simplified the very good answer of #VamsiPrabhala
demo:db<>fiddle
SELECT
task,
MAX(time) - MIN(time) as difference
FROM times
GROUP BY task
HAVING array_agg(event ORDER BY time) = '{"opened","closed"}'
Grouping by task. But only these tasks that have exactly one opened and one closed state (in that order). This is checked by aggregating the events
Because we know that there are only these two events, ordered by time, the first one (MIN) is the opened event, the last one (MAX) is the closed event.
Furthermore:
The difference between two timestamps always gets an interval type instead of your expected integer of minutes. To get the minutes, you'll need:
EXTRACT(EPOCH FROM
MAX(time) - MIN(time)
) / 60 as difference
EXTRACT(EPOCH FROM) converts the interval into seconds. To get minutes, divide it by 60.
Vamsi's solution works, but it is too complicated for my tastes. I would just go for:
select task,
max(time) FILTER(where event = 'closed') - max(time) FILTER (where event = 'opened')
from tbl
group by task
having count(*) = 2 and
min(event) = 'closed' and
max(event) = 'opened';
Or, if we don't want to depend on the string ordering:
having count(*) = 2 and
count(*) filter (where event = 'closed') = 1 and
count(*) filter (where event = 'opened') = 1 ;
Yet another option is to break your table into 3 separate derived tables: One for the opened event, one for the closed event, and one for "other" events (e.g. interrupted). Then, you can join those derived tables together to get what you need. For example (using CTEs, although you can of course inline the queries):
WITH
-- sample data
tbl(Task, "Event", Time) AS
(
VALUES
(2, 'opened', '2018-12-14 16:23:49.058707+01'::TIMESTAMP),
(2, 'closed', '2018-12-14 16:24:49.058707+01'::TIMESTAMP),
(3, 'opened', '2018-12-14 16:25:49.058707+01'::TIMESTAMP),
(3, 'interrupted', '2018-12-14 16:26:49.058707+01'::TIMESTAMP),
(3, 'closed', '2018-12-14 16:27:49.058707+01'::TIMESTAMP)
),
-- derived tables
opened AS (SELECT * FROM tbl WHERE "Event" = 'opened'),
closed AS (SELECT * FROM tbl WHERE "Event" = 'closed'),
other AS (SELECT * FROM tbl WHERE "Event" NOT IN ('opened', 'closed'))
SELECT
-- uses #S-Man's EXTRACT function to get minutes from a TIMESTAMP value.
ABS(EXTRACT(epoch FROM (opened.Time - closed.Time)) / 60)
FROM opened
INNER JOIN closed ON
closed.Task = opened.task
-- use LEFT JOIN and NULL to exclude records that have an "other" status.
LEFT JOIN other ON
other.Task = opened.Task
WHERE other.Task IS NULL

How to say how much time ago in text using a date field and SQL

In the JavaScript library moment.js, we can insert a date, and get the relative date difference from now in long English. For example, if I were to input yesterday's physical date, I would receive the response 'yesterday' (similar time-stamping in Facebook's news line).
Has anyone seen any examples of this feature set in SQL Server or comparative technology? I need to understand the logic to convert dates to the English representation similar to moment.js so I can begin constructing the query.
Thank you.
Database engines are grounded in set theory, where this kind of work is explicitly out of scope for them.
Database servers are also often expensive to license, where CPU time on the DB is significantly more expensive compared to CPU time in a web server, application server, or desktop.
Database servers are typically difficult to scale outward, such that the database is often a performance bottleneck for a system or application. The more CPU work you can move away from the database, the faster the application can go or the more users it can serve effectively.
Put all three of those together, and the common wisdom is this work should be done by the calling application. Let the database just return a DateTime value. It's good at that, and can do it while still preserving it's expensive and busy CPU. Let a client language, like C# or Javascript, worry about converting that DateTime value into a string like "Yesterday" or "Tomorrow".
Generally speaking, push the formatting as close to the user/presentation level as possible.
If you are open to a TVF as a helper function which I use to calcuate elapsed time, perhaps something like this
Example
Declare #YourTable table (SomeDate datetime)
Insert Into #YourTable values
('2015-05-28 16:10:27'),
('2018-05-25 22:15:18'),
('2018-06-01 16:52:18'),
(dateadd(SECOND,-3,GetDate())),
(GetDate())
Select A.SomeDate
,B.*
,TimeAgo = case when years > 0 then concat(years,' years ago') else
case when months > 0 then concat(months,' months ago') else
case when days > 0 then concat(days,' days ago') else
case when hours > 0 then concat(hours,' hours ago') else
case when minutes > 0 then concat(minutes,' minutes ago') else
case when seconds > 0 then concat(seconds,' seconds ago') else 'just now'
end end end end end end
From #YourTable A
Cross Apply [dbo].[tvf-Date-Elapsed] ( A.SomeDate,GetDate()) B
Returns
The UDF if Interested
CREATE FUNCTION [dbo].[tvf-Date-Elapsed] (#D1 DateTime,#D2 DateTime)
Returns Table
Return (
with cteBN(N) as (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cteRN(R) as (Select Row_Number() Over (Order By (Select NULL))-1 From cteBN a,cteBN b,cteBN c),
cteYY(N,D) as (Select Max(R),Max(DateAdd(YY,R,#D1))From cteRN R Where DateAdd(YY,R,#D1)<=#D2),
cteMM(N,D) as (Select Max(R),Max(DateAdd(MM,R,D)) From (Select Top 12 R From cteRN Order By 1) R, cteYY P Where DateAdd(MM,R,D)<=#D2),
cteDD(N,D) as (Select Max(R),Max(DateAdd(DD,R,D)) From (Select Top 31 R From cteRN Order By 1) R, cteMM P Where DateAdd(DD,R,D)<=#D2),
cteHH(N,D) as (Select Max(R),Max(DateAdd(HH,R,D)) From (Select Top 24 R From cteRN Order By 1) R, cteDD P Where DateAdd(HH,R,D)<=#D2),
cteMI(N,D) as (Select Max(R),Max(DateAdd(MI,R,D)) From (Select Top 60 R From cteRN Order By 1) R, cteHH P Where DateAdd(MI,R,D)<=#D2),
cteSS(N,D) as (Select Max(R),Max(DateAdd(SS,R,D)) From (Select Top 60 R From cteRN Order By 1) R, cteMI P Where DateAdd(SS,R,D)<=#D2)
Select [Years] = cteYY.N
,[Months] = cteMM.N
,[Days] = cteDD.N
,[Hours] = cteHH.N
,[Minutes] = cteMI.N
,[Seconds] = cteSS.N
--,[Elapsed] = Format(cteYY.N,'0000')+':'+Format(cteMM.N,'00')+':'+Format(cteDD.N,'00')+' '+Format(cteHH.N,'00')+':'+Format(cteMI.N,'00')+':'+Format(cteSS.N,'00')
From cteYY,cteMM,cteDD,cteHH,cteMI,cteSS
)
--Max 1000 years
--Select * from [dbo].[tvf-Date-Elapsed] ('1991-09-12 21:00:00.000',GetDate())
--Select * from [dbo].[tvf-Date-Elapsed] ('2017-01-01 20:30:15','2018-02-05 22:58:35')

SQL Table-value function optimisation / improvement

We've got a query that is taking a very long time to complete with a large dataset. I think I've tracked it down to a table-value function in the SQL server.
The query is designed to return the difference in printing usage between two dates. So if a printer had usage of 100 at date x and 200 at date y a row needs to be returned which reflects that it has had a usage change of 100.
These readings are taken periodically (but not every day) and stored in a table called MeterReadings. The code for the table-value function is below. This is then called from another SQL query which joins the returned table on a devices table with an inner join to get extra device information.
Any advise as to how to optimise the below would be appreciated.
ALTER FUNCTION [dbo].[DeviceUsage]
-- Add the parameters for the stored procedure here
( #StartDate DateTime , #EndDate DateTime )
RETURNS table
AS
RETURN
(
SELECT MAX(dbo.MeterReadings.ScanDateTime) AS MX,
MAX(dbo.MeterReadings.DeviceTotal - reading.DeviceTotal) AS TotalDiff,
MAX(dbo.MeterReadings.TotalCopy - reading.TotalCopy) AS CopyDiff,
MAX(dbo.MeterReadings.TotalPrint - reading.TotalPrint) AS PrintDiff,
MAX(dbo.MeterReadings.TotalScan - reading.TotalScan) AS ScanDiff,
MAX(dbo.MeterReadings.TotalFax - reading.TotalFax) AS FaxDiff,
MAX(dbo.MeterReadings.TotalMono - reading.TotalMono) AS MonoDiff,
MAX(dbo.MeterReadings.TotalColour - reading.TotalColour) AS ColourDiff,
MIN(reading.ScanDateTime) AS MN, dbo.MeterReadings.DeviceID
FROM dbo.MeterReadings INNER JOIN (SELECT * FROM dbo.MeterReadings WHERE
(dbo.MeterReadings.ScanDateTime > #StartDate) AND
(dbo.MeterReadings.ScanDateTime < #EndDate) )
AS reading ON dbo.MeterReadings.DeviceID = reading.DeviceID
WHERE (dbo.MeterReadings.ScanDateTime > #StartDate) AND (dbo.MeterReadings.ScanDateTime < #EndDate)
GROUP BY dbo.MeterReadings.DeviceID);
On the assumption that a value can only ever increase over time, it can certainly be simplified.
SELECT
DeviceID,
MIN(ScanDateTime) AS MN,
MAX(ScanDateTime) AS MX,
MAX(DeviceTotal ) - MIN(DeviceTotal) AS TotalDiff,
MAX(TotalCopy ) - MIN(TotalCopy ) AS CopyDiff,
MAX(TotalPrint ) - MIN(TotalPrint ) AS PrintDiff,
MAX(TotalScan ) - MIN(TotalScan ) AS ScanDiff,
MAX(TotalFax ) - MIN(TotalFax ) AS FaxDiff,
MAX(TotalMono ) - MIN(TotalMono ) AS MonoDiff,
MAX(TotalColour ) - MIN(TotalColour) AS ColourDiff
FROM
dbo.MeterReadings
WHERE
ScanDateTime > #StartDate
AND ScanDateTime < #EndDate
GROUP BY
DeviceID
This assumes that if you have reading on dates 1, 3, 5, 7, 9 and you want to report on 2 -> 8 then you want reading 7 - reading 3. I would have thought you wanted reading 7 - reading 1?
The above query should be fine for relatively small ranges. If you have Huge ranges of time, the MAX() - MIN() will be operating on large numbers of rows. This can then possibly be improved even further with the following (with correlated sub-queries to lookup just the two rows that you want).
As a side benefit, this also works even if the values can go down as well as up.
(I assume the existance of a Device table for a simpler query and faster performance.)
SELECT
Device.DeviceID,
start.ScanDateTime AS MN,
finish.ScanDateTime AS MX,
finish.DeviceTotal - start.DeviceTotal AS TotalDiff,
finish.TotalCopy - start.TotalCopy AS CopyDiff,
finish.TotalPrint - start.TotalPrint AS PrintDiff,
finish.TotalScan - start.TotalScan AS ScanDiff,
finish.TotalFax - start.TotalFax AS FaxDiff,
finish.TotalMono - start.TotalMono AS MonoDiff,
finish.TotalColour - start.TotalColour AS ColourDiff
FROM
dbo.Device AS device
INNER JOIN
dbo.MeterReadings AS start
ON start.DeviceID = device.DeviceID
AND start.ScanDateTime = (SELECT MIN(ScanDateTime)
FROM dbo.MeterReadings
WHERE DeviceID = device.DeviceID
AND ScanDateTime > #startDate
AND ScanDateTime < #endDate)
INNER JOIN
dbo.MeterReadings AS finish
ON finish.DeviceID = device.DeviceID
AND finish.ScanDateTime = (SELECT MAX(ScanDateTime)
FROM dbo.MeterReadings
WHERE DeviceID = device.DeviceID
AND ScanDateTime > #startDate
AND ScanDateTime < #endDate)
This can also be modified to pick up the start as being the first date on or before #startDate, if required.
EDIT: Modification to pick the start reading as being for the first date on or before #startDate.
SELECT
Device.DeviceID,
start.ScanDateTime AS MN,
finish.ScanDateTime AS MX,
COALESCE(finish.DeviceTotal, 0) - COALESCE(start.DeviceTotal, 0) AS TotalDiff,
COALESCE(finish.TotalCopy , 0) - COALESCE(start.TotalCopy , 0) AS CopyDiff,
COALESCE(finish.TotalPrint , 0) - COALESCE(start.TotalPrint , 0) AS PrintDiff,
COALESCE(finish.TotalScan , 0) - COALESCE(start.TotalScan , 0) AS ScanDiff,
COALESCE(finish.TotalFax , 0) - COALESCE(start.TotalFax , 0) AS FaxDiff,
COALESCE(finish.TotalMono , 0) - COALESCE(start.TotalMono , 0) AS MonoDiff,
COALESCE(finish.TotalColour, 0) - COALESCE(start.TotalColour, 0) AS ColourDiff
FROM
dbo.Device AS device
LEFT JOIN
dbo.MeterReadings AS start
ON start.DeviceID = device.DeviceID
AND start.ScanDateTime = (SELECT MAX(ScanDateTime)
FROM dbo.MeterReadings
WHERE DeviceID = device.DeviceID
AND ScanDateTime < #startDate)
LEFT JOIN
dbo.MeterReadings AS finish
ON finish.DeviceID = device.DeviceID
AND finish.ScanDateTime = (SELECT MAX(ScanDateTime)
FROM dbo.MeterReadings
WHERE DeviceID = device.DeviceID
AND ScanDateTime < #endDate)
Your query seems to compute a cross-product of all readings in a time range for each particular device. This works semantically because the MIN and MAX aggregates don't care about duplicates. But this is very slow. If you are comparing 100 dates with themselves you need to process 10,000 rows.
I suggest you calculate the MIN and MAX values for each metric/column over the entire time interval and then subtract them. That way you don't need to join and you need a single pass ofer the data. Like this:
select Diff = MAX(col) - MIN(col)
from readings
group by DeviceID