In Azure Stream Analytics grouping by sliding window grouping from multiple windows - azure-stream-analytics

I have the following queries in azure stream analytics...DataInput returns only 1 row (I output to a blob and can see it)...but looks like the CalcData is processing a lot more rows...It looks like it is taking rows from multiple sliding windows. When I have events spaced out, then I get the right output, but when events occur next to each other, the sliding window doesn't seem to be right
WITH DataInput AS 1 AS (SELECT
CONCAT(fqn, '_HealthIndex') AS fqn,
value as value,
count(value) as cntvalue
FROM DataInput
GROUP BY fqn,value,SlidingWindow(Duration( hour, 8 ))
),
CalcData AS
(SELECT
fqn,
count(*) as records,
sum(value) as alm,
100 - sum(case when cast(value as bigint)=19 and cast(cntvalue as bigint) > 1 then 5
when cast(value as bigint)=23 and cast(cntvalue as bigint) > 1 then 5
when cast(value as bigint)=64 and cast(cntvalue as bigint) > 1 then 10
when cast(value as bigint)=72 and cast(cntvalue as bigint) > 1 then 10
when cast(value as bigint)=77 and cast(cntvalue as bigint) > 0 then 5
when cast(value as bigint)=78 and cast(cntvalue as bigint) > 0 then 5
when cast(value as bigint)=83 and cast(cntvalue as bigint) > 16 then 5
when cast(value as bigint)=84 and cast(cntvalue as bigint) > 16 then 5
when cast(value as bigint)=91 and cast(cntvalue as bigint) > 0 then 30
when cast(value as bigint)=92 and cast(cntvalue as bigint) > 1 then 5
when cast(value as bigint)=101 and cast(cntvalue as bigint) > 1 then 15 else 0 end ) as value
,System.TimeStamp as t
from DataInput1 group by fqn,SlidingWindow(Duration( hour, 8 ))
)
Any insight on why the CalcData is not taking only the output from DataInput would be greatly appreciated

The CalcData step is only taking in data from the output of the DataInput1 step, however you are grouping your events via sliding window in DataInput1. A sliding window creates an output every time an event enters or leaves the window. Thus, you can have an event included in multiple sliding windows. In order to make sure that an event is included in at most one window, consider using grouping by tumbling window instead.

Related

Determine if column is increasing or decreasing

I have data table that has column ORDER that is supposed to indicate if the values are increasing or decreasing and another column ORDER_BASIS. However, data in ORDER is often incorrect at the first place so I am trying to determine the correct order using ORDER_BASIS.
Here's what the table looks like:
ORDER
ORDER_BASIS
INCREASING
8
INCREASING
16
INCREASING
12
INCREASING
5
INCREASING
1
INCREASING
1
INCREASING
10
INCREASING
16
INCREASING
16
I am trying to achieve this:
ORDER
ORDER_BASIS
CORRECT_ORDER
INCREASING
8
INCREASING
INCREASING
16
INCREASING
INCREASING
12
DECREASING
INCREASING
5
DECREASING
INCREASING
1
DECREASING
INCREASING
1
DECREASING
INCREASING
10
INCREASING
INCREASING
16
INCREASING
INCREASING
16
INCREASING
First column may use ORDER then the following rows should determine if it's increasing or decreasing. If value did not increase or decrease then remain with it's current status until there's a change in value.
My current logic uses LAG and LEAD:
SELECT
LEAD (ORDER_BASIS, 1, 0) AS NEXT_BASIS,
LAG (ORDER_BASIS, 1, 0) AS PREV_BASIS
FROM
DATA_TABLE
Then created a condition but cannot get it to work correctly
CASE
WHEN CAST(PREV_BASIS AS int) = 0
OR (CAST(PREV_BASIS AS int) >= CAST(ORDER_BASIS AS int)
AND CAST(NEXT_BASIS AS int) <= CAST(ORDER_BASIS AS int))
THEN ORDER_BASIS
ELSE 'OPPOSITE_DIRECTION'
END AS CORRECT_ORDER
Using SQL Server 2014
if your query has not any order by statement, the order of rows is totally random anytime,and can be different. so to solve this problem you need some column that you can guarantee the initial order of rows , then we can fix the issue :
select * ,
case when ORDER_BASIS > LAG(ORDER_BASIS,1,-1) over (order by <the column>)
then 'INCREASING'
case when ORDER_BASIS = LAG(ORDER_BASIS,1,-1) over (order by <the column>)
then 'No change'
else 'DECREASING' end CORRECT_ORDER
from DATA_TABLE
With sequential processing functions, like LAG and LEAD the sequence is the most important factor to maintain and is the one item that was left out of the original post. In SQL Server, window functions will operate on their own partition (grouping) and sort criteria, so when visually correlating the data it is important to use the same criteria in the external query as you do for the window functions.
The following solution can be explored in this fiddle: http://sqlfiddle.com/#!18/5e1ee/31
To validate your input conditions, run the query to output the LAG and LEAD results:
SELECT
[Id],[Order]
, LAG (ORDER_BASIS, 1, NULL) OVER (ORDER BY [Id]) AS PREV_BASIS
, [Order_Basis]
, LEAD (ORDER_BASIS, 1, NULL) OVER (ORDER BY [Id]) AS NEXT_BASIS
FROM DATA_TABLE;
Id
Order
PREV_BASIS
Order_Basis
NEXT_BASIS
1
INCREASING
(null)
8
16
2
INCREASING
8
16
12
3
INCREASING
16
12
5
4
INCREASING
12
5
1
5
INCREASING
5
1
1
6
INCREASING
1
1
10
7
INCREASING
1
10
16
8
INCREASING
10
16
16
9
INCREASING
16
16
(null)
The next issue is that your attempted logic is using the LAG AND the LEAD values, which is not invalid, but is usually used to compute a value that either smooths out the curve or is trying to detect spikes or Highs and Lows.
It is not necessary to do this via a CTE however, it simplifies the readability of the syntax for this discussion, within the CTE we can perform the Integer Casting as well, however in a production environment it might be optimal to store the ORDER_BASIS column as an Integer in the first place.
WITH Records as
(
SELECT
[Id],[Order]
, CAST(LAG (ORDER_BASIS, 1, NULL) OVER (ORDER BY [Id]) AS INT) AS PREV_BASIS
, CAST([Order_Basis] AS INT) AS [Order_Basis]
, CAST(LEAD (ORDER_BASIS, 1, NULL) OVER (ORDER BY [Id]) AS INT) AS NEXT_BASIS
FROM DATA_TABLE
)
SELECT
[Id],[Order],PREV_BASIS,[Order_Basis],NEXT_BASIS
,CASE
WHEN NEXT_BASIS > ORDER_BASIS AND PREV_BASIS > ORDER_BASIS THEN 'LOW'
WHEN NEXT_BASIS < ORDER_BASIS AND PREV_BASIS < ORDER_BASIS THEN 'HIGH'
WHEN ISNULL(PREV_BASIS, ORDER_BASIS) = ORDER_BASIS THEN 'NO CHANGE'
WHEN ISNULL(PREV_BASIS, ORDER_BASIS) >= ORDER_BASIS
AND ISNULL(NEXT_BASIS, ORDER_BASIS) <= ORDER_BASIS
THEN 'DECREASING'
WHEN ISNULL(PREV_BASIS, ORDER_BASIS) <= ORDER_BASIS
AND ISNULL(NEXT_BASIS, ORDER_BASIS) >= ORDER_BASIS
THEN 'INCREASING'
ELSE 'INDETERMINATE'
END AS CORRECT_ORDER
FROM Records
ORDER BY [Id];
Id
Order
PREV_BASIS
Order_Basis
NEXT_BASIS
CORRECT_ORDER
1
INCREASING
(null)
8
16
NO CHANGE
2
INCREASING
8
16
12
HIGH
3
INCREASING
16
12
5
DECREASING
4
INCREASING
12
5
1
DECREASING
5
INCREASING
5
1
1
DECREASING
6
INCREASING
1
1
10
NO CHANGE
7
INCREASING
1
10
16
INCREASING
8
INCREASING
10
16
16
INCREASING
9
INCREASING
16
16
(null)
NO CHANGE
You could extend this by using a LAG comparison again to determine if the NO CHANGE in the middle of the above record set is in fact a low point over a longer period.
If the CORRECT ORDER should only be a function of the previous record, then there is no need to use a LEAD evaluation at all:
WITH Records as
(
SELECT
[ID],[ORDER]
, CAST(LAG (ORDER_BASIS, 1, NULL) OVER (ORDER BY [Id]) AS INT) AS PREV_BASIS
, CAST([ORDER_BASIS] AS INT) AS [ORDER_BASIS]
FROM DATA_TABLE
)
SELECT
[ID],[ORDER],[PREV_BASIS],[ORDER_BASIS]
, CASE WHEN ORDER_BASIS < PREV_BASIS
THEN 'DECREASING'
WHEN ORDER_BASIS > PREV_BASIS
THEN 'INCREASING'
ELSE 'NO CHANGE'
END CORRECT_ORDER
FROM Records;
ID
ORDER
PREV_BASIS
ORDER_BASIS
CORRECT_ORDER
1
INCREASING
(null)
8
NO CHANGE
2
INCREASING
8
16
INCREASING
3
INCREASING
16
12
DECREASING
4
INCREASING
12
5
DECREASING
5
INCREASING
5
1
DECREASING
6
INCREASING
1
1
NO CHANGE
7
INCREASING
1
10
INCREASING
8
INCREASING
10
16
INCREASING
9
INCREASING
16
16
NO CHANGE

See the distribution of secondary requests grouped by time interval in sql

I have the following table:
RequestId,Type, Date, ParentRequestId
1 1 2020-10-15 null
2 2 2020-10-19 1
3 1 2020-10-20 null
4 2 2020-11-15 3
For this example I am interested in the request type 1 and 2, to make the example simpler. My task is to query a big database and to see the distribution of the secondary transaction based on the difference of dates with the parent one. So the result would look like:
Interval,Percentage
0-7 days,50 %
8-15 days,0 %
16-50 days, 50 %
So for the first line from teh expected result we have the request with the id 2 and for the third line from the expected result we have the request with the id 4 because the date difference fits in this interval.
How to achieve this?
I'm using sql server 2014.
We like to see your attempts, but by the looks of it, it seems like you're going to need to treat this table as 2 tables and do a basic GROUP BY, but make it fancy by grouping on a CASE statement.
WITH dateDiffs as (
/* perform our date calculations first, to get that out of the way */
SELECT
DATEDIFF(Day, parent.[Date], child.[Date]) as daysDiff,
1 as rowsFound
FROM (SELECT RequestID, [Date] FROM myTable WHERE Type = 1) parent
INNER JOIN (SELECT ParentRequestID, [Date] FROM myTable WHERE Type = 2) child
ON parent.requestID = child.parentRequestID
)
/* Now group and aggregate and enjoy your maths! */
SELECT
case when daysDiff between 0 and 7 then '0-7'
when daysDiff between 8 and 15 then '8-15'
when daysDiff between 16 and 50 THEN '16-50'
else '50+'
end as myInterval,
sum(rowsFound) as totalFound,
(select sum(rowsFound) from dateDiffs) as totalRows,
1.0 * sum(rowsFound) / (select sum(rowsFound) from dateDiffs) * 100.00 as percentFound
FROM dateDiffs
GROUP BY
case when daysDiff between 0 and 7 then '0-7'
when daysDiff between 8 and 15 then '8-15'
when daysDiff between 16 and 50 THEN '16-50'
else '50+'
end;
This seems like basically a join and group by query:
with dates as (
select 0 as lo, 7 as hi, '0-7 days' as grp union all
select 8 as lo, 15 as hi, '8-15 days' union all
select 16 as lo, 50 as hi, '16-50 days'
)
select d.grp,
count(*) as cnt,
count(*) * 1.0 / sum(count(*)) over () as raio
from dates left join
(t join
t tp
on tp.RequestId = t. ParentRequestId
)
on datediff(day, tp.date, t.date) between d.lo and d.hi
group by d.grp
order by d.lo;
The only trick is generating all the date groups, so you have rows with zero values.

How to do this in SQL? (run a query over a window instead of just a single aggregate)

Let's say I have some data with timestamps yyyy/mm/dd hh:mm:ss and some error stages (1 meaning an error has occurred). I have the data loaded in a dataframe I call df and want to compute Time_To_Next_Error (measured in seconds) a new column Time_To_Error based on the timestamp and Error_State.
Timestamp Error_State Time_To_Next_Error
2017-05-10 00:10:50 0 10
2017-05-10 00:10:55 0 5
2017-05-10 00:11:05 1 0
2017-05-10 00:11:10 0 5
2017-05-10 00:11:15 1 0
2017-05-10 00:11:20 0 15
2017-05-10 00:11:25 0 10
2017-05-10 00:11:30 0 5
2017-05-10 00:11:20 1 0
2017-05-10 00:11:20 0 0
For example, the first observation, there's 15 seconds before the first error occurs at 11:05 after which the count starts over from 0 seconds then the next "window" starts.
Is there a way to define a "window" spanning the next say 5 rows so I can 'look ahead' and check if any of those 5 rows satisfy some condition (like say one of the values is a 1 meaning an Error_Stage = 1 will happen soon)
Something like this perhaps:
SELECT
*,
DATEDIFF(second,
timestamp,
MIN(CASE WHEN error > 0 THEN timestamp END) OVER(ORDER BY timestamp ROWS BETWEEN 1 FOLLOWING AND 5 FOLLOWING
) as ttne
FROM yourtable
This will get the lowest(soonest) time stamp in the following 5 rows where the error code that occurs is greatest than 0, and datediff it with the timestamp of the current row
You could adjust the case when to do different logic
--time to next error code 1
MIN(CASE WHEN error = 1 THEN ...
If there is no error code 1 in the next 5 rows this should result in a null and datediff should also then output a null
Exactly what you're saying -- a window function!
Here's some code, SQL Server style:
DECLARE #tbl TABLE (
ts datetime,
Error_st int
);
INSERT INTO #tbl
VALUES
('2017-05-10 00:10:50', 0),
('2017-05-10 00:10:55', 0),
('2017-05-10 00:11:05', 1),
('2017-05-10 00:11:10', 0),
('2017-05-10 00:11:15', 1),
('2017-05-10 00:11:20', 0),
('2017-05-10 00:11:25', 0),
('2017-05-10 00:11:30', 0),
('2017-05-10 00:11:35', 1),
('2017-05-10 00:11:40', 0)
select *, DATEDIFF(second, ts,
min(CASE WHEN error_st=1 then ts else NULL END)
over (order by ts desc)) as time_to_Next_Err
-- , min(CASE WHEN error_st=1 then ts else NULL END)
-- over (order by ts desc) as NextErrorTS
from #tbl
order by ts
Here we rely on default behaviour of the SQL Server window-version of MIN():
the window is defined as "all previous rows and current" (ordering by descending timestamp). You can control the window and limit it to the "5 previous", if you only want to show "close-to-error" situations.
More details here:
https://learn.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql

How to calculate query result on the bases of percentage

I want to calculate query result on the bases of percentage, which is set from the admin panel of the website.
There are four status for the same.
Gold, Silver,Bronze and Medellin.
My current formula is
select * , isnull( CASE WHEN RowNumber <= (#totaltopPromoters*#Gold/100) Then 1
WHEN RowNumber >= (#totaltopPromoters*#Gold/100) and RowNumber <= (#totaltopPromoters*#Gold/100) + (#totaltopPromoters*#Silver/100) THEN 2
WHEN RowNumber>=(#totaltopPromoters*#Silver/100) and RowNumber<= (#totaltopPromoters*#Gold/100)+(#totaltopPromoters*#Silver/100) + (#totaltopPromoters*#Bronze/100)THEN 3
WHEN RowNumber>=(#totaltopPromoters*#Medallion/100) and RowNumber <= (#totaltopPromoters*#Gold/100)+(#totaltopPromoters*#Silver/100) + (#totaltopPromoters*#Bronze/100)+(#totaltopPromoters*#Medallion/100) THEN 4
end ,0) as
TrophyType
Can anyone guide me on this?

Sum case when one result bigger than the other

I'm using SQL report builder and wish to calculate the % within turnaround times
my table looks like
Name count within tat
jeff 1 1
jeff 1 0
jeff 1 1
jeff 1 0
i would like it to look like this.
Name count within tat
jeff 4 2 (50%)
The code im using to calculate within tat is
case
when (convert(Decimal(10,2),
(cast(datediff(minute,
(CAST(RequestDate AS DATETIME) + CAST(RequestTime AS DATETIME)),
REQUEST_TESTLINES.AuthDateTime)as float)/60/24))) >
EXP_TAT then '1' else '0' end as [withintat]
How can I sum this column ?
you looking for something like that?
select name , sum(count) total, sum(within_tat)*100 /sum(count) as percent
from Table1
Group by name
LOOK DEMO SQLFIDDLE
EDIT.
OR if you want it exactly as yyou want try this
select name , sum(count) as total, CONCAT(sum(within_tat),' ' ,'(',sum(within_tat)*100 /sum(count), '%',')' ) as percent
from Table1
Group by name
CHECK DEMO HERE
You could wrap it in another SELECT.
SELECT SUM(count), SUM(withintat)
FROM (/*Your original query*/)
yes, you can use case statment inside sum()
but it would need to return a number..
a change in your "within tat" to something like
select case when (convert(Decimal(10,2),
(cast(datediff(minute,
(CAST(RequestDate AS DATETIME) + CAST(RequestTime AS DATETIME)),
REQUEST_TESTLINES.AuthDateTime)as float)/60/24))) >
EXP_TAT
then 1
else 0 end as [withintat]
....
but, if you need the sum and the percentual.
you will need to use this value two times.
and I am sure you dont want to keep replicate code.
so use your actual query as a sub query to sum it may be a good idea...
if you realy dont want to use it as a subquery
you should use a outer apply to gather the value of withintat, doing something like this
select Name
,count(*) as [count]
,sum(OA.[withintat]) as [withintat]
,sum(OA.[withintat])*100/count(*) as [percent]
,cast(sum(OA.[withintat]) as varchar(max))+' ('+cast(sum(OA.[withintat])*100/count(*) as varchar(max))+' %)' as [withintat_and_percent]
from your_actual_query
outer apply(select case when (convert(Decimal(10,2),
(cast(datediff(minute,
(CAST(RequestDate AS DATETIME) + CAST(RequestTime AS DATETIME)),
REQUEST_TESTLINES.AuthDateTime)as float)/60/24))) > EXP_TAT
then 1
else 0 end as [withintat]
)OA
where ....
I would use an IF in this case (pun aside). I tried to reduce the complexity of your comparison, but without seeing some actual data, it's a best guess.
select
name,
count(name) as count,
concat(
sum(if(datediff(minute,
(cast(RequestDate AS DATETIME) +
cast(RequestTime AS DATETIME)),
REQUEST_TESTLINES.AuthDateTime) / 60 / 24 > EXP_TAT, 1, 0 )),
' (',
format(
(sum(if(datediff(minute,
(cast(RequestDate AS DATETIME) +
cast(RequestTime AS DATETIME)),
REQUEST_TESTLINES.AuthDateTime) / 60 / 24 > EXP_TAT, 1, 0 )
)/count(name))*100,1),
'%)'
) as 'within tat'