I am trying to calculate the mean time to failure for each asset in a job table. At the moment I calculate as follows;
Previous ID = CALCULATE(MAX('JobTrackDB Job'[JobId]),FILTER('JobTrackDB Job','JobTrackDB Job'[AssetDescriptionID]=EARLIER('JobTrackDB Job'[AssetDescriptionID]) && 'JobTrackDB Job'[JobId]<EARLIER('JobTrackDB Job'[JobId])))
Then I bring back the last finish time for the current job when the JobStatus is 7 (closed);
Finish Time = CALCULATE(MAX('JobTrackDB JobDetail'[FinishTime]),'JobTrackDB JobDetail'[JobId],'JobTrackDB JobDetail'[JobStatus]=7)
Then I bring back the previous jobs finish time where the JobType is 1 (Response rather than comparing it to maintenance calls);
Previous Finish = CALCULATE(MAX('JobTrackDB Job'[Finish Time]),FILTER('JobTrackDB Job','JobTrackDB Job'[AssetDescriptionID]=EARLIER('JobTrackDB Job'[AssetDescriptionID]) && 'JobTrackDB Job'[Finish Time]<EARLIER('JobTrackDB Job'[Finish Time]) && EARLIER('JobTrackDB Job'[JobTypeID])=1))
Then I calculate the Time between failure where I also disregard erroneous values;
Time between failure = IF([Previous Finish]=BLANK(),BLANK(),IF('JobTrackDB Job'[Date Logged]-[Previous Finish]<0,BLANK(),'JobTrackDB Job'[Date Logged]-[Previous Finish]))
Issue is that sometimes the calculation uses previous maintenance jobs even though I specified JobTypeID = 1 in the filter. Also, the current calculation does not take into account the time from the start of records to the first job for that asset and also from the last job till today. I am scratching my head trying to figure it out.
Any ideas???
Thanks,
Brent
Some base measures:
MaxJobID := MAX( Job[JobID] )
MaxLogDate := MAX ( Job[Date Logged] )
MaxFinishTime := MAX (JobDetail[Finish Time])
Intermediate calculations:
ClosedFinishTime := CALCULATE ( [MaxFinishTime], Job[Status] = 7 )
AssetPreviousJobID := CALCULATE (
[MaxJobID],
FILTER(
ALLEXCEPT(Job, Table1[AssetDescriptionID]),
Job[JobId] < MAX(Table1[JobID])
)
)
PreviousFinishTime: = CALCULATE ( [ClosedFinishTime],
FILTER(
ALLEXCEPT(Job, Job[AssetDescriptionID]),
Job[JobId] < MAX(Job[JobID])
&& Job[JobType] = 1
)
)
FailureTime := IF (
ISBLANK([PreviousFinishTime]),
0,
( [MaxLogDate]-[PreviousFinishTime] )
)
This should at least get you started. If you want to set some sort of "first day", you can replace the 0 in the FailureTime with a calc like MaxLogDate - [OverallFirstDate], which could be a calculated measure or a constant.
For something that hasn't failed, you'd want to use an entirely different measure since this one is based on lookback only. Something like [Days Since Last Failure] which would just be (basically) TODAY() - [ClosedFinishTime]
Related
I was doing a test, when I generated data that was 30 days old.
When sent to SA job all that input was dropped, but per settings in event ordering blade I was expecting that all will be passed thru.
Part of job query contains:
---------------all incoming events storage query
SELECT stream.*
INTO [iot-predict-SA2-ColdStorage]
FROM [iot-predict-SA2-input] stream TIMESTAMP BY stream.UtcTime
so my expectation is to have everything that was pushed to SA job in blob storage.
When I sent events that were only 5 hours old - then the input was marked as late (expected) and processed.
Per SS first marked area is showing outdated events input, but no output (red), the second part shows late processed events.
full query
WITH AlertsBasedOnMin
AS (
SELECT stream.SensorGuid
,stream.Value
,stream.SensorName
,ref.AggregationTypeFlag
,ref.MinThreshold AS threshold
,ref.Count
,CASE
WHEN (ref.MinThreshold > stream.Value)
THEN 1
ELSE 0
END AS isAlert
FROM [iot-predict-SA2-input] stream TIMESTAMP BY stream.UtcTime
JOIN [iot-predict-SA2-referenceBlob] ref ON ref.SensorGuid = stream.SensorGuid
WHERE ref.AggregationTypeFlag = 8
)
,AlertsBasedOnMax
AS (
SELECT stream.SensorGuid
,stream.Value
,stream.SensorName
,ref.AggregationTypeFlag
,ref.MaxThreshold AS threshold
,ref.Count
,CASE
WHEN (ref.MaxThreshold < stream.Value)
THEN 1
ELSE 0
END AS isAlert
FROM [iot-predict-SA2-input] stream TIMESTAMP BY stream.UtcTime
JOIN [iot-predict-SA2-referenceBlob] ref ON ref.SensorGuid = stream.SensorGuid
WHERE ref.AggregationTypeFlag = 16
)
,alertMinMaxUnion
AS (
SELECT *
FROM AlertsBasedOnMin
UNION ALL
SELECT *
FROM AlertsBasedOnMax
)
,alertMimMaxComputed
AS (
SELECT SUM(alertMinMaxUnion.isAlert) AS EventCount
,alertMinMaxUnion.SensorGuid AS SensorGuid
,alertMinMaxUnion.SensorName
FROM alertMinMaxUnion
GROUP BY HoppingWindow(Duration(minute, 1), Hop(second, 30))
,alertMinMaxUnion.SensorGuid
,alertMinMaxUnion.Count
,alertMinMaxUnion.AggregationTypeFlag
,alertMinMaxUnion.SensorName
HAVING SUM(alertMinMaxUnion.isAlert) > alertMinMaxUnion.Count
)
,alertsMimMaxComputedMergedWithReference
AS (
SELECT System.TIMESTAMP [TimeStampUtc]
,computed.EventCount
,0 AS SumValue
,0 AS AvgValue
,0 AS StdDevValue
,computed.SensorGuid
,computed.SensorName
,ref.MinThreshold
,ref.MaxThreshold
,ref.TimeFrameInSeconds
,ref.Count
,ref.GatewayGuid
,ref.SensorType
,ref.AggregationType
,ref.AggregationTypeFlag
,ref.EmailList
,ref.PhoneNumberList
FROM alertMimMaxComputed computed
JOIN [iot-predict-SA2-referenceBlob] ref ON ref.SensorGuid = computed.SensorGuid
)
,alertsAggregatedByFunction
AS (
SELECT Count(1) AS eventCount
,stream.SensorGuid AS SensorGuid
,stream.SensorName
,ref.[Count] AS TriggerThreshold
,SUM(stream.Value) AS SumValue
,AVG(stream.Value) AS AvgValue
,STDEV(stream.Value) AS StdDevValue
,ref.AggregationTypeFlag AS flag
FROM [iot-predict-SA2-input] stream TIMESTAMP BY stream.UtcTime
JOIN [iot-predict-SA2-referenceBlob] ref ON ref.SensorGuid = stream.SensorGuid
GROUP BY HoppingWindow(Duration(minute, 1), Hop(second, 30))
,ref.AggregationTypeFlag
,ref.[Count]
,ref.MaxThreshold
,ref.MinThreshold
,stream.SensorGuid
,stream.SensorName
HAVING
--as this is alert then this factor will be relevant to all of the aggregated queries
Count(1) >= ref.[Count]
AND (
--average
(
ref.AggregationTypeFlag = 1
AND (
AVG(stream.Value) >= ref.MaxThreshold
OR AVG(stream.Value) <= ref.MinThreshold
)
)
--sum
OR (
ref.AggregationTypeFlag = 2
AND (
SUM(stream.Value) >= ref.MaxThreshold
OR Sum(stream.Value) <= ref.MinThreshold
)
)
--stdev
OR (
ref.AggregationTypeFlag = 4
AND (
STDEV(stream.Value) >= ref.MaxThreshold
OR STDEV(stream.Value) <= ref.MinThreshold
)
)
)
)
,alertsAggregatedByFunctionMergedWithReference
AS (
SELECT System.TIMESTAMP [TimeStampUtc]
,0 AS EventCount
,computed.SumValue
,computed.AvgValue
,computed.StdDevValue
,computed.SensorGuid
,computed.SensorName
,ref.MinThreshold
,ref.MaxThreshold
,ref.TimeFrameInSeconds
,ref.Count
,ref.GatewayGuid
,ref.SensorType
,ref.AggregationType
,ref.AggregationTypeFlag
,ref.EmailList
,ref.PhoneNumberList
FROM alertsAggregatedByFunction computed
JOIN [iot-predict-SA2-referenceBlob] ref ON ref.SensorGuid = computed.SensorGuid
)
,allAlertsUnioned
AS (
SELECT *
FROM alertsAggregatedByFunctionMergedWithReference
UNION ALL
SELECT *
FROM alertsMimMaxComputedMergedWithReference
)
---------------alerts storage query
SELECT *
INTO [iot-predict-SA2-Alerts-ColdStorage]
FROM allAlertsUnioned
---------------alerts to alert events query
SELECT *
INTO [iot-predict-SA2-Alerts-EventStream]
FROM allAlertsUnioned
---------------alerts to stream query
SELECT *
INTO [iot-predict-SA2-TSI-EventStream]
FROM allAlertsUnioned
---------------all incoming events storage query
SELECT stream.*
INTO [iot-predict-SA2-ColdStorage]
FROM [iot-predict-SA2-input] stream TIMESTAMP BY stream.UtcTime
---------------all incoming events to time insights query
SELECT stream.*
INTO [iot-predict-SA2-TSI-AlertStream]
FROM [iot-predict-SA2-input] stream TIMESTAMP BY stream.UtcTime
Since you are using "TIMESTAMP BY", Stream Analytics job event ordering settings are taking effects. Please check your job's "event ordering" settings, specifically below two:
Events that arrive late -- the late arrival limit between 0 second and 21 days.
Handling other events -- error handling policy, drop or adjust the application time to system clock time.
I guess that, most likely, your late arrival limit was more than 5 hours, so that those 5-hours old events could be processed.
You may already figure out from above that Stream Analytics job can only process "old" events up to 21 days late. To work around this limitation, you can consider one of below options:
Remove TIMESTAMP BY, then all your windowing aggregate will be using enqueue time. This might generate incorrect result according to your query logic.
Select "adjust" as the error handling policy. Again, this might generate incorrect result according to your query logic.
Shifting the application time (stream.UtcTime) to a more resent time by using DATEADD() function, for example TIMESTAMP BY DATEADD(day, 10, UtcTime). This works well when this is a onetime task, and you know the time range of your events.
Use batch job(outside Stream Analytics) to process data that 30 days old.
After a chat with guys from MS, it emerged that my test have to had an extra step to perform.
To have late events processed, regardless late event settings, we need to start this job in a way, that late event is considered as a sent when job was started, so in this particular case, we have to start SA job using custom start date and set it 30 days ago.
I need to reproduce the kettle Datedif function in the R programming language. I need the 'datedif month' option. I thought reproducing would be pretty easy but I have some 'weird behaviour' in pentaho. As an example:
ID date_1 date_2 monthly_difference_kettle daydiff_mysql
15943 31/12/2013 28/07/2014 7 209
15943 31/12/2011 27/07/2012 6 209
So in pentaho kettle I used the formula-step and the function DATEDIF(date2,date1,"m"). As you can see when I calculate the daily difference in mysql I get for both records the same amount of days in difference (209), however, when the monthly difference is calculated via the formula step in pentaho kettle I get a different result in months (7 and 6 respectively). I don't understand how this is calculated...
Can anyone produce the source code for the 'DATEDIF months' function in pentaho? I would like to reproduce it in R so I get exactly the same results.
Thanks in advance,
best regards,
Not sure about mysql, but i think it is the same. In PostgreSQL date difference gives integer value (in days). It means both rows has total match in days.
Calculating month difference non trivial. What is month (28, 30, 31 day)? Shall we count if month is not full?
Documentation states If there is not a complete month between the dates, 0 will be returned
But according to source code easy to understand how calculated datedif:
Source code available via github https://github.com/pentaho/pentaho-reporting/blob/f7defbcfc0e8f48ad2b139fe9820445f052e0e78/libraries/libformula/src/main/java/org/pentaho/reporting/libraries/formula/function/datetime/DateDifFunction.java
private int addFieldLoop( final GregorianCalendar c, final GregorianCalendar target, final int field ) {
c.set( Calendar.MILLISECOND, 0 );
c.set( Calendar.SECOND, 0 );
c.set( Calendar.MINUTE, 0 );
c.set( Calendar.HOUR_OF_DAY, 0 );
target.set( Calendar.MILLISECOND, 0 );
target.set( Calendar.SECOND, 0 );
target.set( Calendar.MINUTE, 0 );
target.set( Calendar.HOUR_OF_DAY, 0 );
if ( c.getTimeInMillis() == target.getTimeInMillis() ) {
return 0;
}
int count = 0;
while ( true ) {
c.add( field, 1 );
if ( c.getTimeInMillis() > target.getTimeInMillis() ) {
return count;
}
count += 1;
}
}
Append 1 month to start date till it will become bigger then end date
I am using PostgreSQL on Amazon Redshift.
My table is :
drop table APP_Tax;
create temp table APP_Tax(APP_nm varchar(100),start timestamp,end1 timestamp);
insert into APP_Tax values('AFH','2018-01-26 00:39:51','2018-01-26 00:39:55'),
('AFH','2016-01-26 00:39:56','2016-01-26 00:40:01'),
('AFH','2016-01-26 00:40:05','2016-01-26 00:40:11'),
('AFH','2016-01-26 00:40:12','2016-01-26 00:40:15'), --row x
('AFH','2016-01-26 00:40:35','2016-01-26 00:41:34') --row y
Expected output:
'AFH','2016-01-26 00:39:51','2016-01-26 00:40:15'
'AFH','2016-01-26 00:40:35','2016-01-26 00:41:34'
I had to compare start and endtime between alternate records and if the timedifference < 10 seconds get the next record endtime till last or final record.
I,e datediff(seconds,2018-01-26 00:39:55,2018-01-26 00:39:56) Is <10 seconds
I tried this :
SELECT a.app_nm
,min(a.start)
,max(b.end1)
FROM APP_Tax a
INNER JOIN APP_Tax b
ON a.APP_nm = b.APP_nm
AND b.start > a.start
WHERE datediff(second, a.end1, b.start) < 10
GROUP BY 1
It works but it doesn't return row y when conditions fails.
There are two reasons that row y is not returned is due to the condition:
b.start > a.start means that a row will never join with itself
The GROUP BY will return only one record per APP_nm value, yet all rows have the same value.
However, there are further logic errors in the query that will not successfully handle. For example, how does it know when a "new" session begins?
The logic you seek can be achieved in normal PostgreSQL with the help of a DISTINCT ON function, which shows one row per input value in a specific column. However, DISTINCT ON is not supported by Redshift.
Some potential workarounds: DISTINCT ON like functionality for Redshift
The output you seek would be trivial using a programming language (which can loop through results and store variables) but is difficult to apply to an SQL query (which is designed to operate on rows of results). I would recommend extracting the data and running it through a simple script (eg in Python) that could then output the Start & End combinations you seek.
This is an excellent use-case for a Hadoop Streaming function, which I have successfully implemented in the past. It would take the records as input, then 'remember' the start time and would only output a record when the desired end-logic has been met.
Sounds like what you are after is "sessionisation" of the activity events. You can achieve that in Redshift using Windows Functions.
The complete solution might look like this:
SELECT
start AS session_start,
session_end
FROM (
SELECT
start,
end1,
lead(end1, 1)
OVER (
ORDER BY end1) AS session_end,
session_boundary
FROM (
SELECT
start,
end1,
CASE WHEN session_switch = 0 AND reverse_session_switch = 1
THEN 'start'
ELSE 'end' END AS session_boundary
FROM (
SELECT
start,
end1,
CASE WHEN datediff(seconds, end1, lead(start, 1)
OVER (
ORDER BY end1 ASC)) > 10
THEN 1
ELSE 0 END AS session_switch,
CASE WHEN datediff(seconds, lead(end1, 1)
OVER (
ORDER BY end1 DESC), start) > 10
THEN 1
ELSE 0 END AS reverse_session_switch
FROM app_tax
)
AS sessioned
WHERE session_switch != 0 OR reverse_session_switch != 0
UNION
SELECT
start,
end1,
'start'
FROM (
SELECT
start,
end1,
row_number()
OVER (PARTITION BY APP_nm
ORDER BY end1 ASC) AS row_num
FROM APP_Tax
) AS with_row_number
WHERE row_num = 1
) AS with_boundary
) AS with_end
WHERE session_boundary = 'start'
ORDER BY start ASC
;
Here is the breadkdown (by subquery name):
sessioned - we first identify the switch rows (out and in), the rows in which the duration between end and start exceeds limit.
with_row_number - just a patch to extract the first row because there is no switch into it (there is an implicit switch that we record as 'start')
with_boundary - then we identify the rows where specific switches occur. If you run the subquery by itself it is clear that session start when session_switch = 0 AND reverse_session_switch = 1, and ends when the opposite occurs. All other rows are in the middle of sessions so are ignored.
with_end - finally, we combine the end/start of 'start'/'end' rows into (thus defining session duration), and remove the end rows
with_boundary subquery answers your initial question, but typically you'd want to combine those rows to get the final result which is the session duration.
I am trying to get how long an activity has been "InProgress" based on the history data i have. Each history record contains StartTime and the "Stage" of an activity.
Stages flow like this:
Ready
InProgress
Completed
Also there is a stage named "OnHold" which puts an activity on Hold. While calculating how long an activity has been "InProgress", i need to subtract the amount of time it was "OnHold".
In the given example you will see Activity named "MA50665" went "InProgress" at "2014-07-17 13:08:04.013" and then was put on hold at "2014-07-17 13:12:14.473" which is roughly about 4 minutes. Then it went "InProgress" again at "2014-07-17 13:22:45.503" and was completed at around "2014-07-17 13:33:38.513" which is roughly around 11 minutes. Which means MA50665 was InProgress for about 11+4=15 minutes.
I have the query which is getting me close to what i am looking for. It gives me two records for "MA50665" which i am expecting but the EndTime for both the records comes to "2014-07-17 13:33:38.513" which is incorrect.
For start time "2014-07-17 13:08:04.013", EndTime should have been "2014-07-17 13:12:14.473" because that is when the "InProgress" stage ends. For the second row, StartTime and EndTime are correct.
How do i say in the query that Get the End Time for the stage from the next history row of that activity? I cannot hard code "+1" in the join .
Here is the SQLFiddle for the Table schema and query:http://sqlfiddle.com/#!3/37ef3/4
I think I'm seeing a duplicate row in your example that you say works but has the "+1" in it. Records 5 & 6, seem to be the same but have different end times. Assuming that you are correct here is a fix for the query.
SELECT ROW_NUMBER()OVER(ORDER BY T1.Seqid, T1.Historyid)Rnumber,
T1.Srid,
T1.Activityid,
T1.Seqid,
T1.Currentactstatus,
T1.Previousactstatus,
T1.Timechanged Statusstarttime,
Endtimehist.Timechanged Statusendtime
FROM T1
LEFT JOIN T1 Endtimehist ON T1.Srid = Endtimehist.Srid
AND T1.Activityid = Endtimehist.Activityid
AND T1.Currentactstatus = Endtimehist.Previousactstatus
WHERE T1.SRId = 'SR50660'
AND T1.Currentactstatus = 'InProgress'
AND T1.Previousactstatus = 'Ready'
AND T1.Historyid < Endtimehist.Historyid --This works but i cannot hard code +1 here as history ids may appear in the random incrementing order
ORDER BY T1.SRId DESC, T1.SeqId, T1.HistoryId
I have to determine stopped time of an vehicle that sends back to server its status data every 30 second and this data is stored in a table of a database.
The fields of a status record consist of (vehicleID, ReceiveDate, ReceiveTime, Speed, Location).
Now what I want to do is, determine each suspension time at the point that vehicle speed came to zero to the status the vehicle move again and so on for next suspension time.
For example on a given day, a given vehicle may have 10 stopped status and I must determine duration of each by a query.
The result can be like this:
id Recvdate Rtime Duration
1 2010-05-01 8:30 45min
1 2110-05-01 12:21 3hour
This is an application of windows functions (called analytic functions in Oracle).
Your goal is to assign a "block number" to each sequence of stops. That is, all stops in a sequence (for a vehicle) will have the same block number, and this will be different from all other sequences of stops.
Here is a way to assign the block number:
Create a speed flag that says 1 when speed > 0 and 0 when speed = 0.
Enumerate all the records where the speed flag = 1. These are "blocks".
Do a self join to put each flag = 0 in a block (this requires grouping and taking the max blocknum).
Summarize by duration or however you want.
The following code is a sketch of what I mean. It won't solve your problem, because you are not clear about how to handle day breaks, what information you want to summarize, and it has an off-by-1 error (in each sequence of stops it includes the previous non-stop, if any).
with vd as
(
select vd.*,
(case when SpeedFlag = 1
then ROW_NUMBER() over (partition by id, SpeedFlag) end) as blocknum
from
(
select vd.*, (case when speed = 0 then 0 else 1 end) as SpeedFlag
from vehicaldata vd
) vd
)
select id, blocknum, COUNT(*) as numrecs, SUM(duration) as duration
from
(
select vd.id, vd.rtime, vd.duration, MAX(vdprev.blocknum) as blocknum
from vd
left outer join vd vdprev
on vd.id = vdprev.id
and vd.rtime > vdprev.rtime
group by vd.id, vd.rtime, vd.duration
) vd
group by id, blocknum