Checking against a threshold and calculating the time duration - pandas

I have a DataFrame as follows:
Timestamp Signal
2020-01-01T10:25:44.000 - 6.00 20
2020-01-01T10:25:45.000 - 6.00 15
2020-01-01T10:25:46.000 - 6.00 8
2020-01-01T10:25:47.000 - 6.00 17
2020-01-01T10:25:48.000 - 6.00 19
2020-01-01T10:25:49.000 - 6.00 19
The timestamp column is a string and not converted to datetime. I want to compare the signal values against a threshold, for example 12 and calculate the time duration the signal stays above 12. So for the given dataset, the duration values will be [2,3] in seconds returned as a list/array. How do I do that in Python? Any help is appreciated.

If it is guaranteed that there will be a row for every second, then you can try to count the rows instead of getting a difference in timestamps.
Either way you need to identify consecutive rows above your threshold.
df['above'] = df.Signal.gt(12)
df['stint'] = (df.above.diff().fillna(0) != 0).cumsum()
# above is a boolean, so diff() will get +1 when stepping above 12 and -1 when stepping below
# !=0 will mark each step up/down with True
# cumsum() will create a 'stint ID' of sorts, so we can groupby it
Now, we can parse timestamp with df.Timestamp = pd.to_datetime(dfTimestamp) and get the difference between each step OR in this case it seems easier to just
stints = df.groupby(['stint', 'above']).Signal.size()
stints = stints.loc[stints.above==True]

Related

SQL QUERY for time totalisation calculation

This is an example of dataset: .
I am looking for an SQL query that calculates the integral of the curve defined by the example data.
Since the value of the curve is a boolean (0-1), calculating the integral using seconds would also result in the value of time in seconds that the pump ran.
The initial interval would be 00:00:00.000 and the final 23:59:59.000.
I guess I have to scroll the data from top to bottom, check the next or previous value, evaluate and do the time differences, but I have no idea how to do it.
In my example: from midnight until 2 o'clock the value is 0, from 2 o'clock until 4 o'clock the pump is on so 04:00:06-02:00:07=7,199s, then it is on from 15 o'clock until 17 o'clock so 17:00:04-15:00:04=7200s, and then it is on from 22:00:03 until 23:59:59 because there is no 0 value.
Basically I should add up the number of seconds where the value is at 1 until the next 0 or 23:59:59.
In a day I might have more than 10000 samplings at different times, not necessarily hourly.
The cursor could scroll through the data, if next is =1 then seconds[date(next)-date(cursor)], if next is =0 do not calculate.
You can use LAG() to get the previous sample. Then computing the area under the curve should be trivial.
For example:
select t.*,
pumpon * (localcol - lag(localcol) over(order by localcol)) as area
from t
The total area under the curve could be:
select
sum(pumpon *
(localcol - lag(localcol) over(order by localcol))
) as total_area
from t

MariaDB Trigger Calculating Time Difference as a decimal of total hours and tenth of hour always rounding up

I have a table that contains these columns:
start_time datetime
end_time datetime
billable_time decimal(3,1)
The billable_time column is calculated with a trigger on insert and update as the total number of hours and tenth of hour as a decimal; however, I want it to always round up the decimal as opposed to rounding to the nearest digit. When creating the trigger, I cannot figure out how to make it always round up the decimal. This trigger calculates it correctly, but it rounds it down if that is the nearest digit:
SET new.billable_time = CAST(TIME_TO_SEC(TIMEDIFF(new.end_time, new.start_time))/3600 as decimal(3,1))
For example:
start time = 2020-06-01 11:00:00
end_time = 2020-06-01 12:22:08
Results in
billable_time = 1.4
But it should be 1.5 if it always rounds up.
I have tried using the CEIL() and CEILING() functions, but they round it up to the nearest integer, which is 2.
I cannot seem to figure out how to make it do what I want. Is it possible to do this in a trigger? Any help would be greatly appreciated. The database is MariaDB v10.4.13.
Thanks in advance.
A simple solution is to add half of a tenth of second to the value before casting it:
SET new.billable_time = CAST(
TIME_TO_SEC(TIMEDIFF(new.end_time, new.start_time))/3600 + 0.05
AS decimal(3,1)
)
You can also use ceil(): the idea is to first divide by 360 instead of 3600, then round with ceil(), and finally divide by 10:
SET new.billable_time =
CEIL(TIME_TO_SEC(TIMEDIFF(new.end_time, new.start_time))/360) / 10

.Net ; Read stable weight from RS232

I have a small application that can read weigh scale weights continuously.
I want users to only capture when the weight stabilizes for about 3 seconds.
How can I achieve that?
You need to store the received values with there timestamps in a queue and then calculate the min, max and average over the last three seconds.
First create a class to hold the values and the timestamp, for example called measure.
Then create another class with a qqeue of measure. Implement functions for adding a measure to the class internal queue and to calculate the min,max and average for a timespan. The final function can then use min, max and average to say if the last measure is near enough to the average within a time span.
Instead of a queue you may use a data table and then use sql commands to get that scalars for min, max and average.
If the values are delivered with a constant interval in between, you can avoid the timespan parts and only calculate over the last x values. For example if the scale delivers a new value every 0.5 second, you will have 6 values for the last three seconds.
A FIFO will store the values (use an array with custom add function or a queue). To know if the last values are stable, you need to know what is the min, max and average over the last measures. That enables you to decide if the last value is near the average or if the diff to min and max is too large.
Ie measures:
3 4 8 2 5 4 gives min=2, max=8, avg=4.3. The last val is near to avg but far from max
5 4 6 4 5 5 gives min=4, max=6, avg=4.9, The last value is near min, max and avg. That seems to be a good last measure.

Inserting Datetime into Predefined 15min Intervals

I have a table with Datetime value and outbound call numbers. I need to be able to round the datetime down to the lower 15min interval, which is fine when I use DATEADD(mi,DATEDIFF(mi,0,[callplacedtime])/15*15, 0).
What I need to do now is for eg. if my search parameter is between 08:00:00 to 20:00:00 then I need to see '0' for the intervals where there were no data.
At the moment if there are not records in a specific interval then it doesn't show.
The trouble you're having is because there's no call numbers with datetimes within that range, SQL doesn't know that those datetimes exist or need to be shown.
Try using a numbers table to generate all the 15-minute offset times for the day, and then do a LEFT join from those 15-minute values to your current dataset. This means that all the times will exist, but if there are no call numbers for that 15-minute block, you'll get a NULL to interpret as you please.
I'm not familiar with SQL, but you could have an If/ElseIf/Else statement along the lines of:
If second = 0 then displaysecond = "00"
Elseif 0 < second < 10 then displaysecond = "0" + second
Else displaysecond = second

SQL statement to collect data over granular time span

I have collected CPU usage data on my server using the following SQL Server 2008 table structure:
CREATE TABLE [tbl]
[id] BIGINT NOT NULL IDENTITY(1,1) PRIMARY KEY,
[dt] DATETIME2, -- date/time when CPU usage was collected
[usage] TINYINT -- value from [0 to 100]
This data is logged by a service running on the server with a frequency of approx. 5 minutes.
So my goal now is to draw a graph that shows CPU usage over time. For that an end-user is allowed to specify the "from" and "to" dates for the graph, and the graph itself is broken into N data points. (For simplicity, let's say that N is 24.)
I came up with the following SQL statement to retrieve N (or 24) data points of the CPU usage to plot on my graph:
SELECT COUNT([usage]), SUM([usage]) FROM [tbl]
WHERE [dt]>='_DateDataPoint0_' AND [dt]<'_DateDataPoint1_'
UINON ALL
SELECT COUNT([usage]), SUM([usage]) FROM [tbl]
WHERE [dt]>='_DateDataPoint1_' AND [dt]<'_DateDataPoint2_'
--and so on, until Nth data point
In case:
"from" = June 14, 2013, 00:00:00
"to" = June 15, 2013, 00:00:00
I get the following data point date/times:
_DateDataPoint0_ = 2013-06-14 00:00:00
_DateDataPoint1_ = 2013-06-14 01:00:00
_DateDataPoint2_ = 2013-06-14 02:00:00
and so on ...
_DateDataPoint23_ = 2013-06-14 23:00:00
_DateDataPoint24_ = 2013-06-15 00:00:00
So at the end my ASP.NET script receives N pairs of data:
Count_Usage_DataPointN
Sum_Usage_DataPointN
So to get a usage number at a particular data point I do this:
Usage_N = Sum_Usage_DataPointN / Count_Usage_DataPointN
First, I'm curious if this is the correct way of doing it?
And, secondly, I'm curious about some strange result of the selection above. Say, if I have a time span of 1 hour, the resulting CPU usage data has higher spikes than, say, for a time span of 1 day, or a month. Is this normal? It seems like the longer the time span the flatter my graph becomes.
This would be expected because you have a fixed number of datapoints on your graph.
You have divided your graph into 24 points. Say you wanted to display 2 days worth of data. When you distribute hourly over 24 points, the average value (sum/count) would flatten . To display each individual spike you would need 48 points on your axis.