Perdurance of a mean over a threshold - sql

I hope I can make this understandable, sorry if my English isn't perfect.
I have a database composed of dated data (measured every 5 minutes since March 2017).
My boss wants me to work on C# and/or SQL but I'm still a beginner in those ( Always been working on R).
The goal is to notice moments where the mean (for an hour or more) is superior to a threshold and for how long.
I've tried doing this by first doing a moving average;
Select DATEPART(YEAR,[Date]),DATEPART(MONTH,[Date]),
DATEPART(DAY,[Date]),DATEPART(HOUR,[Date]),
DATEPART(MINUTE,[Date]) as "minute", AVG(Mesure)
OVER(Order by DATEPART(YEAR,[Date]) ROWS
between 11 PRECEDING and CURRENT ROW) as "moving_average"
from My_data_base
where Code_Prm = 920
I do have to keep the "Where" clause because it's how I can select only the value I need to work on .
From here I don't know if I could find a way to add the "Perdurance" of the mean, for example by concatenating when multiples rows return an average superior to X.
Or if I should rely on C# with multiple if conditions to try and get what I want
Hope this is understandable, thanks
EDIT :
The data is is stored in 3 fields( I don't know if there is a way to show it a better way)
Date
Code_prm
Mesure
2017-03-10 11:18:00
920
X
2017-03-10 11:18:00
901
X
2017-03-10 11:18:00
903
X
2017-03-10 11:23:00
920
X
The expected result would be the average for an hour, for example : From 11:18 too 12:18, only if the average is superior to X.( I think I kind of did it with the moving average)
The next step and what I'm looking for how I could know if the mean superior to X lasts more than an hour, and then how much is it lasting.
Hour is "any hour" I guess, so 12 rows as there is a value every 5 minutes and
I'm sure there are no missing values!

Related

Trying to understand how to create a payroll query to condense clock ins and outs

I am attempting to teach myself SQL. I am a learn to swim by jumping into the deep end kind of guy, and I think I am drowning. I know far to little currently about forming SQL commands other than basic Select, From, Where, and Group By. I am about to watch some udemy classes to help, but If i have a real world example I can help ground some concepts for me I learn much better.
I have some data in a table for when a person clocks in and out for the day. I am looking to condense the whole day into just one clock in time and one clock out time and the total hours of the day.
SO I dont know much yet about how all the sql parts work together to form a query, and I dont understand the order of execution of these yet.
So far all I can do is Select the columns from a table and order them by date and time.
For example I have a table like this:
select Empl_id, AdjClockInDate, AdjClockInTime, AdjClockOutDate, AdjClockOutTime, TotAdjTime
From AttendDet
Where EmplCode = '33'
Order By EmplCode Asc, SearchDate desc, AdjClockInTime Asc
empl_id AdjClockInDate AdjClockInTime AdjClockOutDate AdjClockOutTime TotAdjTime
33 07/01/2019 07:00 07/01/2019 12:00 5
33 07/01/2019 12:00 07/01/2019 12:30 .5
33 07/01/2019 12:30 07/01/2019 17:50 5
And what I wanted to get out of it is to group the common dates and empl_id together into a single days over view like this:
empl_id AdjClockInDate AdjClockInTime AdjClockOutDate AdjClockOutTime TotAdjTime
33 07/01/2019 07:00 07/01/2019 17:50 10
Ignoring the break period of .5 hours and summing up the non break hours.
I think I need to sort these by dates and times first, then get the first entry of clock in columns, and get the last entry for clock out columns.
This is about as much as I can say as I dont really know what I can use or how to use it for the sql commands.
I am hoping that by getting a solution to this problem I am trying to work on it will greatly help me understand while I go thru the courses so I can make a brain connection to something tangible.
Right now I know this is over my head, but by seeing and breaking it down and understanding the parts will help me tremendously.
You seem to want a basic aggregation query:
select Empl_id, AdjClockInDate, MIN(AdjClockInTime), AdjClockOutDate,
MAX(AdjClockOutTime), SUM(TotAdjTime)
from AttendDet
where Empl_id = '33'
group by Empl_id, AdjClockInDate, AdjClockOutDate;
This is a pretty basic query and suggests that you should learn the basics of SQL before diving into the deep end.

Calculating difference in column value from one row to the next

I am using an Access DB to keep track of Utility Usage on hundreds of accounts. The meters in these accounts have consumption values only per month. I need to take the consumption value of this month and subtract it from the value of the previous month. to get the consumption for this month. I know that in SQL Server, there is a lead/lag function that can calculate those differences. Is there a similar function in access? or is there a simple way to subtract the value in one row from the one above it?
Ex.
The first Line is Billed Date
The Second Line is the Meter Reading
The Third Line is Consumption
1/26/2014
2/25/2014
3/27/2014
4/28/2014
5/26/2014
7/29/2014
0
3163
4567
5672
7065
8468
1538
1625
1404
1105
1393
1403
I do not quite get some of your results, but I think you want something like:
SELECT Meters.MeterDate,
Meters.MeterReading,
(SELECT TOP 1 MeterReading
FROM Meters m WHERE m.MeterDate <Meters.MeterDate
ORDER BY MeterDate DESC) AS LastReading,
[MeterReading]-Nz([LastReading],0) AS MonthResult
FROM Meters
ORDER BY Meters.MeterReading;

Force SQLDev to read more rows than it seems to be. SQL Dev, Win7 Ult 64

TIA for and help/advice. This is coursework so pointers to further reading would be great!
Basically I have built and normalised a db concerning a transport company and various things like driver details, trip and package details.
Trying to find those drivers who've spent more than 100 days out in 6 months. Luckily the sample data only concerns 6 months so...
I have this which works, kind of. The max result given is 27 and I can see on the data that it should be much more, so I'm wondering have I unintentionally limited SQLDev's read size? Say maybe limited it to only return from the first 100 rows or something?
SELECT driver_first_name, driver_second_name,sum(duration)
from trips
group by driver_first_name, driver_second_name, duration
order by sum(duration) desc
Thanks for your help sufleR!
trip_id start_date end_date duration driver_first_name driver_second_name
1234 1/1/12 1/5/12 4 Ahmed Leer
That's a sample row, so duration is how long they were on a trip for. So I need to count up all the durations against a drivers name, then sort the results in descending order.
Hope that makes sense!

identifying trends and classifying using sql

i have a table xyz, with three columns rcvr_id,mth_id and tpv. rcvr_id is an id given to a customer, mth_id is a column which stores the month number( mth_id is calculated as (2012-1900) * 12 + 1,2,3.. ( depending on the month). So for example Dec 2011 will have month_id of 1344, Jan 2012 1345 etc. Tpv is a variable which shows the customers transaction amount.
Example table
rcvr_id mth_id tpv
1 1344 23
2 1344 27
3 1344 54
1 1345 98
3 1345 102
.
.
.
so on
P.S if a customer does not have a transaction in a given month, his row for that month wont exist.
Now, the question. Based on transactions for the months 1327 to 1350, i need to classify a customer as steady or sporadic.
Here is a description.
The above image is for 1 customer. i have millions of customers.
How do i go about it? I have no clue how to identify trends in sql .. or rather how to do it the best way possible.
ALSO i am working on teradata.
Ok i have found out how to get standard deviation. Now the important question is : How do i set a standard deviation limit on my own? i just cant randomly say "if standard dev is above 40% he is sporadic else steady". I thought of calculating average of standard deviation for all customers and if it is above that then he is sporadic else steady. But i feel there could be a better logic
I would suggest the STDDEV_POP function - a higher value indicates a greater variation in values.
select
rcvr_id, STDDEV_POP(tpv)
from yourtable
group by rcvr_id
STDDEV_POP is the function for Standard Deviation
If this doesn't differentiate enough, you may need to look at regression functions and variance.

Date range intersection in SQL

I have a table where each row has a start and stop date-time. These can be arbitrarily short or long spans.
I want to query the sum duration of the intersection of all rows with two start and stop date-times.
How can you do this in MySQL?
Or do you have to select the rows that intersect the query start and stop times, then calculate the actual overlap of each row and sum it client-side?
To give an example, using milliseconds to make it clearer:
Some rows:
ROW START STOP
1 1010 1240
2 950 1040
3 1120 1121
And we want to know the sum time that these rows were between 1030 and 1100.
Lets compute the overlap of each row:
ROW INTERSECTION
1 70
2 10
3 0
So the sum in this example is 80.
If your example should have said 70 in the first row then
assuming #range_start and #range_end as your condition paramters:
SELECT SUM( LEAST(#range_end, stop) - GREATEST(#range_start, start) )
FROM Table
WHERE #range_start < stop AND #range_end > start
using the greatest/least and date functions you should be able to get what you need directly operating on the date type.
I fear you're out of luck.
Since you don't know the number of rows that you will be "cumulatively intersecting", you need either a recursive solution, or an aggregation operator.
The aggregation operator you need is no option because SQL does not have the data type that it is supposed to operate on (that type being an interval type, as described in "Temporal Data and the Relational Model").
The recursive solution may be possible, but it is likely to be difficult to write, difficult to read to other programmers, and it is also questionable whether the optimizer can turn that query into the optimal data access strategy.
Or I misunderstood your question.
There's a fairly interesting solution if you know the maximum time you'll ever have. Create a table with all the numbers in it from one to your maximum time.
millisecond
-----------
1
2
3
...
1240
Call it time_dimension (this technique is often used in dimensional modelling in data warehousing.)
Then this:
SELECT
COUNT(*)
FROM
your_data
INNER JOIN time_dimension ON time_dimension.millisecond BETWEEN your_data.start AND your_data.stop
WHERE
time_dimension.millisecond BETWEEN 1030 AND 1100
...will give you the total number of milliseconds of running time between 1030 and 1100.
Of course, whether you can use this technique depends on whether you can safely predict the maximum number of milliseconds that will ever be in your data.
This is often used in data warehousing, as I said; it fits well with some kinds of problems -- for example, I've used it for insurance systems, where a total number of days between two dates was needed, and where the overall date range of the data was easy to estimate (from the earliest customer date of birth to a date a couple of years into the future, beyond the end date of any policies that were being sold.)
Might not work for you, but I figured it was worth sharing as an interesting technique!
After you added the example, it is clear that indeed I misunderstood your question.
You are not "cumulatively intersecting rows".
The steps that will bring you to a solution are :
intersect each row's start and end point with the given start and end points. This should be doable using CASE expressions or something of that nature, something in the style of :
SELECT (CASE startdate < givenstartdate : givenstartdate, CASE startdate >= givenstartdate : startdate) as retainedstartdate, (likewise for enddate) as retainedenddate FROM ... Cater for nulls and that sort of stuff as needed.
With the retainedstartdate and retainedenddate, use a date function to compute the length of the retained interval (which is the overlap of your row with the given time section).
SELECT the SUM() of those.