How to find where a total condition exist - sql

I am trying to create a report that will show how long an automated sprinkler system has run for. The system is comprised of several sprinklers, with each one keeping track of only itself, and then sends that information to a database. My problem is that each sprinkler has its own run time (I.E. if 5 sprinklers all ran at the same time for 10 minutes, it would report back a total run time of 50 minutes), and I want to know only the net amount of run time - in this example, it would be 10 minutes.
The database is comprised of a time stamp and a boolean, where it records the time stamp every time a sprinkler is shut on or off (its on/off state is indicated by the 1/0 of the boolean).
So, to figure out the total net time the system was on each day - whether it was 1 sprinkler running or all of them - I need to check the database for time frames where no sprinklers were turned at all (or where ANY sprinkler at all was turned on). I would think the beginning of the query would look something like
SELECT * FROM MyTable
WHERE MyBoolean = 0
AND [ ... ]
But I'm not sure what the conditional statements that would follow the AND would be like to check the time stamps.
Is there a query I can send to the database that will report back this format of information?
EDIT:
Here's the table the data is recorded to - it's literally just a name, a boolean, and a datetime of when the boolean was changed, and that's the entire database

Every time a sprinkler turns on the number of running sprinklers increments by 1, and every time one turns off the number decrements by 1. If you transform the data so you get this:
timestamp on/off
07:00:05 1
07:03:10 1
07:05:45 -1
then you have a sequence of events in order; which sprinklers they refer to is irrelevant. (I've changed the zeros to -1 for reasons that will become evident in a moment. You can do this with "(2 * value) - 1")
Now put a running total together:
select a.timestamp, (SELECT SUM(a.on_off)
FROM sprinkler_events b
WHERE b.timestamp <= a.timestamp) as run_total
from sprinkler_events a
order by a.timestamp;
where sprinkler_events is the transformed data I listed above. This will give you:
timestamp run_total
07:00:05 1
07:03:10 2
07:05:45 1
and so on. Every row in this which has a run total of zeros is a time at which all sprinklers were turned off, which I think is what you're looking for. If you need to sum the time they were on or off, you'll need to do additional processing: search for "date difference between consecutive rows" and you'll see solutions for that.

You might consider looking for whether all the sprinklers are currently off. For example:
SELECT COUNT (DISTINCT s._NAME) AS sprinkers_currently_off
FROM (
SELECT
_NAME,
_VALUE,
_TIMESTAMP,
ROW_NUMBER() OVER (PARTITION BY _NAME ORDER BY _TIMESTAMP DESC, _VALUE) AS latest_rec
FROM sprinklers
) s
WHERE
_VALUE = 0
AND latest_rec = 1
The inner query orders the records so that you can get the latest status of all the sprinklers, and the outer query counts how many are currently off. If you have 10 sprinklers you would report them all off when this query returns 10.
You could modify this by applying a date range to the inner query if you wanted to look into the past, but this should get you on the right track.

Related

Query another table with results of an another query that include a csv column

Brief Summary:
I am currently trying to get a count of completed parts that fall within a specific time range, machine number, operation number, and matches the tool number.
For example:
SELECT Sequence, Serial, Operation,Machine,DateTime,value as Tool
FROM tbPartProfile
CROSS APPLY STRING_SPLIT(Tool_Used, ',')
ORDER BY DateTime desc
is running a query which pulls all the instances that a tool has been changed, I am splitting the CSV from Tool_Used column. I am doing this because there can be multiple changes during one operation.
Objective:
This is where the production count come into place. For example, record 1 has a to0l change of 36 on 12/12/2022. I will need to go back in to the table and get the amount of part completed that equals the OPERATION/MACHINE/TOOL and fall between the date range.
For example:
SELECT *
FROM tbPartProfile
WHERE Operation = 20 AND Machine = 1 AND Tool_Used LIKE '%36%'
ORDER BY DateTime desc
For example this query will give me the datetimes the tools LIKE 36 was changed. I will need to take this datetime and compare it previous query and get the sum of all parts that were ran in this TimeRange/Operation/Machine/Tool Used

Rolling average postgres

I am running Postgres 9.2 and I have a large table something like
CREATE TABLE sensor_values
(
ts timestamp with time zone NOT NULL,
value double precision NOT NULL DEFAULT 'NaN'::real,
sensor_id integer NOT NULL
)
I have values coming into the system constantly ie many per minute. I want to maintain a rolling standard deviation / average for the last 200 values so I can determine if new values entering the system are within say 3 standard deviations of the mean. To do so I would need the current standard deviation and mean to be constantly updated for the last 200 values.
As the table can be hundreds of millions of rows I do not want to get the last say 200 rows for a sensor ordered by time and then do vg(value), var_samp(value) for every new value coming in. I and assuming it will be faster to updated the standard deviation and mean.
I have started writing a PL/pgSQL function to update a rolling variance and mean on each new value entering the system for a particular sensor.
I can do this using code pseudo like
newavg = oldavg + (new_value - old_value)/window_size
new_variance += (new_value-old_value)*(new_value-newavg+old_value-oldavg)/(window_size-1)
This is based on
http://jonisalonen.com/2014/efficient-and-accurate-rolling-standard-deviation/
Basically the window is of size 200 values. The old_value is the first value of the window. When a new value comes in we shift the window forward one. After I get the result I store the following values for the sensor
The first value of the window.
The mean average of the window values.
The variance of the window values.
This way I don't have to constantly get there last 200 value and do a sum etc.I can reuse this values when a new sensor value come in.
My problem is when first running I have no previous window data for a sensor ie the three values above so I have to do it the slow way.
something like
WITH s AS
(SELECT value FROM sensor_values WHERE sensor_values.sensor_id = $1 AND ts >= (NOW() - INTERVAL '2 day')::timestamptz ORDER BY ts DESC LIMIT 200)
SELECT avg(value), var_samp(value) INTO last_window_average, last_window_variance FROM s;
But how could I get the last value (ealiest) to save from that select statement ?
Can I access the first row from s in PL/pgSQL.
I thought PL/pgSQL would be faster / cleaner approach but maybe its better to do this is client code ?
Are there better ways to perform this type on rolling statistic update ?
I assume, that it will not be drastically slow to re-calculated latest 200 entries each time with proper indexing. If you'll do an index, like:
CREATE INDEX i_sensor_values ON sensor_values(sensor_id, ts DESC);
you'll be able to get results fairly quickly doing:
SELECT sum("value") -- add more expressions as required
FROM sensor_values
WHERE sensor_id=$1
ORDER BY ts DESC
LIMIT 200;
You can execute this query in a loop from PL/pgSQL function.
If you'll migrate to 9.3 (or higher) any time soon, you'll be able to also use LATERAL joins for this purpose.
I do not think a covering index will do a good thing here, as table is constantly changing and IndexOnlyScan will not kick in.
It is good to check Loose Index scans also.
P.S. Column name value should be double quoted, as this is an SQL reserved word.

In Crystal Report print only first record in group and leave it summable

I have a table that lists every task an operator completed during a day. This is gathered by a Shop Floor Control program. There is also a column that has the total hours worked that day, this field comes from their time punches. The table looks something like this:
Operator 1 Bestupid 0.5 8 5/12/1986
Operator 1 BeProductive 0.1 8 5/12/1986
Operator 1 Bestupidagain 3.2 8 5/12/1986
Operator 1 Belazy 0.7 8 5/13/1986
Operator 2 BetheBest 1.7 9.25 5/12/1986
I am trying to get an efficiency out of this by summing the process hours and comparing it to the hours worked. The problem is that when I do any kind of summary on the hours worked column it sums EVERY DETAIL LINE.
I have tried:
If Previous (groupingfield) = (groupingfield) Then
HoursWorked = 0
Else
HoursWorked = HoursWorked
I have tried a global three formula trick, but neither of the above leave me with a summable field, I get "A summary has been specified on a non-recurring field"
I currently use a global variable, reset in the group header, but not WhilePrintinganything. However it is missing some records and upon occasion I will get two hoursworked > 0 in the same group :(
Any ideas?
I just want to clarify, I have three groups:
Groups: Work Center --> Operator --> Date
I can summarize the process hours across any group and that's fine. However, the hours worked prints on every detail line even though it really should only print once per Date. Therefore when I summarize the Hours Worked for an operator the total is WAY off because it is adding up 8hours for each entry instead of 8 hours for each day.
Try grouping by the operators. Then create a running total for the process hours that sum for each record and reset on change of group. In the group footer you can display the running total and any other stats for that operator you care to.
Try another running total for the daily hours but pick maximum as the type of summary. Since all the records for the day will have the same hours work the maximum will be correct. Reset with the change of the date group and you should be good to go.

SQL: Calculating system load statistics

I have a table like this that stores messages coming through a system:
Message
-------
ID (bigint)
CreateDate (datetime)
Data (varchar(255))
I've been asked to calculate the messages saved per second at peak load. The only data I really have to work with is the CreateDate. The load on the system is not constant, there are times when we get a ton of traffic, and times when we get little traffic. I'm thinking there are two parts to this problem: 1. Determine ranges of time that are considered peak load, 2. Calculate the average messages per second during these times.
Is this the right approach? Are there things in SQL that can help with this? Any tips would be greatly appreciated.
I agree, you have to figure out what Peak Load is first before you can start to create reports on it.
The first thing I would do is figure out how I am going to define peak load. Ex. Am I going to look at an hour by hour breakdown.
Next I would do a group by on the CreateDate formated in seconds (no milleseconds). As part of the group by I would do an avg based on number of records.
I don't think you'd need to know the peak hours; you can generate them with SQL, wrapping a the full query and selecting the top 20 entries, for example:
select top 20 *
from (
[...load query here...]
) qry
order by LoadPerSecond desc
This answer had a good lesson about averages. You can calculate the load per second by looking at the load per hour, and dividing by 3600.
To get a first glimpse of the load for the last week, you could try (Sql Server syntax):
select datepart(dy,createdate) as DayOfYear,
hour(createdate) as Hour,
count(*)/3600.0 as LoadPerSecond
from message
where CreateDate > dateadd(week,-7,getdate())
group by datepart(dy,createdate), hour(createdate)
To find the peak load per minute:
select max(MessagesPerMinute)
from (
select count(*) as MessagesPerMinute
from message
where CreateDate > dateadd(days,-7,getdate())
group by datepart(dy,createdate),hour(createdate),minute(createdate)
)
Grouping by datepart(dy,...) is an easy way to distinguish between days without worrying about month borders. It works until you select more that a year back, but that would be unusual for performance queries.
warning, these will run slow!
this will group your data into "second" buckets and list them from the most activity to least:
SELECT
CONVERT(char(19),CreateDate,120) AS CreateDateBucket,COUNT(*) AS CountOf
FROM Message
GROUP BY CONVERT(Char(19),CreateDate,120)
ORDER BY 2 Desc
this will group your data into "minute" buckets and list them from the most activity to least:
SELECT
LEFT(CONVERT(char(19),CreateDate,120),16) AS CreateDateBucket,COUNT(*) AS CountOf
FROM Message
GROUP BY LEFT(CONVERT(char(19),CreateDate,120),16)
ORDER BY 2 Desc
I'd take those values and calculate what they want

Simultaneous calls from CDR

I need to come up with an analysis of simultaneus events, when having only starttime and duration of each event.
Details
I've a standard CDR call detail record, that contains among others:
calldate (timedate of each call start
duration (int, seconds of call duration)
channel (a string)
What I need to come up with is some sort of analysys of simultaneus calls on each second, for a given timedate period. For example, a graph of simultaneous calls we had yesterday.
(The problem is the same if we have visitors logs with duration on a website and wish to obtain simultaneous clients for a group of web-pages)
What would your algoritm be?
I can iterate over records in the given period, and fill an array, where each bucket of the array corresponds to 1 second in the overall period. This works and seems to be fast, but if the timeperiod is big (say..1 year), I would need lots of memory (3600x24x365x4 bytes ~ 120MB aprox).
This is for a web-based, interactive app, so my memory footprint should be small enough.
Edit
By simultaneous, I mean all calls on a given second. Second would be my minimum unit. I cannot use something bigger (hour for example) becuse all calls during an hour do not need to be held at the same time.
I would implement this on the database. Using a GROUP BY clause with DATEPART, you could get a list of simultaneous calls for whatever time period you wanted, by second, minute, hour, whatever.
On the web side, you would only have to display the histogram that is returned by the query.
#eric-z-beard: I would really like to be able to implement this on the database. I like your proposal, and while it seems to lead to something, I dont quite fully understand it. Could you elaborate? Please recall that each call will span over several seconds, and each second need to count. If using DATEPART (or something like it on MySQL), what second should be used for the GROUP BY. See note on simultaneus.
Elaborating over this, I found a way to solve it using a temporary table. Assuming temp holds all seconds from tStart to tEnd, I could do
SELECT temp.second, count(call.id)
FROM call, temp
WHERE temp.second between (call.start and call.start + call.duration)
GROUP BY temp.second
Then, as suggested, the web app should use this as a histogram.
You can use a static Numbers table for lots of SQL tricks like this. The Numbers table simply contains integers from 0 to n for n like 10000.
Then your temp table never needs to be created, and instead is a subquery like:
SELECT StartTime + Numbers.Number AS Second
FROM Numbers
You can create table 'simultaneous_calls' with 3 fields: yyyymmdd Char(8),
day_second Number, -- second of the day,
count Number -- count of simultaneous calls
Your web service can take 'count' value from this table and make some statistics.
Simultaneous_calls table will be filled by some batch program which will be started every day after end of the day.
Assuming that you use Oracle, the batch may start a PL/SQL procedure which does the following:
Appends table with 24 * 3600 = 86400 records for each second of the day, with default 'count' value = 0.
Defines the 'day_cdrs' cursor for the query:
Select to_char(calldate, 'yyyymmdd') yyyymmdd,
(calldate - trunc(calldate)) * 24 * 3600 starting_second,
duration duration
From cdrs
Where cdrs.calldate >= Trunc(Sysdate -1)
And cdrs.calldate
Iterates the cursor to increment 'count' field for the seconds of the call:
For cdr in day_cdrs
Loop
Update simultaneos_calls
Set count = count + 1
Where yyyymmdd = cdr.yyyymmdd
And day_second Between cdr.starting_second And cdr.starting_second + cdr.duration;
End Loop;