How to deal with high frequency queries - sql

I have a query that should run every 30 seconds. I am doing so because I want to track if the Cycle Time which is displayed in seconds, decreases or increases for a given timerange. If it execedes the accepted limit, I will get a notification.
Here a more specific display : The aim is to have a cycle time < 60s, because we are working in seconds a one time increase or decrease of the cyle time would not be very meaningful, so what I would do is take the last ten cyle times and calculate the average of it. If this > 60 seconds I will get notified.
BUT To make the tracking as accurate as possible I need the query to run evey 30 seconds. QUERY :
SELECT
CAST([Cycletime (s)] as Float) as [Cycletime (s)]
,Machine
,Date
FROM v_Analysis
WHERE Date >= CAST( GETDATE() AS Date )
My question now is how much is the 30 seconds intervall of the query is affecting the performace of the database and if so how do we improve the performance.

Related

SUM of last 24 hour scores within a specific range in a sorted set (Redis)

Is there a way to calculate the SUM of scores saved under 24 hour respecting performance of the Redis server ? (For around 1Million new rows added per day)
What is the right format to use in order to store timestamp and score of users using sorted sets ?
Actually I am using this command:
ZADD allscores 1570658561 20
As score, it is the actual time in seconds ... and other field is the real score.
But, there is a problem here ! When another user get the same score (20), it is not added since it's already present - Any solution for this problem ?
I am thinking to use a LUA script, but there is 2 headaches:
The LUA script will block other commands from working until it is finished the job (Which is not a good practice for my case since the script have to work 24/24 7/7 meanwhile many users have to fetch datas in the same time from the Redis cache server like users scores, history infos ect.) - Plus, the LUA script have to deal each time with many records saved each day inside a specific key - So, while the Lua script is working, users can't fetch datas ... knowing that the Lua script will work in loop all time.
Second, it is related to the first problem that do not let me store same score if I use timestamp as score in the command so I can return 24 hour datas.
If you are in my case, how will you deal with this ? Thanks
Considering that the data is needed for last 24 hours(Sliding window) and the number of rows possible is 1 million. We cannot use sorted set data structure to compute sum with high performance.
High performance design and also solving your duplicate score issue:
Instead with a little decision on the accuracy, you can have a highly performant system by crunching the data within a window.
Sample Input data:
input 1: user 1 wants to add time: 11:10:01 score: 20
input 2: user 2 wants to add time: 11:11:02 score: 20
input 3: user 1 wants to add time: 11:17:04 score: 50
You can have 1 minute, 5 minutes or 1 hour accuracy and decide window based on that.
If you accept an approximation of 1 hour data, you can have this while insertion,
for input 1 :
INCRBY SCORES_11_hour 20
for input 2:
INCRBY SCORES_11_hour 20
for input 3:
INCRBY SCORES_11_hour 20
To get the data for last 24 hours, you need to sum up only 24 hourly keys.
MGET SCORES_previous_day_12_hour SCORES_previous_day_13_hour SCORES_previous_day_14_hour .... SCORES_current_day_10_hour SCORES_current_day_11_hour
If you accept an approximation of 5 minutes, you can have this while insertion, along with incrementing the hourly keys, you need to store the 5 minute window data.
for input 1 :
INCRBY SCORES_11_hour 20
INCRBY SCORES_11_hour_00_minutes 20
for input 2:
INCRBY SCORES_11_hour 20
INCRBY SCORES_11_hour_00_minutes 20
for input 3:
INCRBY SCORES_11_hour 20
INCRBY SCORES_11_hour_05_minutes 20
To get the data for last 24 hours, you need to sum up only 23 hour keys(whole hours data) + 12 five minute window keys
If the time added is based on the current time, you can optimize it further. (Assuming that if it is 11th hour and the data for 10th, 9th and the previous hours wont change at all).
As you told it is going to be 24/7, we can use some computed values from the previous iterations too.
Say it is computed on 11th hour, you would've got the values for past 24 hours.
If it is again computed on 12th hour, you can reuse the sum for 22 intermediate hours whose data is unchanged and get only the missing 2 hours data from redis.
Similarly further optimisations can be applied based on your need.

Optimize timescale query

I am using the below query with timescaledb to get the 10 minute candles from a ticks database.
SELECT time_bucket('10minute', time) AS min,
first(ticks, time) AS open,
last(ticks, time) AS close,
max(ticks) AS high,
min(ticks) AS low,
last(volume, time)-first(volume, time) AS vol
FROM prices
WHERE asset_code = '".$symbol."'
GROUP BY min
ORDER BY min DESC
LIMIT 100
I want to make sure the query doesn't slow down the after some days as the db grows. At any time I want to run this query on ticks from last two days and not the whole table. So I want to know is there a way I can limit the time_bucket query on last 100000 ticks from db.
I am also using PDO for query db.
TimescaleDB uses constraint exclusion to eliminate needing to touch chunks when answering a query. We have some work going on right now to extend the query optimization to more intelligently handle some types of LIMIT queries, as in your example, so that even the above will just touch the necessary chunks.
But for now, there's a very easy fix: use a time predicate in the WHERE clause instead of the LIMIT.
In particular, assuming that you typically have a ticker symbol in each 10 minute interval, and you want 100 intervals:
SELECT time_bucket('10 minutes', time) AS min,
first(ticks, time) AS open,
...
FROM prices
WHERE asset_code = '".$symbol."'
AND time > NOW() - interval '1000 minutes'
GROUP BY min
ORDER BY min DESC

How to find where a total condition exist

I am trying to create a report that will show how long an automated sprinkler system has run for. The system is comprised of several sprinklers, with each one keeping track of only itself, and then sends that information to a database. My problem is that each sprinkler has its own run time (I.E. if 5 sprinklers all ran at the same time for 10 minutes, it would report back a total run time of 50 minutes), and I want to know only the net amount of run time - in this example, it would be 10 minutes.
The database is comprised of a time stamp and a boolean, where it records the time stamp every time a sprinkler is shut on or off (its on/off state is indicated by the 1/0 of the boolean).
So, to figure out the total net time the system was on each day - whether it was 1 sprinkler running or all of them - I need to check the database for time frames where no sprinklers were turned at all (or where ANY sprinkler at all was turned on). I would think the beginning of the query would look something like
SELECT * FROM MyTable
WHERE MyBoolean = 0
AND [ ... ]
But I'm not sure what the conditional statements that would follow the AND would be like to check the time stamps.
Is there a query I can send to the database that will report back this format of information?
EDIT:
Here's the table the data is recorded to - it's literally just a name, a boolean, and a datetime of when the boolean was changed, and that's the entire database
Every time a sprinkler turns on the number of running sprinklers increments by 1, and every time one turns off the number decrements by 1. If you transform the data so you get this:
timestamp on/off
07:00:05 1
07:03:10 1
07:05:45 -1
then you have a sequence of events in order; which sprinklers they refer to is irrelevant. (I've changed the zeros to -1 for reasons that will become evident in a moment. You can do this with "(2 * value) - 1")
Now put a running total together:
select a.timestamp, (SELECT SUM(a.on_off)
FROM sprinkler_events b
WHERE b.timestamp <= a.timestamp) as run_total
from sprinkler_events a
order by a.timestamp;
where sprinkler_events is the transformed data I listed above. This will give you:
timestamp run_total
07:00:05 1
07:03:10 2
07:05:45 1
and so on. Every row in this which has a run total of zeros is a time at which all sprinklers were turned off, which I think is what you're looking for. If you need to sum the time they were on or off, you'll need to do additional processing: search for "date difference between consecutive rows" and you'll see solutions for that.
You might consider looking for whether all the sprinklers are currently off. For example:
SELECT COUNT (DISTINCT s._NAME) AS sprinkers_currently_off
FROM (
SELECT
_NAME,
_VALUE,
_TIMESTAMP,
ROW_NUMBER() OVER (PARTITION BY _NAME ORDER BY _TIMESTAMP DESC, _VALUE) AS latest_rec
FROM sprinklers
) s
WHERE
_VALUE = 0
AND latest_rec = 1
The inner query orders the records so that you can get the latest status of all the sprinklers, and the outer query counts how many are currently off. If you have 10 sprinklers you would report them all off when this query returns 10.
You could modify this by applying a date range to the inner query if you wanted to look into the past, but this should get you on the right track.

ESPER: Find Max and Min of 24 hours and check if price goes above the Max of previous 24 hours value

I am unable to solve an Esper problem. I have to calculate Max and Min of 24 hours and then i have to check if tick price goes above this value ( This has to be done on multiple securities .) Here is the code which i am using. But i am betting alot of performance hit and getting an event fired more than once.
create context
GroupSecurity
partition by
security
from
Tick;
context
GroupSecurity
select
currentData.last, max(groupedData.last)
from
Tick as currentData unidirectional, Tick.win:time_batch(24 hour) as groupedData
having
currentData.last > max(groupedData.last);
How can i Improve this code?
The "Tick.win:time_batch(24 hour)" tells the engine to retain in memory all 24 hours of Tick events that may arrive, and only spit these out after 24 hours.
I think a better approach would be to have the engine compute say 1-minute maximums and take the 1-minute maximums for 24 hours and take the max of that, i.e. retain and build a max from no more then 24*60 rows where each row keeps a 1-minute max.

How do I add seconds to a timestamp of a start date so that I can get an estimated end date?

I have a Task object with a start_time and an estimated time in seconds. The start_time is stored in the database as a MYSQL DATETIME and the estimated time is stored as seconds. I would like to add the seconds to the start_time to get an estimated finish date.
There is a lot in the link Neville gave, but cutting slightly to the chase...
my_date_time + INTERVAL xx SECOND
I think it is more correct to use DATE_ADD(), but as far as I know there is no performance difference.
http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_addtime