Optimize timescale query - pdo

I am using the below query with timescaledb to get the 10 minute candles from a ticks database.
SELECT time_bucket('10minute', time) AS min,
first(ticks, time) AS open,
last(ticks, time) AS close,
max(ticks) AS high,
min(ticks) AS low,
last(volume, time)-first(volume, time) AS vol
FROM prices
WHERE asset_code = '".$symbol."'
GROUP BY min
ORDER BY min DESC
LIMIT 100
I want to make sure the query doesn't slow down the after some days as the db grows. At any time I want to run this query on ticks from last two days and not the whole table. So I want to know is there a way I can limit the time_bucket query on last 100000 ticks from db.
I am also using PDO for query db.

TimescaleDB uses constraint exclusion to eliminate needing to touch chunks when answering a query. We have some work going on right now to extend the query optimization to more intelligently handle some types of LIMIT queries, as in your example, so that even the above will just touch the necessary chunks.
But for now, there's a very easy fix: use a time predicate in the WHERE clause instead of the LIMIT.
In particular, assuming that you typically have a ticker symbol in each 10 minute interval, and you want 100 intervals:
SELECT time_bucket('10 minutes', time) AS min,
first(ticks, time) AS open,
...
FROM prices
WHERE asset_code = '".$symbol."'
AND time > NOW() - interval '1000 minutes'
GROUP BY min
ORDER BY min DESC

Related

SQL: a time-series variant of the "every nth row" problem

I have a table of time-series data, with the columns:
sensor_number (integer primary key)
signal_strength (integer)
signal_time (timestamp)
Each sensor creates 20-30 rows per minute. I need a query that returns for a sensor 1 row per minute (or every 2 minutes, 3 minutes, etc). A pure SQL approach is to use a window function, with a partition on an expression that rounds the timestamp appropriately (date_trunc() works for the 1-minute case, otherwise I have to some messy casting) The problem is the expression blocks the ability to use the index. With 5B rows, that's a killer.
The best alternative I can come up with is a user-defined function that uses a cursor to step through the table in index key order (sensor_number, signal_time) and outputting a row every time the timestamp crosses a minute boundary. That's still slow though. Is there a pure SQL approach that'll accomplish this AND utilize the index?
I think if you're returning enough rows, scanning the whole range of rows that match the sensor_number will just be the best plan. The signal_time portion of the index may simply not be helpful at that point, because the database needs to read so many rows anyway.
However, if your time interval is big enough / the number of rows you're returning is small enough, it might be more efficient to hit the index separately for each row you're returning. Something like this (using an interval of 3 minutes and a sensor number of 5 as an example):
WITH range AS (
SELECT
max(signal_time) as max_time,
min(signal_time) as min_time
FROM timeseries
WHERE sensor_number = 5
)
SELECT sample.*
FROM range
JOIN generate_series(min_time, max_time, interval '3 minutes') timestamp ON true
JOIN LATERAL (
SELECT *
FROM timeseries
WHERE sensor_number = 5
AND signal_time >= timestamp
AND signal_time < timestamp + interval '3 minutes'
LIMIT 1
) sample ON true;

How to deal with high frequency queries

I have a query that should run every 30 seconds. I am doing so because I want to track if the Cycle Time which is displayed in seconds, decreases or increases for a given timerange. If it execedes the accepted limit, I will get a notification.
Here a more specific display : The aim is to have a cycle time < 60s, because we are working in seconds a one time increase or decrease of the cyle time would not be very meaningful, so what I would do is take the last ten cyle times and calculate the average of it. If this > 60 seconds I will get notified.
BUT To make the tracking as accurate as possible I need the query to run evey 30 seconds. QUERY :
SELECT
CAST([Cycletime (s)] as Float) as [Cycletime (s)]
,Machine
,Date
FROM v_Analysis
WHERE Date >= CAST( GETDATE() AS Date )
My question now is how much is the 30 seconds intervall of the query is affecting the performace of the database and if so how do we improve the performance.

Time aggregation in SQL

I have a data-set which includes time {hh,mm,ss} and temperature.
I want to aggregate the temperature with respect to the time.
For each minute in a specific hour there are number of temperature records and I want to calculate the average of them to have a single value for each minute.
Thanks in advance.
Use date functions ( http://www.w3schools.com/sql/ ) to get more general (less precise) time [i.e. Hour and Minute only], group by that and use Average SQL function to get your average value.

SQL Select statement Where time is *:00

I'm attempting to make a filtered table based off an existing table. The current table has rows for every minute of every hour of 24 days based off of locations (tmcs).
I want to filter this table into another table that has rows for just 1 an hour for each of the 24 days based off the locations (tmcs)
Here is the sql statement that i thought would have done it...
SELECT
Time_Format(t.time, '%H:00') as time, ROUND(AVG(t.avg), 0) as avg,
tmc, Date, Date_Time FROM traffic t
GROUP BY time, tmc, Date
The problem is i still get 247,000 rows effected...and according to simple math I should only have:
Locations (TMCS): 14
Hours in a day: 24
Days tracked: 24
Total = 14 * 24 * 24 = 12,096
My original table has 477,277 rows
When I make a new table off this query i get right around 247,000 which makes no sense, so my query must be wrong.
The reason I did this method instead of a where clause is because I wanted to find the average speed(avg)per hour. This is not mandatory so I'd be fine with using a Where clause for time, but I just don't know how to do this based off *:00
Any help would be much appreciated
Fix the GROUP BY so it's standard, rather then the random MySQL extension
SELECT
Time_Format(t.time, '%H:00') as time,
ROUND(AVG(t.avg), 0) as avg,
tmc, Date, Date_Time
FROM traffic t
GROUP BY
Time_Format(t.time, '%H:00'), tmc, Date, Date_Time
Run this with SET SESSION sql_mode = 'ONLY_FULL_GROUP_BY'; to see the errors that other RDBMS will give you and make MySQL work properly

SQL: Calculating system load statistics

I have a table like this that stores messages coming through a system:
Message
-------
ID (bigint)
CreateDate (datetime)
Data (varchar(255))
I've been asked to calculate the messages saved per second at peak load. The only data I really have to work with is the CreateDate. The load on the system is not constant, there are times when we get a ton of traffic, and times when we get little traffic. I'm thinking there are two parts to this problem: 1. Determine ranges of time that are considered peak load, 2. Calculate the average messages per second during these times.
Is this the right approach? Are there things in SQL that can help with this? Any tips would be greatly appreciated.
I agree, you have to figure out what Peak Load is first before you can start to create reports on it.
The first thing I would do is figure out how I am going to define peak load. Ex. Am I going to look at an hour by hour breakdown.
Next I would do a group by on the CreateDate formated in seconds (no milleseconds). As part of the group by I would do an avg based on number of records.
I don't think you'd need to know the peak hours; you can generate them with SQL, wrapping a the full query and selecting the top 20 entries, for example:
select top 20 *
from (
[...load query here...]
) qry
order by LoadPerSecond desc
This answer had a good lesson about averages. You can calculate the load per second by looking at the load per hour, and dividing by 3600.
To get a first glimpse of the load for the last week, you could try (Sql Server syntax):
select datepart(dy,createdate) as DayOfYear,
hour(createdate) as Hour,
count(*)/3600.0 as LoadPerSecond
from message
where CreateDate > dateadd(week,-7,getdate())
group by datepart(dy,createdate), hour(createdate)
To find the peak load per minute:
select max(MessagesPerMinute)
from (
select count(*) as MessagesPerMinute
from message
where CreateDate > dateadd(days,-7,getdate())
group by datepart(dy,createdate),hour(createdate),minute(createdate)
)
Grouping by datepart(dy,...) is an easy way to distinguish between days without worrying about month borders. It works until you select more that a year back, but that would be unusual for performance queries.
warning, these will run slow!
this will group your data into "second" buckets and list them from the most activity to least:
SELECT
CONVERT(char(19),CreateDate,120) AS CreateDateBucket,COUNT(*) AS CountOf
FROM Message
GROUP BY CONVERT(Char(19),CreateDate,120)
ORDER BY 2 Desc
this will group your data into "minute" buckets and list them from the most activity to least:
SELECT
LEFT(CONVERT(char(19),CreateDate,120),16) AS CreateDateBucket,COUNT(*) AS CountOf
FROM Message
GROUP BY LEFT(CONVERT(char(19),CreateDate,120),16)
ORDER BY 2 Desc
I'd take those values and calculate what they want