SQLite query to get the closest datetime - sql

I am trying to write an SQLite statement to get the closest datetime from an user input (from a WPF datepicker). I have a table IRquote(rateId, quoteDateAndTime, quoteValue).
For example, if the user enter 10/01/2000 and the database have only fixing stored for 08/01/2000, 07/01/2000 and 14/01/2000, it would return 08/01/2000, being the closest date from 10/01/2000.
Of course, I'd like it to work not only with dates but also with time.
I tried with this query, but it returns the row with the furthest date, and not the closest one:
SELECT quoteValue FROM IRquote
WHERE rateId = '" + pRefIndexTicker + "'
ORDER BY abs(datetime(quoteDateAndTime) - datetime('" + DateTimeSQLite(pFixingDate) + "')) ASC
LIMIT 1;
Note that I have a function DateTimeSQLite to transform user input to the right format.
I don't get why this does not work.
How could I do it? Thanks for your help

To get the closest date, you will need to use the strftime('%s', datetime) SQLite function.
With this example/demo, you will get the most closest date to your given date.
Note that the date 2015-06-25 10:00:00 is the input datetime that the user selected.
select t.ID, t.Price, t.PriceDate,
abs(strftime('%s','2015-06-25 10:00:00') - strftime('%s', t.PriceDate)) as 'ClosestDate'
from Test t
order by abs(strftime('%s','2015-06-25 10:00:00') - strftime('%s', PriceDate))
limit 1;
SQL explanation:
We use the strftime('%s') - strftime('%s') to calculate the difference, in seconds, between the two dates (Note: it has to be '%s', not '%S'). Since this can be either positive or negative, we also need to use the abs function to make it all positive to ensure that our order by and subsequent limit 1 sections work correct.

If the table is big, and there is an index on the datetime column, this will use the index to get the 2 closest rows (above and below the supplied value) and will be more efficient:
select *
from
( select *
from
( select t.ID, t.Price, t.PriceDate
from Test t
where t.PriceDate <= datetime('2015-06-23 10:00:00')
order by t.PriceDate desc
limit 1
) d
union all
select * from
( select t.ID, t.Price, t.PriceDate
from Test t
where t.PriceDate > datetime('2015-06-23 10:00:00')
order by t.PriceDate asc
limit 1
) a
) x
order by abs(julianday('2015-06-23 10:00:00') - julianday(PriceDate))
limit 1 ;
Tested in SQLfiddle.

Another useful solution is using BETWEEN operator, if you can determine upper and lower bounds for your time/date query. I encountered this solution just recently here in this link. This is what i've used for my application on a time column named t (changing code for date column and date function is not difficult):
select *
from myTable
where t BETWEEN '09:35:00' and '09:45:00'
order by ABS(strftime('%s',t) - strftime('%s','09:40:00')) asc
limit 1
Also, i must correct my comment on above post. I tried a simple examination of speed of these 3 approaches proposed by #BerndLinde, #ypercubeᵀᴹ and me . I have around 500 tables with 150 rows in each and medium hardware in my PC. The result is:
Solution 1 (using strftime) takes around 12 seconds.
Adding index of column t to solution 1 improves speed by around 30% and takes around 8 seconds. I didn't face any improvement for using index of time(t).
Solution 2 also has around 30% of speed improvement over Solution 1 and takes around 8 seconds
Finally, Solution 3 has around 50% improvement and takes around 5.5 seconds. Adding index of column t gives a little more improvement and takes around 4.8 seconds. Index of time(t) has no effect in this solution.
Note: I'm a simple programmer and this is a simple test in .NET code. A real performance test must consider more professional aspects, which i'm not aware of them. There was also some computations in my code, after querying and reading from database. Also, as #ypercubeᵀᴹ states, this result my not work for large amount of data.

Related

Limit result rows for minimal time intervals for PostgreSQL

Background: I am running TeslaMate/Grafana for monitoring my car status, one of the gauges plots the battery level fetched from database. My server is located remotely and running in a Dock from an old NAS, so both query performance and network overhead matters.
I found the koisk page frequently hangs and by investigation, it might caused by the query -- two of the plots returns 10~100k rows of results from database. I want to limit the number of rows returned by SQL queries, as the plots certainly don't have that much precision for drawing such detailed intervals.
I tried to follow this answer and use row_number() to pop only 100-th rows of results, but more complicated issues turned up, that is, the time intervals among rows are not consistent.
The car has 4 status, driving / online / asleep / offline.
If the car is at driving status, the time interval could be less than 200ms as the car pushes the status whenever it has new data.
If the car is at online status, the time interval could be several minutes as the system actively fetches the status from the car.
Even worse, if the system thinks the car is going to sleep and need to stop fetching status (to avoid preventing the car to sleep), the interval could be 40 minutes maximum depend on settings.
If the car is in asleep/offline status, no data is recorded at all.
This obviously makes skipping every n-th rows a bad idea, as for case 2-4 above, lots of data points might missing so that Grafana cannot plot correct graph representing the battery level at satisfactory precision.
I wonder if there's any possible to skip the rows by time interval from a datetime field rather than row_number() without much overhead from the query? i.e., fetch every row with minimal 1000ms from the previous row.
E.g., I have following data in the table, I want the rows returned are row 1, 4 and 5.
row date
[1] 1610000001000
[2] 1610000001100
[3] 1610000001200
[4] 1610000002000
[5] 1610000005000
The current (problematic) method I am using is as follows:
SELECT $__time(t.date), t.battery_level AS "SOC [%]"
FROM (
SELECT date, battery_level, row_number() OVER(ORDER BY date ASC) AS row
FROM (
SELECT battery_level, date
FROM positions
WHERE car_id = $car_id AND $__timeFilter(date)
UNION ALL
SELECT battery_level, date
FROM charges c
JOIN charging_processes p ON p.id = c.charging_process_id
WHERE $__timeFilter(date) AND p.car_id = $car_id) AS data
ORDER BY date ASC) as t
WHERE t.row % 100 = 0;
This method clearly gives problem that only returns alternate rows instead of what I wanted (given the last row reads t.row % 2 = 0)
PS: please ignore the table structures and UNION from the sample code, I haven't dig deep enough to the tables which could be other tweaks but irrelevant to this question anyway.
Thanks in advance!
You can use a recursive CTE:
WITH RECURSIVE rec(cur_row, cur_date) AS (
(
SELECT row, date
FROM t
ORDER BY date
LIMIT 1
)
UNION ALL
(
SELECT row, date
FROM t
JOIN rec
ON t.date >= cur_date + 1000
ORDER BY t.date
LIMIT 1
)
)
SELECT *
FROM rec;
cur_row
cur_date
1
1610000001000
4
1610000002000
5
1610000005000
View on DB Fiddle
Using a function instead would probably be faster:
CREATE OR REPLACE FUNCTION f() RETURNS SETOF t AS
$$
DECLARE
row t%ROWTYPE;
cur_date BIGINT;
BEGIN
FOR row IN
SELECT *
FROM t
ORDER BY date
LOOP
IF row.date >= cur_date + 1000 OR cur_date IS NULL
THEN
cur_date := row.date;
RETURN NEXT row;
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT *
FROM f();
row
date
1
1610000001000
4
1610000002000
5
1610000005000

sqlite3 unixtime interval query by multiplication and division

I have been trying to create a query for my sqlite3 database that provides me with a count of all records at 10 minute intervals between a maximum and minimum time.
I found this answer on the internet, and it seems to work:
select (((`unixtime`)/600000)*600000) as timeslice,
count(*) as mycount from mytable
where
`unixtime` >= 1413902772599
and
`unixtime` <= 1413972793000
group by timeslice;
The result I get is something like this:
timeslice mycount
------------- ----------
1413930000000 9
1413930600000 1013
1413931200000 265
1413932400000 410
1413933000000 643
This seems like sort of a hackish way to go about doing this query. It also doesn't include datapoints that have a zero count, which is an edge-case that I am going to have to fix outside of the database scope (unless there is an SQL solution for this).
Is there a better way to go about this? Are there edge cases for this if I proceed to continue using this query? Will this catastrophically fail under certain scenarios that I'm not considering?
There is no better way to round to multiples of 600000; SQLite has the round() function, but you would still need to convert to/from a value that can be rounded to some decimal fraction.
If you had SQLite 3.8.3 or later, you could use a recursive common table expression to generate the intervals:
WITH RECURSIVE intervals(t) AS (
VALUES(1413902400000)
UNION ALL
SELECT t + 600000
FROM intervals
WHERE t < 1413972000000
)
SELECT intervals.t,
COUNT(*)
FROM intervals
LEFT JOIN MyTable
ON MyTable.unixtime BETWEEN intervals.t
AND intervals.t + 599999
GROUP BY 1;

Enhancing Performance

I'm not as clued up on shortcuts in SQL so I was hoping to utilize the brainpower on here to help speed up a query I'm using. I'm currently using Oracle 8i.
I have a query:
SELECT
NAME_CODE, ACTIVITY_CODE, GPS_CODE
FROM
(SELECT
a.NAME_CODE, b.ACTIVITY_CODE, a.GPS_CODE,
ROW_NUMBER() OVER (PARTITION BY a.GPS_DATE ORDER BY b.ACTIVITY_DATE DESC) AS RN
FROM GPS_TABLE a, ACTIVITY_TABLE b
WHERE a.NAME_CODE = b.NAME_CODE
AND a.GPS_DATE >= b.ACTIVITY_DATE
AND TRUNC(a.GPS_DATE) > TRUNC(SYSDATE) - 2)
WHERE
RN = 1
and this takes about 7 minutes give or take 10 seconds to run.
Now the GPS_TABLE is currently 6.586.429 rows and continues to grow as new GPS coordinates are put into the system, each day it grows by about 8.000 rows in 6 columns.
The ACTIVITY_TABLE is currently 1.989.093 rows and continues to grow as new activities are put into the system, each day it grows by about 2.000 rows in 31 columns.
So all in all these are not small tables and I understand that there will always be a time hit running this or similar queries. As you can see I'm already limiting it to only the last 2 days worth of data, but anything to speed it up would be appreciated.
Your strongest filter seems to be the filter on the last 2 days of GPS_TABLE. It should filter the GPS_TABLE to about 15k rows. Therefore one of the best candidate for improvement is an index on the column GPS_DATE.
You will find that your filter TRUNC(a.GPS_DATE) > TRUNC(SYSDATE) - 2 is equivalent to a.GPS_DATE > TRUNC(SYSDATE) - 2, therefore a simple index on your column will work if you change the query. If you can't change it, you could add a function-based index on TRUNC(GPS_DATE).
Once you have this index in place, we need to access the rows in ACTIVITY_TABLE. The problem with your join is that we will get all the old activities and therefore a good portion of the table. This means that the join as it is will not be efficient with index scans.
I suggest you define an index on ACTIVITY_TABLE(name_code, activity_date DESC) and a PL/SQL function that will retrieve the last activity in the least amount of work using this index specifically:
CREATE OR REPLACE FUNCTION get_last_activity (p_name_code VARCHAR2,
p_gps_date DATE)
RETURN ACTIVITY_TABLE.activity_code%type IS
l_result ACTIVITY_TABLE.activity_code%type;
BEGIN
SELECT activity_code
INTO l_result
FROM (SELECT activity_code
FROM activity_table
WHERE name_code = p_name_code
AND activity_date <= p_gps_date
ORDER BY activity_date DESC)
WHERE ROWNUM = 1;
RETURN l_result;
END;
Modify your query to use this function:
SELECT a.NAME_CODE,
a.GPS_CODE,
get_last_activity(a.name_code, a.gps_date)
FROM GPS_TABLE a
WHERE trunc(a.GPS_DATE) > trunc(sysdate) - 2
Optimising an SQL query is generally done by:
Add some indexes
Try a different way to get the same information
So, start by adding an index for ACTIVITY_DATE, and perhaps some other fields that are used in the conditions.

SQL: Difference between "BETWEEN" vs "current_date - number"

I am wondering which of the following is the best way to implement and why.
select * from table1 where request_time between '01/18/2012' and '02/17/2012'
and
select * from table1 where request_time > current_date - 30
I ran the two queries through some of my date tables in my database and using EXPLAIN ANALYZE I found these results:
explain analyze
select * from capone.dim_date where date between '01/18/2012' and '02/17/2012'
Total runtime: 22.716 ms
explain analyze
select * from capone.dim_date where date > current_date - 30
Total runtime: 65.044 ms
So it looks like the 1st option is more optimal. Of course this is biased towards my DBMS but these are still the results I got.
The table has dates ranging from 1900 to 2099 so it is rather large, and not just some dinky little table.
Between has the inclusive ranges i.e when you issue a query like id between 2 and 10 the value of 2 and 10 will also be fetched.If you want to eliminate these values use > and <.
Also when indexes are applied say on date column > and < makes a good use of index than between.

Postgres SQL select a range of records spaced out by a given interval

I am trying to determine if it is possible, using only sql for postgres, to select a range of time ordered records at a given interval.
Lets say I have 60 records, one record for each minute in a given hour. I want to select records at 5 minute intervals for that hour. The resulting rows should be 12 records each one 5 minutes apart.
This is currently accomplished by selecting the full range of records and then looping thru the results and pulling out the records at the given interval. I am trying to see if I can do this purly in sql as our db is large and we may be dealing with tens of thousands of records.
Any thoughts?
Yes you can. Its really easy once you get the hang of it. I think its one of jewels of SQL and its especially easy in PostgreSQL because of its excellent temporal support. Often, complex functions can turn into very simple queries in SQL that can scale and be indexed properly.
This uses generate_series to draw up sample time stamps that are spaced 1 minute apart. The outer query then extracts the minute and uses modulo to find the values that are 5 minutes apart.
select
ts,
extract(minute from ts)::integer as minute
from
( -- generate some time stamps - one minute apart
select
current_time + (n || ' minute')::interval as ts
from generate_series(1, 30) as n
) as timestamps
-- extract the minute check if its on a 5 minute interval
where extract(minute from ts)::integer % 5 = 0
-- only pick this hour
and extract(hour from ts) = extract(hour from current_time)
;
ts | minute
--------------------+--------
19:40:53.508836-07 | 40
19:45:53.508836-07 | 45
19:50:53.508836-07 | 50
19:55:53.508836-07 | 55
Notice how you could add an computed index on the where clause (where the value of the expression would make up the index) could lead to major speed improvements. Maybe not very selective in this case, but good to be aware of.
I wrote a reservation system once in PostgreSQL (which had lots of temporal logic where date intervals could not overlap) and never had to resort to iterative methods.
http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 is an excellent book that goes has lots of interval examples. Hard to find in book stores now but well worth it.
Extract the minutes, convert to int4, and see, if the remainder from dividing by 5 is 0:
select *
from TABLE
where int4 (date_part ('minute', COLUMN)) % 5 = 0;
If the intervals are not time based, and you just want every 5th row; or
If the times are regular and you always have one record per minute
The below gives you one record per every 5
select *
from
(
select *, row_number() over (order by timecolumn) as rown
from tbl
) X
where mod(rown, 5) = 1
If your time records are not regular, then you need to generate a time series (given in another answer) and left join that into your table, group by the time column (from the series) and pick the MAX time from your table that is less than the time column.
Pseudo
select thetimeinterval, max(timecolumn)
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
And further join it back to the table for the full record (assuming unique times)
select t.* from
tbl inner join
(
select thetimeinterval, max(timecolumn) timecolumn
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
) y on tbl.timecolumn = y.timecolumn
How about this:
select min(ts), extract(minute from ts)::integer / 5
as bucket group by bucket order by bucket;
This has the advantage of doing the right thing if you have two readings for the same minute, or your readings skip a minute. Instead of using min even better would be to use one of the the first() aggregate functions-- code for which you can find here:
http://wiki.postgresql.org/wiki/First_%28aggregate%29
This assumes that your five minute intervals are "on the fives", so to speak. That is, that you want 07:00, 07:05, 07:10, not 07:02, 07:07, 07:12. It also assumes you don't have two rows within the same minute, which might not be a safe assumption.
select your_timestamp
from your_table
where cast(extract(minute from your_timestamp) as integer) in (0,5);
If you might have two rows with timestamps within the same minute, like
2011-01-01 07:00:02
2011-01-01 07:00:59
then this version is safer.
select min(your_timestamp)
from your_table
group by (cast(extract(minute from your_timestamp) as integer) / 5)
Wrap either of those in a view, and you can join it to your base table.