Most efficient way to retrieve data by timestamps - sql

I'm using PostgreSQL 9.2.8.
I have table like:
CREATE TABLE foo
(
foo_date timestamp without time zone NOT NULL,
-- other columns, constraints
)
This table contains about 4.000.000 rows. One day data is about 50.000 rows.
My goal is to retrieve one day data as fast as possible.
I have created an index like:
CREATE INDEX foo_foo_date_idx
ON foo
USING btree
(date_trunc('day'::text, foo_date));
And now I'm selecting data like this (now() is just an example, i need data from ANY day):
select *
from process
where date_trunc('day'::text, now()) = date_trunc('day'::text, foo_date)
This query lasts about 20 s.
Is there any possiblity to obtain same data in shorter time?

It takes time to retrieve 50,000 rows. 20 seconds seems like a long time, but if the rows are wide, then that might be an issue.
You can directly index foo_date and use inequalities. So, you might try this version:
create index foo_foo_date_idx2 on foo(foo_date);
select p
from process p
where p.foo_date >= date_trunc('day', now()) and
p.foo_date < date_trunc('day', now() + interval '1 day');

Related

Postgres partitioned tables with dynamic where clause

I have a partitioned table that I'm using to get the sum of X column for the records with Y column.
If I use
where logdate >= '2020-01-01' AND logdate < '2020-02-01'
I get a clean query plan and execution, while if I do
where logdate >= (SELECT CURRENT_DATE - 30) AND logdate < '2020-02-01'
really, SELECT CURRENT_DATE - 30 is just an example to show the initial date is going to be generated dynamically - it traverses all the partitioned with bitmap heap scans and index scans resulting in a query execution 10 times slower.
Is there something I can do in order improve the query for that given part (the where clause), or should I resign and try instead manipulating the dates with the programming language I'm currently using?

SELECT COUNT(*) WHERE DATE_PART(...) is Slow in PostgreSQL/TimescaleDB

To find the number of rows in a table temperatures that exist for every hour, I'm running a series of SQL queries on my PostgreSQL 11.2 database with TimescaleDB 1.6.0 extension. temperatures is a TimescaleDB hypertable.
For example,
SELECT COUNT(*) FROM temperatures
WHERE DATE_PART('year', "timestamp") = 2020
AND DATE_PART('month', "timestamp") = 2
AND DATE_PART('day', "timestamp") = 2
AND DATE_PART('hour', "timestamp") = 0
AND DATE_PART('minute', "timestamp") = 0
Question: However, this query appears to be very slow (I think), taking about 6-8 seconds per query with no other queries running on this database. The table temperatures contains 11.5 million rows. There are about 100-2000 rows for each hour.
Looking for suggestions on improving the speed of such queries. Thanks!
Don't apply date functions on the timestamp column: this requires repeated computation for each row (5 total), and prevents the database from taking advantage of an existing index on the timestamp column:
This should be faster:
select count(*)
from temperatures
where
timestamp >= '2020-02-02 00:00:00'::timestamp
and timestamp < '2020-02-01 00:01:00'::timestamp
This query uses the half-open interval strategy to check the timestamp column against two constant values.

Query - find empty interval in series of timestamps

I have a table that stores historical data. I get a row inserted in this query every 30 seconds from different type of sources and obviously there is a time stamp associated.
Let's make my parameter as disservice to 1 hour.
Since I charge my services based on time, I need to know, for example, in a specific month, if there is a period within this month in which the there is an interval which is equal or exceeds my 1 hour interval.
A simplified structure of the table would be like:
tid serial primary key,
tunitd id int,
tts timestamp default now(),
tdescr text
I don't want to write a function that loops through all the records comparing them one by one as I suppose it is time and memory consuming.
Is there any way to do this directly from SQL maybe using the interval type in PostgreSQL?
Thanks.
this small SQL query will display all gaps with the duration more than one hour:
select tts, next_tts, next_tts-tts as diff from
(select a.tts, min(b.tts) as next_tts
from test1 a
inner join test1 b ON a.tts < b.tts
GROUP BY a.tts) as c
where next_tts - tts > INTERVAL '1 hour'
order by tts;
SQL Fiddle

Postgres SQL select a range of records spaced out by a given interval

I am trying to determine if it is possible, using only sql for postgres, to select a range of time ordered records at a given interval.
Lets say I have 60 records, one record for each minute in a given hour. I want to select records at 5 minute intervals for that hour. The resulting rows should be 12 records each one 5 minutes apart.
This is currently accomplished by selecting the full range of records and then looping thru the results and pulling out the records at the given interval. I am trying to see if I can do this purly in sql as our db is large and we may be dealing with tens of thousands of records.
Any thoughts?
Yes you can. Its really easy once you get the hang of it. I think its one of jewels of SQL and its especially easy in PostgreSQL because of its excellent temporal support. Often, complex functions can turn into very simple queries in SQL that can scale and be indexed properly.
This uses generate_series to draw up sample time stamps that are spaced 1 minute apart. The outer query then extracts the minute and uses modulo to find the values that are 5 minutes apart.
select
ts,
extract(minute from ts)::integer as minute
from
( -- generate some time stamps - one minute apart
select
current_time + (n || ' minute')::interval as ts
from generate_series(1, 30) as n
) as timestamps
-- extract the minute check if its on a 5 minute interval
where extract(minute from ts)::integer % 5 = 0
-- only pick this hour
and extract(hour from ts) = extract(hour from current_time)
;
ts | minute
--------------------+--------
19:40:53.508836-07 | 40
19:45:53.508836-07 | 45
19:50:53.508836-07 | 50
19:55:53.508836-07 | 55
Notice how you could add an computed index on the where clause (where the value of the expression would make up the index) could lead to major speed improvements. Maybe not very selective in this case, but good to be aware of.
I wrote a reservation system once in PostgreSQL (which had lots of temporal logic where date intervals could not overlap) and never had to resort to iterative methods.
http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 is an excellent book that goes has lots of interval examples. Hard to find in book stores now but well worth it.
Extract the minutes, convert to int4, and see, if the remainder from dividing by 5 is 0:
select *
from TABLE
where int4 (date_part ('minute', COLUMN)) % 5 = 0;
If the intervals are not time based, and you just want every 5th row; or
If the times are regular and you always have one record per minute
The below gives you one record per every 5
select *
from
(
select *, row_number() over (order by timecolumn) as rown
from tbl
) X
where mod(rown, 5) = 1
If your time records are not regular, then you need to generate a time series (given in another answer) and left join that into your table, group by the time column (from the series) and pick the MAX time from your table that is less than the time column.
Pseudo
select thetimeinterval, max(timecolumn)
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
And further join it back to the table for the full record (assuming unique times)
select t.* from
tbl inner join
(
select thetimeinterval, max(timecolumn) timecolumn
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
) y on tbl.timecolumn = y.timecolumn
How about this:
select min(ts), extract(minute from ts)::integer / 5
as bucket group by bucket order by bucket;
This has the advantage of doing the right thing if you have two readings for the same minute, or your readings skip a minute. Instead of using min even better would be to use one of the the first() aggregate functions-- code for which you can find here:
http://wiki.postgresql.org/wiki/First_%28aggregate%29
This assumes that your five minute intervals are "on the fives", so to speak. That is, that you want 07:00, 07:05, 07:10, not 07:02, 07:07, 07:12. It also assumes you don't have two rows within the same minute, which might not be a safe assumption.
select your_timestamp
from your_table
where cast(extract(minute from your_timestamp) as integer) in (0,5);
If you might have two rows with timestamps within the same minute, like
2011-01-01 07:00:02
2011-01-01 07:00:59
then this version is safer.
select min(your_timestamp)
from your_table
group by (cast(extract(minute from your_timestamp) as integer) / 5)
Wrap either of those in a view, and you can join it to your base table.

How to filter table to date when it has a timestamp with time zone format?

I have a very large dataset - records in the hundreds of millions/billions.
I would like to filter the data in this column - i am only showing 2 records of millions:
arrival_time
2019-04-22 07:36:09.870+00
2019-06-07 09:46:09.870+00
How can i filter the data in this column to only the date part? as in I would like to filter where the arrival_time is 2019-04-22 as this would give me the first record and any other records which have the matching date of 2019-04-22?
I have tried to cast the column to timestamp::date = "2019-04-22" but this has been costly and does not work well given i have such vast amounts of records.
sample code is:
select
*
from
mytable
where
arrival_time::timestamp::date = '2019-09-30'
again very costly if i cast to date format as this will be done before the filtering!
any ideas? I am using postgresql and pgadmin4
This query:
where (arrival_time::timestamp)::date = '2019-09-30'
Is converting arrival_time to another type. That generally precludes the use of index and makes it harder for the optimizer to choose the best execution path.
Instead, compare to same data type:
where arrival_time >= '2019-09-30'::timestamp and
arrival_time >= ('2019-09-30'::timestamp + interval '1 day')
You can try to filter for the upper and lower bounds of that day.
...
WHERE arrival_time >= '2019-04-22'::timestamp
AND arrival_time < '2019-04-23'::timestamp
...
Like that an index on arrival_time should be usable and help to improve performance.