Efficient PostgreSQL Query for Mins and Maxis withing equal intervals in a time period - sql

I am using Postgres v9.2.6.
Have a system with lots of devices that take measurements. These measurements are stored in
table with three fields.
device_id
measurement (Indexed)
time (Indexed)
There could be 10 Million measurements in a single year. Most of the time the user is only interested in 100 min max pairs within equal interval for a certain period, for example in last 24 hours or in last 53 weeks. To get these 100 mins and maxs the period is divided into 100 equal intervals. From each interval min and max is extracted. Would you recommend the most efficient approach to query the data? So far I have tried the following query:
WITH periods AS (
SELECT time.start AS st, time.start + (interval '1 year' / 100) AS en FROM generate_series(now() - interval '1 year', now(), interval '1 year' / 100) AS time(start)
)
SELECT * FROM sample_data
JOIN periods
ON created_at BETWEEN periods.st AND periods.en AND
customer_id = 23
WHERE
sample_data.id = (SELECT id FROM sample_data WHERE created_at BETWEEN periods.st AND periods.en ORDER BY sample ASC LIMIT 1)
This test approach took over a minute for 1 million points on MacBook Pro.
Thanks...

Sorry about that. It was actually my question and looks like the author of this post caught cold so I ca not ask him to edit it. I've posted "more good" question here - Slow PostgreSQL Query for Mins and Maxs within equal intervals in a time period. Could you please close this question?

Related

Averaging a variable over a period of time

I am currently having difficulty formulating this into an sql query:
I would like to average the data of a column here twa for a duration of 10 minutes starting from the last value of the table i.e. data included here:
last date-10minutes<=date<=last date
I tried to start a first query but it does not show the right answer:
SELECT AVG(twa), horaire FROM OF50 WHERE ((SELECT horaire FROM of50 ORDER BY horaire DESC LIMIT 1)-INTERVAL '1 minutes'>horaire) ORDER BY horaire;
Regards,
Maybe this will do.
with t as (select max(horaire) maxhoraire from of50)
select AVG(of50.twa)
from of50, t
where of50.horaire between t.maxhoraire - interval '1 minute' and t.maxhoraire;
or even this may do, given that the last value can not be 'younger' then now and at least one event happened during the last minute, though it is not exactly the same and says 'the average over the last 1 minute'
select AVG(twa)
from of50
where horaire >= now() - interval '1 minute';

unable to count records for an interval in days

I want to detect how many records are covered by a certain period in a RedShift table. So I queried records for various periods of time. However I've noticed a strange behavior.
When I'm trying to count a number of records for say 100 days it returns 0 no matter how many days I'm executing the query for.
SELECT count(*)
FROM main.transaction_data
WHERE tr_date > current_date - interval '100' day;
But when I query the count for several months it returns a valid count.
SELECT count(*)
FROM main.transaction_data
WHERE tr_date > current_date - interval '3 months';
Is the query for a period of 100 days incorrect?

Getting random time interval in postgreSQL

I need random interval time between 0 and (10 days and 5 hours).
My code:
select random() * (interval '10 days 5 hours')
from generate_series(1, 50)
It works like should, except a few strange results, like:
0 years 0 mons 7 days 26 hours 10 mins 1.353353 secs
The problem is 26 hours, it shouldn't be more than 23. And I never get 10 days, what I'd like to.
Intervals in Postgres are quite flexible, so hour values of greater than 23 do not necessarily roll over to days. Use jusify_interval() to return them to the normal "days" and "hours"."
So:
select justify_interval(random() * interval '10 day 5 hour')
from generate_series(1, 200)
order by 1 desc;
will return values with appropriate values for days, hours, minutes, and seconds.
Now, why aren't you getting intervals with more than 10 days? This is simple randomness. If you increase the number of rows to 200 (as above), you'll see them (in all likelihood). If you run the code multiple times, sometimes you'll see none in that range; sometimes you'll see two.
Why? You are asking how often you get a value of 240+ in a range of 245. Those top 5 hours account for 0.02% of the range (about 1/50). In other words a sample of 50 is not big enough -- any given sample of 50 random values is likely to be missing 1 or more 5 hour ranges.
Plus, without justify_interval(), you are likely to miss those anyway because they may show up as 9 days with an hours component larger than 23.
Try this:
select justify_hours(random() * (interval '245 hours'))
FROM generate_series(1, 50)
See Postgres Documentation for an explanation of the justify_* functions.
One option would be to use an interval of one hour, and then multiply by the random number between 0 and 1 coming from the series:
select random() * 245 * interval '1 hour'
from generate_series(1, 50);
I can see that the other answers suggest using justify_interval. If you just want a series of intervals between 0 and 245 hours (245 hours corresponding to 10 days and 5 hours), then my answer should suffice.

Aggregate and calculate total minutes for set of records as productivity

I have a table that lists activity for people and start / end timed for activity.
How do I get total amount of records for each person?
SELECT NAME,
--sum(startDT- endDT) AS minutes -- stuck here
FROM TABLE1
GROUP BY NAME
You're subtracting end time from start time, which will produce a negative value - try flipping those around (subtract start time from end time). The following will give you the number of records and the total elapsed time for each NAME:
SELECT NAME,
COUNT(*) AS "Records for NAME",
TO_CHAR(NUMTODSINTERVAL(SUM(END_DATE_TIME - START_DATE_TIME), 'DAY')) AS MINUTES
FROM TABLE1
GROUP BY NAME
SQLFiddle here
Share and enjoy.
Assuming that startDT and endDT are both of type date, you were really close. Subtracting two dates gives a difference in days. Multiply by 24 to get a difference in hours and again by 60 to get minutes
SELECT NAME,
sum(endDT - startDT)*24*60 AS minutes -- stuck here
FROM TABLE1
GROUP BY NAME
Assuming that your differences aren't always an exactly even number of minutes, you'll either get a non-integer result (e.g. 12.5 for 12 minutes 30 seconds) here or you'll want to either round or trunc the sum to get an integer number of minutes.

date_trunc 5 minute interval in PostgreSQL [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the fastest way to truncate timestamps to 5 minutes in Postgres?
Postgresql SQL GROUP BY time interval with arbitrary accuracy (down to milli seconds)
I want to aggregate data at 5 minute intervals in PostgreSQL. If I use the date_trunc() function, I can aggregate data at an hourly, monthly, daily, weekly, etc. interval but not a specific interval like 5 minute or 5 days.
select date_trunc('hour', date1), count(*) from table1 group by 1;
How can we achieve this in PostgreSQL?
SELECT date_trunc('hour', date1) AS hour_stump
, (extract(minute FROM date1)::int / 5) AS min5_slot
, count(*)
FROM table1
GROUP BY 1, 2
ORDER BY 1, 2;
You could GROUP BY two columns: a timestamp truncated to the hour and a 5-minute-slot.
The example produces slots 0 - 11. Add 1 if you prefer 1 - 12.
I cast the result of extract() to integer, so the division / 5 truncates fractional digits. The result:
minute 0 - 4 -> slot 0
minute 5 - 9 -> slot 1
etc.
This query only returns values for those 5-minute slots where values are found. If you want a value for every slot or if you want a running sum over 5-minute slots, consider this related answer:
PostgreSQL: running count of rows for a query 'by minute'
Here's a simple query you can either wrap in a function or cut and paste all over the place:
select now()::timestamp(0), (extract(epoch from now()::timestamptz(0)-date_trunc('d',now()))::int)/60;
It'll give you the current time, and a number from 0 to the n-1 where n=60 here. To make it every 5 minutes, make that number 300 and so on. It groups by the seconds since the start of the day. To make it group by seconds since year begin, hour begin, or whatever else, change the 'd' in the date_trunc.