Postgres sql query by time window - sql

I have a table "meterreading" that has columns: "timestamp", "value", "meterId". I would like to get sums of the "value" for each hour starting a specific time... So far I have come up with this query, but it is erroring saying I need to group by timestamp. Timestamps are just integers representing unix epoch timestamps.
select date_trunc('hour', to_timestamp(timestamp)) as hours, sum(value)
from meterreading
WHERE timestamp >= 1377993600 AND timestamp < 1409595081
group by date_trunc('hours', to_timestamp(timestamp))
order by date_trunc('hours', to_timestamp(timestamp)) asc

select date_trunc('hour', to_timestamp(timestamp)) as hours, sum(value)
from meterreading
WHERE timestamp >= 1377993600 AND timestamp < 1409595081
group by 1
order by 1
or use the exact same expression used in the select list
group by date_trunc('hour', to_timestamp(timestamp));
Notice 'hour' in instead of 'hours'. Hence the convenience of the number reference syntax in the group by. It is clearer and less prone to errors.

Related

How do I select a data every second with PostgreSQL?

I've got a SQL query that selects every data between two dates and now I would like to add the time scale factor so that instead of returning all the data it returns one data every second, minute or hour.
Do you know how I can achieve it ?
My query :
"SELECT received_on, $1 FROM $2 WHERE $3 <= received_on AND received_on <= $4", [data_selected, table_name, date_1, date_2]
The table input:
As you can see there are several data the same second, I would like to select only one per second
If you want to select data every second, you may use ROW_NUMBER() function partitioned by 'received_on' as the following:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY received_on ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
If you want to select data every minute or hour, you may use the extract function to get the number of seconds in 'received_on' and divide it by 60 to get the minutes or divide it by 3600 to get the hours.
epoch: For date and timestamp values, the number of seconds since 1970-01-01 00:00:00-00 (can be negative); for interval values, the total number of seconds in the interval
Group by minutes:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY floor(extract(epoch from (received_on)) / 60) ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
Group by hours:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY floor(extract(epoch from (received_on)) / (60*60)) ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
See a demo.
When there are several rows per second, and you only want one result row per second, you can decide to pick one of the rows for each second. This can be a randomly chosen row or you pick the row with the greatest or least value in a column as shown in Ahmed's answer.
It would be more typical, though, to aggregate your data per second. The columns show figures and you are interested in those figures. Your sample data shows two times the value 2509 and three times the value 2510 for the adc_v column at 2022-07-29, 15:52. Consider what you would like to see. Maybe you don't want this value go below some boundary, so you show the minimum value MIN(adc_v) to see how low it went in the second. Or you want to see the value that occured most often in the second MODE(adc_v). Or you'd like to see the average value AVG(adc_v). Make this decision for every value, so as to get the informarion most vital to you.
select
received_on,
min(adc_v),
avg(adc_i),
...
from mytable
group by received_on
order by received_on;
If you want this for another interval, say an hour instead of the month, truncate your received_on column accordingly. E.g.:
select
date_trunc('hour', received_on) as received_hour,
min(adc_v),
avg(adc_i),
...
from mytable
group by date_trunc('hour', received_on)
order by date_trunc('hour', received_on);

select rows with condition of date presto

I try to select by hour the number of impression for a particular day :
I try with this code :
SELECT
date_trunc('hour', CAST(date_time AS timestamp)) date_time,
COUNT(impression_id) AS count_impression_id
FROM
parquet_db.imp_pixel
WHERE
date_time = '2022-07-27'
LIMIT 100
GROUP BY 1
But I got this error when I add the "where" clause :
line 5:1: mismatched input 'group'. Expecting:
Can you help me to fix it? thanks
LIMIT usually comes last in a SQL query. Also, you should not be using LIMIT without ORDER BY. Use this version:
SELECT DATE_TRUNC('hour', CAST(date_time AS timestamp)) date_time,
COUNT(impression_id) AS count_impression_id
FROM parquet_db.imp_pixel
WHERE CAST(date_time AS date) = '2022-07-27'
GROUP BY 1
ORDER BY <something>
LIMIT 100;
Note that the ORDER BY clause determines which 100 records you get in the result set. Your current (intended) query lets Presto decide on its own which 100 records get returned.

Aggregrate the variable from timestamp on bigQuery

I am planning to calculate the most frequency part_of_day for each of the user. In this case, firstly, I encoded timestamp with part_of_day, then aggregrate with the most frequency part_of_day. I use the ARRAY_AGG to calculate the mode (). However, I’m not sure how to deal with timestamp with the ARRAY_AGG, because there is error, so my code structure might be wrong
SELECT User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day,
case
when time BETWEEN '04:00:00' AND '12:00:00'
then "morning"
when time < '04:00:00' OR time > '20:00:00'
then "night"
end AS part_of_day
FROM (
SELECT User_ID,
TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS Time
COUNT(*) AS cnt
Error received:
Syntax error: Expected ")" but got identifier "COUNT" at [19:9]
Even though you did not share any sample data, I was able to identify some issues within your code.
I have used some sample data I created based in the formats and functions you used in your code to keep consistency. Below is the code, without any errors:
WITH data AS (
SELECT 98 as User_ID,DATETIME "2008-12-25 05:30:00.000000" AS Request_Timestamp, "something!" AS channel UNION ALL
SELECT 99 as User_ID,DATETIME "2008-12-25 22:30:00.000000" AS Request_Timestamp, "something!" AS channel
)
SELECT User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day1,
case
when time BETWEEN '04:00:00' AND '12:00:00'
then "morning"
when time < '04:00:00' OR time > '20:00:00'
then "night"
end AS part_of_day
FROM (
SELECT User_ID,
TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS time,
COUNT(*) AS cnt
FROM data
GROUP BY User_ID, Channel, Request_Timestamp
#order by Request_Timestamp
)
GROUP BY User_ID, Time;
First, notice that I have changed the column's name in your ARRAY_AGG() method, it had to be done because it would cause the error "Duplicate column name". Second, after your TIME_TRUNC() function, it was missing a comma so you could select COUNT(*). Then, within your GROUP BY, you needed to group Request_Timestamp as well because it wasn't aggregated nor grouped. Lastly, in your last GROUP BY, you needed to aggregate or group time. Thus, after theses corrections, your code will execute without any errors.
Note: the Syntax error: Expected ")" but got identifier "COUNT" at [19:9] error you experienced is due to the missing comma. The others would be shown after correcting this one.
If you want the most frequent part of each day, you need to use the day part in the aggregation:
SELECT User_ID,
ARRAY_AGG(part_of_day ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day
FROM (SELECT User_ID,
(case when time BETWEEN '04:00:00' AND '12:00:00' then 'morning'
when time < '04:00:00' OR time > '20:00:00' then 'night'
end) AS part_of_day
COUNT(*) AS cnt
FROM cognitivebot2.chitchaxETL.conversations
GROUP BY User_ID, part_of_day
) u
GROUP BY User_ID;
Obviously, if you want the channel as well, then you need to include that in the queries.

How to get min value in postgres sql

I have few records and i want to create a query to give hourly records of a each divece battery level
What i did from timestamp field i extract date and select min function to get low value but as extract hour from timestamp is not aggragate function so i need to add in group by which now given me duplicate records.
Here is my sql:
select extract(hour from observationtime) as hour,
deviceid,
min(batterylevel) as batterylevel
from smartvakt_device_report
where batterylevel!=''
and deviceid!=''
and observationtime between '2016-02-02' and '2016-03-02'
group by observationtime,deviceid
order by observationtime ASC, deviceid ASC
Here is above query output:
Here are actual records:
Can someone suggest how i can remove these duplicate
Change Group by column Order to first Group by deviceid then by the hour using the same function extract(hour from observationtime).
SELECT
deviceid,
extract(hour from observationtime) AS hour,
min(batterylevel) AS batterylevel
FROM smartvakt_device_report
WHERE
batterylevel!=''
AND deviceid!=''
AND observationtime BETWEEN '2016-02-02'
AND '2016-03-02'
GROUP BY
deviceid,
extract(hour from observationtime)
ORDER BY
extract(hour from observationtime) ASC,
deviceid ASC
Since you are only interested in the hour, when you are grouping, you have to indicate that like this
group by extract(hour from observationtime)
Otherwise, postgresql will try to group together rows whose observationtime values are identical. But observationtime contains the time with full resolution, not just the hour.

PostgreSQL: running count of rows for a query 'by minute'

I need to query for each minute the total count of rows up to that minute.
The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute:
SELECT COUNT(id) AS count
, EXTRACT(hour from "when") AS hour
, EXTRACT(minute from "when") AS minute
FROM mytable
GROUP BY hour, minute
Return only minutes with activity
Shortest
SELECT DISTINCT
date_trunc('minute', "when") AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY 1;
Use date_trunc(), it returns exactly what you need.
Don't include id in the query, since you want to GROUP BY minute slices.
count() is typically used as plain aggregate function. Appending an OVER clause makes it a window function. Omit PARTITION BY in the window definition - you want a running count over all rows. By default, that counts from the first row to the last peer of the current row as defined by ORDER BY. The manual:
The default framing option is RANGE UNBOUNDED PRECEDING, which is the
same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY,
this sets the frame to be all rows from the partition start up
through the current row's last ORDER BY peer.
And that happens to be exactly what you need.
Use count(*) rather than count(id). It better fits your question ("count of rows"). It is generally slightly faster than count(id). And, while we might assume that id is NOT NULL, it has not been specified in the question, so count(id) is wrong, strictly speaking, because NULL values are not counted with count(id).
You can't GROUP BY minute slices at the same query level. Aggregate functions are applied before window functions, the window function count(*) would only see 1 row per minute this way.
You can, however, SELECT DISTINCT, because DISTINCT is applied after window functions.
ORDER BY 1 is just shorthand for ORDER BY date_trunc('minute', "when") here.
1 is a positional reference reference to the 1st expression in the SELECT list.
Use to_char() if you need to format the result. Like:
SELECT DISTINCT
to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY date_trunc('minute', "when");
Fastest
SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) sub
ORDER BY 1;
Much like the above, but:
I use a subquery to aggregate and count rows per minute. This way we get 1 row per minute without DISTINCT in the outer SELECT.
Use sum() as window aggregate function now to add up the counts from the subquery.
I found this to be substantially faster with many rows per minute.
Include minutes without activity
Shortest
#GabiMe asked in a comment how to get eone row for every minute in the time frame, including those where no event occured (no row in base table):
SELECT DISTINCT
minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT generate_series(date_trunc('minute', min("when"))
, max("when")
, interval '1 min')
FROM tbl
) m(minute)
LEFT JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER BY 1;
Generate a row for every minute in the time frame between the first and the last event with generate_series() - here directly based on aggregated values from the subquery.
LEFT JOIN to all timestamps truncated to the minute and count. NULL values (where no row exists) do not add to the running count.
Fastest
With CTE:
WITH cte AS (
SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
FROM tbl
GROUP BY 1
)
SELECT m.minute
, COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (
SELECT generate_series(min(minute), max(minute), interval '1 min')
FROM cte
) m(minute)
LEFT JOIN cte USING (minute)
ORDER BY 1;
Again, aggregate and count rows per minute in the first step, it omits the need for later DISTINCT.
Different from count(), sum() can return NULL. Default to 0 with COALESCE.
With many rows and an index on "when" this version with a subquery was fastest among a couple of variants I tested with Postgres 9.1 - 9.4:
SELECT m.minute
, COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (
SELECT generate_series(date_trunc('minute', min("when"))
, max("when")
, interval '1 min')
FROM tbl
) m(minute)
LEFT JOIN (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) c USING (minute)
ORDER BY 1;