How do I select a data every second with PostgreSQL? - sql

I've got a SQL query that selects every data between two dates and now I would like to add the time scale factor so that instead of returning all the data it returns one data every second, minute or hour.
Do you know how I can achieve it ?
My query :
"SELECT received_on, $1 FROM $2 WHERE $3 <= received_on AND received_on <= $4", [data_selected, table_name, date_1, date_2]
The table input:
As you can see there are several data the same second, I would like to select only one per second

If you want to select data every second, you may use ROW_NUMBER() function partitioned by 'received_on' as the following:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY received_on ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
If you want to select data every minute or hour, you may use the extract function to get the number of seconds in 'received_on' and divide it by 60 to get the minutes or divide it by 3600 to get the hours.
epoch: For date and timestamp values, the number of seconds since 1970-01-01 00:00:00-00 (can be negative); for interval values, the total number of seconds in the interval
Group by minutes:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY floor(extract(epoch from (received_on)) / 60) ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
Group by hours:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY floor(extract(epoch from (received_on)) / (60*60)) ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
See a demo.

When there are several rows per second, and you only want one result row per second, you can decide to pick one of the rows for each second. This can be a randomly chosen row or you pick the row with the greatest or least value in a column as shown in Ahmed's answer.
It would be more typical, though, to aggregate your data per second. The columns show figures and you are interested in those figures. Your sample data shows two times the value 2509 and three times the value 2510 for the adc_v column at 2022-07-29, 15:52. Consider what you would like to see. Maybe you don't want this value go below some boundary, so you show the minimum value MIN(adc_v) to see how low it went in the second. Or you want to see the value that occured most often in the second MODE(adc_v). Or you'd like to see the average value AVG(adc_v). Make this decision for every value, so as to get the informarion most vital to you.
select
received_on,
min(adc_v),
avg(adc_i),
...
from mytable
group by received_on
order by received_on;
If you want this for another interval, say an hour instead of the month, truncate your received_on column accordingly. E.g.:
select
date_trunc('hour', received_on) as received_hour,
min(adc_v),
avg(adc_i),
...
from mytable
group by date_trunc('hour', received_on)
order by date_trunc('hour', received_on);

Related

How to summarize dates in a table with different timestamps

When I ask this question, I get multiple hits on divcode because there are different timestamps in pickdate. Is there a way to get the question to summarize the date so I get 1 value per division?
/*
Söker Vikt och volym Snitt
*/
select TO_CHAR(ROUND(O08T1.pickdate),'YYYY-MM-DD') as date_1,
O08T1.divcode,
sum(O08T1.calcwght) as Vikt, sum(O08T1.calcvol) as Volym,
count(*) as AntalOrder,
avg(O08T1.calcwght) as avg_vikt,
avg(O08T1.calcvol) as avg_Vol
from O08T1
where O08T1.pickdate >= #('Från datum',#DATE)
group by O08T1.pickdate, O08T1.divcode
order by Pickdate DESC
If you want the values grouped by day then don't use GROUP BY pickdate as that will group by each instant and have values down to an accuracy of a second. Instead, GROUP BY TRUNC(pickdate) which will truncate values to the start of the day and will aggregate all values on the same day together (and then, also, use TRUNC in the first line and in the ORDER BY):
select TO_CHAR(TRUNC(pickdate),'YYYY-MM-DD') as date_1,
divcode,
sum(calcwght) as Vikt,
sum(calcvol) as Volym,
count(*) as AntalOrder,
avg(calcwght) as avg_vikt,
avg(calcvol) as avg_Vol
from O08T1
where pickdate >= #('Från datum',#DATE)
group by TRUNC(pickdate),
divcode
order by TRUNC(pickdate) DESC

Delete duplicated rows

I ve got duplicated rows in a temp table mainly because there are some date values which are seconds/miliseconds different to each other.
For example:
2018-08-30 12:30:19.000
2018-08-30 12:30:20.000
This is what causes the duplication.
How can I keep only one of those values? Let s say the higher one?
Thank you.
Well, one method is to use lead():
select t.*
from (select t.*, lead(ts) over (order by ts) as next_ts
from t
) t
where next_ts is null or
datediff(second, ts, next_ts) < 60; -- or whatever threshold you want
You could assign a Row_Number to each value, as follows:
Select *
, Row_Number() over
(partition by ObjectID, cast(date as date)... ---whichever criteria you want to consider duplicates
order by date desc) --assign the latest date to row 1, may want other order criteria if you might have ties on this field
as RN
from MyTable
Then retain only the rows where RN = 1 to remove duplicates. See this answer for examples of how to round your dates to the nearest hour, minute, etc. as needed; I used truncating to the day above as an example.

Time difference between two rows for specified ID

I'm trying to find the time difference in seconds between two rows that have the same ID.
Here's a simple table.
The table is ordered by myid and timestamp. I'm trying to get the total second between two rows that have the same myid.
Here's what I have come up with. The only problem with this query is that it calculates the time difference for all records but not for the same ID.
SELECT DATEDIFF(second, pTimeStamp, TimeStamp), q.*
FROM (
SELECT *,
LAG(TimeStamp) OVER (ORDER BY TimeStamp) pTimeStamp
FROM data
) q
WHERE pTimeStamp IS NOT NULL
This is the output.
I only want the output highlighted in yellow.
Any suggestions?
SQLFIDDLE
The fix is simply a matter of narrowing the window, with PARTITION BY, to rows with the same ID:
SELECT DATEDIFF(second, pTimeStamp, TimeStamp), q.*
FROM (
SELECT *,
LAG(TimeStamp) OVER (PARTITION BY ID ORDER BY TimeStamp) pTimeStamp
FROM data
) q
WHERE pTimeStamp IS NOT NULL

SQL Statement Only latest entry of the day

seems it is too long ago that I needed create own SQL Statements. I have a table (GAS_COUNTER) with timestamps (TS) and values (VALUE).
There are hundreds of entries per day, but I only need the latest of the day. I tried different ways but never get what I need.
Edit
Thanks for the fast replies, but some do not meet my needs (I need the latest value of each day in the table) and some don't work. My best own statement was:
select distinct (COUNT),
from
(select
extract (DAY_OF_YEAR from TS) as COUNT,
extract (YEAR from TS) as YEAR,
extract (MONTH from TS) as MONTH,
extract (DAY from TS) as DAY,
VALUE as VALUE
from GAS_COUNTER
order by COUNT)
but the value is missing. If I put it in the first select all rows return. (logical correct as every line is distinct)
Here an example of the Table content:
TS VALUE
2015-07-25 08:47:12.663 0.0
2015-07-25 22:50:52.155 2.269999999552965
2015-08-10 11:18:07.667 52.81999999284744
2015-08-10 20:29:20.875 53.27999997138977
2015-08-11 10:27:21.49 54.439999997615814
2nd Edit and solution
select TS, VALUE from GAS_COUNTER
where TS in (
select max(TS) from GAS_COUNTER group by extract(DAY_OF_YEAR from TS)
)
This one would give you the very last record:
select top 1 * from GAS_COUNTER order by TS desc
Here is one that would give you last records for every day:
select VALUE from GAS_COUNTER
where TS in (
select max(TS) from GAS_COUNTER group by to_date(TS,'yyyy-mm-dd')
)
Depending on the database you are using you might need to replace/adjust to_date(TS,'yyyy-mm-dd') function. Basically it should extract date-only part from the timestamp.
Select the max value for the timestamp.
select MAX(TS), value -- or whatever other columns you want from the record
from GAS_COUNTER
group by value
Something like this would window the data and give you the last value on the day - but what happens if you get two TS the same? Which one do you want?
select *
from ( select distinct cast( TS as date ) as dt
from GAS_COUNTER ) as gc1 -- distinct days
cross apply (
select top 1 VALUE -- last value on the date.
from GAS_COUNTER as gc2
where gc2.TS < dateadd( day, 1, gc1.dt )
and gc2.TS >= gc1.dt
order by gc2.TS desc
) as x

PostgreSQL: running count of rows for a query 'by minute'

I need to query for each minute the total count of rows up to that minute.
The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute:
SELECT COUNT(id) AS count
, EXTRACT(hour from "when") AS hour
, EXTRACT(minute from "when") AS minute
FROM mytable
GROUP BY hour, minute
Return only minutes with activity
Shortest
SELECT DISTINCT
date_trunc('minute', "when") AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY 1;
Use date_trunc(), it returns exactly what you need.
Don't include id in the query, since you want to GROUP BY minute slices.
count() is typically used as plain aggregate function. Appending an OVER clause makes it a window function. Omit PARTITION BY in the window definition - you want a running count over all rows. By default, that counts from the first row to the last peer of the current row as defined by ORDER BY. The manual:
The default framing option is RANGE UNBOUNDED PRECEDING, which is the
same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY,
this sets the frame to be all rows from the partition start up
through the current row's last ORDER BY peer.
And that happens to be exactly what you need.
Use count(*) rather than count(id). It better fits your question ("count of rows"). It is generally slightly faster than count(id). And, while we might assume that id is NOT NULL, it has not been specified in the question, so count(id) is wrong, strictly speaking, because NULL values are not counted with count(id).
You can't GROUP BY minute slices at the same query level. Aggregate functions are applied before window functions, the window function count(*) would only see 1 row per minute this way.
You can, however, SELECT DISTINCT, because DISTINCT is applied after window functions.
ORDER BY 1 is just shorthand for ORDER BY date_trunc('minute', "when") here.
1 is a positional reference reference to the 1st expression in the SELECT list.
Use to_char() if you need to format the result. Like:
SELECT DISTINCT
to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY date_trunc('minute', "when");
Fastest
SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) sub
ORDER BY 1;
Much like the above, but:
I use a subquery to aggregate and count rows per minute. This way we get 1 row per minute without DISTINCT in the outer SELECT.
Use sum() as window aggregate function now to add up the counts from the subquery.
I found this to be substantially faster with many rows per minute.
Include minutes without activity
Shortest
#GabiMe asked in a comment how to get eone row for every minute in the time frame, including those where no event occured (no row in base table):
SELECT DISTINCT
minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT generate_series(date_trunc('minute', min("when"))
, max("when")
, interval '1 min')
FROM tbl
) m(minute)
LEFT JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER BY 1;
Generate a row for every minute in the time frame between the first and the last event with generate_series() - here directly based on aggregated values from the subquery.
LEFT JOIN to all timestamps truncated to the minute and count. NULL values (where no row exists) do not add to the running count.
Fastest
With CTE:
WITH cte AS (
SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
FROM tbl
GROUP BY 1
)
SELECT m.minute
, COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (
SELECT generate_series(min(minute), max(minute), interval '1 min')
FROM cte
) m(minute)
LEFT JOIN cte USING (minute)
ORDER BY 1;
Again, aggregate and count rows per minute in the first step, it omits the need for later DISTINCT.
Different from count(), sum() can return NULL. Default to 0 with COALESCE.
With many rows and an index on "when" this version with a subquery was fastest among a couple of variants I tested with Postgres 9.1 - 9.4:
SELECT m.minute
, COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (
SELECT generate_series(date_trunc('minute', min("when"))
, max("when")
, interval '1 min')
FROM tbl
) m(minute)
LEFT JOIN (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) c USING (minute)
ORDER BY 1;