PostgreSQL select daily max and corresponding hour of ocurrence - sql

I have the following table structure, with daily-hourly data:
time_of_ocurrence(timestamp); particles(numeric)
"2012-11-01 00:30:00";191.3
"2012-11-01 01:30:00";46
...
"2013-01-01 02:30:00";319.6
How do i select the DAILY max and THE HOUR in which this max occur?
I've tried
SELECT date_trunc('hour', time_of_ocurrence) as hora,
MAX(particles)
from my_table WHERE time_of_ocurrence > '2013-09-01'
GROUP BY hora ORDER BY hora
But it doesn't work:
"2013-09-01 00:00:00";34.35
"2013-09-01 01:00:00";33.13
"2013-09-01 02:00:00";33.09
"2013-09-01 03:00:00";28.08
My result would be in this format instead (one max per day, showing the hour)
"2013-09-01 05:00:00";100.35
"2013-09-02 03:30:00";80.13
How can i do that? Thanks!

This type of question has come up on StackOverflow frequently, and these questions are categorized with the greatest-n-per-group tag, if you want to see other solutions.
edit: I changed the following code to group by day instead of by hour.
Here's one solution:
SELECT t.*
FROM (
SELECT date_trunc('day', time_of_ocurrence) as hora, MAX(particles) AS particles
FROM my_table
GROUP BY hora
) AS _max
INNER JOIN my_table AS t
ON _max.hora = date_trunc('day', t.time_of_ocurrence)
AND _max.particles = t.particles
WHERE time_of_ocurrence > '2013-09-01'
ORDER BY time_of_ocurrence;
This might also show more than one result per day, if more than one row has the max value.
Another solution using window functions that does not show such duplicates:
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY date_trunc('day', time_of_ocurrence)
ORDER BY particles DESC) AS _rn
FROM my_table
) AS _max
WHERE _rn = 1
ORDER BY time_of_ocurrence;
If multiple rows have the same max, one row with nevertheless be numbered row 1. If you need specific control over which row is numbered 1, you need to use ORDER BY in the partitioning clause using a unique column to break such ties.

Use window functions:
select distinct
date_trunc('day',time_of_ocurrence) as day,
max(particles) over (partition by date_trunc('day',time_of_ocurrence)) as particles_max_of_day,
first_value(date_trunc('hour',time_of_ocurrence)) over (partition by date_trunc('day',time_of_ocurrence) order by particles desc)
from my_table
order by 1
One edge case here is if the same MAX number of particles show up in the same day, but in different hours. This version would randomly pick one of them. If you prefer one over the other (always the earlier one for example) you can add that to the order by clause:
first_value(date_trunc('hour',time_of_ocurrence)) over (partition by date_trunc('day',time_of_ocurrence) order by particles desc, time_of_ocurrence)

Related

BigQuery - Extract last entry of each group

I have one table where multiple records inserted for each group of product. Now, I want to extract (SELECT) only the last entries. For more, see the screenshot. The yellow highlighted records should be return with select query.
The HAVING MAX and HAVING MIN clause for the ANY_VALUE function is now in preview
HAVING MAX and HAVING MIN were just introduced for some aggregate functions - https://cloud.google.com/bigquery/docs/release-notes#February_06_2023
with them query can be very simple - consider below approach
select any_value(t having max datetime).*
from your_table t
group by t.id, t.product
if applied to sample data in your question - output is
You might consider below as well
SELECT *
FROM sample_table
QUALIFY DateTime = MAX(DateTime) OVER (PARTITION BY ID, Product);
If you're more familiar with an aggregate function than a window function, below might be an another option.
SELECT ARRAY_AGG(t ORDER BY DateTime DESC LIMIT 1)[SAFE_OFFSET(0)].*
FROM sample_table t
GROUP BY t.ID, t.Product
Query results
You can use window function to do partition based on key and selecting required based on defining order by field.
For Example:
select * from (
select *,
rank() over (partition by product, order by DateTime Desc) as rank
from `project.dataset.table`)
where rank = 1
You can use this query to select last record of each group:
Select Top(1) * from Tablename group by ID order by DateTime Desc

Oracle SQL Return First & Last Value From Different Columns By Partition

I need help with a query that will return a single record per partition in the below dataset. I used the DENSE_RANK to get the order and first/last position within each partition, but the problem is that I need to get a single record for each EMPLOYEE ITEM_ID combination which contains:
MIN(START) which is date type with time
SUM(DURATION) which is a number type signifying seconds of activity
MIN ranked value from INIT_STATUS
MAX ranked value from FIN_STATUS
Here is the initial data table, the same data table ordered with rank, and the desired result at the end (see image below):
Also, here is the code used to get the ordered table with rank values:
SELECT T.*,
DENSE_RANK() OVER (PARTITION BY T.EMPLOYEE, T.ITEM_ID ORDER BY T.START) AS D_RANK
FROM TEST_DATA T
ORDER BY T.EMPLOYEE, T.ITEM_ID, T.START;
Use first/last option to find statuses. The rest is classic aggregation:
select employee, min(start_), sum(duration),
max(init_status) keep (dense_rank first order by start_),
max(fin_status) keep (dense_rank last order by start_)
from test_data t
group by employee, item_id
order by employee, item_id;
start is a reserved word, so I used start_ for my test.

How to select the first observation in a category in PostgreSQL

My table contains different house IDs(dataid), time of observation(readtime), meter reading Basic Output
And the query is as follows Query statement :
select *
from university.gas_ert
where readtime between '01/01/2014' and '01/02/2014'
I am trying to get only the first observation of each day of all the dataids between the time span. I have tried GROUP BY, but it doesn't seem working.
Distinct ON could make your query much more simple.. More read in Documentation
Definition :
Keeps only the first row of each set of rows where the given
expressions evaluate to equal. Note that the “first row” of each set
is unpredictable unless ORDER BY is used to ensure that the desired
row appears first.
SELECT
DISTINCT ON (meter_value) meter_value,
dataid,
readtime
FROM
university.gas.ert
WHERE
readtime between '2014-01-01' and '2014-01-02'
ORDER BY
meter_value,
readtime ASC;
If you want one row for each unique dataid within the time range, you should use the DISTINCT ON construction. The following query will give you a row for each dataid for each day in the range described in the WHERE clause and lets you extend the range if you want to return rows for each day x dataid combination.
select distinct on(dataid, date_trunc('day', readtime)) *
from university.gas_ert
where readtime between '2014-01-01' and '2014-01-02'
order by dataid, date_trunc('day', readtime) asc
You can take a look at window functions to help out in this. ROW_NUMBER.
GROUP the records on the basis of day using date_trunc(ie without the time component) and then rank them on the basis of readtime asc
select *
from (
select *
,row_number() over(partition by date_trunc('day',a.readtime) order by a.readtime asc ) as rnk
from university.gas_ert a
)x
where x.rnk=1

PostgreSQL backward intersection & join

I have a survey form of certain questions for a certain facility.
the facility can be monitored(data entry) more than once in a month.
now i need the latest data(values) against the questions
but if there is no latest data against any question i will traverse through prior records(previous dates) of the same month.
i can get the latest record but i don't know how to get previous record of the same month id there is no latest data.
i am using PostgreSQL 10.
Table Structure is
Desired output is
You can try to use ROW_NUMBER window function to make it.
SELECT to_char(date, 'MON') month,
facility,
idquestion,
value
FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY facility,idquestion ORDER BY DATE DESC) rn
FROM T
) t1
where rn = 1
demo:db<>fiddle
SELECT DISTINCT
to_char(qdate, 'MON'),
facility,
idquestion,
first_value(value) OVER (PARTITION BY facility, idquestion ORDER BY qdate DESC) as value
FROM questions
ORDER BY facility, idquestion
Using window functions:
first_value(value) OVER ... gives you the first value of a window frame. The frame is a group of facility and idquestion. Within this group the rows are ordered by date DESC. So the very last value is first no matter which date it is
DISTINCT filtered the tied values (e.g. there are two values for facility == 1 and idquestion == 7)
Please notice:
"date" is a reserved word in Postgres. I strongly recommend to rename your column to avoid certain trouble. Furthermore in Postgres lower case is used and is recommended.

SQL Server LAG() function to calculate differences between rows

I'm new in SQL Server and I've got some doubts about the lag() function.
I have to calculate the average distance (in days) between two user's activities. Then, I have to GROUP BY all the users, calculate all the date differences between rows for each user, and finally select the average of the group.
Just to be clear, I've got this kind of table:
First I have to filter days with activities (activities!=0). Then I have to create this:
And finally, the expected outcome is this one:
I thought this could be a "kind of" code:
select userid, avg(diff)
(SELECT *,DATEDIFF(day, Lag(dateid, 1) OVER(ORDER BY [Userid]),
dateid) as diff
FROM table1
where activities!=0
group by userid) t
group by userid
Of course it doesn't work. I think I also have to do a while loop since rownumber changes for each users.
I hope you can help meeee! thank you very much
You are almost there. Just add partition by userid so the difference is calculated for each userid and order by dateid.
select userid, avg(diff)
(SELECT t.*
,DATEDIFF(day, Lag(dateid, 1) OVER(PARTITION BY [Userid] ORDER BY [dateid]),dateid) as diff
FROM table1 t
where wager!=0
) t
group by userid
You don't need lag() at all. The average is the maximum minus the minimum divided by one less than the count:
SELECT userid,
DATEDIFF(day, MIN(dateid), MAX(dateid)) * 1.0 / NULLIF(COUNT(*), 1) as avg_diff
FROM table1
WHERE wager<> 0
GROUP BY userid;