Oracle: getting maximum value of a group? - sql

Given a table like this, what query will the most recent calibration information for each monitor? In other words, I want to find the maximum date value for each of the monitors. Oracle-specific functionality is fine for my application.
monitor_id calibration_date value
---------- ---------------- -----
1 2011/10/22 15
1 2012/01/01 16
1 2012/01/20 17
2 2011/10/22 18
2 2012/01/02 19
The results for this example would look like this:
1 2012/01/20 17
2 2012/01/02 19

I'd tend to use analytic functions
SELECT monitor_id,
host_name,
calibration_date,
value
FROM (SELECT b.monitor_id,
b.host_name,
a.calibration_date,
a.value,
rank() over (partition by b.monitor_id order by a.calibration_date desc) rnk
FROM table_name a,
table_name2 b
WHERE a.some_key = b.some_key)
WHERE rnk = 1
You could also use correlated subqueries though that will be less efficient
SELECT monitor_id,
calibration_date,
value
FROM table_name a
WHERE a.calibration_date = (SELECT MAX(b.calibration_date)
FROM table_name b
WHERE a.monitor_id = b.monitor_id)

My personal preference is this:
SELECT DISTINCT
monitor_id
,MAX(calibration_date)
OVER (PARTITION BY monitor_id)
AS latest_calibration_date
,FIRST_VALUE(value)
OVER (PARTITION BY monitor_id
ORDER BY calibration_date DESC)
AS latest_value
FROM mytable;
A variation would be to use the FIRST_VALUE syntax for latest_calibration_date as well. Either way works.

The window functions solution should be the most efficient and result in only one table or index scan. The one I am posting here i think wins some points for being intuitive and easy to understand. I tested on SQL server and it performed 2nd to window functions, resulting in two index scans.
SELECT T1.monitor_id, T1.calibration_date, T1.value
FROM someTable AS T1
WHERE NOT EXISTS
(
SELECT *
FROM someTable AS T2
WHERE T2.monitor_id = T1.monitor_id AND T2.value > T1.value
)
GROUP BY T1.monitor_id, T1.calibration_date, T1.value
And just for the heck of it, here's another one along the same lines, but less performing (63% cost vs 37%) than the other (again in sql server). This one uses a Left Outer Join in the execution plan where as the first one uses an Anti-Semi Merge Join:
SELECT T1.monitor_id, T1.calibration_date, T1.value
FROM someTable AS T1
LEFT JOIN someTable AS T2 ON T2.monitor_id = T1.monitor_id AND T2.value > T1.value
WHERE T2.monitor_id IS NULL
GROUP BY T1.monitor_id, T1.calibration_date, T1.value

select monitor_id, calibration_date, value
from table
where calibration_date in(
select max(calibration_date) as calibration_date
from table
group by monitor_id
)

Related

Foreach/per-item iteration in SQL

I'm new to SQL and I think I must just be missing something, but I can't find any resources on how to do the following:
I have a table with three relevant columns: id, creation_date, latest_id. latest_id refers to the id of another entry (a newer revision).
For each entry, I would like to find the min creation date of all entries with latest_id = this.id. How do I perform this type of iteration in SQL / reference the value of the current row in an iteration?
select
t.id, min(t2.creation_date) as min_creation_date
from
mytable t
left join
mytable t2 on t2.latest_id = t.id
group by
t.id
You could solve this with a loop, but it's not anywhere close the best strategy. Instead, try this:
SELECT tf.id, tf.Creation_Date
FROM
(
SELECT t0.id, t1.Creation_Date,
row_number() over (partition by t0.id order by t1.creation_date) rn
FROM [MyTable] t0 -- table prime
INNER JOIN [MyTable] t1 ON t1.latest_id = t0.id -- table 1
) tf -- table final
WHERE tf.rn = 1
This connects the id to the latest_id by joining the table to itself. Then it uses a windowing function to help identify the smallest Creation_Date for each match.

Adding an extra condition to a group-by query makes it slow

I have a query that selects a maximum value for a certain group, e.g.
select *
from mytable
where value in (select max(value) from mytable group by mygroupfield)
This is fast enough (~ 0.1 sec), but when I add an extra condition, e.g.
select *
from mytable
where value in (select max(value) from mytable group by mygroupfield)
AND mygroupfield='something'
the query becomes very slow (~ 4-5 seconds)
Both value and mygroupfield are indexed (non unique, non clustered, if it matters). This table contains ~ 40,000 records.
I know I can do this:
where value in (select max(value) where mygroupfield='something') (which is also fast) but due to our architecture, that is not an option right now.
Is there a way to speed up this query?
Try a correlated subquery like the example below.
SELECT a.*
FROM mytable AS a
WHERE value IN (
SELECT MAX(value)
FROM mytable AS b
WHERE b.mygroupfield=a.mygroupfield
GROUP by mygroupfield
)
AND mygroupfield='something';
This index may help as well:
CREATE INDEX idx ON dbo.mytable(mygroupfield) INCLUDE(value);
You can try using row_number() though not sure whether it'll serve your purpose or not
with cte as
(
select *,row_number() over(partition by mygroupfield order by value desc) as rn
from mytable
where mygroupfield='something'
)
select * from cte where rn=1
I would suggest a correlated subquery:
select t.*
from mytable t
where t.value = (select max(t2.value)
from mytable t2
where t2.mygroupfield = t.mygroupfield
) and
t.mygroupfield = 'something';
In particular, this can take advantage of an index on mytable(mygroupfield, value)

How to find Max value in a column in SQL Server 2012

I want to find the max value in a column
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
2 1 10 P2
3 2 50 P2
4 2 80 P1
Above is my table structure. I just want to find the max total value only from the table. In that four row ID 1 and 2 have same value in CName but total val and PName has different values. What I am expecting is have to find the max value in ID 1 and 2
Expected result:
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
4 2 80 P1
I need result same as like mention above
select Max(Tot_Val), CName
from table1
where PName in ('P1', 'P2')
group by CName
This is query I have tried but my problem is that I am not able to bring PName in this table. If I add PName in the select list means it will showing the rows doubled e.g. Result is 100 rows but when I add PName in selected list and group by list it showing 600 rows. That is the problem.
Can someone please help me to resolve this.
One possible option is to use a subquery. Give each row a number within each CName group ordered by Tot_Val. Then select the rows with a row number equal to one.
select x.*
from ( select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt ) x
where x.No = 1;
An alternative would be to use a common table expression (CTE) instead of a subquery to isolate the first result set.
with x as
(
select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt
)
select x.*
from x
where x.No = 1;
See both solutions in action in this fiddle.
You can search top-n-per-group for this kind of a query.
There are two common ways to do it. The most efficient method depends on your indexes and data distribution and whether you already have another table with the list of all CName values.
Using ROW_NUMBER
WITH
CTE
AS
(
SELECT
ID, CName, Tot_Val, PName,
ROW_NUMBER() OVER (PARTITION BY CName ORDER BY Tot_Val DESC) AS rn
FROM table1
)
SELECT
ID, CName, Tot_Val, PName
FROM CTE
WHERE rn=1
;
Using CROSS APPLY
WITH
CTE
AS
(
SELECT CName
FROM table1
GROUP BY CName
)
SELECT
A.ID
,A.CName
,A.Tot_Val
,A.PName
FROM
CTE
CROSS APPLY
(
SELECT TOP(1)
table1.ID
,table1.CName
,table1.Tot_Val
,table1.PName
FROM table1
WHERE
table1.CName = CTE.CName
ORDER BY
table1.Tot_Val DESC
) AS A
;
See a very detailed answer on dba.se Retrieving n rows per group
, or here Get top 1 row of each group
.
CROSS APPLY might be as fast as a correlated subquery, but this often has very good performance (and better than ROW_NUMBER():
select t.*
from t
where t.tot_val = (select max(t2.tot_val)
from t t2
where t2.cname = t.cname
);
Note: The performance depends on having an index on (cname, tot_val).

Access SQL Query Top 1 percent group by winsorizing

I need to find the 99th and 1th percentiles for a variable at each date. So far I have managed to do so but for the overall period, I would like to "loop" the following query (that does work) for each date (which is basic winsorizing) like a simple GROUP BY, but the latter does not work with TOP PERCENT)
SELECT Date,ID,Value,
IIf(Value>[upper_threshold],[upper_threshold],IIf(Value<[lower_threshold],
[lower_threshold],Value)) AS winsor_Value
FROM MyTable,
(SELECT [lower_threshold], [upper_threshold] FROM (SELECT MAX(Value) AS
lower_threshold FROM (SELECT TOP 1 PERCENT Value FROM MyTable ORDER BY
Value)) AS t1, (SELECT MIN(Value) AS upper_threshold FROM (SELECT TOP 1
PERCENT Value FROM MyTable ORDER BY Value DESC)));
My data looks like
I have 700 000 rows.
Thanks a lot
I am not sure if the following works in MS Access, but it is worth a try. To get the value at the top 99%:
select t.date,
(select min(t2.value)
from (select top 1 percent t2.*
from t as t2
where t2.date = t.date
order by t2.value desc
) as t2
) as percentile_99
from (select distinct date
from t
) as t;
I do not know if MS Access scoping rules allow you to correlate a subquery more than one level deep. If so the above approach should work for all the percentiles.

Postgresql extract last row for each id

Suppose I've next data
id date another_info
1 2014-02-01 kjkj
1 2014-03-11 ajskj
1 2014-05-13 kgfd
2 2014-02-01 SADA
3 2014-02-01 sfdg
3 2014-06-12 fdsA
I want for each id extract last information:
id date another_info
1 2014-05-13 kgfd
2 2014-02-01 SADA
3 2014-06-12 fdsA
How could I manage that?
The most efficient way is to use Postgres' distinct on operator
select distinct on (id) id, date, another_info
from the_table
order by id, date desc;
If you want a solution that works across databases (but is less efficient) you can use a window function:
select id, date, another_info
from (
select id, date, another_info,
row_number() over (partition by id order by date desc) as rn
from the_table
) t
where rn = 1
order by id;
The solution with a window function is in most cases faster than using a sub-query.
select *
from bar
where (id,date) in (select id,max(date) from bar group by id)
Tested in PostgreSQL,MySQL
I found this as the fastest solution:
SELECT t1.*
FROM yourTable t1
LEFT JOIN yourTable t2 ON t2.tag_id = t1.tag_id AND t2.value_time > t1.value_time
WHERE t2.tag_id IS NULL
For most scenarios, The most efficient way is to use GROUP BY
I saw the accepted answer which determine that using distinct on (id) id is The most efficient way to solve the problem which was described in the question but I believe it's extremely not accurate.
Sadly I couldn't find any helpfull insights from POSTGRES doc' but I did find this article which refference few others and provide examples whereas
GROUP BY approach definitely leads to better performance
We had discussion over this subject at work and did a little experience over a table that holds some data about tags' blinks with 4,114,692 rows, and has indexes over tag_id and over timestamp (seperated indexes)
Here are the queries:
1.using ditinct:
select distinct on (tag_id) tag_id, timestamp, some_data
from blinks
order by id, timestamp desc;
2.using CTE + group by + join:
`with blink_last_timestamp as (
select tag_id, max(timestamp) as max_timestamp
from blinks
group by tag_id )
select bl.tag_id, max_timestamp, some_data
from blink_last_timestamp bl
join blinks b on
b.tag_id = bl.tag_id and
bd.timestamp = bl.max_timestamp`
The results where unambiguous and favored the second solution for this scenario (Which is pretty generic in my opinion),
showing that it is being 10X times (!) faster 1655.991 ms (00:01.656) vs 16723.346 ms (00:16.723) and of course delivered the same data.
Group by id and use any aggregate functions to meet the criteria of last record. For example
select id, max(date), another_info
from the_table
group by id, another_info