SQL: Select id from table grouping by max year and name - sql

I have the following data:
ID Year Name
1 2016 A
2 2015 A
3 2014 A
4 2014 B
5 2015 B
6 2010 C
7 2007 D
8 2008 D
9 2006 D
I need to query just the ID of the max date for each name group
Result: [1, 5, 6, 8 ]
which is really:
ID Year Name
1 2016 A
5 2015 B
6 2010 C
8 2008 D
I have the following, but don't know where to go from here
SELECT MAX(year) from table GROUP BY name
Ideally there should be no duplicate year and name groups, but if the there are duplicate records, then its possible. Since they would be duplicates, it would not matter which to keep.

If you want one row per name then I would recommend distinct on:
select distinct on (name) t.*
from t
order by name, year desc;
If you have duplicates, then one solution is rank():
select id, year, name
from (select t.*, rank() over (partition by name order by year desc) as seqnum
from t
) t
where seqnum = 1;

One could use row_number() analytic partitioned by name and ordered by year and ID desc to get the max ID for the max date. You've not indicated if ties exist what you'd like to see... but this would return one of them (the one with the highest ID.)
SELECT *
FROM (SELECT ID
, year
, Name
, row_number() over (PARTITION BY Name ORDER BY Year Desc, ID Desc) RN
FROM tbl) Z
WHERE RN = 1
An alternative way to accomplish this is to use your query as a inline view and simply join it back to the base set.
SELECT *
FROM tbl A
INNER JOIN (SELECT max(year) mYear, name
FROM tbl
GROUP BY name) B
on A.year = B.myear
and A.Name = B.Name
Ties will be displayed. So if you have a name with two records having a max year of 2016, then both records would be returned.

Related

Get most recent measurement

I have a table that has has some measurements, ID and date.
The table is built like so
ID DATE M1 M2
1 2020 1 NULL
1 2020 NULL 15
1 2018 2 NULL
2 2019 1 NULL
2 2019 NULL 1
I would like to end up with a table that has one row per ID with the most recent measurement
ID M1 M2
1 1 15
2 1 1
Any ideas?
You can use correlated sub-query with aggregation :
select id, max(m1), max(m2)
from t
where t.date = (select max(t1.date) from t t1 where t1.id = t.id)
group by id;
Use ROW_NUMBER combined with an aggregation:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATE DESC) rn
FROM yourTable
)
SELECT ID, MAX(M1) AS M1, MAX(M2) AS M2
FROM cte
WHERE rn = 1
GROUP BY ID;
The row number lets us restrict to only records for each ID having the most recent year date. Then, we aggregate to find the max values for M1 and M2.
In standard SQL, you can use lag(ignore nulls):
select id, coalesce(m1, prev_m1), coalesce(m2, prev_m2)
from (select t.*,
lag(m1 ignore nulls) over (partition by id order by date) as prev_m1,
lag(m2 ignore nulls) over (partition by id order by date) as prev_m2,
row_number() over (partition by id order by date desc) as seqnum
from t
) t
where seqnum = 1;

How to group by one column, aggregate by another column and get another column as result in postgresql?

This seems something simple, but couldn't find an answer for this question last few hours.
I have a table request_state, where "id" is primary key, it can have multiple entries with same state_id. I want to get the id after grouping by state_id using max datetime.
So I tried this, but it gives error "state_id" must appear in the GROUP BY clause or be used in an aggregate function
select id, state_id, max(datetime)
from request_state
group by id
but when I use following query, I get multiple entries with same state_id.
select id, state_id, max(datetime)
from request_state
group by id, state_id
My table:
id state_id date_time
cef 1 Jan 1
ter 1 Jan 2
ijk 1 Jan 3
uuu 2 Feb 1
rrr 2 Feb 2
This is what I want as my result,
id state_id date_time
__ ________ _________
ijk 1 Jan 3
rrr 2 Feb 2
You seem to want:
select max(id) as id, state_id, max(datetime)
from request_state
group by state_id;
If you want the row where datetime is maximum for each state, then use distinct on:
select distinct on (state) rs.*
from request_state rs
order by state, datetime desc;
Try this query:
select id, state_id, date_time from (
select id, state_id, date_time,
row_number() over (partition by state_id order by date_time desc) rn
from tbl
) a where rn = 1
You can use correlated suqbuery :
select t.*
from table t
where date_time = (select max(date_time) from table t1 where t1.state_id = t.state_id);

using group by in subquery in sql

how to get around this error :
Unable to use an aggregate or a subquery in an expression used in the
GROUP BY list of a GROUP BY clause.
here is my query :
select Id, name,dayA,monthA,yearA,
sum(x) as x,
(select SUM(x) group by month) as total,
from table_A
group by Id,name,monthA,dAyA,yearA, SUM(x)
in other words :
sample data :
id name dayA monthA yearA x
===========================
1 name1 2 3 2016 4
2 name2 2 3 2016 3
3 name1 2 3 2016 2
Expected result :
id name dayA monthA yearA x total
===================================
1 name1 2 3 2016 4 6
2 name2 2 3 2016 3 3
3 name1 2 3 2016 2 6
Thanks in advance
you're query has more problem.
(select SUM(x) group by month) as total, is it from the same table, not likely since column month is not mention inyour group by. When using sub query in a query, you must guaranteed that i will only return one record.
Based on your sample data and expected results...
create table table_A(
id int,
name varchar(25),
dayA int,
monthA int,
yearA int,
x int
)
insert into table_A
values (1,'name1',2,3,2016,4),
(2,'name2',2,3,2016,3),
(2,'name1',2,3,2016,2)
select ta.id, ta.name, ta.dayA, ta.monthA, ta.yearA, ta.x, total.Total from table_A as ta
left join
(select name, sum(x) as Total from table_A group by name) total on ta.name = total.name
group by
ta.id, ta.name, ta.dayA, ta.monthA, ta.yearA, ta.x, total.name, total.Total
May be this is what you want:
select table_A.*, TotalSums.total
from table_A
left join (select name, monthA, dayA, yearA, sum(x) as total from table_A group by name, monthA, dayA, yearA) as TotalSums
on table_A.name = TotalSums.name
and table_A.monthA = TotalSums.monthA
and table_A.dayA = TotalSums.dayA
and table_A.yearA = TotalSums.yearA
order by id
i think this is what you're looking for
select Id, main.name,dayA,main.monthA,main.yearA,
sum(x) as x,
,max(total.total) as total
from table_A as main
join (select SUM(x) total ,name ,monthA,yearA from table_A group by name,monthA,yearA) as total
on main.name = total.name
and main.monthA = total.monthA
and main.yearA = total.yearA
group by Id,main.name,monthA,dAyA,yearA

How to filter out the first and last entry from a table using RANK?

I've this data:
Id Date Value
'a' 2000 55
'a' 2001 3
'a' 2012 2
'a' 2014 5
'b' 1999 10
'b' 2014 110
'b' 2015 8
'c' 2011 4
'c' 2012 33
I want to filter out the first and the last value (when the table is sorted on the Date column), and only keep the other values. In case there are only two entries, nothing is returned. (Example for Id = 'c')
ID Date Value
'a' 2001 3
'a' 2012 2
'b' 2014 110
I tried to use order by (RANK() OVER (PARTITION BY [Id] ORDER BY Date ...)) in combination with this article (http://blog.sqlauthority.com/2008/03/02/sql-server-how-to-retrieve-top-and-bottom-rows-together-using-t-sql/) but I can't get it to work.
[UPDATE]
All the 3 answers seem fine. But I'm not a SQL expert, so my question is which one has the fastest performance if the table has around 800000 rows and there a no indexes on any column.
You can use row_number twice to determine the min and max dates and then filter accordingly:
with cte as (
select id, [date], value,
row_number() over (partition by id order by [date]) minrn,
row_number() over (partition by id order by [date] desc) maxrn
from data
)
select id, [date], value
from cte
where minrn != 1 and maxrn != 1
SQL Fiddle Demo
Here's another approach using min and max for this without needing to use a ranking function:
with cte as (
select id, min([date]) mindate, max([date]) maxdate
from data
group by id
)
select *
from data d
where not exists (
select 1
from cte c
where d.id = c.id and d.[date] in (c.mindate, c.maxdate))
More Fiddle
Here is a similar solution with row_number and count :
SELECT id,
dat,
value
FROM (SELECT *,
ROW_NUMBER()
OVER(
partition BY id
ORDER BY dat) rnk,
COUNT(*)
OVER (
partition BY id) cnt
FROM #table) t
WHERE rnk NOT IN( 1, cnt )
You can do this with EXISTS:
SELECT *
FROM Table1 a
WHERE EXISTS (SELECT 1
FROM Table1 b
WHERE a.ID = b.ID
AND b.Date < a.Date
)
AND EXISTS (SELECT 1
FROM Table1 b
WHERE a.ID = b.ID
AND b.Date > a.Date
)
Demo: SQL Fiddle

SQL Max over multiple versions

I have a table with three columns
Product Version Price
1 1 25
1 2 15
1 3 25
2 1 8
2 2 8
2 3 4
3 1 25
3 2 10
3 3 5
I want to get the max price and the max version by product.
So in the above example the results would have product 1, version 3, price25. product 2, version 2, price 8.
Can you let me know how I would do this.
I'm on Teradata
If Teradata supports the ROW_NUMBER analytic function:
SELECT
Product,
Version,
Price
FROM (
SELECT
atable.*, /* or specify column names explicitly as necessary */
ROW_NUMBER() OVER (PARTITION BY Product
ORDER BY Price DESC, Version DESC) AS rn
FROM atable
) s
WHERE rn = 1
;
Using Teradata SQL this can be further simplified:
SELECT * FROM atable
QUALIFY
ROW_NUMBER()
OVER (PARTITION BY Product
ORDER BY Price DESC, Version DESC) = 1;
The QUALIFY is a Teradata extension to Standard SQL, it's similar to a HAVING for GROUP BY, it filters the result of a window function.
SELECT product
, max(version) as version
, max(price) as price
FROM mytable
GROUP BY product
Following code will select product, Highest value of version, Highest value of price and will sort at product using GROUP BY
SELECT [product], MAX([version]) as [MaxVersion], MAX([price]) as [MaxPrice]
FROM [NameOfTable]
GROUP BY [product]
More explanation on Max function:
Max function SQL
try this one
select p.Product, MAX(p.Price), (select MAX(Version) from Products where Product = p.Product and Price = MAX(p.price))
from Products as p
group by p.Product
it returns
(Product, price,version)
1 25 3 ,
2 8 2 ,
3 25 1