Consider a PostgreSQL table with fields a-z
a, b, c ... z
-------------
5, 6, 2 ... 9
5, 6, 3 ... 1
I'd like to do a group on fields a,b and keep only records where b was maximum.
SELECT a, max(b) as b, c, d, e ... z
FROM table
GROUP BY a, b
This works fine, but it's annoying to have to type out all the values in SELECT. I'd much rather do something like
SELECT max(b) as b, *
FROM TABLE
But doing so gives error
[42803] ERROR: column "table.id" must appear in the GROUP BY clause or
be used in an aggregate function.
Any idea how to avoid having to type all the column names in a lengthy table when doing a groupby operation?
You can use rank():
select t.*
from (select t.*, rank() over (partition by a order by b desc) as seqnum
from t
) t
where seqnum = 1;
Actually, in Postgres, the fastest method is usually distinct on:
select t.*
from t
order by a, b desc;
With an index on (a, b desc) this should be the fastest method.
Gordon Linoff's answer put me on the right track, namely using distinct on. This works in postgres
SELECT DISTINCT ON (a, b) *
FROM table
ORDER BY a, b DESC
Basically it lists the distinct rows of (a,b) and sorts them in order, hence taking only the first or last value depending on sort order. Actually surprised this works...
Related
Assuming that I have data like the ones in columns A and B, how can I rank them like column C? I have tried multiple varieties of RANK and NTILE but have been unsuccessful. Thank you.
Note: There are not always 3 rows for each group, it varies.
SQL tables are inherently unordered. There is no distinguishing between the 1st and 4th row, with the data as you've presented. You can generate an equivalent result set, but the ordering may differ.
Simple arithmetic may do the trick:
select a,
( row_number() over (order by a) + 2) / 3 ) as
from t
order by a, b, c;
A better method uses the b column:
select a,
row_number() over (partition by b order by a) as c
from t
order by a, c;
you can use ntile as below:
Select *, ntile(2) over(order by (Select NULL)) from #data
Instead of (Select NULL) you can provide any other valid ordering column based on your data
I have a permanent problem,
lets assume that I have a following columns:
T:A(PK), B, C, D, E
Now,
select A, MAX(B) from T group BY A
No, I cant do:
select A, C, MAX(B) from T group BY A
I don't understand why - when in comes to AVG or SUM I get it. However, MAX or MIN is getting from exactly one row.
How to deal with it?
You can use ROW_NUMBER() for that like this:
select A, C, B
from (
select *
, row_number() over (partition by A order by B desc) seq
-- group by ^ max(^)
from yourTable ) t
where seq = 1;
That's cause columns included in the select list should also be part of group by clause. You may have column which re part of group by but not present in select list but vice-versa not possible.
You generally, put only those columns in select clause on which you want the grouping to happen.
try this. it can help you find the MAX by just 1 column (f1), and also adding the column you wanted(f3) but not affecting your MAX operation
SELECT m.f1,s.f2,m.maxf3 FROM
(SELECT f1,max(f3) maxf3 FROM t1 GROUP BY f1) m
CROSS APPLY (SELECT TOP(1) f2,f1 FROM t1 WHERE m.f1 = f1) s
Your question isn't very clear in that we aren't sure what you are trying to do.
Assuming you don't actually want to do a group by in your main query but want to return the max of B based on column A you can do it like so.
select A, C,(Select Max(B) from T as T2 WHERE T.A = T2.A) as MaxB from T
I found this very similar question here and all I would like to do in addition to this is group by date. There is a date column present in each sub-query.
concatenating columns from multiple unrelated 1-row resultsets
Each sub-query will all contain the exact same range of dates. I have tried putting group by in each sub-query, and in the outer query, but I can't seem to make a row combining each sub-query for each date. I am using this query in Hive, but I believe any ANSI SQL will work here. Don't quote me on that though. My scenario seems to be a minor variation on the answers found in the link I posted, however I can't seem to make it work.
Here is one query posted at the link I attached above:
select A, B, C, D
from ( SELECT SUM(A) as A, SUM(B) as B FROM X ) as U
CROSS JOIN ( SELECT SUM(C) as C, SUM(D) as D FROM Y ) as V
How do I add a GROUP BY to this when each sub-query has a date column? Or is there a better way to achieve the same result?
Is this what you want?
select u.dte, A, B, C, D
from (select dte, SUM(A) as A, SUM(B) as B
from X
group by dte
) u join
(select dte, SUM(C) as C, SUM(D) as D
from Y
group by dte
) v
on u.dte = v.dte;
I was trying to do something like:
SELECT
a, b, c, MAX(d)
FROM
table -- table with 4 columns a, b, c and d
GROUP BY
a, b
I would like to have c as an additional value from the table that I do not want to group by, but that distinguish rows within groups. My problem is that GROUP BY makes c look like the first rows from groups and not the ones that really contain
d = MAX(d)
in the table.
ORDER BY is applied to the whole result, so it's not an option. Can I achieve that in any other way than sorting the table prematurely (as a subquery) and then applying the grouping? Would that work in every SQL engine? Do standards define such behaviors?
Edit1:
I tested something like:
SELECT
t.*,
MAX(d) AS v
FROM
(SELECT
a, b, c, d
FROM
table
ORDER BY
d DESC) AS t
GROUP BY
a, b
and it works... but I do not think anybody can guarantee that the sort order will also be applied to the group rows... - maybe it works this way in MySQL, but how will it go with Oracle or PostgreSQL?
This is ANSI SQL:
SELECT a,
b,
c,
MAX(d) over (partition by a,b) as max_d
FROM the_table
This will still return all rows from the table. The max value will repeated for every row that is returned. If you want to get only the rows with the max value you need to wrap this in a derived table:
select a,b,c,d
from (
SELECT a,
b,
c,
d,
MAX(d) over (partition by a,b) as max_d
FROM the_table
) t
where d = max_d;
That will return multiple rows if the same max value occurs more than once. If you only want a single row for each max value you need to use row_number()
You can use
select x.*,y.c from
(SELECT a, b, MAX(d) as d FROM table GROUP BY a, b) x,(select c,d from table) y
where x.d = y.d
Given a table [a, b, c, d], I want to select exactly those rows which have max(a) within groups of c
i.e. rows with a = select max(a) from table group by c
What is the most efficient way to do this ? Can I use partition clause etc etc?
In real world there exists almost anytime a clue, a particularity of problem, that can be speculated in your favor.
Your problem though is an ideal case, has no particularity. This query will full scan the table and then will make some sorts to find the maximum a:
select a,b,c,d
from(
select
a,
b,
c,
d,
row_number() over (partition by c order by a desc) as rnk_in_group_of_c
from table
)
where rnk_in_group_of_c = 1;
This query is better than using a subquery to find the max because it may lead to more than one full scan, unwanted nested loops or other performance issues.
Note that if you want all rows that have tha maximum a(ie there are two equal maximum rows) you should use dense_rank() function instead of row_number()
Have you tried the keep (dense_rank first this helped me a lot and the performance can be a lot better.
select
max(a) a,
max(b) keep (dense_rank first order by a desc) b,
c,
max(d) keep (dense_rank first order by a desc) d
from table
group by c
Optional order by may be added after partition:
SELECT max(a) OVER (PARTITION BY c) max_c FROM...