Sorted SQL groups - sql

I was trying to do something like:
SELECT
a, b, c, MAX(d)
FROM
table -- table with 4 columns a, b, c and d
GROUP BY
a, b
I would like to have c as an additional value from the table that I do not want to group by, but that distinguish rows within groups. My problem is that GROUP BY makes c look like the first rows from groups and not the ones that really contain
d = MAX(d)
in the table.
ORDER BY is applied to the whole result, so it's not an option. Can I achieve that in any other way than sorting the table prematurely (as a subquery) and then applying the grouping? Would that work in every SQL engine? Do standards define such behaviors?
Edit1:
I tested something like:
SELECT
t.*,
MAX(d) AS v
FROM
(SELECT
a, b, c, d
FROM
table
ORDER BY
d DESC) AS t
GROUP BY
a, b
and it works... but I do not think anybody can guarantee that the sort order will also be applied to the group rows... - maybe it works this way in MySQL, but how will it go with Oracle or PostgreSQL?

This is ANSI SQL:
SELECT a,
b,
c,
MAX(d) over (partition by a,b) as max_d
FROM the_table
This will still return all rows from the table. The max value will repeated for every row that is returned. If you want to get only the rows with the max value you need to wrap this in a derived table:
select a,b,c,d
from (
SELECT a,
b,
c,
d,
MAX(d) over (partition by a,b) as max_d
FROM the_table
) t
where d = max_d;
That will return multiple rows if the same max value occurs more than once. If you only want a single row for each max value you need to use row_number()

You can use
select x.*,y.c from
(SELECT a, b, MAX(d) as d FROM table GROUP BY a, b) x,(select c,d from table) y
where x.d = y.d

Related

SQL: How to extract one row by MAX in another column and DISTINCT in yet another sequence of columns?

Say I have a table with columns A, B, C, and D.
What I want is to
get all the distinct combinations of A and B that there exists in the original table.
for every such combination, I want extract a SINGLE row that has that combination, plus I also want its C and D value.
there will probably be multiple rows that have that particular combination of A and B. In that case, I still only want just one row, and it should be the one with the highest value in the C column.
For example, if in my original table I have A = Male or Female, B = Tall or Short, and C = Age, and D is something else, then I want to end up with a table with 4 rows, each having one of these combinations:
Male, Tall, …, …
Female Tall, …, ...
Male Short, …, ...
Female Short, …, …
where each row should belong to the person with the biggest age, and then their respective D value as well.
Gordon's answer is wrong. Use this:
select a, b, c, d
from (select t.*,
row_number() over (partition by a, b order by c) as seqnum
from t
) t
where seqnum = 1;
if you are trying to group by columns A and B, and get max,sum,avg or any other function value of columns C and D a simple group by clause might work, example;
select a, b, max(c), max(d)
from table
group by a,b
You can use row_number():
select a, b, c, d
from (select t.*,
row_number() over (partition by a, b order by c desc) as seqnum
from t
) t
where seqnum = 1;

How to select * in addition to group by?

Consider a PostgreSQL table with fields a-z
a, b, c ... z
-------------
5, 6, 2 ... 9
5, 6, 3 ... 1
I'd like to do a group on fields a,b and keep only records where b was maximum.
SELECT a, max(b) as b, c, d, e ... z
FROM table
GROUP BY a, b
This works fine, but it's annoying to have to type out all the values in SELECT. I'd much rather do something like
SELECT max(b) as b, *
FROM TABLE
But doing so gives error
[42803] ERROR: column "table.id" must appear in the GROUP BY clause or
be used in an aggregate function.
Any idea how to avoid having to type all the column names in a lengthy table when doing a groupby operation?
You can use rank():
select t.*
from (select t.*, rank() over (partition by a order by b desc) as seqnum
from t
) t
where seqnum = 1;
Actually, in Postgres, the fastest method is usually distinct on:
select t.*
from t
order by a, b desc;
With an index on (a, b desc) this should be the fastest method.
Gordon Linoff's answer put me on the right track, namely using distinct on. This works in postgres
SELECT DISTINCT ON (a, b) *
FROM table
ORDER BY a, b DESC
Basically it lists the distinct rows of (a,b) and sorts them in order, hence taking only the first or last value depending on sort order. Actually surprised this works...

concatenating columns from multiple unrelated 1-row resultsets (with group by)

I found this very similar question here and all I would like to do in addition to this is group by date. There is a date column present in each sub-query.
concatenating columns from multiple unrelated 1-row resultsets
Each sub-query will all contain the exact same range of dates. I have tried putting group by in each sub-query, and in the outer query, but I can't seem to make a row combining each sub-query for each date. I am using this query in Hive, but I believe any ANSI SQL will work here. Don't quote me on that though. My scenario seems to be a minor variation on the answers found in the link I posted, however I can't seem to make it work.
Here is one query posted at the link I attached above:
select A, B, C, D
from ( SELECT SUM(A) as A, SUM(B) as B FROM X ) as U
CROSS JOIN ( SELECT SUM(C) as C, SUM(D) as D FROM Y ) as V
How do I add a GROUP BY to this when each sub-query has a date column? Or is there a better way to achieve the same result?
Is this what you want?
select u.dte, A, B, C, D
from (select dte, SUM(A) as A, SUM(B) as B
from X
group by dte
) u join
(select dte, SUM(C) as C, SUM(D) as D
from Y
group by dte
) v
on u.dte = v.dte;

Getting row with MAX value together with SUM

I have a PostgreSQL table example with three columns: a INT, b INT, c TEXT.
For each value of a I want the c with the highest value of b, together with the sum of all b. Something like (if there was an ARGMAX function):
SELECT a, ARGMAX(c for MAX(b)), SUM(b) FROM example GROUP BY a
I've found a lot of solutions with varying techniques to get the ARGMAX bit, but none of them seem to use GROUP BY, so I was wondering what we most efficient way would be to capture the SUM (or other aggregate functions) as well.
This can be easily achieved using window functions:
SELECT a, b, c, s
FROM (
SELECT a, b, c,
ROW_NUMBER() OVER (PARTITION BY a ORDER BY b DESC) AS rn,
SUM(b) OVER (PARTITION BY a) AS s
FROM example) AS t
WHERE t.rn = 1
ROW_NUMBER enumerates records within each a partition: the record having the highest b value is assigned a value of 1, next record a value of 2, etc.
SUM(b) OVER (PARTITION BY a) returns the sum of all b within each a partition.

In SQL Server, what is the best way to create a union query excluding partial duplicates

I have two tables that have the same schema. I want to create a union of all the fields, but I want to exclude duplicates based on the equality of some, but not all of the fields. What is the best way to achieve this in SQL Server (2008r2)?
I see this sort of answer, but is there a better option?
Thanks for any help.
You might be able to do it with the RANK() function, though as #Tim says it will just discard any differences in all fields not used in the partition. Below, if you have six rows with B and C in common, only one of them will survive regardless of the values in columns A and D:
SELECT *
FROM (
SELECT A, B, C, D,
RANK() OVER(PARTITION BY B, C ORDER BY B, C) AS MYRANK
FROM (
SELECT A, B, C, D
FROM TABLE_A
UNION
SELECT A, B, C, D
FROM TABLE_B
) T1
WHERE T1.MYRANK = 1