How to return the category with max value for every user in postgresql? - sql

This is the table
id
category
value
1
A
40
1
B
20
1
C
10
2
A
4
2
B
7
2
C
7
3
A
32
3
B
21
3
C
2
I want the result like this
id
category
1
A
2
B
2
C
3
A

For small tables or for only very few rows per user, a subquery with the window function rank() (as demonstrated by The Impaler) is just fine. The resulting sequential scan over the whole table, followed by a sort will be the most efficient query plan.
For more than a few rows per user, this gets increasingly inefficient though.
Typically, you also have a users table holding one distinct row per user. If you don't have it, created it! See:
Is there a way to SELECT n ON (like DISTINCT ON, but more than one of each)
Select first row in each GROUP BY group?
We can leverage that for an alternative query that scales much better - using WITH TIES in a LATERAL JOIN. Requires Postgres 13 or later.
SELECT u.id, t.*
FROM users u
CROSS JOIN LATERAL (
SELECT t.category
FROM tbl t
WHERE t.id = u.id
ORDER BY t.value DESC
FETCH FIRST 1 ROWS WITH TIES -- !
) t;
db<>fiddle here
See:
Get top row(s) with highest value, with ties
Fetching a minimum of N rows, plus all peers of the last row
This can use a multicolumn index to great effect - which must exist, of course:
CREATE INDEX ON tbl (id, value);
Or:
CREATE INDEX ON tbl (id, value DESC);
Even faster index-only scans become possible with:
CREATE INDEX ON tbl (id, value DESC, category);
Or (the optimum for the query at hand):
CREATE INDEX ON tbl (id, value DESC) INCLUDE (category);
Assuming value is defined NOT NULL, or we have to use DESC NULLS LAST. See:
Sort by column ASC, but NULL values first?
To keep users in the result that don't have any rows in table tbl, user LEFT JOIN LATERAL (...) ON true. See:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?

You can use RANK() to identify the rows you want. Then, filtering is easy. For example:
select *
from (
select *,
rank() over(partition by id order by value desc) as rk
from t
) x
where rk = 1
Result:
id category value rk
--- --------- ------ --
1 A 40 1
2 B 7 1
2 C 7 1
3 A 32 1
See running example at DB Fiddle.

Related

SQL Select Rows with Highest Version

I am trying to query the following table:
ID
ConsentTitle
ConsentIdentifier
Version
DisplayOrder
1
FooTitle
foo1
1
1
2
FooTitle 2
foo1
2
2
3
Bar Title
bar1
1
3
4
Bar Title 2
bar1
2
4
My table has entries with unique ConsentTemplateIdentifier. I want to bring back only the rows with the highest version number for that particular unique Identifier...
ID
ConsentTitle
ConsentIdentifier
Version
DisplayOrder
2
FooTitle 2
foo1
2
2
4
Bar Title 2
bar1
2
4
My current query doesn't seem to work. It is telling me:
Column 'ConsentTemplates.ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
Select Distinct ID,ConsentTitle, DisplayOrder, ConsentTemplateIdentifier, MAX(Version) as Version
from FooDocs
group by ConsentTemplateIdentifier
How do I select the rows distinctly which have the highest Version number for their respective ConsentTemplateIdentifiers ordered by their display order?
Any help would be really appreciated. I am using SQL Server.
You can do this with CROSS APPLY.
SELECT DISTINCT ca.*
FROM FooDocs fd
CROSS APPLY (SELECT TOP 1 *
FROM FooDocs
WHERE ConsentIdentifier = fd.ConsentIdentifier
ORDER BY Version DESC) ca
If your unique identifier has it's own table.
SELECT ca.*
FROM ConsentTable ct
CROSS APPLY (SELECT TOP 1 *
FROM FooDocs
WHERE ConsentIdentifier = ct.Identifier
ORDER BY Version DESC) ca
Using CROSS APPLY works, but effectively invokes the sub-query for every row in in your table, then expend effort to de-duplicate the results with DISTINCT, resulting in a semi-cartesian-product / triangular-join.
It is usually much more efficient just to use ROW_NUMBER() and avoid the implicit join all together...
WITH
sorted_by_version AS
(
SELECT
*,
ROW_NUMBER()
OVER (
PARTITION BY ConsentTemplateIdentifier
ORDER BY version DESC
)
AS version_ordinal
FROM
ConsentTemplates
)
SELECT
*
FROM
sorted_by_version
WHERE
version_ordinal = 1
ORDER BY
DisplayOrder
With a slight modification to Derrick's answer, I was able to get the data back the way I wanted to see it using CROSS APPLY
SELECT DISTINCT ca.*
FROM ConsentTemplates fd
CROSS APPLY (SELECT TOP 1 *
FROM ConsentTemplates
WHERE ConsentTemplateIdentifier = fd.ConsentTemplateIdentifier
ORDER BY Version DESC) ca
order by DisplayOrder

SQL select top rows based on limit

Please help me t make below select query
Source table
name Amount
-----------
A 2
B 3
C 2
D 7
if limit is 5 then result table should be
name Amount
-----------
A 2
B 3
if limit is 8 then result table
name Amount
-----------
A 2
B 3
C 2
You can use window function to achieve this:
select name,
amount
from (
select t.*,
sum(amount) over (
order by name
) s
from your_table t
) t
where s <= 8;
The analytic function sum will be aggregated row-by-row based on the given order order by name.
Once you found sum till given row using this, you can filter the result using a simple where clause to find rows till which sum of amount is under or equal to the given limit.
More on this topic:
The SQL OVER() clause - when and why is it useful?
https://explainextended.com/2009/03/08/analytic-functions-sum-avg-row_number/

query for roww returning the first element of a group in db2

Suppose I have a table filled with the data below, what SQL function or query I should use in db2 to retrieve all rows having the FIRST field FLD_A with value A, the FIRST field FLD_A with value B..and so on?
ID FLD_A FLD_B
1 A 10
2 A 20
3 A 30
4 B 10
5 A 20
6 C 30
I am expecting a table like below; I am aware of grouping done by function GROUP BY but how can I limit the query to return the very first of each group?
Essentially I would like to have the information about the very first row where a new value for FLD_A is appearing for the first time?
ID FLD_A FLD_B
1 A 10
4 B 10
6 C 30
Try this it works in sql
SELECT * FROM Table1
WHERE ID IN (SELECT MIN(ID) FROM Table1 GROUP BY FLD_A)
A good way to approach this problem is with window functions and row_number() in particular:
select t.*
from (select t.*,
row_number() over (partition by fld_a order by id) as seqnum
from table1
) t
where seqnum = 1;
(This is assuming that "first" means "minimum id".)
If you use t.*, this will add one extra column to the output. You can just list the columns you want to avoid this.

How can I find the record with the max value for a group?

I am trying to write a query for a large dataset with many joins and having trouble accomplishing a particular piece without some sort of subquery, which I am trying to avoid.
For an example table with columns ID, Size, Item there may be multiple records with the same ID. I want to return the record per ID which has the largest Size.
ID Size Item
1 5 a
1 10 b
2 3 c
2 6 d
2 11 e
3 2 f
Expected result
ID Size Item
1 10 b
2 11 e
3 2 f
I've tried various group and having approaches without success.
Using a subquery I can do it like this but for a large dataset I'd prefer not to do it this way
select id, size, item
from test
where size = (select max(size) from test t2 where id = test.id)
Any suggestions?
This should satisfy your requirements: For each id, return only the row with the largest size
SELECT test.id, test.size, test.item
FROM test
INNER JOIN (
SELECT id, MAX(size) AS size
FROM test
GROUP BY id
) max_size ON max_size.id = test.id AND max_size.size = test.size
WITH T AS ( SELECT * ,
ROW_NUMBER() OVER ( PARTITION BY ID
ORDER BY Size DESC ) AS RN
FROM YourTable
)
SELECT ID ,
Size ,
Item
FROM T
WHERE RN = 1
SELECT id, item, MAX(size)
FROM Test
GROUP BY id, item
Assuming item is the same for every occurrence of that id.
select id, max(size), item
from test
group by id, item
Edit: Ah, the data you just added changes this and my above query no longer applies.
You can use this query(I mean your query) but it's necessary to create composite index (id, size)

Fetch the row which has the Max value for a column in SQL Server

I found a question that was very similar to this one, but using features that seem exclusive to Oracle. I'm looking to do this in SQL Server.
I have a table like this:
MyTable
--------------------
MyTableID INT PK
UserID INT
Counter INT
Each user can have multiple rows, with different values for Counter in each row. I need to find the rows with the highest Counter value for each user.
How can I do this in SQL Server 2005?
The best I can come up with is a query the returns the MAX(Counter) for each UserID, but I need the entire row because of other data in this table not shown in my table definition for simplicity's sake.
EDIT: It has come to my attention from some of the answers in this post, that I forgot an important detail. It is possible to have 2+ rows where a UserID can have the same MAX counter value. Example below updated for what the expected data/output should be.
With this data:
MyTableID UserID Counter
--------- ------- --------
1 1 4
2 1 7
3 4 3
4 11 9
5 11 3
6 4 6
...
9 11 9
I want these results for the duplicate MAX values, select the first occurance in whatever order SQL server selects them. Which rows are returned isn't important in this case as long as the UserID/Counter pairs are distinct:
MyTableID UserID Counter
--------- ------- --------
2 1 7
4 11 9
6 4 6
I like to use a Common Table Expression for that case, with a suitable ROW_NUMBER() function in it:
WITH MaxPerUser AS
(
SELECT
MyTableID, UserID, Counter,
ROW_NUMBER() OVER(PARTITION BY userid ORDER BY Counter DESC) AS 'RowNumber'
FROM dbo.MyTable
)
SELECT MyTableID, UserID, Counter
FROM MaxPerUser
WHERE RowNumber = 1
THat partitions the data over the UserID, orders it by Counter (descending) for each user, and then labels each of the rows starting with 1 for each user. Select only those rows with a 1 for rownumber and you have your max. values per user.
It's that easy :-) And I get results something like this:
MyTableID UserID Counter
2 1 7
6 4 6
4 11 9
Only one entry per user, no matter how many rows per user happen to have the same max value.
I think this will help you.
SELECT distinct(a.userid), MAX(a.counterid) as counterid
FROM mytable a INNER JOIN mytable b ON a.mytableid = b.mytableid
GROUP BY a.userid
There are several ways to do this, take a look at this Including an Aggregated Column's Related Values Several methods are shown including the performance differences
Here is one example
select t1.*
from(
select UserID, max(counter) as MaxCount
from MyTable
group by UserID) t2
join MyTable t1 on t2.UserID =t1.UserID
and t1.counter = t2.counter
Try this... I'm pretty sure this is the only way to truly make sure you get one row per User.
SELECT MT.*
FROM MyTable MT
INNER JOIN (
SELECT MAX(MID.MyTableId) AS MaxMyTableId,
MID.UserId
FROM MyTable MID
INNER JOIN (
SELECT MAX(Counter) AS MaxCounter, UserId
FROM MyTable
GROUP BY UserId
) AS MC
ON (MID.UserId = MC.UserId
AND MID.Counter = MC.MaxCounter)
GROUP BY MID.UserId
) AS MID
ON (MID.UserId = MC.UserId
AND MID.MyTableId = MC.MaxMyTableId)
select m.*
from MyTable m
inner join (
select UserID, max(Counter) as MaxCounter
from MyTable
group by UserID
) mm on m.UserID = mm.UserID and m.Counter = mm.MaxCounter