How to select nth row of each group in SQL - sql

I have a table like this
member_id
book_title
1
one-one
1
one-two
1
one-three
2
two-one
2
two-two
2
two-three
I want to group by member_id and 3rd book title, so if I run the query, the result must be:
member_id
book_title
1
one-three
2
two-three

Assuming with "3rd book title" you mean "the third book if sorted alphabetically by title" this can be achieved using window functions:
select member_id, book_title
from (
select member_id, book_title,
row_number() over (partition by member_id order by book_title) as rn
from the_table
) t
where rn = 3;
Note that this won't return members that have less than 3 books assigned.

Related

Select row with max value from each group in Oracle SQL [duplicate]

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Closed 1 year ago.
I have table people containing people, their city and their money balance:
id city_id money
1 1 25
2 1 13
3 2 97
4 2 102
5 2 37
Now, I would like to select richest person from each city. How can I do that using Oracle SQL? Desired result is:
id city_id money
1 1 25
4 2 102
Something like that would be useful:
SELECT * as tmp FROM people GROUP BY city_id HAVING money = MAX(money)
You should be thinking "filtering", not "aggregation", because you want the entire row. You can use a subquery:
select p.*
from people p
where p.money = (select max(p2.money) from people p2 where p2.city_id = p.city_id);
You can use DENSE_RANK() analytic function through grouping by city_id(by using partition by clause) and descendingly ordering by money within the subquery to pick the returned values equal to 1 within the main query in order to determine the richest person including ties(the people having the same amount of money in each city) such as
SELECT id, city_id, money
FROM( SELECT p.*,
DENSE_RANK() OVER ( PARTITION BY city_id ORDER BY money DESC ) AS dr
FROM people p )
WHERE dr = 1
You can use RANK() as its flexible as you can get richest or top N richest
SELECT
id, city_id, money
FROM (
SELECT
p.* ,RANK() OVER (PARTITION BY city_id ORDER BY money DESC ) as rank_per_city
FROM
people p )
WHERE
rank_per_city = 1

Just another SQL case (GROUP BY)

I'm stuck on an SQL problem that I don't know how to solve.
Let's say I have a table like this (concerning estimations on house prices):
estimationID | estimationDate | userID | cityID
1 | '2020-01-01' | 123456 | 987654
2 | '2020-12-01' | 135790 | 975310
...
With estimationDate being the date when the estimation was made, userID the ID of the user who made the estimation and cityID the ID of the city where the estimation was made.
I need to get the maximum number of estimations made by one user (I don't care which one, I don't need an ID) for each city.
Something like
SELECT cityID,*maximum number of estimations made by one user from this city* FROM estimationsTable GROUP BY cityID
Any idea?
Step by step:
Get the number of estimations per user and city.
Get the maximum of these numbers per city.
The query:
select cityid, max(cnt)
from
(
select cityid, userid, count(*) as cnt
from estimationstable
group by cityid, userid
) counted
group by cityid
order by cityid;
try like below
with cte as (
select userid,cityid,count(*) as cnt
from table_name group by userid,cityid
)
, cte2 as (
select *,
row_number() over(partition by cityid order by cnt desc) rn
from cte
) select * from cte2 where rn=1
sol 1:
SELECT id, MAX(maximum_number_of_estimations)
FROM (SELECT id,COUNT(*) AS maximum_number_of_estimations
FROM TABLE x)group by id as final_query
sol2:
use order by Count DESC with group by`
something like this should work
the idea is you count all the occurrences in the inner query with the group by on your id and another query to get the max of it OR you use ORDER BY [Field] DESC
with GROUP BY which will automatically put the highest ones on the top
In BigQuery, I think you can do this without a subquery:
select distinct cityid,
(array_agg(userid order by count(*) desc, userid))[ordinal(1)] as userid,
max(count(*)) over (order by count(*) desc) as cnt
from estimationstable
group by cityid, userid

SQL MIN(value) matching row in PostgreSQL

I have a following tables:
TABLE A:
ID ID NAME PRICE CODE
00001 B 1000 1
00002 A 2000 1
00003 C 3000 1
Here is the SQL I use:
Select Min (ID),
Min (ID NAME),
Sum(PRICE)
From A
GROUP BY CODE
Here is what I get:
ID ID NAME PRICE
00001 A 6000
As you can see, ID NAME don't match up with the min row value. I need them to match up.
I would like the query to return the following
ID ID NAME PRICE
00001 B 6000
What SQL can I use to get that result?
If you want one row, use limit or fetch first 1 row only:
select a.*
from a
order by a.price asc
fetch first 1 row only;
If, for some reason, you want the sum() of all prices, then you can use window functions:
select a.*, sum(a.price) over () as sum_prices
from a
order by a.price asc
fetch first 1 row only;
You can use row_number() function :
select min(id), max(case when seq = 1 then id_name end) as id_name, sum(price) as price, code
from (select t.*, row_number() over (partition by code order by id) seq
from table t
) t
group by code;
you can also use sub-query
select t1.*,t2.* from
(select ID,Name from t where ID= (select min(ID) from t)
) as t1
cross join (select sum(Price) as total from t) as t2
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=a496232b552390a641c0e5c0fae791d1
id name total
1 B 6000

Select by greatest sum, but without the sum in the result

I need to select the top score of all combined attempts by a player and I need to use a WITH clause.
create table scorecard(
id integer primary key,
player_name varchar(20));
create table scores(
id integer references scorecard,
attempt integer,
score numeric
primary key(id, attempt));
Sample Data for scorecard:
id player_name
1 Bob
2 Steve
3 Joe
4 Rob
Sample data for scores:
id attempt score
1 1 50
1 2 45
2 1 10
2 2 20
3 1 40
3 2 35
4 1 0
4 2 95
The results would simply look like this:
player_name
Bob
Rob
But would only be Bob if Rob had scored less than 95 total. I've gotten so far as to have the name and the total scores that they got in two columns using this:
select scorecard.player_name, sum(scores.score)
from scorecard
left join scores
on scorecard.id= scores.id
group by scorecard.name
order by sum(scores.score) desc;
But how do I just get the names of the highest score (or scores if tied).
And remember, it should be using a WITH clause.
Who ever told you to "use a WITH clause" was missing a more efficient solution. To just get the (possibly multiple) winners:
SELECT c.player_name
FROM scorecard c
JOIN (
SELECT id, rank() OVER (ORDER BY sum(score) DESC) AS rnk
FROM scores
GROUP BY 1
) s USING (id)
WHERE s.rnk = 1;
A plain subquery is typically faster than a CTE. If you must use a WITH clause:
WITH top_score AS (
SELECT id, rank() OVER (ORDER BY sum(score) DESC) AS rnk
FROM scores
GROUP BY 1
)
SELECT c.player_name
FROM scorecard c
JOIN top_score s USING (id)
WHERE s.rnk = 1;
SQL Fiddle.
You could add a final ORDER BY c.player_name to get a stable sort order, but that's not requested.
The key feature of the query is that you can run a window function like rank() over the result of an aggregate function. Related:
Postgres window function and group by exception
Get the distinct sum of a joined table column
Can try something like follows.
With (SELECT id, sum(score) as sum_scores
FROM scores
group by id) as sumScoresTable,
With (SELECT max(score) as max_scores
FROM scores
group by id) as maxScoresTable
select player_name
FROM scorecard
WHERE scorecard.id in (SELECT sumScoresTable.id
from sumScoresTable
where sumScoresTable.score = (select maxScoresTable.score from maxScoresTable)
Try this code:
WITH CTE AS (
SELECT ID, RANK() OVER(ORDER BY SumScore DESC) As R
FROM (
SELECT ID, SUM(score) AS SumScore
FROM scores
GROUP BY ID )
)
SELECT player_name
FROM scorecard
WHERE ID IN (SELECT ID FROM CTE WHERE R = 1)

ORDER BY ROW_NUMBER

UPD: thanks for all, topic closed, after sleeping I understand everything =)
I have a problem with understanding OVER clause and and ROW_NUMBER function. Simple table - name and mark. I want to calculate average mark for each name.
SELECT top 1 with ties name, ROW_NUMBER() over (PARTITION BY name ORDER BY name) as number
FROM table
ORDER BY AVG(mark) OVER(PARTITION BY name)
it will display something like this, and I understand why - that is what ROW_NUMBER() does
name|number
Pete 1
Pete 2
But if I write
SELECT top 1 with ties name, ROW_NUMBER() over (PARTITION BY name ORDER BY name) as number
FROM table
ORDER BY AVG(mark) OVER(PARTITION BY name), number
it will display
name|number
Pete 1
And this time I don't understand how ORDER BY works with ROW_NUMBER() function. Can somebody explain it to me?
You can certainly order by ROW_NUMBER column because the SELECT clause is evaluated before the ORDER BY clause. You can ORDER BY any column or column alias. This is why no error message was thrown (because it is valid).
SELECT name, ROW_NUMBER() over (PARTITION BY name ORDER BY name) as number
FROM #table
ORDER BY number
Evaluates to
name number
---------- --------------------
John 1
pete 1
pete 2
John 2
pete 3
OP's second example of row_number is not correct.
SELECT AVG(mark) OVER(PARTITION BY name), name, ROW_NUMBER() over (PARTITION BY name ORDER BY name) as number
FROM #table
ORDER BY AVG(mark) OVER(PARTITION BY name), number
Returns as expected because AVG is the first sort column followed by number.
name number
----------- ---------- --------------------
11 pete 1
11 pete 2
11 pete 3
17 John 1
17 John 2
Change the query to number DESC and pete is still first however the row numbers are descending order.
name number
----------- ---------- --------------------
11 pete 3
11 pete 2
11 pete 1
17 John 2
17 John 1
SQL Order of operations
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
You can't ORDER BY the ROW_NUMBER directly: I don't know why you didn't get an error on this case, but normally you would. Hence the use of derived tables or CTEs
SELECT
name, number
FROM
(
SELECT
name,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) as number,
AVG(mark) OVER (PARTITION BY name) AS nameavg
FROM table
) foo
ORDER BY
nameavg, number
However, PARTITION BY name ORDER BY name is meaningless. Each partition has random order because the sort is the partition
I suspect you want something like this where ROW_NUMBER is based on AVG
SELECT
name, number
FROM
(
SELECT
name,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY nameavg) AS number
FROM
(
SELECT
name,
AVG(mark) OVER (PARTITION BY name) AS nameavg
FROM table
) foo
) bar
ORDER BY
number
Or more traditionally (but name is collapsed for the average)
SELECT
name, number
FROM
(
SELECT
name,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY nameavg) AS number
FROM
(
SELECT
name,
AVG(mark) AS nameavg
FROM
table
GROUP BY
name
) foo
) bar
ORDER BY
number
You can maybe collapse the derived foo and bar into one with
ROW_NUMBER() OVER (PARTITION BY name ORDER BY AVG(mark))
But none of this makes sense: I understand that your question is abstract about how it works bit it is unclear question. It would make more sense if you described what you want in plain English and with sample input and output