How to SELECT top 3 base on an attribute value? - sql

There is a table which has 4 columns: id, student_name,phone_num, score.
And I want to select the TOP 3 student base on the score.
table:
id|student_name |phone_num|score
1 | James | 001350 | 89
2 | Roomi | 123012 | 78
3 | Sibay | 123012 | 65
4 | Ellae | 123012 | 78
5 | Katee | 123012 | 33
As the table shows, there are two students have the same scores.
So they are in the same rank.
I tried to use 'LIMIT' but it can only select 3 rows.
SELECT id,student_name,score
FROM table
GROUP BY id,student_name,score
ORDER BY score
LIMIT 3
Expected results:
id|student_name |score
1 | James | 89
2 | Roomi | 78
4 | Ellae | 78
3 | Sibay | 65
Thanks!

You'll want to use a ranking function - I'd recommend Dense Rank.
; with CTE as
(Select ID, Student_Name, Score, Dense_Rank() over (order by score desc) as DR
From Table)
Select *
from CTE
where DR <= 3
To expand on this function:
Dense_Rank will assign tied values the same number, then assign the next highest value the next highest number (in comparison to Rank, which will skip ranks if there are ties). For example:
Value Rank Dense_Rank
1 1 1
3 2 2
3 2 2
7 4 3
8 5 4

You want to use DENSE_RANK here:
WITH cte AS (
SELECT id, student_name, score,
DENSE_RANK() OVER (ORDER BY score DESC) dr
FROM yourTable
)
SELECT id, student_name, score
FROM cte
WHERE dr <= 3
ORDER BY score DESC;
Another way, using a subquery to find the top 3 distinct highest scores:
SELECT id, student_name, score
FROM yourTable
WHERE score IN (SELECT DISTINCT TOP 3 score FROM yourTable ORDER BY score DESC)
ORDER BY score DESC;
This second approach is similar to what you were trying to do. Here is a demo for the second query:
Demo

Related

Django: Is there a way to apply an aggregate function on a window function?

I have already made a raw SQL of this query as a last resort.
I have a gaps-and-islands problem, where I get the respective groups with two ROW_NUMBER -s. Later on I use a COUNT and a MAX like so:
SELECT id, name, MAX(count)
FROM (
SELECT id, name, COUNT(*)
FROM (
SELECT players.id, players.name,
(ROW_NUMBER() OVER(ORDER BY match_details.id, goals.time) -
ROW_NUMBER() OVER(PARTITION BY match_details.id, players.id ORDER BY match_details.id, goals.time)) AS grp
FROM match_details
JOIN players
ON players.id = match_details.player_id
JOIN goals
ON goals.match_detail_id = match_details.id
ORDER BY match_details.id, goals.time
) AS x
GROUP BY grp, id, name
ORDER BY count DESC
) AS y
GROUP BY id, name
ORDER BY MAX(count) DESC, name
players example:
id | name
----+-------
1 | John
2 | Mark
match_details example:
id | player_id
----+------------
1 | 1
2 | 1
3 | 2
4 | 2
goals example:
id | match_detail_id | time
----+------------------+---------
1 | 1 | 2
2 | 1 | 10
3 | 2 | 2
4 | 3 | 1
5 | 3 | 5
6 | 4 | 6
output example:
id | name | max
----+--------+---------
1 | John | 2
2 | Mark | 2
So far, I have finished the innermost query with Django ORM, but when I try to annotate over group , it throws an error:
django.db.utils.ProgrammingError: aggregate function calls cannot contain window function calls
I haven't yet wrapped my head around using Subquery, but I'm also not sure if that would work at all. I do not need to filter over the window function, only use aggregates on it.
Is there a way to solve this with plain Django, or do I have to resort to hybrid raw-ORM queries, perhaps to django-cte ?

How to account for Postgresql rank() ties

I've a table teams with 30 rows and has a handful of statistics stored as attributes. For example, goals for, goals against, etc and I've created a view that uses rank() and does a good job ranking the records. Here's an abridged query example and resulting table:
SELECT name,
points,
rank() OVER (ORDER BY points DESC) AS point_tank
FROM teams;
name | points | point_rank
-----------------------+-----------+----------------
Team 1 | 14 | 1
Team 2 | 11 | 2
Team 3 | 9 | 3
Team 4 | 9 | 3
I would like to add an additional column that would return boolean based on whether or not the rank is a tie. eg Team 3 and Team 4 in this example. It might look something like this:
name | points | point_rank | tie
-----------------------+-----------+----------------+----------------
Team 1 | 14 | 1 | false
Team 2 | 11 | 2 | false
Team 3 | 9 | 3 | true
Team 4 | 9 | 3 | true
Any ideas here? Or am I approaching this incorrectly and abusing rank() here? Thanks in advance!
You could use a CTE and then use the lag/lead functions to check for ties:
with ranked as (
SELECT name,
points,
rank() OVER (ORDER BY points DESC) AS point_rank
FROM teams
)
select name, points, point_rank,
( point_rank = lag(point_rank, 1, -1::bigint) over (order by point_rank)
or point_rank = lead(point_rank, 1, -1::bigint) over (order by point_rank)
) as is_tie
from ranked;
The default value for the lag and lead function is needed for the first and last row, to avoid checking for null there.
Example: https://dbfiddle.uk/-01aFLr4
One option would be to place your current query into a common table expression and then use it to identify which ranks are duplicate:
WITH cte AS (
SELECT name,
points,
rank() OVER (ORDER BY points DESC) AS point_rank
FROM teams;
)
SELECT cte.name,
cte.points,
cte.point_rank
CASE WHEN t.point_rank IS NOT NULL THEN 'false' ELSE 'true' END AS tie
FROM cte
LEFT JOIN
(
SELECT point_rank
FROM cte
GROUP BY point_rank
HAVING COUNT(*) = 1
) t
ON cte.point_rank = t.point_rank
SELECT name
, points
, rank() OVER (rrr) AS point_rank
-- , count(*) OVER (ppp) AS ppp_cnt
, rank() OVER (pp2) AS sub_rank
, (COUNT(*) OVER (ppp) > 1) AS is_tie
FROM teams
WINDOW ppp AS (PARTITION BY points )
, pp2 AS (PARTITION BY points ORDER BY ctid )
, rrr AS (ORDER BY points DESC)
ORDER BY points DESC
;
Result (I added two extra rows):
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 6
name | points | point_rank | sub_rank | is_tie
--------+--------+------------+----------+--------
Team_1 | 14 | 1 | 1 | f
Team_2 | 11 | 2 | 1 | f
Team_3 | 9 | 3 | 1 | t
Team_4 | 9 | 3 | 2 | t
Team_5 | 5 | 5 | 1 | t
Team_6 | 5 | 5 | 2 | t
(6 rows)

sql aggregate data

this is not a specific dbms question, but a generic sql problem.
i have this dataset
userid | objecteid| count
--------------------------
1 | 1 | 12
1 | 2 | 15
1 | 3 | 6
2 | 4 | 30
2 | 1 | 1
2 | 5 | 9
with one query i need to find: for each user, the object with the maximum count
looking for a result like this:
userid | objecteid| count
--------------------------
1 | 2 | 15
2 | 4 | 30
because the object 2 has the max count for user 1 and the object 4 has the max count for user 2
This can easily be solved using window functions.
The following is standard ANSI SQL:
select userid, objecteid, "count"
from (
select userid, objecteid, "count",
max("count") over (partition by userid) as max_cnt
from the_table
) t
where "count" = max_cnt;
If there are two objects with the same count, both will be returned.
Alternatively this can also be done using row_number() instead:
select userid, objecteid, "count"
from (
select userid, objecteid, "count",
row_number() over (partition by userid order by "count" desc) as rn
from the_table
) t
where rn = 1;
Unlike the first query, this will only pick one row if a user has more than one object with the same count. If you want those duplicates returned, use dense_rank() instead of row_number()
SQLFiddle: http://sqlfiddle.com/#!15/f02a9/1
try this
Select * from tableName
where count in (
Select Max(count)
from tableName
group by userid
)

Renumber dynamic column without update in SQL Server

I have this data
5 | Batman
5 | Superman
5 | Wonderwomen
6 | Green Lantern
6 | Green Arrow
7 | Cyborg
when I do select query, I want renumber to
1 | Batman
1 | Superman
1 | Wonderwomen
2 | Green Lantern
2 | Green Arrow
3 | Cyborg
thought?
EDIT:
thanks to vittore, so i came up with this solution. I'm not sure if my query is good.
I do ROW_NUMBER() twice. In case my sequence Id is jumping, this query will renumbering perfectly.
WITH cte AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY id ORDER BY id asc) AS CteId
FROM MyTable
)
SELECT
ROW_NUMBER() OVER(PARTITION BY CteId ORDER BY CteId asc) AS RenumberColumn
FROM cte
RANK function is what you are looking for
select RANK() OVER (ORDER BY id), name
from t
Check row_number() and dense_rank() when you reading about it as well.
UPDATE: If you just use rank alone, it will give you not the values you want ( 1 1 1 2 2 3 ), but ranked values ( 1 1 1 4 4 6 )
So in order to get (1 2 3) group, rank and join:
select a.r, t.name from t
inner join (select id, rank() over (order by id asc) r
from t group by id) a
on t.id = a.id
If it's always -4, then:
Select (number-4), name
from table
But I doubt it's that simple.

SQL RANK() versus ROW_NUMBER()

I'm confused about the differences between these. Running the following SQL gets me two idential result sets. Can someone please explain the differences?
SELECT ID, [Description], RANK() OVER(PARTITION BY StyleID ORDER BY ID) as 'Rank' FROM SubStyle
SELECT ID, [Description], ROW_NUMBER() OVER(PARTITION BY StyleID ORDER BY ID) as 'RowNumber' FROM SubStyle
You will only see the difference if you have ties within a partition for a particular ordering value.
RANK and DENSE_RANK are deterministic in this case, all rows with the same value for both the ordering and partitioning columns will end up with an equal result, whereas ROW_NUMBER will arbitrarily (non deterministically) assign an incrementing result to the tied rows.
Example: (All rows have the same StyleID so are in the same partition and within that partition the first 3 rows are tied when ordered by ID)
WITH T(StyleID, ID)
AS (SELECT 1,1 UNION ALL
SELECT 1,1 UNION ALL
SELECT 1,1 UNION ALL
SELECT 1,2)
SELECT *,
RANK() OVER(PARTITION BY StyleID ORDER BY ID) AS [RANK],
ROW_NUMBER() OVER(PARTITION BY StyleID ORDER BY ID) AS [ROW_NUMBER],
DENSE_RANK() OVER(PARTITION BY StyleID ORDER BY ID) AS [DENSE_RANK]
FROM T
Returns
StyleID ID RANK ROW_NUMBER DENSE_RANK
----------- -------- --------- --------------- ----------
1 1 1 1 1
1 1 1 2 1
1 1 1 3 1
1 2 4 4 2
You can see that for the three identical rows the ROW_NUMBER increments, the RANK value remains the same then it leaps to 4. DENSE_RANK also assigns the same rank to all three rows but then the next distinct value is assigned a value of 2.
ROW_NUMBER : Returns a unique number for each row starting with 1. For rows that have duplicate values,numbers are arbitarily assigned.
Rank : Assigns a unique number for each row starting with 1,except for rows that have duplicate values,in which case the same ranking is assigned and a gap appears in the sequence for each duplicate ranking.
This article covers an interesting relationship between ROW_NUMBER() and DENSE_RANK() (the RANK() function is not treated specifically). When you need a generated ROW_NUMBER() on a SELECT DISTINCT statement, the ROW_NUMBER() will produce distinct values before they are removed by the DISTINCT keyword. E.g. this query
SELECT DISTINCT
v,
ROW_NUMBER() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number
... might produce this result (DISTINCT has no effect):
+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a | 1 |
| a | 2 |
| a | 3 |
| b | 4 |
| c | 5 |
| c | 6 |
| d | 7 |
| e | 8 |
+---+------------+
Whereas this query:
SELECT DISTINCT
v,
DENSE_RANK() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number
... produces what you probably want in this case:
+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a | 1 |
| b | 2 |
| c | 3 |
| d | 4 |
| e | 5 |
+---+------------+
Note that the ORDER BY clause of the DENSE_RANK() function will need all other columns from the SELECT DISTINCT clause to work properly.
The reason for this is that logically, window functions are calculated before DISTINCT is applied.
All three functions in comparison
Using PostgreSQL / Sybase / SQL standard syntax (WINDOW clause):
SELECT
v,
ROW_NUMBER() OVER (window) row_number,
RANK() OVER (window) rank,
DENSE_RANK() OVER (window) dense_rank
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v
... you'll get:
+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 4 | 4 | 2 |
| c | 5 | 5 | 3 |
| c | 6 | 5 | 3 |
| d | 7 | 7 | 4 |
| e | 8 | 8 | 5 |
+---+------------+------+------------+
Simple query without partition clause:
select
sal,
RANK() over(order by sal desc) as Rank,
DENSE_RANK() over(order by sal desc) as DenseRank,
ROW_NUMBER() over(order by sal desc) as RowNumber
from employee
Output:
--------|-------|-----------|----------
sal |Rank |DenseRank |RowNumber
--------|-------|-----------|----------
5000 |1 |1 |1
3000 |2 |2 |2
3000 |2 |2 |3
2975 |4 |3 |4
2850 |5 |4 |5
--------|-------|-----------|----------
Quite a bit:
The rank of a row is one plus the number of ranks that come before the row in question.
Row_number is the distinct rank of rows, without any gap in the ranking.
http://www.bidn.com/blogs/marcoadf/bidn-blog/379/ranking-functions-row_number-vs-rank-vs-dense_rank-vs-ntile
Note, all these windowing functions return an integer-like value.
Often the database will choose a BIGINT datatype, and this take much more space than we need. And, we will rarely need a range from -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807.
Cast the results as a BYTEINT, SMALLINT, or INTEGER.
These modern systems and hardware are so strong, so you may never see a meaningflul extra use of resources, but I think it's best-practice.
Look this example.
CREATE TABLE [dbo].#TestTable(
[id] [int] NOT NULL,
[create_date] [date] NOT NULL,
[info1] [varchar](50) NOT NULL,
[info2] [varchar](50) NOT NULL,
)
Insert some data
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (1, '1/1/09', 'Blue', 'Green')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (1, '1/2/09', 'Red', 'Yellow')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (1, '1/3/09', 'Orange', 'Purple')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (2, '1/1/09', 'Yellow', 'Blue')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (2, '1/5/09', 'Blue', 'Orange')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (3, '1/2/09', 'Green', 'Purple')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (3, '1/8/09', 'Red', 'Blue')
Repeat same Values for 1
INSERT INTO dbo.#TestTable (id, create_date, info1, info2) VALUES (1,
'1/1/09', 'Blue', 'Green')
Look All
SELECT * FROM #TestTable
Look your results
SELECT Id,
create_date,
info1,
info2,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY create_date DESC) AS RowId,
RANK() OVER(PARTITION BY Id ORDER BY create_date DESC) AS [RANK]
FROM #TestTable
Need to understand the different
I haven't done anything with rank, but I discovered this today with row_number().
select item, name, sold, row_number() over(partition by item order by sold) as row from table_name
This will result in some repeating row numbers since in my case each name holds all items. Each item will be ordered by how many were sold.
+--------+------+-----+----+
|glasses |store1| 30 | 1 |
|glasses |store2| 35 | 2 |
|glasses |store3| 40 | 3 |
|shoes |store2| 10 | 1 |
|shoes |store1| 20 | 2 |
|shoes |store3| 22 | 3 |
+--------+------+-----+----+
Also, pay attention to ORDER BY in PARTITION (Standard AdventureWorks db is used for example) when using RANK.
SELECT as1.SalesOrderID, as1.SalesOrderDetailID, RANK() OVER
(PARTITION BY as1.SalesOrderID ORDER BY as1.SalesOrderID ) ranknoequal
, RANK() OVER (PARTITION BY as1.SalesOrderID ORDER BY
as1.SalesOrderDetailId ) ranknodiff FROM Sales.SalesOrderDetail as1
WHERE SalesOrderId = 43659 ORDER BY SalesOrderDetailId;
Gives result:
SalesOrderID SalesOrderDetailID rank_same_as_partition rank_salesorderdetailid
43659 1 1 1
43659 2 1 2
43659 3 1 3
43659 4 1 4
43659 5 1 5
43659 6 1 6
43659 7 1 7
43659 8 1 8
43659 9 1 9
43659 10 1 10
43659 11 1 11
43659 12 1 12
But if change order by to (use OrderQty :
SELECT as1.SalesOrderID, as1.OrderQty, RANK() OVER (PARTITION BY
as1.SalesOrderID ORDER BY as1.SalesOrderID ) ranknoequal , RANK()
OVER (PARTITION BY as1.SalesOrderID ORDER BY as1.OrderQty ) rank_orderqty
FROM Sales.SalesOrderDetail as1 WHERE SalesOrderId = 43659 ORDER BY
OrderQty;
Gives:
SalesOrderID OrderQty rank_salesorderid rank_orderqty
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 2 1 7
43659 2 1 7
43659 3 1 9
43659 3 1 9
43659 4 1 11
43659 6 1 12
Notice how the Rank changes when we use OrderQty (rightmost column second table) in ORDER BY and how it changes when we use SalesOrderDetailID (rightmost column first table) in ORDER BY.