How can I get the rank of rows relative to total number of rows based on a field? - sql

I have a scores table that has two fields:
user_id
score
I'm fetching specific rows that match a list of user_id's. How can I determine a rank for each row relative to the total number of rows, based on score? The rows in the result set are not necessarily sequential (the scores will vary widely from one row to the next). I'm not sure if this matters, but user_id is a unique field.
Edit
#Greelmo
I'm already ordering the rows. If I fetch 15 rows, I don't want the rank to be 1-15. I need it to be the position of that row compared against the entire table by the score property. So if I have 200 rows, one row's rank may be 3 and another may be 179 (these are arbitrary #'s for example only).
Edit 2
I'm having some luck with this query, but I actually want to avoid ties
SELECT
s.score
, s.created_at
, u.name
, u.location
, u.icon_id
, u.photo
, (SELECT COUNT(*) + 1 FROM scores WHERE score > s.score) AS rank
FROM
scores s
LEFT JOIN
users u ON u.uID = s.user_id
ORDER BY
s.score DESC
, s.created_at DESC
LIMIT 15
If two or more rows have the same score, I want the latest one (or earliest - I don't care) to be ranked higher. I tried modifying the subquery with AND id > s.id but that ended up giving me an unexpected result set and different ties.

Select S.score, S.created_at, U.name
, U.location, U.icon_id, U.photo
, (Select Count(*) + 1
From scores S2
Where S2.score > S.score
Or (S2.score = S.Score And S2.created_at > S.created_at)
) AS rank
From scores S
Left Join users U
On U.uID = S.user_id
Order By S.score DESC, S.created_at DESC
LIMIT 15
Of course, if it is possible for two scores to have the same created_at date, then you will still get ties and need to determine a third tie-breaker.

You could order the data in your query.
SELECT user_id, score FROM table ORDER BY score ASC;
This will give you your data from lowest score to highest.
If this doesn't answer your question, then I don't understand what you're asking.
EDIT
To get the position, while iterating through the database results, just keep a counter.

Related

How to Rank Based on Multiple Columns

I'm trying to score people in Microsoft Access based on the count they have for a particular category.
There are 7 possible categories a person can have against them, and I want to assigned each person a score from 1-7, with 1 being assigned to the highest scoring category, 7 being the lowest. They might not have an answer for every category, in which case that category can be ignored.
The aim would be to have an output result as shown in this image:
I've tried a few different things, including partition over and joins, but none have worked. To be honest I think I'm way off the mark with the queries I've been trying. I've tried to write the code in SQL from scratch, and used query builder.
Any help is really appreciated!
As you for an email can have duplicated counts, you will need two subqueries for this:
SELECT
Score.email,
Score.category,
Score.[Count],
(Select Count(*) From Score As T Where
T.email = Score.email And
T.[Count] >= Score.[Count])-
(Select Count(*) From Score As S Where
S.email = Score.email And
S.[Count] = Score.[Count] And
S.category > Score.category) AS Rank
FROM
Score
ORDER BY
Score.email,
Score.[Count] DESC,
Score.category;
For categories with equal Count values for the same email, the following will rank the records alphabetically descending by Category name (since this is what is shown in your example):
select t.email, t.category, t.count,
(
select count(*) from YourTable u
where t.email = u.email and
((t.count = u.count and t.category <= u.category) or t.count < u.count)
) as rank
from YourTable t
order by t.email, t.count desc, t.category desc
Change both references of YourTable to the name of your table.

Hive Script, DISTINCT with SUM

I am trying to distinct and then find the count of the teams a player played for in any single season and number of teams he played for. This is tripping me up and ofcourse i have a sample down below(2nd) one. The first ones is my failed attempt
SELECT o.id,o.year,COUNT(DISTINCT(o.team)) b JOIN
(SELECT id, year, team FROM batting
GROUP BY id,year,team
ORDER BY id DESC
LIMIT 25) o
0.id =b.id;
SELECT id, year, team FROM batting
GROUP BY id,year,team
ORDER BY id DESC
LIMIT 25;
produces
IGNORE the ^A, i think they represent either space or comma, just column seperatpr
Get the count of teams for each player for each year and order by the count desc,get the 1 row
SELECT id, year, COUNT(DISTINCT(team)) FROM batting
GROUP BY id,year
ORDER BY COUNT(DISTINCT(team)) DESC
LIMIT 1;

Get n grouped categories and sum others into one

I have a table with the following structure:
Contents (
id
name
desc
tdate
categoryid
...
)
I need to do some statistics with the data in this table. For example I want to get number of rows with the same category by grouping and id of that category. Also I want to limit them for n rows in descending order and if there are more categories available I want to mark them as "Others". So far I have come out with 2 queries to database:
Select n rows in descending order:
SELECT COALESCE(ca.NAME, 'Unknown') AS label
,ca.id AS catid
,COUNT(c.id) AS data
FROM contents c
LEFT OUTER JOIN category ca ON ca.id = c.categoryid
GROUP BY label
,catid
ORDER BY data DESC LIMIT 7
Select other rows as one:
SELECT 'Others' AS label
,COUNT(c.id) AS data
FROM contents c
LEFT OUTER JOIN category ca ON ca.id = c.categoryid
WHERE c.categoryid NOT IN ($INCONDITION)
But when I have no category groups left in db table I still get an "Others" record. Is it possible to make it in one query and make the "Others" record optional?
The specific difficulty here: Queries with one or more aggregate functions in the SELECT list and no GROUP BY clause produce exactly one row, even if no row is found in the underlying table.
There is nothing you can do in the WHERE clause to suppress that row. You have to exclude such a row after the fact, i.e. in the HAVING clause, or in an outer query.
Per documentation:
If a query contains aggregate function calls, but no GROUP BY clause,
grouping still occurs: the result is a single group row (or perhaps no
rows at all, if the single row is then eliminated by HAVING). The same
is true if it contains a HAVING clause, even without any aggregate
function calls or GROUP BY clause.
It should be noted that adding a GROUP BY clause with only a constant expression (which is otherwise completely pointless!) works, too. See example below. But I'd rather not use that trick, even if it's short, cheap and simple, because it's hardly obvious what it does.
The following query only needs a single table scan and returns the top 7 categories ordered by count. If (and only if) there are more categories, the rest is summarized into 'Others':
WITH cte AS (
SELECT categoryid, count(*) AS data
, row_number() OVER (ORDER BY count(*) DESC, categoryid) AS rn
FROM contents
GROUP BY 1
)
( -- parentheses required again
SELECT categoryid, COALESCE(ca.name, 'Unknown') AS label, data
FROM cte
LEFT JOIN category ca ON ca.id = cte.categoryid
WHERE rn <= 7
ORDER BY rn
)
UNION ALL
SELECT NULL, 'Others', sum(data)
FROM cte
WHERE rn > 7 -- only take the rest
HAVING count(*) > 0; -- only if there actually is a rest
-- or: HAVING sum(data) > 0
You need to break ties if multiple categories can have the same count across the 7th / 8th rank. In my example, categories with the smaller categoryid win such a race.
Parentheses are required to include a LIMIT or ORDER BY clause to an individual leg of a UNION query.
You only need to join to table category for the top 7 categories. And it's generally cheaper to aggregate first and join later in this scenario. So don't join in the the base query in the CTE (common table expression) named cte, only join in the first SELECT of the UNION query, that's cheaper.
Not sure why you need the COALESCE. If you have a foreign key in place from contents.categoryid to category.id and both contents.categoryid and category.name are defined NOT NULL (like they probably should be), then you don't need it.
The odd GROUP BY true
This would work, too:
...
UNION ALL
SELECT NULL , 'Others', sum(data)
FROM cte
WHERE rn > 7
GROUP BY true;
And I even get slightly faster query plans. But it's a rather odd hack ...
SQL Fiddle demonstrating all.
Related answer with more explanation for the UNION ALL / LIMIT technique:
Sum results of a few queries and then find top 5 in SQL
The quick fix, to make the 'Others' row conditional would be to add a simple HAVING clause to that query.
HAVING COUNT(c.id) > 0
(If there are no other rows in the contents table, then COUNT(c.id) is going to be zero.)
That only answers half the question, how to make the return of that row conditional.
The second half of the question is a little more involved.
To get the whole resultset in one query, you could do something like this
(this is not tested yet; desk checked only.. I'm not sure if postgresql accepts a LIMIT clause in an inline view... if it doesn't we'd need to implement a different mechanism to limit the number of rows returned.
SELECT IFNULL(t.name,'Others') AS name
, t.catid AS catid
, COUNT(o.id) AS data
FROM contents o
LEFT
JOIN category oa
ON oa.id = o.category_id
LEFT
JOIN ( SELECT COALESCE(ca.name,'Unknown') AS name
, ca.id AS catid
, COUNT(c.id) AS data
FROM contents c
LEFT
JOIN category ca
ON ca.id = c.categoryid
GROUP
BY COALESCE(ca.name,'Unknown')
, ca.id
ORDER
BY COUNT(c.id) DESC
, ca.id DESC
LIMIT 7
) t
ON ( t.catid = oa.id OR (t.catid IS NULL AND oa.id IS NULL))
GROUP
BY ( t.catid = oa.id OR (t.catid IS NULL AND oa.id IS NULL))
, t.catid
ORDER
BY COUNT(o.id) DESC
, ( t.catid = oa.id OR (t.catid IS NULL AND oa.id IS NULL)) DESC
, t.catid DESC
LIMIT 7
The inline view t basically gets the same result as the first query, a list of (up to) 7 id values from category table, or 6 id values from category table and a NULL.
The outer query basically does the same thing, joining content with category, but also doing a check if there's a matching row from t. Because t might be returning a NULL, we have a slightly more complicated comparison, where we want a NULL value to match a NULL value. (MySQL conveniently gives us shorthand operator for this, the null-safe comparison operator <=>, but I don't think that's available in postgresql, so we have to express differently.
a = b OR (a IS NULL AND b IS NULL)
The next bit is getting a GROUP BY to work, we want to group by the 7 values returned by the inline view t, or, if there's not matching value from t, group the "other" rows together. We can get that to happen by using a boolean expression in the GROUP BY clause.
We're basically saying "group by 'if there was a matching row from t'" (true or false) and then group by the row from 't'. Get a count, and then order by the count descending.
This isn't tested, only desk checked.
You can approach this with nested aggregation. The inner aggregation calculates the counts along with a sequential number. You want to take everything whose number is 7 or less and then combine everything else into the others category:
SELECT (case when seqnum <= 7 then label else 'others' end) as label,
(case when seqnum <= 7 then catid end) as catid, sum(cnt)
FROM (SELECT ca.name AS label, ca.id AS catid, COUNT(c.id) AS cnt,
row_number() over (partition by ca.name, catid order by count(c.id) desc) as seqnum
FROM contents c LEFT OUTER JOIN
category ca
ON ca.id = c.categoryid
GROUP BY label, catid
) t
GROUP BY (case when seqnum <= 7 then label else 'others' end),
(case when seqnum <= 7 then catid end)
ORDER BY cnt DESC ;

Fetch one row per account id from list

I have a table with game scores, allowing multiple rows per account id: scores (id, score, accountid). I want a list of the top 10 scorer ids and their scores.
Can you provide an sql statement to select the top 10 scores, but only one score per account id?
Thanks!
select username, max(score) from usertable group by username order by max(score) desc limit 10;
First limit the selection to the highest score for each account id.
Then take the top ten scores.
SELECT TOP 10 AccountId, Score
FROM Scores s1
WHERE AccountId NOT IN
(SELECT AccountId s2 FROM Scores
WHERE s1.AccountId = s2.AccountId and s1.Score > s2.Score)
ORDER BY Score DESC
Try this:
select top 10 username,
max(score)
from usertable
group by username
order by max(score) desc
PostgreSQL has the DISTINCT ON clause, that works this way:
SELECT DISTINCT ON (accountid) id, score, accountid
FROM scoretable
ORDER BY score DESC
LIMIT 10;
I don't think it's standard SQL though, so expect other databases to do it differently.
SELECT accountid, MAX(score) as top_score
FROM Scores
GROUP BY accountid,
ORDER BY top_score DESC
LIMIT 0, 10
That should work fine in mysql. It's possible you may need to use 'ORDER BY MAX(score) DESC' instead of that order by - I don't have my SQL reference on hand.
I believe that PostgreSQL (at least 8.3) will require that the DISTINCT ON expressions must match initial ORDER BY expressions. I.E. you can't use DISTINCT ON (accountid) when you have ORDER BY score DESC. To fix this, add it into the ORDER BY:
SELECT DISTINCT ON (accountid) *
FROM scoretable
ORDER BY accountid, score DESC
LIMIT 10;
Using this method allows you to select all the columns in a table. It will only return 1 row per accountid even if there are duplicate 'max' values for score.
This was useful for me, as I was not finding the maximum score (which is easy to do with the max() function) but for the most recent time a score was entered for an accountid.

Fetch one row per account id from list, part 2

Not sure how to ask a followup on SO, but this is in reference to an earlier question:
Fetch one row per account id from list
The query I'm working with is:
SELECT *
FROM scores s1
WHERE accountid NOT IN (SELECT accountid FROM scores s2 WHERE s1.score < s2.score)
ORDER BY score DESC
This selects the top scores, and limits results to one row per accountid; their top score.
The last hurdle is that this query is returning multiple rows for accountids that have multiple occurrences of their top score. So if accountid 17 has scores of 40, 75, 30, 75 the query returns both rows with scores of 75.
Can anyone modify this query (or provide a better one) to fix this case, and truly limit it to one row per account id?
Thanks again!
If you're only interested in the accountid and the score, then you can use the simple GROUP BY query given by Paul above.
SELECT accountid, MAX(score)
FROM scores
GROUP BY accountid;
If you need other attributes from the scores table, then you can get other attributes from the row with a query like the following:
SELECT s1.*
FROM scores AS s1
LEFT OUTER JOIN scores AS s2 ON (s1.accountid = s2.accountid
AND s1.score < s2.score)
WHERE s2.accountid IS NULL;
But this still gives multiple rows, in your example where a given accountid has two scores matching its maximum value. To further reduce the result set to a single row, for example the row with the latest gamedate, try this:
SELECT s1.*
FROM scores AS s1
LEFT OUTER JOIN scores AS s2 ON (s1.accountid = s2.accountid
AND s1.score < s2.score)
LEFT OUTER JOIN scores AS s3 ON (s1.accountid = s3.accountid
AND s1.score = s3.score AND s1.gamedate < s3.gamedate)
WHERE s2.accountid IS NULL
AND s3.accountid IS NULL;
select accountid, max(score) from scores group by accountid;
If your RDBMS supports them, then an analytic function would be a good approach particularly if you need all the columns of the row.
select ...
from (
select accountid,
score,
...
row_number() over
(partition by accountid
order by score desc) score_rank
from scores)
where score_rank = 1;
The row returned is indeterminate in the case you describe, but you can easily modify the analytic function, for example by ordering on (score desc, test_date desc) to get the more recent of two matching high scores.
Other analytic functions based on rank will achieve a similar purpose.
If you don't mind duplicates then the following would probably me more efficient than your current method:
select ...
from (
select accountid,
score,
...
max(score) over (partition by accountid) max_score
from scores)
where score = max_score;
If you are selecting a subset of columns then you can use the DISTINCT keyword to filter results.
SELECT DISTINCT UserID, score
FROM scores s1
WHERE accountid NOT IN (SELECT accountid FROM scores s2 WHERE s1.score < s2.score)
ORDER BY score DESC
Does your database support distinct? As in select distinct x from y?
This solutions works in MS SQL, giving you the whole row.
SELECT *
FROM scores
WHERE scoreid in
(
SELECT max(scoreid)
FROM scores as s2
JOIN
(
SELECT max(score) as maxscore, accountid
FROM scores s1
GROUP BY accountid
) sub ON s2.score = sub.maxscore AND s2.accountid = s1.accountid
GROUP BY s2.score, s2.accountid
)