SQL: Find top-rated article in each category - sql

I have a table articles, with fields id, rating (an integer from 1-10), and category_id (an integer representing to which category it belongs).
How can I, in one MySQL query, find the single article with the highest rating from each category? ORDER BY and LIMIT would usually be how I would find the top-rated article, I suppose, but I'm not sure how to mix that with grouping to get the desired result, if I even can. (A dependent subquery would likely be an easy answer, but ewwww. Is there something better?)
For the following data:
id | category_id | rating
---+-------------+-------
1 | 1 | 10
2 | 1 | 8
3 | 2 | 7
4 | 3 | 5
5 | 3 | 2
6 | 3 | 6
I would like the following to be returned:
id | category_id | rating
---+-------------+-------
1 | 1 | 10
3 | 2 | 7
6 | 3 | 6

Try These
SELECT id, category_id, rating
FROM articles a1
WHERE rating =
(SELECT MAX(a2.rating) FROM articles a2 WHERE a1.category_id = a2.category_id)
OR
SELECT * FROM (SELECT * FROM articles ORDER BY rating DESC) AS a1 GROUP BY a1.rating;

You can use a subselect as the target of a FROM clause, too, which reads funny but makes for a slightly easier-to-understand query.
SELECT a1.id, a1.category_id, a1.rating
FROM articles as a1,
(SELECT category_id, max(rating) AS mrating FROM articles AS a2
GROUP BY a2.category_id) AS a_inner
WHERE
a_inner.category_id = a1.category_id AND
a_inner.mrating = a1.rating;

Related

Django: Is there a way to apply an aggregate function on a window function?

I have already made a raw SQL of this query as a last resort.
I have a gaps-and-islands problem, where I get the respective groups with two ROW_NUMBER -s. Later on I use a COUNT and a MAX like so:
SELECT id, name, MAX(count)
FROM (
SELECT id, name, COUNT(*)
FROM (
SELECT players.id, players.name,
(ROW_NUMBER() OVER(ORDER BY match_details.id, goals.time) -
ROW_NUMBER() OVER(PARTITION BY match_details.id, players.id ORDER BY match_details.id, goals.time)) AS grp
FROM match_details
JOIN players
ON players.id = match_details.player_id
JOIN goals
ON goals.match_detail_id = match_details.id
ORDER BY match_details.id, goals.time
) AS x
GROUP BY grp, id, name
ORDER BY count DESC
) AS y
GROUP BY id, name
ORDER BY MAX(count) DESC, name
players example:
id | name
----+-------
1 | John
2 | Mark
match_details example:
id | player_id
----+------------
1 | 1
2 | 1
3 | 2
4 | 2
goals example:
id | match_detail_id | time
----+------------------+---------
1 | 1 | 2
2 | 1 | 10
3 | 2 | 2
4 | 3 | 1
5 | 3 | 5
6 | 4 | 6
output example:
id | name | max
----+--------+---------
1 | John | 2
2 | Mark | 2
So far, I have finished the innermost query with Django ORM, but when I try to annotate over group , it throws an error:
django.db.utils.ProgrammingError: aggregate function calls cannot contain window function calls
I haven't yet wrapped my head around using Subquery, but I'm also not sure if that would work at all. I do not need to filter over the window function, only use aggregates on it.
Is there a way to solve this with plain Django, or do I have to resort to hybrid raw-ORM queries, perhaps to django-cte ?

Nested Sql select statement

Can anyone tell me what is wrong with the following sql query ?
Select *,
(SELECT [DiseaseID], COUNT(*) AS [Rank] FROM [DiseaseSymptom] WHERE
([SymptomID] IN(1, 5)) GROUP BY [DiseaseID] ORDER BY [Rank] DESC)
FROM Disease WHERE GenderID in (1, 3)
I have 2 tables one contains disease and the gender it is associated with
Disease
+-----------+-------------------+----------+
| DiseaseID | DiseaseName | GenderID |
+-----------+-------------------+----------+
| 1 | Fever | 3 |
| 2 | Flu | 3 |
| 3 | Lady Disease | 2 |
| 4 | Gentlemen Disease | 1 |
+-----------+-------------------+----------+
Gender 1 = Male, 2 = Female, 3 = Common
And a Symptom Disease Matrix like this
DiseaseSymptom
+-----------+-----------+----------+
| DiseaseID | SymptomID | DissymID |
+-----------+-----------+----------+
| 1 | 1 | 1 |
| 1 | 2 | 3 |
| 1 | 4 | 4 |
| 2 | 1 | 5 |
| 2 | 3 | 9 |
| 2 | 4 | 6 |
| 2 | 5 | 7 |
+-----------+-----------+----------+
I get symptoms from user and match it in the DiseaseSymptom table and rank it according to the number of symptoms matched (inner sql statement)
In the outer statement I simply want get the result from inner statement and evaluate whether it belongs to specific gender. The error I get when I try to run the above query is
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
Subqueries in select clause must only generate a scalar value, not a resultset with multiple columns or rows. if you want both then put the subquery in the from clause (properly correlated), and refer to the two different vqlues in the select clause
Select d.*, z.DeseaseId, z.Rank
FROM Disease d
join (SELECT DiseaseID, COUNT(*) Rank
FROM DiseaseSymptom
WHERE SymptomID IN(1, 5)
GROUP BY DiseaseID) Z
On z.DeseaseId = d.DeseaseId
WHERE GenderID in (1, 3)
Order By z.Rank
You are using a subquery with group by. Your intention is to have a correlated subquery. The problem is that the subquery is returning more than one row. I think this is what you want:
Select d.*,
(SELECT COUNT(*) AS [Rank]
FROM [DiseaseSymptom] ds
WHERE [SymptomID] IN (1, 5)) AND ds.DiseaseId = d.DiseaseId
)
FROM Disease d
WHERE GenderID in (1, 3);
You should use Common Table Expression (cte) like this:
with cte as (SELECT [DiseaseID], GenderID, COUNT(*) AS [Rank] FROM [DiseaseSymptom] WHERE
([SymptomID] IN(1, 5)) GROUP BY [DiseaseID],GenderID ORDER BY [Rank] DESC)
select * FROM cte WHERE GenderID in (1, 3)
Hope this help ;)
There is really no need to have a nested query, just join and filter
SELECT d.DiseaseID, d.DiseaseName, d.GenderID
, Symptoms = Count(ds.SymptomID)
FROM Disease d
INNER JOIN DiseaseSymptom ds ON d.DiseaseID = ds.DiseaseID
WHERE ds.SymptomID IN (1, 5)
AND d.GenderID IN (1, 3)
GROUP BY d.DiseaseID, d.DiseaseName, d.GenderID
ORDER BY Count(SymptomID) Desc
SQLFiddle Demo

SQL join problems - users betting on matches

I have the following table:
scores:
user_id | match_id | points
1 | 110 | 4
1 | 111 | 3
1 | 112 | 3
2 | 111 | 2
Users bet on matches and depending on the result of the match they are awarded with points. Depending on how accurate the bet was you are either awarded with 0, 2, 3 or 4 points for a match.
Now I want to rank the users so that i can see who is in 1st, 2nd place etc...
The ranking order is firstly by total_points. If these are equal its ordered by the amount of times a user has scored 4 points then by the amount of times a user scored 3 points and so on.
For that i would need the following table:
user_id | total_points | #_of_fours | #_of_threes | #_of_twos
1 | 10 | 1 | 2 | 0
2 | 2 | 0 | 0 | 1
But i cant figure out the join statements which would help me get it.
This is as far as i get without help:
SELECT user_id, COUNT( points ) AS #_of_fours FROM scores WHERE points = 4 GROUP BY user_id
Which results in
user_id | #_of_fours
1 | 1
2 | 0
Now i would have to do that for #_of_threes and twos aswell as total points and join it all together, but i cant figure out how.
BTW im using MySQL.
Any help would be really apreciated. Thanks in advance
SELECT user_id
, sum(points) as total_points
, sum(case when points = 4 then 1 end) AS #_of_fours
, sum(case when points = 3 then 1 end) AS #_of_threes
, sum(case when points = 2 then 1 end) AS #_of_twos
FROM scores
GROUP BY
user_id
Using mysql syntax, you can use SUM to count the matching rows easily;
SELECT
user_id,
SUM(points) AS total_points,
SUM(points=4) AS no_of_fours,
SUM(points=3) AS no_of_threes,
SUM(points=2) AS no_of_twos
FROM Table1
GROUP BY user_id;
Demo here.

Help with optimising SQL query

Hi i need some help with this problem.
I am working web application and for database i am using sqlite. Can someone help me with one query from databse which must be optimized == fast =)
I have table x:
ID | ID_DISH | ID_INGREDIENT
1 | 1 | 2
2 | 1 | 3
3 | 1 | 8
4 | 1 | 12
5 | 2 | 13
6 | 2 | 5
7 | 2 | 3
8 | 3 | 5
9 | 3 | 8
10| 3 | 2
....
ID_DISH is id of different dishes, ID_INGREDIENT is ingredient which dish is made of:
so in my case dish with id 1 is made with ingredients with ids 2,3
In this table a have more then 15000 rows and my question is:
i need query which will fetch rows where i can find ids of dishes ordered by count of ingreedients ASC which i haven added to my algoritem.
examle: foo(2,4)
will rows in this order:
ID_DISH | count(stillMissing)
10 | 2
1 | 3
Dish with id 10 has ingredients with id 2 and 4 and hasn't got 2 more, then is
My query is:
SELECT
t2.ID_dish,
(SELECT COUNT(*) as c FROM dishIngredient as t1
WHERE t1.ID_ingredient NOT IN (2,4)
AND t1.ID_dish = t2.ID_dish
GROUP BY ID_dish) as c
FROM dishIngredient as t2
WHERE t2.ID_ingredient IN (2,4)
GROUP BY t2.ID_dish
ORDER BY c ASC
works,but it is slow....
select ID_DISH, sum(ID_INGREDIENT not in (2, 4)) stillMissing
from x
group by ID_DISH
having stillMissing != count(*)
order by stillMissing
this is the solution, my previous query work 5 - 20s this work about 80ms
This is from memory, as I don't know the SQL dialect of sqlite.
SELECT DISTINCT T1.ID_DISH, COUNT(T1.ID_INGREDIENT) as COUNT
FROM dishIngredient as T1 LEFT JOIN dishIngredient as T2
ON T1.ID_DISH = T2.ID_DISH
WHERE T2.ID_INGREDIENT IN (2,4)
GROUP BY T1.ID_DISH
ORDER BY T1.ID_DISH

SQL AVG(COUNT(*))?

I'm trying to find out the average number of times a value appears in a column, group it based on another column and then perform a calculation on it.
I have 3 tables a little like this
DVD
ID | NAME
1 | 1
2 | 1
3 | 2
4 | 3
COPY
ID | DVDID
1 | 1
2 | 1
3 | 2
4 | 3
5 | 1
LOAN
ID | DVDID | COPYID
1 | 1 | 1
2 | 1 | 2
3 | 2 | 3
4 | 3 | 4
5 | 1 | 5
6 | 1 | 5
7 | 1 | 5
8 | 1 | 2
etc
Basically, I'm trying to find all the copy ids that appear in the loan table LESS times than the average number of times for all copies of that DVD.
So in the example above, copy 5 of dvd 1 appears 3 times, copy 2 twice and copy 1 once so the average for that DVD is 2. I want to list all the copies of that (and each other) dvd that appear less than that number in the Loan table.
I hope that makes a bit more sense...
Thanks
Similar to dotjoe's solution, but using an analytic function to avoid the extra join. May be more or less efficient.
with
loan_copy_total as
(
select dvdid, copyid, count(*) as cnt
from loan
group by dvdid, copyid
),
loan_copy_avg as
(
select dvdid, copyid, cnt, avg(cnt) over (partition by dvdid) as copy_avg
from loan_copy_total
)
select *
from loan_copy_avg lca
where cnt <= copy_avg;
This should work in Oracle:
create view dvd_count_view
select dvdid, count(1) as howmanytimes
from loans
group by dvdid;
select avg(howmanytimes) from dvd_count_view;
Untested...
with
loan_copy_total as
(
select dvdid, copyid, count(*) as cnt
from loan
group by dvdid, copyid
),
loan_copy_avg as
(
select dvdid, avg(cnt) as copy_avg
from loan_copy_total
group by dvdid
)
select lct.*, lca.copy_avg
from loan_copy_avg lca
inner join loan_copy_total lct on lca.dvdid = lct.dvdid
and lct.cnt <= lca.copy_avg;