How to get only one from each value in a column - sql

I have this query:
SELECT
r.rev_id, rs.name, COUNT(ws.user_id) as likes
FROM
Reviews AS r
LEFT JOIN
Wasliked AS ws ON r.rev_id = ws.rev_id
LEFT JOIN
Restaurants AS rs ON rs.rid = r.rest_id
GROUP BY
rs.name, r.rev_id
ORDER BY
likes DESC
and the result is:
rev_id name likes
------------------------
7 rest1 5
10 rest1 3
6 rest1 2
2 rest3 2
1 rest2 2
5 rest3 1
8 rest4 1
But I want the result to be like this:
rev_id name likes
--------------------------
7 rest1 5
2 rest3 2
1 rest2 2
taking the 3 highest results with different names.
I have already tried to only group by rs.name instead of rs.name,r.rev_id but that causes an error.
Thanks in advance

So you want the top value for each name, limited to three rows. This suggest row_number():
SELECT TOP 3 rev_id, name, likes
FROM (SELECT r.rev_id, rs.name, COUNT(ws.user_id) as likes,
ROW_NUMBER() OVER (PARTITION BY rs.name ORDER BY COUNT(ws.user_id)) as seqnum
FROM Reviews r left join
Wasliked ws
on r.rev_id = ws.rev_id left join
Restaurants rs
on rs.rid = r.rest_id
GROUP BY rs.name, r.rev_id
) x
WHERE seqnum = 1
ORDER BY likes desc;

Also you can do it like this if you do not mind writing a redundant sql:
select top 3 t1.*
from (
select r.rev_id, rs.name, count(ws.user_id) as likes
from reviews as r
left join wasliked as ws on r.rev_id=ws.rev_id
left join restaurants as rs on rs.rid=r.rest_id
group by rs.name,r.rev_id
) t1
inner join (
select name, max(likes) as likes
from (
select r.rev_id, rs.name, count(ws.user_id) as likes
from reviews as r
left join wasliked as ws on r.rev_id=ws.rev_id
left join restaurants as rs on rs.rid=r.rest_id
group by rs.name,r.rev_id) tmp
group by name
) t2 on t1.name = t2.name and t1.likes = t2.likes
order by t1.likes desc
#Gordon Linoff's answer is a better way to do this, his sql is just right, and you could find it gives you the lowest likes row per name, so when you change
ROW_NUMBER() OVER (PARTITION BY rs.name ORDER BY COUNT(ws.user_id)) as seqnum
to
ROW_NUMBER() OVER (PARTITION BY rs.name ORDER BY COUNT(ws.user_id) DESC) as seqnum
it will give you the right result.

Your query has the form show rows with max(foo). The first line of attack is Group By, but sometimes, you want more information about the aggregate. In this case, having computed likes by rev_id and name, you want only those rows having max(likes) for each name. That calls for an existence test:
with T (rev_id, name, likes) as (
SELECT r.rev_id, rs.name, COUNT(ws.user_id) as likes
FROM Reviews as r
left join Wasliked as ws on r.rev_id=ws.rev_id
left join Restaurants as rs on rs.rid=r.rest_id
GROUP BY rs.name,r.rev_id
)
select * from T as L
where exists (
select 1 from T
where name = L.name
group by name
having max(likes) = L.likes
)
order by likes desc
That's about right.
I prefer my version to the others provided so far. It doesn't use the nonstandard top N formulation, and the query is cast in terms of the logical operation you need, i.e. quantification.
With practice, where exists gets easier, and will save you writing many more complicated queries.

Related

JOIN 2 tables ORDER BY SUM value

I have 2 tables: 1st is comment, 2nd is rating
SELECT * FROM comment_table a
INNER JOIN (SELECT comment_id, SUM(rating_value) AS total_rating FROM rating_table GROUP BY comment_id) b
ON a.comment_id = b.comment_id
ORDER BY b.total_rating DESC
I tried the above SQL but doesn't work!
Object is to display a list of comments order by rating points of each comments.
SELECT s.* FROM (
SELECT * FROM comment_table a
INNER JOIN (SELECT comment_id, SUM(rating_value) AS total_rating FROM rating_table GROUP BY comment_id) b
ON a.comment_id = b.comment_id
) AS s
ORDER BY s.total_rating DESC
Nest it inside an another select. It will then output the data in the correct order.

Return only the highest-valued row

I'm trying to find a solution to only returns the highest-valued row from a SQL query
I have a query that joins two tables together and then checks how many times the id matches within the different tables (within 'athelete' the id param is unique).
SELECT t.athlete_id, count(a.id) as 'Number of activities' FROM training_session t
INNER JOIN athlete a ON t.athlete_id = a.id
WHERE t.athlete_id = a.id
GROUP BY a.id
The following table is returned
athlete_id Number of activities
1 4
2 1
3 1
4 1
5 1
6 1
The issued problem is that I only want to return the row with the highest number of activities. According to the table above this should be
athlete_id = 1 since it has the greatest amount of activities.
I would appreciate some pointers on how I could improve my query to match these queries.
Use ORDER BY and LIMIT:
SELECT t.athlete_id, count(*) as `Number of activities`
FROM training_session t INNER JOIN
athlete a
ON t.athlete_id = a.id
GROUP BY t.athlete_id
ORDER BY COUNT(*) DESC
LIMIT 1;
I don't think a JOIN is needed for this query:
SELECT t.athlete_id, COUNT(*) as `Number of activities`
FROM training_session t
GROUP BY t.athlete_id
ORDER BY COUNT(*) DESC
LIMIT 1;
And if you want all rows in the event of ties, then this requires a bit more work. I would recommend ranking functions:
SELECT *
FROM (SELECT t.athlete_id, COUNT(*) as `Number of activities`,
RANK() OVER (ORDER BY COUNT(*) DESC) as seqnum
FROM training_session t
GROUP BY t.athlete_id
) t
WHERE seqnum = 1;

Making a MAX() query from a subquery with COUNT()

Need to Show the Name of the boat which made the most trips, so i made a query that counts the trips:
SELECT B.IdBoat, COUNT(T.IdTrip)
FROM Trip T INNER JOIN Boat B ON T.IdBoat=B.IdBoat
GROUP BY B.IdBoat
Now I need to show the name of the one with the MAX trips, how do I use that query as a subquery, without using the ORDER BY DESC and TOP 1 but using MAX?
Currently got:
SELECT B.Name
FROM Trip T INNER JOIN Boat B ON T.IdBoat=B.IdBoat
WHERE B.IdBoat = MAX( the sub query above)
also tried
SELECT B.Name, T.IdTrip
FROM Boat B INNER JOIN Trip T ON B.IdBoat=T.IdBoat
WHERE B.IdBoat IN (
SELECT MAX(T.NTrips) FROM
(SELECT B.IdBoat AS [IdBoat], COUNT(T.IdTrip) AS [NTrips]
FROM Trip T INNER JOIN Boat B ON B.IdBoat=T.IdBoat
GROUP BY B.Boat) T
GROUP BY T.IdBoat)
The above returned the full count of 3 on the name of the boat instead of the correct 2.
I've tried googling and searching about said problem on stackoverflow and others but can't adapt their solution to my query, any help is good help.
Thank you.
edit 1. As asked, I'll provide some data as to help understand the problem better
Table Boat:
IdBoat | Name
1 | 'SS Sparrow'
2 | 'SS AndaNoMar'
Table Trip
IdTrip | IdBoat
1 | 1
2 | 1
3 | 2
Subquery 1 (COUNT)
IdBoat | NTrips
2 | 1
1 | 2
You can do:
with
x as (
select
b.idBoat,
b.Name,
count(*) as cnt
from trip t
join boat b on b.idBoat = t.idBoat
group by b.idBoat, b.Name
),
m as (
select max(cnt) as max_cnt from x
)
select
x.*
from x
join m on m.max_cnt = x.cnt
SELECT
B.IdBoat,
B.Name,
T.Trips
FROM
Boat AS B
INNER JOIN
(
SELECT
IdBoat,
COUNT(*) AS Trips,
RANK() OVER (PARTITION BY IdBoat
ORDER BY COUNT(*) DESC
)
AS TripsRank
FROM
Trip
GROUP BY
IdBoat
)
AS T
ON T.IdBoat = B.IdBoat
WHERE
T.TripsRank = 1
A better method than either of the other two answers is to use ORDER BY:
SELECT TOP (1) B.IdBoat, B.Name, COUNT(T.IdTrip) as cnt
FROM Trip T INNER JOIN
Boat B
ON T.IdBoat = B.IdBoat
GROUP BY B.IdBoat, B.Name
ORDER BY cnt DESC;
There is no need for subqueries or CTEs or window functions.
If you want ties, then you can use TOP (1) WITH TIES.

Display the city name which has most number of branches

I have tried to get city name which has most number of branches .
select C.City_name ,count(B.B_Name)
from tblcity C
inner join
tblBranch B
on c.city_id=B.City_id
group by C.City_name
order by count(B.B_Name) desc
Above code will give me the count of branches for particular city .
Please help me solve to get city name which has most number of branches
you can add TOP 1 to your query
select TOP 1 C.City_name ,count(B.B_Name)
from tblcity C
inner join
tblBranch B
on c.city_id=B.City_id
group by C.City_name
order by count(B.B_Name) desc
Use DENSE_RANK():
SELECT
City_Name, cnt
FROM
(
SELECT
c.City_name,
COUNT(b.B_Name) cnt,
DENSE_RANK() OVER (ORDER BY COUNT(b.B_Name) DESC) dr
FROM tblcity c
INNER JOIN tblBranch b
ON b.city_id = c.City_id
GROUP BY c.City_name
) t
WHERE dr = 1;
Using TOP 1 WITH TIES would be another option here, but that is specific to SQL Server.

Select all threads and order by the latest one

Now that I got the Select all forums and get latest post too.. how? question answered, I am trying to write a query to select all threads in one particular forum and order them by the date of the latest post (column "updated_at").
This is my structure again:
forums forum_threads forum_posts
---------- ------------- -----------
id id id
parent_forum (NULLABLE) forum_id content
name user_id thread_id
description title user_id
icon views updated_at
created_at created_at
updated_at
last_post_id (NULLABLE)
I tried writing this query, and it works.. but not as expected: It doesn't order the threads by their last post date:
SELECT DISTINCT ON(t.id) t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY t.id, p.updated_at DESC;
How can I solve this one?
Assuming you want a single row per thread and not all rows for all posts.
DISTINCT ON is still the most convenient tool. But the leading ORDER BY items have to match the expressions of the DISTINCT ON clause. If you want to order the result some other way, you need to wrap it into a subquery and add another ORDER BY to the outer query:
SELECT *
FROM (
SELECT DISTINCT ON (t.id)
t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY t.id, p.updated_at DESC
) sub
ORDER BY updated_at DESC;
If you are looking for a query without subquery for some unknown reason, this should work, too:
SELECT DISTINCT
t.id
, first_value(u.username) OVER w AS username
, first_value(p.updated_at) OVER w AS updated_at
, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
WINDOW w AS (PARTITION BY t.id ORDER BY p.updated_at DESC)
ORDER BY updated_at DESC;
There is quite a bit going on here:
The tables are joined and rows are selected according to JOIN and WHERE clauses.
The two instances of the window function first_value() are run (on the same window definition) to retrieve username and updated_at from the latest post per thread. This results in as many identical rows as there are posts in the thread.
The DISTINCT step is executed after the window functions and reduces each set to a single instance.
ORDER BY is applied last and updated_at references the OUT column (SELECT list), not one of the two IN columns (FROM list) of the same name.
Yet another variant, a subquery with the window function row_number():
SELECT id, username, updated_at, title
FROM (
SELECT t.id
, u.username
, p.updated_at
, t.title
, row_number() OVER (PARTITION BY t.id
ORDER BY p.updated_at DESC) AS rn
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
) sub
WHERE rn = 1
ORDER BY updated_at DESC;
Similar case:
Return records distinct on one column but order by another column
You'll have to test which is faster. Depends on a couple of circumstances.
Forget the distinct on:
SELECT t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY p.updated_at DESC;