hive select max count by grouping on two fields - sql

I am trying to write a sql query to find Most Popular Artist in each Country. Popular artist is one which has maximum number of rating>=8
Below is table structure,
describe album;
albumid string
album_title string
album_artist string`
describe album_ratings;
userid int
albumid string
rating int
describe cusers;
userid int
state string
country string
Below is one query that I wrote but it is not working.
select album_artist, country, count(rating)
from album, album_ratings, cusers
where album.albumid=album_ratings.albumid
and album_ratings.userid=cusers.userid
and rating>=6
group by country, album_artist
having count(rating) = (
select max(t.cnt)
from (
select count(rating) as cnt
from album, album_ratings, cusers
where album.albumid=album_ratings.albumid
and album_ratings.userid=cusers.userid
and rating>=6
group by country, album_artist
) as t
group by t.country
);

Learn to use proper, explicit JOIN syntax. Never use commas in the FROM clause.
You can do this with window functions:
select *
from (select album_artist, country, count(*) as cnt,
row_number() over (partition by country order by count(*) desc) as seqnum
from album a join
album_ratings ar join
on a.albumid = ar.albumid
cusers u
on ar.userid = u.userid
where rating >= 6
group by country, album_artist
) aru
where seqnum = 1;
If you want ties, use rank() instead of row_number().

You can use window function row_number to find most popular artist in each country (higher rating - more popular):
select *
from (
select c.country,
a.album_artist,
sum(rating) as total_rating,
row_number() over (partition by c.country order by sum(rating) desc) as rn
from cusers c
join album_ratings r on c.userid = r.userid
join album a on r.albumid = a.albumid
where r.rating >= 8
group by c.country,
a.album_artist
) t
where rn = 1;
I assumed sum(rating) instead, because I think rating should be additive.
Also, always use explicit join syntax instead of old comma based join.

Related

For each country, report the movie genre with the highest average rates

For each country, report the movie genre with the highest average ratings, and I am missing only one step that i cant figure it out.
Here's my current code:
SELECT c.code AS c_CODE, menres.genre AS GENRE, AVG(RATE) as AVERAGE_rate,MAX(RATE) AS MAXIMUM_rate, MIN(RATE) AS MINIMUM_rate from movirates
leftJOIN movgenres ON movgenres.movieid = movratings.movieid
left JOIN users ON users.userid = movrates.userid
left JOIN c ON c.code = users.city
LEFT JOIN menres ON movenres.genreid = menres.code
GROUP BY menres.genre , c.code
order by c.code asc, avg(rate) desc, menres.genre desc ;
You can use the ROW_NUMBER window function to assign a unique rank to each of your rows:
partitioned by country code
ordered by descendent average rating
Once you get this ranking, you may want to select all those rows which have the highest average rating (which are the same having the ranking equal to 1).
WITH cte AS (
SELECT c.code AS COUNTRY_CODE,
mg.genre AS GENRE,
AVG(rating) AS AVERAGE_RATING,
MAX(rating) AS MAXIMUM_RATING,
MIN(RATING) AS MINIMUM_RATING
FROM moviesratings r
INNER JOIN moviesgenres g ON g.movieid = r.movieid
INNER JOIN users u ON u.userid = r.userid
INNER JOIN countries c ON c.code = u.country
LEFT JOIN mGenres mg ON mg.code = g.genreid
GROUP BY mg.genre,
c.code
ORDER BY c.code,
AVG(rating) DESC,
mg.genre DESC;
)
SELECT *
FROM (SELECT *,
ROW_NUMBER() OVER(
PARTITION BY COUNTRY_CODE,
ORDER BY AVERAGE_RATING) AS rn
FROM cte) ranked_averages
WHERE rn = 1
Note: The code inside the common table expression is equivalent to yours. If you're willing to share your input tables, I may even suggest an improved query.
You should use window function in this case by using rank() then select the first rank only.
with mov_rates(c.code, genre, average, max, min)
as.
select c.code c_code,
e.genre genre,
avg (rate) avg
max (rate) max
min (rate) min
from movrates a
LEFT join movge.nres b on a.movieid = b.movieid
LEFT join users c on a.userid = c.user
LEFT join countr.ies d on c.code = d.code
left join mGenres e on b.genreid = e.code
group by d.country_code, e.x
),
rategenre (rank, c_code, genre, avgrate, max, min)
as
(
select rank() over (partition by c.c order by avgrates asc) rank,
country code,
genre,
average_r.ating,
maximum_rating,
minimum_.ating
from movrate \\just practicing on something
)
selec.t 2
from genre
where rank = 5
Reference:
OVER Clause

Issue with getting the rank of a user based on combined columns in a join table

I have a users table and each user has flights in a flights table. Each flight has a departure and an arrival airport relationship within an airports table. What I need to do is count up the unique airports across both departure and arrival columns (flights.departure_airport_id and flights.arrival_airport_id) for each user, and then assign them a rank via dense_rank and then retrieve the rank for a given user id.
Basically, I need to order all users according to how many unique airports they have flown to or from and then get the rank for a certain user.
Here's what I have so far:
SELECT u.rank FROM (
SELECT
users.id,
dense_rank () OVER (ORDER BY count(DISTINCT (flights.departure_airport_id, flights.arrival_airport_id)) DESC) AS rank
FROM users
LEFT JOIN flights ON users.id = flights.user_id
GROUP BY users.id
) AS u WHERE u.id = 'uuid';
This works, but does not actually return the desired result as count(DISTINCT (flights.departure_airport_id, flights.arrival_airport_id)) counts the combined airport ids and not each unique airport id separately. That's how I understand it works, anyway... I'm guessing that I somehow need to use a UNION join on the airport id columns but can't figure out how to do that.
I'm on Postgres 13.0.
I would recommend a lateral join to unpivot, then aggregation and ranking:
select *
from (
select f.user_id,
dense_rank() over(order by count(distinct a.airport_id) desc) rn
from flights f
cross join lateral (values
(f.departure_airport_id), (f.arrival_airport_id)
) a(airport_id)
group by f.user_id
) t
where user_id = 'uuid'
You don't really need the users table for what you want, unless you do want to allow users without any flight (they would all have the same, highest rank). If so:
select *
from (
select u.id,
dense_rank() over(order by count(distinct a.airport_id) desc) rn
from users u
left join flights f on f.user_id = u.id
left join lateral (values
(f.departure_airport_id), (f.arrival_airport_id)
) a(airport_id) on true
group by u.id
) t
where id = 'uuid'
You're counting the distinct pairs of (departure_airport_id, arrival_airpot_id). As you suggested, you could use union to get a single column of airport IDs (regardless of whether they are departure or arrival airports), and then apply a count on them:
SELECT user_id, DENSE_RANK() OVER (ORDER BY cnt DESC) AS user_rank
FROM (SELECT u.id AS user_id, COALESCE(cnt, 0) AS cnt
FROM users u
LEFT JOIN (SELECT user_id, COUNT DISTINCT(airport_id) AS cnt
FROM (SELECT user_id, departure_airport_id AS airport_id
FROM flights
UNION
SELECT user_id, arrival_airport_id AS airport_id
FROM flights) x
GROUP BY u.id) f ON u.id = f.user_id) t

Selecting rows with the most repeated values at specific column

Problem in general words: I need to select value from one table referenced to the most repeated values in another table.
Tables have this structure:
screenshot
screenshot2
The question is to find country which has the most results from sportsmen related to it.
First, INNER JOIN tables to have relation between result and country
SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id);
Then, I count how much time each country appear
SELECT country, COUNT(country) AS highest_participation
FROM (SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id))
GROUP BY country
;
And got this screenshot3
Now it feels like I'm one step away from solution ))
I guess it's possible with one more SELECT FROM (SELECT ...) and MAX() but I can't wrap it up?
ps:
I did it with doubling the query like this but I feel like it's so inefficient if there are millions of rows.
SELECT country
FROM (SELECT country, COUNT(country) AS highest_participation
FROM (SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id)
) GROUP BY country
)
WHERE highest_participation = (SELECT MAX(highest_participation)
FROM (SELECT country, COUNT(country) AS highest_participation
FROM (SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id)
) GROUP BY country
))
Also I did it with a view
CREATE VIEW temp AS
SELECT country as country_with_most_participations, COUNT(country) as country_participate_in_#_comp
FROM(
SELECT country, competition_id FROM result
INNER JOIN sportsman USING(sportsman_id)
)
GROUP BY country;
SELECT country_with_most_participations FROM temp
WHERE country_participate_in_#_comp = (SELECT MAX(country_participate_in_#_comp) FROM temp);
But not sure if it's easiest way.
If I understand this correctly you want to rank the countries per competition count and show the highest ranking country (or countries) with their count. I suggest you use RANK for the ranking.
select country, competition_count
from
(
select
s.country,
count(*) as competition_count,
rank() over (order by count(*) desc) as rn
from sportsman s
inner join result r using (sportsman_id)
group by s.country
) ranked_by_count
where rn = 1
order by country;
If the order of the result rows doesn't matter, you can shorten this to:
select s.country, count(*) as competition_count
from sportsman s
inner join result r using (sportsman_id)
group by s.country
order by count(*) desc
fetch first rows with ties;
You seem to be overcomplicating this. Starting from your existing join query, you can aggregate, order the results and keep the top row(s) only.
select s.country, count(*) cnt
from sportsman s
inner join result r using (sportsman_id)
group by s.country
order by cnt desc
fetch first 1 row with ties
Note that this allows top ties, if any.
SELECT country
FROM (SELECT country, COUNT(country) AS highest_participation
FROM (SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id)
) GROUP BY country
order by 2 desc
)
where rownum=1

oracle - maximum per group

University Table - UniversityName, UniversityId
Lease Table - LeaseId, BookId, UniversityId, LeaseDate
Book Table - BookId, UniversityId, Category, PageCount.
For each university, I have to find category that had the most number of books leased.
So, something like
UniversityName Category #OfTimesLeased
I have been playing around with it with some success using Dense_Rank etc - but if there is a tie, only one of them shows up, while I want both of them to show up.
Current Query:
select b.UniversityId, MAX(tempTable.type) KEEP (DENSE_RANK FIRST ORDER BY tempTable.counter DESC)
from book b
join
(select count(l.leaseid) AS counter, b.category, b.universityid
from lease l
join book b
on b.bookid =l.bookid AND b.universityid=r.universityid
group by b.category, b.universityid) tempTable
on counterTable.universityid= b.universityid
group by b.universityid
^Unable to solve the tie issue and get the number of leases for the most leased book type.
Try this
WITH CTE AS
(
SELECT UniversityName, Category, Count(*) NumOfTimesLeased
FROM University u
INNER JOIN Book b on u.UniversityId = b.UniversityId
INNER JOIN Lease l on b.bookid = l.bookid and b.UniversityId = l.UniversityId
GROUP BY UniversityName, Category
),
CTE2 AS (
SELECT UniversityName, Category, NumOfTimesLeased,
RANK() OVER (PARTITION BY UniversityName
ORDER BY NumOfTimesLeased DESC) Rnk
FROM CTE)
SELECT * FROM CTE2 WHERE Rnk = 1
You are on the right track with the analytic functions:
select Univerity, Category, NumLeased
from (select t.*,
row_number() over (partition by university order by Numleased desc) as seqnum
from (select l.university, b.category, count(*) as NumLeased
from lease l join
book b
on l.bookid = b.bookid
group by l.university, b.category
) t
) t
where seqnum = 1
I use the row_number() because you only want the one top value. Rank and dense_rank are more useful when you are looking for values other than "1".
If you want the top values to show up when there is a tie, then use dense_rank instead of row_number. The values will be on different rows.

Help with SQL QUERY OF JOIN+COUNT+MAX

I need a help constructung an sql query for mysql database. 2 Table as follows:
tblcities (id,name)
tblmembers(id,name,city_id)
Now I want to retrieve the 'city' details that has maximum number of 'members'.
Regards
SELECT tblcities.id, tblcities.name, COUNT(tblmembers.id) AS member_count
FROM tblcities
LEFT JOIN tblmembers ON tblcities.id = tblmembers.city_id
GROUP BY tblcities.id
ORDER BY member_count DESC
LIMIT 1
Basically: retrieve all cities and count how many members each has, sort by that member count in descending order, making the highest count first - then show only that first city.
Terrible, but that's a way of doing it:
SELECT * FROM tblcities WHERE id IN (
SELECT city_id
FROM tblMembers
GROUP BY city_id
HAVING COUNT(*) = (
SELECT MAX(TOTAL)
FROM (
SELECT COUNT(*) AS TOTAL
FROM tblMembers
GROUP BY city_id
) AS AUX
)
)
That way, if there is a tie, still you'll get all cities with the maximum number of members...
Select ...
From tblCities As C
Join (
Select city_id, Count(*) As MemberCount
From tblMembers
Order By Count(*) Desc
Limit 1
) As MostMembers
On MostMembers.city_id = C.id
select top 1 c.id, c.name, count(*)
from tblcities c, tblmembers m
where c.id = m.city_id
group by c.id, c.name
order by count(*) desc