How to find the second largest value within certain row? - sql

I have two tables, movies(id, name, year, rating) and movies_genres(movie_id, genre), I want to find the id of all second top rated movies within each genre(not global), but when I wrote this
select MG.genre, M.id
from movies_genres MG inner join movies M on MG.movie_id = M.id
where M.rating =
(select max(rating) from
(select rating from movies M2 inner join movies_genres MG2 on M2.id = MG2.movie_id where MG2.genre = MG.genre)
where rating <
(select max(rating) from
(select rating from movies M3 inner join movies_genres MG3 on M3.id = MG3.movie_id where MG3.genre = MG.genre)))
order by MG.genre;
I got an error, it said that the MG.genre in Line 5 is a invalid identifier.

If you are using a database other than mysql you can use row_number window function to get the second highest rated movie for each genre.
select MG.genre, M.id
from movies_genres MG
inner join movies M on MG.movie_id = M.id
inner join (select m.*,mg.*,
row_number() over(partition by mg.genre order by m.rating desc) as rn
from movies m inner join movies_genres mg on mg.movie_id = M.id) x
on x.id = m.id and mg.genre = x.genre
where x.rn = 2;

You don't have to make a reference to the outer genre, only to the next select level like this:
Select MG.genre, M.id
From movies_genres MG, movies M
Where M.id = MG.movie_id
And M.rating = (Select max(rating)
From movies M2, movies_genres MG2
Where M2.id = MG2.movie_id
And MG2.genre = MG.genre
And rating < (Select max(rating)
From movies M3, movies_genres MG3
Where M3.id = MG3.movie_id
And MG3.genre = MG2.genre)
)
order by MG.genre;
Edited: oopsie, I've missed one reference to genre: And MG2.genre = MG.genre. Just added.
How does it work? The most inner select returns the max rating of the ask genre MG2. The middle select returns the max rating of asked genre MG, which is lower than the max of most inner select. The outer select list all movie ids (and genre) with this exact rating.

Related

For each country, report the movie genre with the highest average rates

For each country, report the movie genre with the highest average ratings, and I am missing only one step that i cant figure it out.
Here's my current code:
SELECT c.code AS c_CODE, menres.genre AS GENRE, AVG(RATE) as AVERAGE_rate,MAX(RATE) AS MAXIMUM_rate, MIN(RATE) AS MINIMUM_rate from movirates
leftJOIN movgenres ON movgenres.movieid = movratings.movieid
left JOIN users ON users.userid = movrates.userid
left JOIN c ON c.code = users.city
LEFT JOIN menres ON movenres.genreid = menres.code
GROUP BY menres.genre , c.code
order by c.code asc, avg(rate) desc, menres.genre desc ;
You can use the ROW_NUMBER window function to assign a unique rank to each of your rows:
partitioned by country code
ordered by descendent average rating
Once you get this ranking, you may want to select all those rows which have the highest average rating (which are the same having the ranking equal to 1).
WITH cte AS (
SELECT c.code AS COUNTRY_CODE,
mg.genre AS GENRE,
AVG(rating) AS AVERAGE_RATING,
MAX(rating) AS MAXIMUM_RATING,
MIN(RATING) AS MINIMUM_RATING
FROM moviesratings r
INNER JOIN moviesgenres g ON g.movieid = r.movieid
INNER JOIN users u ON u.userid = r.userid
INNER JOIN countries c ON c.code = u.country
LEFT JOIN mGenres mg ON mg.code = g.genreid
GROUP BY mg.genre,
c.code
ORDER BY c.code,
AVG(rating) DESC,
mg.genre DESC;
)
SELECT *
FROM (SELECT *,
ROW_NUMBER() OVER(
PARTITION BY COUNTRY_CODE,
ORDER BY AVERAGE_RATING) AS rn
FROM cte) ranked_averages
WHERE rn = 1
Note: The code inside the common table expression is equivalent to yours. If you're willing to share your input tables, I may even suggest an improved query.
You should use window function in this case by using rank() then select the first rank only.
with mov_rates(c.code, genre, average, max, min)
as.
select c.code c_code,
e.genre genre,
avg (rate) avg
max (rate) max
min (rate) min
from movrates a
LEFT join movge.nres b on a.movieid = b.movieid
LEFT join users c on a.userid = c.user
LEFT join countr.ies d on c.code = d.code
left join mGenres e on b.genreid = e.code
group by d.country_code, e.x
),
rategenre (rank, c_code, genre, avgrate, max, min)
as
(
select rank() over (partition by c.c order by avgrates asc) rank,
country code,
genre,
average_r.ating,
maximum_rating,
minimum_.ating
from movrate \\just practicing on something
)
selec.t 2
from genre
where rank = 5
Reference:
OVER Clause

Find all the actors that made more movies with Yash Chopra than any other director

Scehma
SELECT p1.pid,
p1.NAME,
Count(movie.mid) AS movieswithyc
FROM person AS p1 natural
JOIN m_cast natural
JOIN movie
JOIN m_director
ON (
movie.mid = m_director.mid)
JOIN person AS p2
ON (
m_director.pid = p2.pid)
WHERE p2.NAME LIKE 'Yash Chopra'
GROUP BY p1.pid
HAVING Count(movie.mid) >ALL
(
SELECT Count(movie.mid)
FROM person AS p3 natural
JOIN m_cast
INNER JOIN movie
JOIN m_director
ON (
movie.mid = m_director.mid)
JOIN person AS p4
ON (
m_director.pid = p4.pid)
where p1.pid = p3.pid
AND p4.NAME NOT LIKE 'Yash Chopra'
GROUP BY p4.pid)
ORDER BY movieswithyc DESC;
I'm not getting the right output. I'm getting zero rows . Can someone modify above query and give me the right output, I have tried various queries but not getting anything
Check this:
SELECT first.actor,
first.count
FROM (SELECT Trim(actor) AS Actor,
Count(*) AS COUNT
FROM m_cast mc
INNER JOIN (SELECT m.mid
FROM movie m) AS m
ON m.mid = Trim(mc.mid)
INNER JOIN (SELECT md.pid,
md.mid
FROM m_director md) AS md
ON md.mid = Trim(mc.mid)
INNER JOIN (SELECT p.pid,
p.NAME AS actor
FROM person p) AS pactor
ON pactor.pid = Trim(mc.pid)
INNER JOIN (SELECT p.pid,
p.NAME AS director
FROM person p) AS pdirector
ON pdirector.pid = Trim(md.pid)
WHERE director LIKE '%Yash Chopra%'
GROUP BY Trim(actor)) first
LEFT JOIN (SELECT actor,
Max(count) AS COUNT
FROM (SELECT DISTINCT Trim(actor) AS Actor,
Count(*) AS COUNT
FROM m_cast mc
INNER JOIN (SELECT m.mid
FROM movie m) AS m
ON m.mid = Trim(mc.mid)
INNER JOIN (SELECT md.pid,
md.mid
FROM m_director md) AS md
ON md.mid = Trim(mc.mid)
INNER JOIN (SELECT p.pid,
p.NAME AS actor
FROM person p) AS pactor
ON pactor.pid = Trim(mc.pid)
INNER JOIN (SELECT p.pid,
p.NAME AS director
FROM person p) AS pdirector
ON pdirector.pid = Trim(md.pid)
WHERE director NOT LIKE '%Yash Chopra%'
GROUP BY Trim(actor),
director)
GROUP BY actor) second
ON first.actor = second.actor
WHERE first.count >= second.count
OR second.actor IS NULL
ORDER BY first.count DESC
You can check the below SQL.
Explanation - First inline view returns list of people with count of their movies with 'Yash Chopra'. Second inline view returns list of people with count of their movies with other directors. At the end, I filter list of those people where count of movies with 'Yash Chopra' is greater than 'other directors'.
(select pc.name, count(distinct m.mid) count_movie
from movie m
join m_cast mc on m.mid = mc.mid
join m_director md on m.mid = md.mid
join person pc on mc.pid = pc.pid
join person pd on md.pid = pd.pid
where pd.name = 'YASH CHOPRA'
group by pc.name) lst_yc
join
(select pc.name, count(m.mid) count_movie
from movie m
join m_cast mc on m.mid = mc.mid
join m_director md on m.mid = md.mid
join person pc on mc.pid = pc.pid
join person pd on md.pid = pd.pid
where pd.name != 'YASH CHOPRA'
group by pc.name) lst_wo
on lst_yc.name = lst_wo.name
where lst_yc.count_movie > lst_wo.count_movie
SELECT *
FROM (
SELECT pc.NAME,
Count(DISTINCT Trim(m.mid)) count_movie
FROM movie m
JOIN m_cast mc
ON Trim(m.mid) = Trim(mc.mid)
JOIN m_director md
ON Trim(m.mid) = Trim(md.mid)
JOIN person pc
ON Trim(mc.pid) = Trim(pc.pid)
JOIN person pd
ON trim(md.pid )= Trim(pd.pid) where pd.NAME = 'Yash Chopra' GROUP BY pc.NAME) lst_yc
JOIN
(
SELECT pc.NAME,
count(trim(m.mid)) count_movie
FROM movie m
JOIN m_cast mc
ON trim(m.mid) = trim(mc.mid )
JOIN m_director md
ON trim(m.mid) = (md.mid)
JOIN person pc
ON trim(mc.pid) = trim(pc.pid)
JOIN person pd
ON trim(md.pid) = trim(pd.pid)
WHERE pd.NAME != 'Yash Chopra'
GROUP BY pc.NAME) lst_wo
ON lst_yc.NAME = lst_wo.NAME
WHERE lst_yc.count_movie > lst_wo.count_movie
This seems to be the answer as given by Mr. Shantanu.
But Do you know why this is taking time, I ran query 1 hour ago and no reult has produced yet.
p2.NAME LIKE 'Yash Chopra' and p1.PID
This is your line from the code.
You should have written it like this TRIM(p2.NAME),TRIM(p1.PID) because the Name and PID from your Movie Table contains spaces and things like that.You should process it correctly else it will return zero rows, keep that thing in mind.
select t.actor,t.count from ( SELECT actor,count(distinct m.mid) as count
FROM m_cast mc
INNER JOIN (SELECT m.mid
FROM movie m) AS m
ON m.mid = Trim(mc.mid)
INNER JOIN (SELECT md.pid,
md.mid
FROM m_director md) AS md
ON md.mid = Trim(mc.mid)
INNER JOIN (SELECT p.pid,
p.NAME AS actor
FROM person p) AS pactor
ON pactor.pid = Trim(mc.pid)
INNER JOIN (SELECT p.pid,
p.NAME AS director
FROM person p) AS pdirector
ON pdirector.pid = Trim(md.pid)
WHERE director LIKE '%Yash Chopra%'
--and actor like '%Uttam Sodi%'
group by actor) as t
join( SELECT actor,count(distinct m.mid) as count
FROM m_cast mc
INNER JOIN (SELECT m.mid
FROM movie m) AS m
ON m.mid = Trim(mc.mid)
INNER JOIN (SELECT md.pid,
md.mid
FROM m_director md) AS md
ON md.mid = Trim(mc.mid)
INNER JOIN (SELECT p.pid,
p.NAME AS actor
FROM person p) AS pactor
ON pactor.pid = Trim(mc.pid)
INNER JOIN (SELECT p.pid,
p.NAME AS director
FROM person p) AS pdirector
ON pdirector.pid = Trim(md.pid)
WHERE director not LIKE '%Yash Chopra%'
group by actor) as w
where t.actor=w.actor and t.count>=w.count
Hey folks who are new to sql and trying very hard to solve this question like me, you can find the part of a solution(99%) below, as i don't want to interupt your process of learning. But before going through it, try for one last time. I am thankful to the people who have discussed their various thoughts on this question in this forum, as they have triggered various ideas in me.
Before going through the solution you can have a look at this video to get an overview on various new keywords used in the below code.
disclaimer - use trim option wherever required
select actor,movies from
( select mc.pid as actor,
md.pid as director,
p.pid,
count(*) as movies,
rank() over (partition by mc.pid order by count(*) desc) as rn,
p.name
from m_director as md
join
m_cast as mc on md.mid=mc.mid
left join
person as p on md.pid=p.pid and name = 'Yash Chopra'
group by mc.pid,md.pid
)
where rn =1 and director like "nm0007181" ;
exact solution - in order to get the exact solution you can join the above table with people table to get the names of actors who had been directed more by yash chopra than any other director.
paila saisravan - data digger
select p.name,h.count
from(select mc.pid as mcpid,md.pid as mdpid,count(mc.MID) as count
from m_cast as mc
join m_director md
on md.MID=mc.MID
group by mc.pid ,md.pid
) h
join person p
on h.mcpid=p.pid
where h.count = (select count(*) as count
from m_cast as mc
join m_director md
on md.mid=mc.mid
where mc.pid=h.mcpid
group by mc.pid,md.pid
order by count(*) desc
limit 1)
and h.mdpid = (select pid
from person
where name like '%Yash Chopra%'
)
order by h.count desc

Sql query returning empty table

I am trying to solve 2 queries
Find all the actors that made more movies with Yash Chopra than any other director
Select b.number,b.actor,b.director from (select MAX(a.count) as number,a.director,a.actor from
(select count(p.PID) as count ,p.PID as actor,md.PID as director from person as p left join m_cast
as
mc on trim(p.PID)=trim(mc.PID) inner join m_director as md on trim(md.MID)=trim(mc.MID) group by
md.PID ,p.PID) as a group by a.actor) as b where b.director=(select PID from person where
Name='Yash Chopra')
report for each year the percentage of movies in that year with only female actors, and the total number of movies made that year. For example, one answer will be: 1990 31.81 13522 meaning that in 1990 there were 13,522 movies, and 31.81% had only female actors. You do not need to round your answer.
SELECT female_count.year Year,
((female_count.Total_movies_with_only_female_leads)*100)/total_count.Total Percentage FROM ((SELECT
movie.year Year,count(*) Total_movies_with_only_female_leads FROM movie WHERE NOT EXISTS ( SELECT *
FROM M_Cast,person WHERE M_Cast.mid = movie.MID and M_Cast.PID = person.PID AND person.gender='Male'
) GROUP BY movie.year) female_count, (SELECT movie.year,count(*) as Total FROM movie group by
movie.year) total_count) WHERE female_count.year=total_count.year
Unfortunately for both the queries, I am getting empty table. Can someone help me in solving these 2 queries
I wrote it using CTEs so it is more readable.
First Question:
WITH HowManyMoviesPerActorDirector AS
(select mc.pid as actorpid
,pa.name as actorname
,md.pid as directorpid
,pd.name as producername
,count(mc.MID) as numberofmovies
from m_cast as mc
inner join m_director md on md.MID=mc.MID
inner join person pa ON mc.PID=pa.PID
inner join person pd ON md.PID=pd.PID
group by mc.pid as actorpid,md.pid
)
select h.acorname
,h.producername
,h.numberofmovies
from HowManyMoviesPerActorDirector h
WHERE h.numberofmovies = (select MAX(h2.numberofmovies)
from HowManyMoviesPerActorDirector h2
where h2actorpid=h.actorpid
group by h2actorpid)
AND h.producername='Yash Chopra'
The second one:
WITH MoviesIncludingGendeflag AS
( select m.mid
,m.year
,sum(case when p.gender='female' then 0 else 1 end) as genderflag
from movie m
inner join mc_cast mc on mc.mid=m.mid
inner join person p on p.pid=mc.pid
group by m.mid,m.year
) FemaleOnlyMovies AS
( select m.year,count(m.id) as Total
from MoviesIncludingGendeflag m
where generflag=0
group by m.year
), TotalMovies AS
(
select m.year,count(m.id) as Total
from movie m
group by m.year
)
select TM.year,TM.Total,(COALESCE(FOM.Total,0)*100.0/TM.Total) as percentage
from TotalMovies TM
left join FemaleOnlyMovies FOM ON FOM.year=TM.year

Remove duplicate rows from answer of below query

**List all directors who directed 5000 movies or more, in descending order of the number of movies they directed
The use of Distinct before d.name doesnot help.
result = pd.read_sql_query("SELECT d.name,count(*) as num
FROM PERSON d, M_DIRECTOR md
WHERE d.Pid = md.Pid
GROUP BY d.Pid,d.name
HAVING COUNT(*) >= 10
order by count(*) desc
",conn)
You must use proper explicit joins between the tables and count on distinct movies:
select
p.name,
count(distinct d.mid) num
from person p
inner join m_director d on d.pid = p.pid
inner join movie m on m.mid = d.mid
group by p.pid, p.name
having num >= 10
order by num desc
Probably you have duplicate records in Person table - people with the same name but different ids. Try to group just by name and not by id
result = pd.read_sql_query("SELECT d.name,count(*) as num
FROM PERSON d, M_DIRECTOR md
WHERE d.Pid = md.Pid
GROUP BY d.name
HAVING COUNT(*) >= 10
order by count(*) desc
",conn)

Shorten a query

I have to write a query that would calculate number of tickets purchased consisting only of movie genre of that type. At the end, I have to return movie genre and number of tickets bought for that genre. I have written a query but I was wondering if it can be made shorter and more compact?
Following is the database scheme:
movies(movieId, movieGenre, moviePrice)
tickets(ticketId, ticketDate, customerId)
details(ticketId, movieId, numOfTickets)
Here is my query:
select m.genre, count(*)
from(select t.ticketId, m.genre
from(select d.ticketId
from(select m.genre, t.ticketId
from tickets t join details d on t.ticketId =
d.ticketId join movies m on d.movieId = m.movieId
group by m.genre, t.ticketId) d
group by d.ticketId
having count(*) = 1) as t join details d on t.ticketId =
d.ticketId join movies m on d.movieId = m.movieId
group by t.ticketId, m.genre) m
group by m.genre;
This runs on a database so I am only able to post sample output:
comedy 29821
action 27857
rom-com 19663
I see no reason to use the table tickets, because the results do not filter or aggregate by ticketDate or customerID. Thus, a shorter sql is
SELECT m.moviegenre,
Sum(d.numoftickets) as SumNum
FROM details d
LEFT JOIN movies m
ON d.movieid = m.movieid
GROUP BY m.moviegenre
HAVING SumNum > 0
ORDER BY m.moviegenre
added 3/28 am
I am not sure what is meant by Duplicates?? In table = details(ticketId, movieId, numOfTickets) ??
I would expect that ticketId is unique, so what would explain duplicates?
Is the same ticketId being printed twice, repeatedly??
Determine what number of ticketId are duplicates--
SELECT ticketId, count(*) as cnt
FROM details d
GROUP By ticketId
HAVING count(*) > 1
Determine what number of "details" rows are duplicates--
SELECT ticketId, movieId, numOfTickets, count(*) as cnt
FROM details d
GROUP By ticketId, movieId, numOfTickets
HAVING count(*) > 1
Then again, it may be that table = movies(movieId, movieGenre, moviePrice) is the one with duplicates??
Determine what number of movieId are duplicates--
SELECT movieId, count(*) as cnt
FROM movies m
GROUP BY movieId
HAVING count(*) > 1
Remove duplicates from details--
SELECT m.moviegenre,
Sum(d.numoftickets) as SumNum
FROM
(Select Distinct * From details) d
LEFT JOIN movies m
ON d.movieid = m.movieid
GROUP BY m.moviegenre
ORDER BY m.moviegenre