Shorten a query - sql

I have to write a query that would calculate number of tickets purchased consisting only of movie genre of that type. At the end, I have to return movie genre and number of tickets bought for that genre. I have written a query but I was wondering if it can be made shorter and more compact?
Following is the database scheme:
movies(movieId, movieGenre, moviePrice)
tickets(ticketId, ticketDate, customerId)
details(ticketId, movieId, numOfTickets)
Here is my query:
select m.genre, count(*)
from(select t.ticketId, m.genre
from(select d.ticketId
from(select m.genre, t.ticketId
from tickets t join details d on t.ticketId =
d.ticketId join movies m on d.movieId = m.movieId
group by m.genre, t.ticketId) d
group by d.ticketId
having count(*) = 1) as t join details d on t.ticketId =
d.ticketId join movies m on d.movieId = m.movieId
group by t.ticketId, m.genre) m
group by m.genre;
This runs on a database so I am only able to post sample output:
comedy 29821
action 27857
rom-com 19663

I see no reason to use the table tickets, because the results do not filter or aggregate by ticketDate or customerID. Thus, a shorter sql is
SELECT m.moviegenre,
Sum(d.numoftickets) as SumNum
FROM details d
LEFT JOIN movies m
ON d.movieid = m.movieid
GROUP BY m.moviegenre
HAVING SumNum > 0
ORDER BY m.moviegenre
added 3/28 am
I am not sure what is meant by Duplicates?? In table = details(ticketId, movieId, numOfTickets) ??
I would expect that ticketId is unique, so what would explain duplicates?
Is the same ticketId being printed twice, repeatedly??
Determine what number of ticketId are duplicates--
SELECT ticketId, count(*) as cnt
FROM details d
GROUP By ticketId
HAVING count(*) > 1
Determine what number of "details" rows are duplicates--
SELECT ticketId, movieId, numOfTickets, count(*) as cnt
FROM details d
GROUP By ticketId, movieId, numOfTickets
HAVING count(*) > 1
Then again, it may be that table = movies(movieId, movieGenre, moviePrice) is the one with duplicates??
Determine what number of movieId are duplicates--
SELECT movieId, count(*) as cnt
FROM movies m
GROUP BY movieId
HAVING count(*) > 1
Remove duplicates from details--
SELECT m.moviegenre,
Sum(d.numoftickets) as SumNum
FROM
(Select Distinct * From details) d
LEFT JOIN movies m
ON d.movieid = m.movieid
GROUP BY m.moviegenre
ORDER BY m.moviegenre

Related

Use the count result from another query in where condition

I wonder how to combine two of my queries.
I have these 3 tables:
movies
movie_id PK
room_id FK
rooms
room_id PK
seats INTEGER
tickets
ticket_id PK
movie_id FK
In this simplified example, a movie only plays in a room and many tickets are sold for each movie.
I want to query which movies still have seats available.
For that I need to check
(room.seats - all tickets sold for that movie) > 0
If I do this, I get the total of tickets for each movie
SELECT movie_id, COUNT(*)
FROM tickets
GROUP BY movie_id;
And I would like to use that results in this query are condition
SELECT movie_id
FROM movies
JOIN rooms ON movies.room_id = rooms.room_id
WHERE (rooms.seats - [THE COUNT OF THE OTHER QUERY]) > 0
Does anyone if it is possible to achieve that?
Thank you in advance
I don't know how to combine two queries, it would be nice to understand how to achieve it
Lets assume, if data looks something like this
select
a.movie_id,
a.seats as "TOTAL_SEATS",
count(1) as "SEATS_SOLD",
a.seats - count(1) as "AVAILABLE_SEATS",
CASE when a.seats - count(1) > 0 then 'Y' else 'N' End as "SEATS_AVAILABLE"
from
(
select
m.id as movie_id,
r.seats,
t.id
from
movie m,
rooms r,
tickets t
where
m.room_id = r.id
and m.id = t.movie_id
) a
group by
a.movie_id,
a.seats
order by
movie_id asc;
Output of the Query:
If you want movie_id's specifically, for those seats that are available, then query will be as below:
select
b.movie_id
from
(
select
a.movie_id,
a.seats as "TOTAL_SEATS",
count(1) as "SEATS_SOLD",
a.seats - count(1) as "AVAILABLE_SEATS",
CASE when a.seats - count(1) > 0 then 'Y' else 'N' End as "SEATS_AVAILABLE"
from
(
select
m.id as movie_id,
r.seats,
t.id
from
movie m,
rooms r,
tickets t
where
m.room_id = r.id
and m.id = t.movie_id
) a
group by
a.movie_id,
a.seats
) b
where
b.SEATS_AVAILABLE = 'Y';
Output of the query will be:
Queries were tested on Oracle database with the above mentioned data.
SELECT movie_id
FROM movies
JOIN rooms ON movies.room_id = rooms.room_id
LEFT JOIN (SELECT movie_id, COUNT(*) AS C_M
FROM tickets
GROUP BY movie_id) t
ON movies.movie_id = t.movie_id and (rooms.seats - t.C_M) > 0
or : use a quantifier ALL,ANY
SELECT movie_id
FROM movies
JOIN rooms ON movies.room_id = rooms.room_id
WHERE (rooms.seats -
ANY (SELECT COUNT(*) OVER(PARTITION BY movie_id)
FROM tickets
WHERE tickets.movie_id =movies.movie_id)) > 0

For each country, report the movie genre with the highest average rates

For each country, report the movie genre with the highest average ratings, and I am missing only one step that i cant figure it out.
Here's my current code:
SELECT c.code AS c_CODE, menres.genre AS GENRE, AVG(RATE) as AVERAGE_rate,MAX(RATE) AS MAXIMUM_rate, MIN(RATE) AS MINIMUM_rate from movirates
leftJOIN movgenres ON movgenres.movieid = movratings.movieid
left JOIN users ON users.userid = movrates.userid
left JOIN c ON c.code = users.city
LEFT JOIN menres ON movenres.genreid = menres.code
GROUP BY menres.genre , c.code
order by c.code asc, avg(rate) desc, menres.genre desc ;
You can use the ROW_NUMBER window function to assign a unique rank to each of your rows:
partitioned by country code
ordered by descendent average rating
Once you get this ranking, you may want to select all those rows which have the highest average rating (which are the same having the ranking equal to 1).
WITH cte AS (
SELECT c.code AS COUNTRY_CODE,
mg.genre AS GENRE,
AVG(rating) AS AVERAGE_RATING,
MAX(rating) AS MAXIMUM_RATING,
MIN(RATING) AS MINIMUM_RATING
FROM moviesratings r
INNER JOIN moviesgenres g ON g.movieid = r.movieid
INNER JOIN users u ON u.userid = r.userid
INNER JOIN countries c ON c.code = u.country
LEFT JOIN mGenres mg ON mg.code = g.genreid
GROUP BY mg.genre,
c.code
ORDER BY c.code,
AVG(rating) DESC,
mg.genre DESC;
)
SELECT *
FROM (SELECT *,
ROW_NUMBER() OVER(
PARTITION BY COUNTRY_CODE,
ORDER BY AVERAGE_RATING) AS rn
FROM cte) ranked_averages
WHERE rn = 1
Note: The code inside the common table expression is equivalent to yours. If you're willing to share your input tables, I may even suggest an improved query.
You should use window function in this case by using rank() then select the first rank only.
with mov_rates(c.code, genre, average, max, min)
as.
select c.code c_code,
e.genre genre,
avg (rate) avg
max (rate) max
min (rate) min
from movrates a
LEFT join movge.nres b on a.movieid = b.movieid
LEFT join users c on a.userid = c.user
LEFT join countr.ies d on c.code = d.code
left join mGenres e on b.genreid = e.code
group by d.country_code, e.x
),
rategenre (rank, c_code, genre, avgrate, max, min)
as
(
select rank() over (partition by c.c order by avgrates asc) rank,
country code,
genre,
average_r.ating,
maximum_rating,
minimum_.ating
from movrate \\just practicing on something
)
selec.t 2
from genre
where rank = 5
Reference:
OVER Clause

Sql query returning empty table

I am trying to solve 2 queries
Find all the actors that made more movies with Yash Chopra than any other director
Select b.number,b.actor,b.director from (select MAX(a.count) as number,a.director,a.actor from
(select count(p.PID) as count ,p.PID as actor,md.PID as director from person as p left join m_cast
as
mc on trim(p.PID)=trim(mc.PID) inner join m_director as md on trim(md.MID)=trim(mc.MID) group by
md.PID ,p.PID) as a group by a.actor) as b where b.director=(select PID from person where
Name='Yash Chopra')
report for each year the percentage of movies in that year with only female actors, and the total number of movies made that year. For example, one answer will be: 1990 31.81 13522 meaning that in 1990 there were 13,522 movies, and 31.81% had only female actors. You do not need to round your answer.
SELECT female_count.year Year,
((female_count.Total_movies_with_only_female_leads)*100)/total_count.Total Percentage FROM ((SELECT
movie.year Year,count(*) Total_movies_with_only_female_leads FROM movie WHERE NOT EXISTS ( SELECT *
FROM M_Cast,person WHERE M_Cast.mid = movie.MID and M_Cast.PID = person.PID AND person.gender='Male'
) GROUP BY movie.year) female_count, (SELECT movie.year,count(*) as Total FROM movie group by
movie.year) total_count) WHERE female_count.year=total_count.year
Unfortunately for both the queries, I am getting empty table. Can someone help me in solving these 2 queries
I wrote it using CTEs so it is more readable.
First Question:
WITH HowManyMoviesPerActorDirector AS
(select mc.pid as actorpid
,pa.name as actorname
,md.pid as directorpid
,pd.name as producername
,count(mc.MID) as numberofmovies
from m_cast as mc
inner join m_director md on md.MID=mc.MID
inner join person pa ON mc.PID=pa.PID
inner join person pd ON md.PID=pd.PID
group by mc.pid as actorpid,md.pid
)
select h.acorname
,h.producername
,h.numberofmovies
from HowManyMoviesPerActorDirector h
WHERE h.numberofmovies = (select MAX(h2.numberofmovies)
from HowManyMoviesPerActorDirector h2
where h2actorpid=h.actorpid
group by h2actorpid)
AND h.producername='Yash Chopra'
The second one:
WITH MoviesIncludingGendeflag AS
( select m.mid
,m.year
,sum(case when p.gender='female' then 0 else 1 end) as genderflag
from movie m
inner join mc_cast mc on mc.mid=m.mid
inner join person p on p.pid=mc.pid
group by m.mid,m.year
) FemaleOnlyMovies AS
( select m.year,count(m.id) as Total
from MoviesIncludingGendeflag m
where generflag=0
group by m.year
), TotalMovies AS
(
select m.year,count(m.id) as Total
from movie m
group by m.year
)
select TM.year,TM.Total,(COALESCE(FOM.Total,0)*100.0/TM.Total) as percentage
from TotalMovies TM
left join FemaleOnlyMovies FOM ON FOM.year=TM.year

Remove duplicate rows from answer of below query

**List all directors who directed 5000 movies or more, in descending order of the number of movies they directed
The use of Distinct before d.name doesnot help.
result = pd.read_sql_query("SELECT d.name,count(*) as num
FROM PERSON d, M_DIRECTOR md
WHERE d.Pid = md.Pid
GROUP BY d.Pid,d.name
HAVING COUNT(*) >= 10
order by count(*) desc
",conn)
You must use proper explicit joins between the tables and count on distinct movies:
select
p.name,
count(distinct d.mid) num
from person p
inner join m_director d on d.pid = p.pid
inner join movie m on m.mid = d.mid
group by p.pid, p.name
having num >= 10
order by num desc
Probably you have duplicate records in Person table - people with the same name but different ids. Try to group just by name and not by id
result = pd.read_sql_query("SELECT d.name,count(*) as num
FROM PERSON d, M_DIRECTOR md
WHERE d.Pid = md.Pid
GROUP BY d.name
HAVING COUNT(*) >= 10
order by count(*) desc
",conn)

How to find the second largest value within certain row?

I have two tables, movies(id, name, year, rating) and movies_genres(movie_id, genre), I want to find the id of all second top rated movies within each genre(not global), but when I wrote this
select MG.genre, M.id
from movies_genres MG inner join movies M on MG.movie_id = M.id
where M.rating =
(select max(rating) from
(select rating from movies M2 inner join movies_genres MG2 on M2.id = MG2.movie_id where MG2.genre = MG.genre)
where rating <
(select max(rating) from
(select rating from movies M3 inner join movies_genres MG3 on M3.id = MG3.movie_id where MG3.genre = MG.genre)))
order by MG.genre;
I got an error, it said that the MG.genre in Line 5 is a invalid identifier.
If you are using a database other than mysql you can use row_number window function to get the second highest rated movie for each genre.
select MG.genre, M.id
from movies_genres MG
inner join movies M on MG.movie_id = M.id
inner join (select m.*,mg.*,
row_number() over(partition by mg.genre order by m.rating desc) as rn
from movies m inner join movies_genres mg on mg.movie_id = M.id) x
on x.id = m.id and mg.genre = x.genre
where x.rn = 2;
You don't have to make a reference to the outer genre, only to the next select level like this:
Select MG.genre, M.id
From movies_genres MG, movies M
Where M.id = MG.movie_id
And M.rating = (Select max(rating)
From movies M2, movies_genres MG2
Where M2.id = MG2.movie_id
And MG2.genre = MG.genre
And rating < (Select max(rating)
From movies M3, movies_genres MG3
Where M3.id = MG3.movie_id
And MG3.genre = MG2.genre)
)
order by MG.genre;
Edited: oopsie, I've missed one reference to genre: And MG2.genre = MG.genre. Just added.
How does it work? The most inner select returns the max rating of the ask genre MG2. The middle select returns the max rating of asked genre MG, which is lower than the max of most inner select. The outer select list all movie ids (and genre) with this exact rating.