Inner joins with Case statement - sql

Instructions
Write a query to return the number of productive and less-productive
actors. The order of your results doesn't matter.
Definitions
productive: appeared in >= 30 films. less-productive: appeared in
<30 films.
I got this error on my query below
(syntax error at or near "AS" LINE 14: END) AS actor_category ^)
SELECT a.actor_id, MAX(a.first_name), MAX(a.last_name)
FROM actor a
INNER JOIN film_actor fa
ON a.actor_id = fa.actor_id
INNER JOIN film f
ON fa.film_id = f.film_id
(CASE
WHEN a.actor_id >= 30 THEN 'productive'
WHEN a.actor_id <= 30 THEN 'less-productive'
END) AS actor_category
GROUP BY a.actor_id;
This was the answer I was given:
SELECT actor_category,
COUNT(*)
FROM (
SELECT
A.actor_id,
CASE WHEN COUNT(DISTINCT FA.film_id) >= 30 THEN 'productive' ELSE 'less productive' END AS actor_category
FROM actor A
LEFT JOIN film_actor FA
ON FA.actor_id = A.actor_id
GROUP BY A.actor_id
) X
GROUP BY actor_category;
Why does it have to be done this way?

Plenty of SQL learning documents, but sometimes having a task and understanding that one can help you learn APPLIED techniques vs just here is how to write a query.
First, what do you care about. The total number of movies ANY given actor has been in. In this case, you have the film_actor table. Get your counts from this first.
select
fa.actor_id,
count(*) TotalMovies
from
film_actor fa
group by
fa.actor_id
The result of this might have
actor_id TotalMovies
1 7
2 34
3 27
4 41
In this case, you dont care WHO was in the movie, just that an actor was in X number of movies. So now that you have THIS query, you can then join to the actual actors table to pull their names and THEN apply the case/when based on the total movies count.
select
a.FirstName,
a.LastName,
PreQueryCnts.TotalMovies,
-- NOW we can apply the case/test
case when PreQueryCnts.TotalMovies >= 30
then 'productive'
else 'less productive' end as Actor_Category
from
(select
fa.actor_id,
count(*) TotalMovies
from
film_actor fa
group by
fa.actor_id ) PreQueryCnts
JOIN actor a
on PreQueryCnts.actor_id = a.actor_id
Although this does not specifically answer your question, you can see how you could get the detail. Now, to get your category total, just use the result of the original query and the simplify the count at the outer level too.
select
-- NOW we can apply the case/test
case when PreQueryCnts.TotalMovies >= 30
then 'productive'
else 'less productive' end as Actor_Category,
count(*) as ActorsInThisCategory
from
(select
fa.actor_id,
count(*) TotalMovies
from
film_actor fa
group by
fa.actor_id ) PreQueryCnts
group by
case when PreQueryCnts.TotalMovies >= 30
then 'productive'
else 'less productive' end
But there are many ways to pull a query. As long as you understand how/where the components come from. Notice the final query does not even care WHO the actor is and is not part of the query at all.

Related

Get all actors who have played in every film categor at least twice (Sakila DB)

I'm running queries on Sakila DB, and I would like to get all actors who have played in every film category at least twice.
I'm having trouble to implement the condition of at least twice in the query, so I would appreciate any help with that.
My try:
SELECT DISTINCT first_name, last_name
FROM actor, film_category, film_actor
WHERE actor.actor_id=film_actor.actor_id AND
film_actor.film_id=film_category.film_id
AND EXISTS(SELECT NULL
FROM film_category
WHERE film_actor.film_id=film_category.film_id
)
HAVING COUNT(film_category.film_id)>1
ORDER BY first_name, last_name
Never use commas in the FROM clause. Always use proper, explicit, standard JOIN syntax.
This is a complicated query. It starts by counting the number of films for each actor/category . . . and then being sure that there are at least two and that all categories are covered.
The first part is:
select fa.actor_id, fc.category_id, count(*) as num_films
from film_actor fa join
film_category fc
on fa.film_id = fc.film_id
group by fa.actor_id, fc.category_id;
Next, we'll add the conditions for "at least two films" and "all categories" by using aggregation on this query and a having clause:
select actor_id
from (select fa.actor_id, fc.category_id, count(*) as num_films
from film_actor fa join
film_category fc
on fa.film_id = fc.film_id
group by fa.actor_id, fc.category_id
) ac
group by actor_id
having min(num_films) >= 2 and
count(*) = (select count(*) from category)
use explicit join and use having inside subquery
SELECT DISTINCT first_name, last_name
FROM actor a join film_actor fa
on a.actor_id=fa.actor_id
join film_category fc on fa.film_id=fc.film_id
WHERE
EXISTS (SELECT NULL
FROM film_category t1 join film_actor fa1
on fa1.film_id=t1.film_id
WHERE fa.actor_id=fa1.actor_id
group by fa1.actor_id,t1.category_id
HAVING COUNT( distinct film_category.film_id)>1
)
ORDER BY first_name, last_name

CodeWars SQL Challenge - Relational division: Find all movies two actors cast in together

Can someone tell me why my solution isn't working for this SQL challenge? Basically, the correct solution should return 4 rows, mine returns 148 rows.
Challenge:
Given film_actor and film tables from the DVD Rental sample database find all movies both Sidney Crowe (actor_id = 105) and Salma Nolte (actor_id = 122) cast in together and order the result set alphabetically.
Solution:
SELECT DISTINCT f.title
FROM film f
INNER JOIN film_actor a ON
f.film_id = a.film_id
INNER JOIN actor c ON
a.actor_id = c.actor_id
WHERE c.last_name IN ('Crowe', 'Nolte')
GROUP BY f.title;
I think actor IDs are already given. Also, you have to join film_actors twice.
Select f.title
From film f
Inner join film_actor a1
On f.film_id = a1.film_id
Inner join film_actor a2
On f.film_id = a2.film_id
Where a1.actor_id = 105
And a2.actor_id = 122
Order by f.title;
Since you are using c.lastName IN('Crowe', 'Nolte') you are displaying all the films these two were part of but the question is both these actors must be part of that film so you need to use c1.lastName=('Crowe') and c2.lastName=('Nolte)
Basically we need to have the count above 2 for both of them in same film (film_id) to conclude that they have cast in together for same movie.
So group by film_id and filter with count 2.
Viola !!!
Hope this helps you guys :)
select film.title from
(select film_id, count(film_id) as cnt from film_actor
where actor_id in (105, 122)
group by film_id
) temp
inner join film
on temp.film_id = film.film_id
where cnt=2
order by title asc

SQL Subquery to get actors in more than 30 films

I need to find all the names of actors that have participated in more than 30 films. This below isn't quite doing it. What am I doing wrong here?
SELECT first_name, last_name
FROM actor A
WHERE 30 > (SELECT COUNT(actor_id) FROM film_actor);
Tables involved:
actor: actor_id first_name last_name last_update
film: film_id title description release_year language_id original_language_id rental_duration rental_rate length replacement_cost rating special_features last_update
film_actor: actor_id film_id last_update
Try this:
SELECT a.first_name, a.last_name
FROM actor a
INNER JOIN film_actor fa ON a.actor_id = fa.actor_id
GROUP BY a.actor_id, a.first_name, a.last_name
HAVING COUNT(fa.film_id) > 30
Behaviour explanation of your current query
Your current query runs without any connection between actor and film_actor tables, so the results would be odd and would actually show every actor from the actor table if the number of rows in entire film_actor table with non-null value actor_id is less than 30.
Right approach for the task
Use INNER JOIN with grouping and having clause to only show actors who participated in at least 30 movies.
If a pair of actor_id, film_id in your table film_actor is unique then this query would suffice:
SELECT a.actor_id,a.first_name, a.last_name
FROM actor a
INNER JOIN film_actor fa ON
a.actor_id = fa.actor_id
GROUP BY a.actor_id, a.first_name, a.last_name
HAVING COUNT(fa.film_id) > 30
However, if you are storing several roles an actor could play in a single movie, so that the pair mentioned earlier is not unique, then add distinct to the having clause:
HAVING COUNT(DISTINCT fa.film_id) > 30
Note that I've added actor_id to select and group by clause since two actors can have the same first and last name and you'd probably want to see those cases clearly.
use joined query instead try this
SELECT first_name, last_name
FROM actor a
JOIN actor_film af
ON(a.actor_id = af.actor_id)
WHERE (SELECT count(af.film_id) FROM af) > 30
This is also good alternative, if we have common columns then it would better to use only one in GROUP BY and rest of with aggregate function as like below:
SELECT
MAX(a.first_name) AS first_name,
MAX(a.last_name) AS last_name
FROM actor a
INNER JOIN film_actor fa ON a.actor_id = fa.actor_id
GROUP BY a.actor_id
HAVING COUNT(fa.film_id) > 30

Count number of ID in a table and select one or all top values(max) in POSTGRESQL

I have a table(film_actor) that holds a relationship between two other tables, film and actor.
I want to count the number of occurences of actor.id in the table film_actor and then select the top value or values, but cant seem to get it to work.
My query so far is as below which gives me a nice list of all actors and the number of movies they have been in. What I cant seem to do is select the highest value or values. I know I can just limit the results to 1, or top 1, but I need it to be dynamic in case the list has 2 or more actors with the same number of movies.
SELECT actor.first_name, actor.last_name, COUNT(actor.actor_id) AS film_number
FROM actor
INNER JOIN film_actor ON actor.actor_id = film_actor.actor_id
GROUP BY actor.actor_id, actor.first_name, actor.last_name
ORDER BY film_number DESC
I havent been able to nest MAX() into this, but Im thinking this is what I need. All tips are welcome!
The best way to write this type of query is to use rank() or dense_rank():
SELECT fa.*
FROM (SELECT a.first_name, a.last_name, COUNT(a.actor_id) AS film_number,
RANK() OVER (ORDER BY COUNT(a.actor_id) DESC) as seqnum
FROM actor a INNER JOIN
film_actor fa
ON a.actor_id = fa.actor_id
GROUP BY a.actor_id, a.first_name, a.last_name
) fa
WHERE seqnum = 1;
I ended up solving this by using HAVING and ALL, see below:
SELECT
actor.first_name, actor.last_name, COUNT(film_actor.actor_id) as Antalfilmer
FROM
film_actor
INNER JOIN actor ON film_actor.actor_id = actor.actor_id
GROUP BY actor.first_name, actor.last_name, actor.actor_id
HAVING COUNT(film_actor.actor_id) >= ALL (SELECT COUNT(film_actor.actor_id) FROM film_actor JOIN actor ON actor.actor_id = film_actor.actor_id GROUP BY actor.first_name, actor.last_name, actor.actor_id)

SQL Query NOT IN, EXIST

Schemas
Movie(title, year, director, budget, earnings)
Actor(stagename, realname, birthyear)
ActedIn(stagename, title, year, pay)
CanWorkWith(stagename, director)
I need to find all the actors (stagename and realname) that have never worked in a movie that has made a profit (Earnings > budget). SO finding all the bad actors :P
SELECT A.stagename, A.realname
FROM Actor A
WHERE A.stagename NOT IN
(SELECT B.stagename
FROM ActedIN B
WHERE EXIST
(SELECT *
FROM Movie M
WHERE M.earnings > M.budget AND M.title = B.title AND M.year))
Would this find all the actors whose stagename does not appear in the second query? Second query will find all stagenames that acted in movies that made a profit.
Is this correct?
I think you could simplify it a bit, see below:
SELECT DISTINCT A.stagename, A.realname
FROM Actor A
WHERE NOT EXISTS
(SELECT *
FROM Actor B
, Movie M
, ActedIn X
WHERE M.Title = X.Title
AND X.StageName = B.StageName
AND M.earnings > M.budget
AND M.year = X.Year
AND A.StageName = B.StageName)
SELECT
a.stagename,
a.realname
FROM
Actor a
LEFT JOIN
ActedIn b ON a.stagename = b.stagename
LEFT JOIN
Movie c ON b.title = c.title
AND a.year = b.year
AND c.earnings >= c.budget
WHERE
c.title IS NULL
GROUP BY
a.stagename,
a.realname
-No subqueries
-Accounts for actors who never acted in a movie yet
-Access to aggregate functions if needed.
That will work, but just do a join between ActedIn and Movie rather than exist.
Possibly also an outer join may be faster rather than the NOT IN clause, but you would need to run explain plans to be sure.
That would do it. You could also write it like:
SELECT A.stagename, A.realname, SUM(B.pay) AS totalpay
FROM Actor A
INNER JOIN ActedIn B ON B.stagename = A.stagename
LEFT JOIN Movie M ON M.title = B.title AND M.year = B.year AND M.earnings > M.budget
WHERE M.title IS NULL
GROUP BY A.stagename, A.realname
ORDER BY totalpay DESC
It basically takes the movies that made a profit and uses that as a left join condition; when the left join is null it gets counted.
I've also added the total pay of said bad actors and ranked them from best to worst paid ;-)
Yes, you have the right idea for using NOT IN, but you're missing half a boolean condition in the second subquery's WHERE clause. I think you intend to use AND M.year = B.year
WHERE M.earnings > M.budget AND M.title = B.title AND M.year = B.year
You can also do this with a few LEFT JOINs, looking for NULL in the right side of the join. This may be faster than the subquery.
SELECT
A.stagename,
A.realname
FROM Actor A
LEFT OUTER JOIN ActedIN B ON A.stagename = B.stagename
LEFT OUTER JOIN Movie M ON B.title = M.title AND B.year = M.year AND M.earnings > M.budget
WHERE
/* NULL ActedIN.stagename indicates the actor wasn't in this movie */
B.stagename IS NULL