SQL Query NOT IN, EXIST - sql

Schemas
Movie(title, year, director, budget, earnings)
Actor(stagename, realname, birthyear)
ActedIn(stagename, title, year, pay)
CanWorkWith(stagename, director)
I need to find all the actors (stagename and realname) that have never worked in a movie that has made a profit (Earnings > budget). SO finding all the bad actors :P
SELECT A.stagename, A.realname
FROM Actor A
WHERE A.stagename NOT IN
(SELECT B.stagename
FROM ActedIN B
WHERE EXIST
(SELECT *
FROM Movie M
WHERE M.earnings > M.budget AND M.title = B.title AND M.year))
Would this find all the actors whose stagename does not appear in the second query? Second query will find all stagenames that acted in movies that made a profit.
Is this correct?

I think you could simplify it a bit, see below:
SELECT DISTINCT A.stagename, A.realname
FROM Actor A
WHERE NOT EXISTS
(SELECT *
FROM Actor B
, Movie M
, ActedIn X
WHERE M.Title = X.Title
AND X.StageName = B.StageName
AND M.earnings > M.budget
AND M.year = X.Year
AND A.StageName = B.StageName)

SELECT
a.stagename,
a.realname
FROM
Actor a
LEFT JOIN
ActedIn b ON a.stagename = b.stagename
LEFT JOIN
Movie c ON b.title = c.title
AND a.year = b.year
AND c.earnings >= c.budget
WHERE
c.title IS NULL
GROUP BY
a.stagename,
a.realname
-No subqueries
-Accounts for actors who never acted in a movie yet
-Access to aggregate functions if needed.

That will work, but just do a join between ActedIn and Movie rather than exist.
Possibly also an outer join may be faster rather than the NOT IN clause, but you would need to run explain plans to be sure.

That would do it. You could also write it like:
SELECT A.stagename, A.realname, SUM(B.pay) AS totalpay
FROM Actor A
INNER JOIN ActedIn B ON B.stagename = A.stagename
LEFT JOIN Movie M ON M.title = B.title AND M.year = B.year AND M.earnings > M.budget
WHERE M.title IS NULL
GROUP BY A.stagename, A.realname
ORDER BY totalpay DESC
It basically takes the movies that made a profit and uses that as a left join condition; when the left join is null it gets counted.
I've also added the total pay of said bad actors and ranked them from best to worst paid ;-)

Yes, you have the right idea for using NOT IN, but you're missing half a boolean condition in the second subquery's WHERE clause. I think you intend to use AND M.year = B.year
WHERE M.earnings > M.budget AND M.title = B.title AND M.year = B.year
You can also do this with a few LEFT JOINs, looking for NULL in the right side of the join. This may be faster than the subquery.
SELECT
A.stagename,
A.realname
FROM Actor A
LEFT OUTER JOIN ActedIN B ON A.stagename = B.stagename
LEFT OUTER JOIN Movie M ON B.title = M.title AND B.year = M.year AND M.earnings > M.budget
WHERE
/* NULL ActedIN.stagename indicates the actor wasn't in this movie */
B.stagename IS NULL

Related

SELECT DISTINCT query taking too long SQL

This is my code below, it's taking a very a long time to execute. When I add the SELECT DISTINCT it makes it very long.
What I'm trying to do is get unique companies that satisfy these conditions and also calculate how many teams each company has (this is given by team_id which is given to each user in auth_user u table).
Any help would be amazing, I want to learn how to make better SQL queries. I know that GROUP BY is the better way to do this, but I can't seem to get it.
SELECT DISTINCT u.company_id, c.name, c.company_type, c.office_location, (SELECT (COUNT(DISTINCT u.team_id)) FROM auth_user u WHERE u.company_id = c.id GROUP BY u.company_id) as number_of_teams, s.status, h.auto_renewal
FROM auth_user u, companies_company c, subscriptions_subscription s, hubspot_company h
WHERE u.company_id = c.id
AND s.company_id = c.id
AND h.myagi_id = c.id
ORDER BY u.company_id ASC
First of all refactor your query to use the 1992 JOIN syntax instead of your grandpa's comma-join syntax. (I'm a grandpa and I jumped at using JOIN as soon as it became available.)
SELECT DISTINCT u.company_id, c.name, c.company_type, c.office_location,
count_of_teams_TODO,
s.status, h.auto_renewal
FROM auth_user u
JOIN companies_company c ON u.company_id = c.id
JOIN subscriptions_subscription s ON s.company_id = c.id
JOIN hubspot_company h ON h.myagi_id = c.id
ORDER BY u.company_id ASC;
Then, I believe each user belongs to one team; that is has one value of auth_user.team_id. And you want your result set to show how many teams the company has.
So substitute COUNT(DISTINCT u.team_id) teams for my count_of_teams_TODO placeholder, getting this. There's no need for a subquery. But for the aggregate function COUNT() we need GROUP BY. And we want to group by company, status, and autorenewal.
SELECT c.id, company_id, c.name, c.company_type, c.office_location,
COUNT(DISTINCT u.team_id) teams,
s.status, h.auto_renewal
FROM auth_user u
JOIN companies_company c ON u.company_id = c.id
JOIN subscriptions_subscription s ON s.company_id = c.id
JOIN hubspot_company h ON h.myagi_id = c.id
GROUP BY c.id, s.status, h.auto_renewal
ORDER BY u.company_id ASC;
And that should do it. Study up on GROUP BY and aggregate functions. Every second you spend learning those concepts better will help you.
As far as performance goes, get this working and then ask another question. Tag it with query-optimization and read this before you ask it.

Sql query returning empty table

I am trying to solve 2 queries
Find all the actors that made more movies with Yash Chopra than any other director
Select b.number,b.actor,b.director from (select MAX(a.count) as number,a.director,a.actor from
(select count(p.PID) as count ,p.PID as actor,md.PID as director from person as p left join m_cast
as
mc on trim(p.PID)=trim(mc.PID) inner join m_director as md on trim(md.MID)=trim(mc.MID) group by
md.PID ,p.PID) as a group by a.actor) as b where b.director=(select PID from person where
Name='Yash Chopra')
report for each year the percentage of movies in that year with only female actors, and the total number of movies made that year. For example, one answer will be: 1990 31.81 13522 meaning that in 1990 there were 13,522 movies, and 31.81% had only female actors. You do not need to round your answer.
SELECT female_count.year Year,
((female_count.Total_movies_with_only_female_leads)*100)/total_count.Total Percentage FROM ((SELECT
movie.year Year,count(*) Total_movies_with_only_female_leads FROM movie WHERE NOT EXISTS ( SELECT *
FROM M_Cast,person WHERE M_Cast.mid = movie.MID and M_Cast.PID = person.PID AND person.gender='Male'
) GROUP BY movie.year) female_count, (SELECT movie.year,count(*) as Total FROM movie group by
movie.year) total_count) WHERE female_count.year=total_count.year
Unfortunately for both the queries, I am getting empty table. Can someone help me in solving these 2 queries
I wrote it using CTEs so it is more readable.
First Question:
WITH HowManyMoviesPerActorDirector AS
(select mc.pid as actorpid
,pa.name as actorname
,md.pid as directorpid
,pd.name as producername
,count(mc.MID) as numberofmovies
from m_cast as mc
inner join m_director md on md.MID=mc.MID
inner join person pa ON mc.PID=pa.PID
inner join person pd ON md.PID=pd.PID
group by mc.pid as actorpid,md.pid
)
select h.acorname
,h.producername
,h.numberofmovies
from HowManyMoviesPerActorDirector h
WHERE h.numberofmovies = (select MAX(h2.numberofmovies)
from HowManyMoviesPerActorDirector h2
where h2actorpid=h.actorpid
group by h2actorpid)
AND h.producername='Yash Chopra'
The second one:
WITH MoviesIncludingGendeflag AS
( select m.mid
,m.year
,sum(case when p.gender='female' then 0 else 1 end) as genderflag
from movie m
inner join mc_cast mc on mc.mid=m.mid
inner join person p on p.pid=mc.pid
group by m.mid,m.year
) FemaleOnlyMovies AS
( select m.year,count(m.id) as Total
from MoviesIncludingGendeflag m
where generflag=0
group by m.year
), TotalMovies AS
(
select m.year,count(m.id) as Total
from movie m
group by m.year
)
select TM.year,TM.Total,(COALESCE(FOM.Total,0)*100.0/TM.Total) as percentage
from TotalMovies TM
left join FemaleOnlyMovies FOM ON FOM.year=TM.year

sql list actors who acted in a film before 1900 and also in a film after 2000

so I have 3 tables, actor(id, name), movie (id,name,year) and casts(aid, mid) (which are actor id and movie id). My goal is to select all the actors who acted in a film before 1900 and also in a film after 2000.
My query is
select a.id
from actor a, movie m1, casts c1, movie m2, casts c2
where a.id = c1.aid = c2.aid and c1.mid = m1.id and c2.mid = m2.id and
m1.year >2000 and m2.year <1900;
this query took really long time and didnt seem to produce the right result.
So someone could please help me?
To get actors who were in films during two date ranges, use two subqueries. Something like this:
select yourFields
from yourTables
where actorId in (subquery to get actor id's for one date range)
and actorId in (subquery to get actor id's for second date range)
You can work out the details.
I assume the problem is the expression a.id = c1.aid = c2.aid. If I am not mistaken, this first compares c1.aid with c2.aid and then the boolean result with a.id.
You could try this:
select a.id
from actor a
inner join casts c1 on c1.aid = a.id
inner join casts c2 on c2.aid = a.id
inner join movie m1 on c1.mid = m1.id
inner join movie m2 on c2.mid = m2.id
where m1.year >2000 and m2.year <1900;
Or, if you better like the where syntax of inner joins, just change a.id = c1.aid = c2.aid to a.id = c1.aid and a.id = c2.aid
This will also work
with first as (
SELECT Trim(mc.pid) as pid1
FROM m_cast mc
WHERE Trim(mc.mid) IN (SELECT Trim(m.mid)
FROM movie m
WHERE Cast(Substr(Trim(m.year), -4) AS INTEGER) < 1970)
),
second as (
SELECT Trim(mc.pid) as pid2
FROM m_cast mc
WHERE Trim(mc.mid) IN (SELECT Trim(m.mid)
FROM movie m
WHERE Cast(Substr(Trim(m.year), -4) AS INTEGER) > 1990)
)
select first.pid1 from first intersect select second.pid2 from second
This query likely takes a long time to run due to the size of the tables and large number of joins.
This query returns results because there are errors in the database.
The correct query is:
SELECT DISTINCT a1.fname, a2.lname
FROM
-- Create the table used to get all movies before 1900
actor AS a1
INNER JOIN casts AS c1
ON a1.id=c1.pid
INNER JOIN movie as m1
on m1.id = c1.mid,
-- Create the table used to get all movies after 2000
actor as a2
INNER JOIN casts AS c2
ON a2.id=c2.pid
INNER JOIN movie as m2
on m2.id = c2.mid
-- Only display actors that have played before 1900 and after 2000
WHERE m1.year < 1900 AND m2.year > 2000 AND a1.id = a2.id;

selected statement in joining table

movie:
id ,title,yr,director,budget,gross
actor:
id,name
casting:
movieid,actorid,ord
I have a question on this I has been asking to (List the films together with the leading star for all 1962 films.) [Note: the ord field of casting gives the position of the actor. If ord=1 then this actor is in the starring role]
My answer was this:
select
title, name
from
movie
join
casting on movie.id = casting.movieid
join
actor on actor.id = casting.actorid
where
yr = 1962 and movie.id = casting.movieid and actor.id = casting.actorid and casting.ord = 1
group by
title
But what my problem was I can get close to the answer, I have the problem at the ord part because some of the casting do not have 1 for the actor that just showing two, so it will not display on the output.
How could I make it select ord =1 or ord =2 (but not both)(and 1 have higher priority)
Hope anyone can help me this.
select title,coalesce (c.name,c21.name) as actorname from movie m
left join (select * from casting where ord=1) c on m.id= c.movieid
left join (select * from casting where ord=2) c1 on c.movieid = c1.movieid
left join actor a on a.id= c.actorid
where yr=1962
Hope this will output your expected result
Select title,name
From Movie M
Cross Apply (Select top 1 movieid,actorid,ord from Casting Order By ord) as Cast
Inner join Actor A On A.id = Cast. actorid
Use CROSS APPLY (http://technet.microsoft.com/en-us/library/ms175156(v=sql.105).aspx) with a subquery which is ordered by the 'ord' column and has a TOP(1) clause.
SELECT
M.title
, CA.name
FROM
movie AS M
CROSS APPLY (
SELECT TOP(1)
actor.name
FROM
casting C
INNER JOIN actor A
ON C.actorid = A.id
WHERE
M.id = C.movieid
ORDER BY
ord
ASC
) AS CA
I have 2 answers:
SELECT movie.title, (select actor.name from actor where actor.id = casting.actorid)
FROM movie
JOIN casting
ON movie.id =casting.movieid
WHERE movie.yr = 1962 and casting.ord = 1
But then I realized that you can chain joins:
SELECT movie.title, actor.name
FROM movie
JOIN casting ON movie.id=casting.movieid
JOIN actor ON casting.actorid = actor.id
WHERE movie.yr = 1962 and casting.ord = 1
The second one is clearly much simpler. (There's no need to nest a SELECT statement).
To have it select either or, do ... and (casting.ord = 1 xor casting.ord = 2).
To order it by 1, try something like Order by casting.ord in (1,2). (I haven't tested that.

sql triple join: ambigious attribute name on a count

So I want to count a number of books, but the books are stored in 2 different tables with the same attribute name.
I want to get a result that looks like:
name1 [total number of books of 1]
name2 [total number of books of 2]
I tried this triple join;
SELECT DISTINCT name, count(book)
FROM writes w
LEFT JOIN person p on p.id = w.author
LEFT JOIN book b on b.title = w.book
LEFT JOIN controls l on l.controller=p.id
GROUP BY name
ORDER BY name DESC
but since book exists as an attribute in writes and in controls, it cant execute the query.
It can only do it if I leave out one of joins so it can identify book.
How can I tell the sql engine to count the number of both book attributes together for each person?
As a result of database design that you interested in, you should issue 2 different sql and then merge them to handle single output.
A)
SELECT DISTINCT w.name as 'Name', count(w.book) as 'Cnt'
FROM writes w
LEFT JOIN person p on p.id = w.author
LEFT JOIN book b on b.title = w.book
B)
SELECT DISTINCT l.name as 'Name', count(l.book) as 'Cnt'
FROM controls l
LEFT JOIN person p on p.id = l.controller
LEFT JOIN book b on b.title = l.book
For your purpose, you can get UNION of A and B.
or you can use them as data source on a third SQL
select A.Name, sum(A.Cnt+B.Cnt)
from A, B
where A.Name = B.Name
group by A.Name
order by A.Name
WITH T AS
(
SELECT DISTINCT 'WRITES' FROMTABLE, w.name, w.count(book)
FROM writes w
LEFT JOIN person p on p.id = w.author
LEFT JOIN book b on b.title = w.book
GROUP BY name
UNION ALL
SELECT DISTINCT 'CONTROLLS' FROMTABLE, c.name, count(c.book)
FROM controlls c
LEFT JOIN person p on p.id = c.author
LEFT JOIN book b on b.title = c.book
GROUP BY name
)
SELECT * FROM T ORDER BY NAME
Should work.
HTH
This will work on a per distinct author's ID to how many books they've written. The pre-aggregation will return one record per author with how many books by that author. THEN, join to the person table to get the name. The reason I am leaving it by ID and Name of the author is... what if you have two authors "John Smith", but they have respective IDs of 123 and 389. You wouldn't want these rolled-up to the same person (or do you).
select
P.ID,
P.Name,
PreAgg.BooksPerAuthor
from
( select
w.author,
count(*) BooksPerAuthor
from
writes w
group by
w.author ) PreAgg
JOIN Person P
on PreAgg.Author = P.id
order by
P.Name