sqlzoo joinII exercise - movie databases 4a - sql

This is question is sqlzoo, and I wrote following code, but I feel it is too redundant
SELECT year, freq
FROM (SELECT yr AS year,count(title) AS freq
FROM movie, actor, casting
WHERE name= 'John Travolta'
AND movie.id=movieid
AND actor.id=actorid
GROUP BY yr) AS a
WHERE freq=(
SELECT MAX(freq)
FROM (SELECT yr AS year,count(title) AS freq
FROM movie, actor, casting
WHERE name= 'John Travolta'
AND movie.id=movieid
AND actor.id=actorid
GROUP BY yr) AS b
)
why cannot it be like this?
SELECT year, freq
FROM (SELECT yr AS year,count(title) AS freq
FROM movie, actor, casting
WHERE name= 'John Travolta'
AND movie.id=movieid
AND actor.id=actorid
GROUP BY yr) AS a
WHERE freq=(
SELECT MAX(freq)
FROM a
)

In cases like this it may be helpful to use CTE's (Common Table Expression). That's the only way you can re-use a subquery. Look how you can use ROW_NUMBER to find the largest frequency. I have also updated the old school FROM A, B, C WHERE ... to the new school FROM A INNER JOIN B ... (I'm not 100% sure the JOIN criteria are correct though.)
WITH a AS
(
SELECT
yr AS year,
COUNT(title) AS freq
FROM
movie
INNER JOIN
casting ON movie.id = casting.movieid
INNER JOIN
actor ON actor.id = casting.actorid
WHERE
name = 'John Travolta'
GROUP BY
yr),
b AS
(
SELECT
year, freq,
ROW_NUMBER() OVER (ORDER BY freq DESC) as RowNum
FROM a
)
SELECT year, freq
FROM b
WHERE RowNum = 1

Whenever you are writing sub-queries the inner ones are evaluated first and then the outer queries.In your second query you are using alias "a" which doesn't exist actually.That is the reason you will get an error in the second query and you cannot use it.The first query is the correct one syntactically.

Related

Get a column without adding it to the group by

select year, gender, max(nHospitalizations) from (select TO_CHAR(i.since, 'YYYY') as year, u.gender, h.name,count(h.name) as nHospitalizations from hospital h
join hospitalization i on i.hospital = h.name
join person u on i.person = u.numberID
group by TO_CHAR(i.since, 'YYYY'), u.gender, h.name)
group by year, gender
order by year desc, gender asc
;
I have this query, and it's pretty much doing what I want it too, except, I want to know the hospital name with the most hospitalizations per year, but when I add the h.name to the select, SQL makes me add it to the outer group by, which would mean I would be getting the count per year, gender and hospital name like in the subquery, instead of the hospital with most hospitalizations per year and gender, how can I add the h.name to the outer query without adding it to the outer group by?
Never use commas in the FROM clause. Always use proper, explicit, standard, readable JOIN syntax.
You want to use window functions for this:
select *
from (select count(*) as nHospitalizations, to_char(i.since, 'YYYY') as year,
u.gender, h.name,
row_number() over (partition by min(i.since) order by count(*) desc) as seqnum
from hospital h join
hospitalization i
i.hospital = h.name join
person u
on i.person = u.numberID
group by TO_CHAR(i.since, 'YYYY'), u.gender, h.name
) x
where serqnum = 1

Can't solve a problem using group by function

The task: Obtain a list, in alphabetical order, of actors who've had at least 30 starring roles.
My code:
select name, count(ord=1)
from casting
join actor on actorid=actor.id
where ord=1 and count(ord=1) and exists ( select 1 from casting
where count(movieid)>=30)
group by actorid,name
order by name
It gives me error, - invalid use of group by function.
Join the tables, group by actor and put the condition in the having clause.
select
a.name,
sum(case c.ord when 1 then 1 else 0 end) starringroles
from actor a inner join casting c
on c.actorid = a.id
group by a.id, a.name
having sum(case c.ord when 1 then 1 else 0 end) >= 30
order by a.name
The expression sum(case c.ord when 1 then 1 else 0 end) will count the number of starring roles (with ord = 1).
you can not use aggregation on where need having
select name, count(*)
from casting
join actor on actorid=actor.id
where ord=1
and exists ( select 1 from casting
having count(movieid)>=30)
group by actorid,name
having count(movieid)>=30
order by name
select MAX(name) AS name, count(*) AS roles
from casting
join actor on actorid=actor.id
group by actorid
HAVING COUNT(*)>=30
order by name;
This is a question in SQLZoo, Question # 13. The picture below explains the tables:
A little description about the database:
This database features two entities (movies and actors) in a many-to-many relation. Each entity has its own table. A third table, casting , is used to link them. The relationship is many-to-many because each film features many actors and each actor has appeared in many films.
Here is the answer I found working:
select A.name
from actor A
inner join casting C
on C.actorid = A.id
where C.ord =1 /*only the lead roles*/
group by A.id /*grouped by Actor ID*/
having count(C.movieid) >=15 /*at least 15 starring roles*/
order by A.name /* in alphabetical order*/
SELECT actor.name FROM actor
JOIN casting ON actor.id=casting.actorid
WHERE casting.actorid IN (SELECT actorid FROM casting WHERE ord =1
GROUP BY actorid
HAVING COUNT(ord) >=15)
GROUP BY name

SQL How to use info from created table

SELECT yr, COUNT(title) AS g FROM movie
JOIN casting ON id = movieid
JOIN actor ON actorid = actor.id
WHERE name = 'John Travolta'
AND g = 1
GROUP BY yr
--> I want g to work, but I don't know how to use info from same query.
You can't use g at that moment because you're trying to use an aggregate result before it's being aggregated. You need to use the HAVING clause.
SELECT yr, COUNT(title) AS g FROM movie
JOIN casting ON id = movieid
JOIN actor ON actorid = actor.id
WHERE name = 'John Travolta'
GROUP BY yr
HAVING COUNT(title) = 1

SQL Join Query help

I am doing some SQL exercise and having this problem, this query below gives me a 'half correct' result because I only want the row(s) with the most title to be displayed, this query is displaying all records. Can someone help? Thanks.
Question:
Which were the busiest years for 'John Travolta'. Show the number of movies he made for each year.
Tables:
movie (id, title, yr, score, votes, director)
actor (id, name)
casting (movieid, actorid, ord)
Query:
select yr, max(title)
from
(
select yr, count(title) title from movie
join casting
on (movie.id=casting.movieid)
join actor
on (casting.actorid=actor.id)
where actor.name="John Travolta"
group by yr Asc
) a
The question asks
Which were the busiest years?
... plural. So, what were his top 5 years?
select top 5
m.yr
,count(*)
from actor as a
join casting as c
join movie as m
on m.movieid = c.movieid
on c.actorid = a.actorid
where a.name = 'John Travolta'
group by
m.yr
order by
count(*) desc
However, the second part of the question specifies that you should
Show the number of movies he made for each year.
So far our query doesn't account for years in which John made no movies... so, this might be where your half correct comes into play. That said, you may want to create a table variable filled with year values from 1954 through the current year... and left join off of that.
declare #year table
(
[yr] int
)
declare #currentYear int = datepart(year,getdate())
while #currentYear >= 1954 begin -- Travolta was born in 1954!
insert #year values (#currentYear)
set #currentYear -= 1
end
select
y.yr
,count(m.movieid)
from #year y
left join movies as m
join casting as c
join actor as a
on a.actorid = c.actorid
and a.name = 'John Travolta'
on c.movieid = m.movieid
on m.yr = y.yr
group by
y.yr
order by
,count(m.movieid) desc
[Edit: based on comments] And a final query to return all years whose count matches the highest of any year.
;with TravoltaMovies as
(
select
m.yr
,count(*) as [Count]
from actor as a
join casting as c
join movie as m
on m.movieid = c.movieid
on c.actorid = a.actorid
where a.name = 'John Travolta'
group by m.yr
)
select
*
from TravoltaMovies as tm
where tm.[Count] = (select max([Count]) from TravoltaMovies)
the answer is:
SELECT y.yr,MAX(y.count)
FROM(
SELECT movie.yr,COUNT(movie.yr) AS count
FROM (movie JOIN casting ON (movie.id=movieid)) JOIN actor ON (actor.id=actorid)
WHERE name='John Travolta'
GROUP BY yr
ORDER BY COUNT(movie.yr) DESC) y
This is much easy solution.
select yr, count(yr)
from movie
join casting on movie.id = movieid
join actor on actorid = actor.id
where name = 'John Travolta'
group by yr
having count(yr) > 2**
Happy to help
select TOP 1 yr, title
from
(
select yr, count(title) title from movie
join casting
on (movie.id=casting.movieid)
join actor
on (casting.actorid=actor.id)
where actor.name="John Travolta"
group by yr Asc
) a
ORDER BY title DESC
Just add a TOP selection and an ORDER BY.
The aggregation is unnecessary.
Thanks all this is the query:
SELECT yr,COUNT(title) FROM
movie JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
where name='John Travolta'
GROUP BY yr
HAVING COUNT(title)=(SELECT MAX(c) FROM
(SELECT yr,COUNT(title) AS c FROM
movie JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
where name='John Travolta'
GROUP BY yr) AS t
)
This a simple answer using both join and sub-query concept
select yr,count(title) from movie
inner join casting on
movie.id=casting.movieid
where actorid= (select id from actor where name ='John Travolta')
group by yr
having count(title)>2
Here is an easier solution with explanation-
First we make a join of all the tables
Then we put a category filter on name with name= 'John Travolta'
Now we put a group function on yr so that we have yr and corresponding count(yr)
Solution wants to have years with count>2 only so we apply that filter on grouped data by using 'having' clause
select yr, count(yr) from movie
join casting on movie.id=movieid
join actor on actor.id=actorid
where name= 'John Travolta'
group by yr
having count(yr)>2
Hope this helps, I dont see a need to write a Procedure for this.

sql : join and group by

Having 3 tables:
movie(id, title, yr, score, votes,
director) actor(id, name)
casting(movieid, actorid, ord)
Q:Which were the busiest years for
'John Travolta'. Show the number of
movies he made for each year.
A: My try is syntactically worng. why ?
select yr, count(*)
from
(actor join casting
on (actor.id = casting.actorid)
join
on (movie.id = casting.movieid)
group by yr
having actor.name='John Travolta'
You are missing the second table name after join
use where not having
Try this:
select yr, count(*)
from actor
join casting on actor.id = casting.actorid
join movie on movie.id = casting.movieid -- you were missing table name "movie"
where actor.name='John Travolta' -- "where", not "having"
group by yr
Also note the consistent formatting I used. If you use a good format, it's easier to find syntax errors
FYI, having is used for aggregate functions, eg having count(*) > 3
Remove the ( ) from around the table name and add movie to your second join.
select yr, count(*)
from actor join
casting on actor.id = casting.actorid join
movie on movie.id = casting.movieid
group by yr
having actor.name='John Travolta'
EDIT:
You need to switch your having to a where because havings are use for aggregate functions in conjunctions with your group by.
select yr, count(*)
from actor join
casting on actor.id = casting.actorid join
movie on movie.id = casting.movieid
where actor.name = 'John Travolta'
group by yr
To join u have to specify table ure joining, should be
join movie
on movie.id = casting.movieid