differences between two sql statements - sql

I'm new to SQL and is doing some practice on sqlzoo (https://sqlzoo.net/wiki/More_JOIN_operations). Per one of its questions, my SQL statement is judged as wrong, but I think it is equivalent to the reference statement.
There are three tables. Details can be found on sqlzoo
This database features two entities (movies and actors) in a many-to-many relation. Each entity has its own table. A third table, casting , is used to link them. The relationship is many-to-many because each film features many actors and each actor has appeared in many films.
My statement
SELECT title, name
FROM casting JOIN movie ON movie.id=movieid
JOIN actor ON actor.id=actorid
WHERE movieid in
(SELECT id FROM movie WHERE yr=1962)
AND ord=1
The reference statement that produces right result
SELECT title, name
FROM movie JOIN casting ON (id=movieid)
JOIN actor ON (actor.id = actorid)
WHERE ord=1 AND yr = 1962
I cannot tell the difference between above two statements.

They should both work and produce the same results. But the reference statement is much more efficient.

Related

SQL Server - Question about Inheritance in Database Schemas

I'm having problems understanding the class table inheritance structure that you can implement using database tables. Info on class table inheritance. I have a use case where I have quite different types of persons that I need to model, but they have very minor differences. For example, all of these persons, like Student, Professor and so on, have a surname and a lastname. My first thought was to move theses attributes into a different table inside a base table like you would do in Object Oriented Programming. Here to illustrate further:
Right now, a Professor can only have one person, for example, otherwise it wouldn't make sense in my use case. Also, I have a school table that has two foreign keys, one for the Professor and one for the Student. Lets assume that a school can also have only one professor and one student. This is not the real use case that I have. This example just represents the relation in my real use case which would be too much to explain here.
What I don't understand is how you would collect data based on that. I'm trying to make a SQL Server View where I want to load the Person of the Professor and the Person of the Student from the view point of the School Table. For example:
SELECT
School.professor_id
surname,
lastname
FROM dbo.School AS school
INNER JOIN dbo.Professor as prof
ON school.professor_id = prof.ID
INNER JOIN dbo.Person as prof_person
ON prof.person_id = prof_person.ID
I can output the surname and lastname of the professor, but now I am stuck since I can't figure out how to get the person of the student.
A subtype table typically shares a key with the supertype table, instead of having its own PK and a FK. EG Student.ID is both the PK and the FK.
Then just join Student>Person in addition to Professor>Person, eg
SELECT
School.Id,
prof_person.surname prof_surname,
student_person.surname student_surname
FROM dbo.School AS school
INNER JOIN dbo.Professor as prof
ON school.professor_id = prof.ID
INNER JOIN dbo.Person as prof_person
ON prof.ID = prof_person.ID
INNER JOIN dbo.Student as student
ON school.student_id = student.ID
INNER JOIN dbo.Person as student_person
ON student.ID = student_person.ID
INNER JOIN is associative, so no need for special ordering or parentheses.

SQL Query Involving Finding Most Frequent Tuple Value in Column

I have the following relations:
teaches(ID,course_id,sec_id,semester,year)
instructor(ID,name,dept_name,salary)
I am trying to express the following as an SQL query:
Find the ID and name of the instructor who has taught the most courses(i.e has the most tuples in teaches).
My Query
select ID, name
from teaches
natural join instructor
group by ID
order by count(*) desc
I know this isn't correct, but I feel like I'm on the right track. In order to answer the question, you need to work with both relations, hence the natural join operation is required. Since the question asks for the instructor that has taught the most courses, that tells me that we are trying to count the number of times each instructor ID appears in the teaches relation. From what I understand, we are looking to count distinct instructor IDs, hence the group by command is needed.
Don't use natural joins: all they do is rely on column names to decide which columns relate across tables (they don't check for foreign keys constraints or the-like, as you would thought). This is unreliable by nature.
You can use a regular inner join:
select i.id, i.name
from teaches t
inner join instructor i on i.id = t.sec_id
group by i.id, i.name
order by count(*) desc
limit 1
Notes:
this assumes that column teaches.sec_id relates to instructor.id (I cannot see which other column could be used)
I added a limit clause to the query since you stated that you want the top instructor - the syntax may vary across databases
always prefix the column names with the table they belong to, to make the query unambiguous and easier to understand
it is a good practice (and a requirement in many databases) that in an aggregate query all non-aggregared columns listed in the select clause should appear in the group by clause; I added the instructur name to your group by clause

Select name for authors that havent written a book

We have to select authors that havent written a book but there are 3 different tables which makes me confused about how to write the join expression.
We have tables:
authors: author_id
authorships: author_id, book_id
books: book_id.
Obviously I selected the names from authors and tried inner join but it wont work for me. Help would be appreciated!
Since this sounds like a school assignment I won't give the full answer.
Try using an outer join between authors and authorship. Make sure you retrieve the book I'd from the authorship.
Try to work out what an author who has not published looks like the. You can use this to formulate the query for the answer you are looking for with an appropriate where clause.
This is a good spot to use the LEFT JOIN antipattern:
SELECT a.*
FROM authors a
LEFT JOIN authorships s ON s.author_id = a.author_id
WHERE s.author_id IS NULL
Rationale: when the LEFT JOIN comes up empty, it means that the author has no corresponding record in the authorships table. The WHERE clause filters out on unmatched authors records only (ie authors that have no books). This is called an antipattern because the purpose of a JOIN is usually to match records, whereas here we use it to detect unmatched records.
Its really easy, just check which column seems to be having common value between all this three tables if something is common atleast within two tables then put inner join on those two and an outer join on the uncommon data table.
Remember your Aliases will always matter when you join between different tables, also the ON and WHERE should be properly mentioned.

SQLZOO #12 -- confused about multiple select & join statements

I am attempting to answer question #12 on sqlzoo.net
(http://sqlzoo.net/wiki/More_JOIN_operations). I couldn't figure out the answer on my own but I did manage to find the answer online.
12: Which were the busiest years for 'John Travolta', show the year and the number of movies he made each year for any year in which he made more than 2 movies.
Answer:
SELECT yr,COUNT(title) FROM
movie JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
WHERE name='John Travolta'
GROUP BY yr
HAVING COUNT(title)=(SELECT MAX(c) FROM
(SELECT yr,COUNT(title) AS c FROM
movie JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
WHERE name='John Travolta'
GROUP BY yr) AS t)
One of parts that I do not fully understand is the multiple joins:
FROM movie
JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
Is Actor being joined only with Movie, or is actor being joined with Movie JOIN Casting?
I am trying to find a website that explains complex join statements as my attempted answer was far from correct (missing many sections). I think subselect statements with multiple complex join statements is a bit confusing at the moment. But, I could not find a good website that breaks the information up to help me form my own queries.
The other part I don't fully understand is this:
(SELECT yr,COUNT(title) AS c FROM
movie JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
WHERE name='John Travolta'
GROUP BY yr) AS t)
3. What is the above code trying to find?
Ok, glad you are not afraid to ask, and I'll do my best to help clarify what is going on... Please excuse my re-formatting of the query to my mindset of writing queries. It better shows the relationships of where things are coming from (my perspective), and may help you too.
A few other things about my rewrite. I also like to use alias references to the tables so every column is qualified with the table (or alias) it originates from. It prevents ambiguity, especially for someone who does not know your table structures and relationships between tables. (m = alias to movie, c = alias for casting, a = alias for actor tables). For the sub query, and to keep alias confusion clear, I suffixed them with 2, such as m2, c2, a2.
SELECT
m.yr,
COUNT(m.title)
FROM
movie m
JOIN casting c
ON m.id = c.movieid
JOIN actor a
ON c.actorid = a.id
WHERE
a.name = 'John Travolta'
GROUP BY
m.yr
HAVING
COUNT(m.title) = ( SELECT MAX(t.movieCount)
FROM
( SELECT m2.yr,
COUNT(m2.title) AS movieCount
FROM
movie m2
JOIN casting c2
ON m2.id = c2.movieid
JOIN actor a2
ON c2.actorid = a2.id
WHERE
a2.name='John Travolta'
GROUP BY
m2.yr ) AS t
)
First, look at the outermost query (aliases m, c, a ) and the innermost query (aliases m2, c2, a2) are virtually identical.
The query has to run from the deepest query first... in this case the m2, c2, a2 query. Look at it and see what IT is going to deliver. If you ran that, you would get every year he had a movie and the number of movies... starting result from their sample data goes from 1976 all the way to 2010. So far, nothing complex unto itself (about 20 rows). Now, since each table may have an alias, each sub query (such as this MUST have an alias, so that is why the "as t". So, there is no true table, it is wrapping the entire query's result set and assigning THAT the alias of "t".
So now, go one level up in the query also wrapped in parens...
SELECT MAX(t.movieCount)
FROM (EntireSubquery as t)
Although abbreviated, this is what the engine is doing. Looking at the subquery result given an alias of "t" and finding the maximum "movieCount" value which is the count of movies that were done in a given year. In this case, the actual number is 3 and we are almost done.
Now, to the outermost query... again, this was virtually identical to the innermost query. The only difference is the HAVING clause. This is applied after all the grouping per year is performed. Then it is comparing ITs row result set count per year to the 3 value result of the SELECT MAX( t.movieCount )...
So, all the years that had only 1 or 2 movies are excluded from the result, and only the one year that had 3 movies are included.
Now, to clarify the JOINs. Each table should have a relationship with one or more tables (also known as linking tables, such as the cast table that has both a movie and actors/actresses. So, think of the join as how to I put the tables in order so that each one can touch a piece to the other until I have them all chained together. In this case
Movie -> Casting linked by the movie ID, then Casting -> actor by the actor ID, so that is how I do it visually hierarchically... I am starting FROM the Movie table, JOINing to the cast table based ON Movie ID = Cast Movie ID. Now, from the Casting table joined to the Actor table based on the common Actor ID field
FROM
movie m
JOIN casting c
ON m.id = c.movieid
JOIN actor a
ON c.actorid = a.id
Now, this is a simple relationship, but you COULD have one primary table with multiple child-level tables. You could join multiple tables based on the respective data. Very simple sample to clarify the point. You have a student table going to a school. A student has a degree major, an ethnicity, an address state (assuming an online school and students can be from any state). If you had lookup tables for degrees, ethnicity and states, you might come up with something like...
select
s.firstname,
s.lastname,
d.DegreeDescription,
e.ethnicityDescription,
st.stateName
from
students s
join degrees d
on s.degreemajor = d.degreeID
join ethnicity e
on s.ethnicityID = e.id
join states st
on s.homeState = st.stateID
Notice the hierarchical representation that each table is directly associated under that of the student. Not all tables need to be one deeper than the last.
So, there are many sites out there, such as the w3schools as offered by Mark, but learn to dissect small pieces at a time... what are the bare minimum tables to get from point-A to point-Z and draw the relationships. THEN, tare down based on requirement criteria you are looking for.
The correct answer would be:
SELECT yr, COUNT(title)
FROM movie m
JOIN casting c ON m.id=c.movieid JOIN actor a ON c.actorid=a.id
WHERE name='John Travolta'
GROUP BY yr
HAVING COUNT(title) > 2;
The answer you found (which seems to be a mistake on the sqlzoo site) is looking for any year that has a count equal to the year with the highest count.
I used table aliases in the query above to clear up how the tables are joined. Movie is joined to casting and casting is joined to actor.
The subquery that confuses you is listing each year and a count of movies for that year that star John Travolta. It's not needed if you're answering the question as written.
As for learning resources, make sure you have the basics down. Understand everything at http://w3schools.com/sql. Try searching for "sql joining multiple tables" in your favorite search engine when you're ready for more.

Ms Access | Issues with INNER JOIN query

So, I'm doing a movie database and I got one table for actors and one table for movie titles.
In the Movies table I got Three columns for Actors. Actor_1, Actor_2, Actor_3. In these fields, I only write numbers which corresponds to a row in the Actors table.
Each actor have these columns:
Actor_ID, Firstname, Surname
Now if I do this query:
SELECT movie.titel, firstname + ' ' + surname AS name
FROM Movie
INNER JOIN Actors ON movie.actor_1=actor.actor_id
Then I get almost what I want. I get the movie titles but only one actor per movie. I don't know how I am supposed to do for the movies where I got two or three actors.
I had to translate to code so it would be more understandable. The code in its original shape works so don't mind if it's not 100% here. Just would appreciate some pointers on how I could do.
You need to change your table design to include a junction table. So you will have
Movies
ID
Etc
Actors
ID
Etc
MoviesActors
MovieID
ActorID
Etc
Your query might be:
SELECT m.MovieTitle, a.ActorName
FROM Actors a
INNER JOIN (Movies
INNER JOIN MoviesActors ma
ON m.ID = ma.MovieID)
ON a.ID = ma.ActorID
Relational database design
Junction tables