SQLite Subqueries and Inner Joins - sql

I was doing a practice question for SQL which asks to create a list of album titles and unit prices for the artist "Audioslave" and find out how many records are returned.
Here is the relational database picture given in the question:
Initially, I used an inner join to retrieve the list and actually got the correct answer (40 records returned). The code is shown below:
select a.Title, t.UnitPrice
from albums a
inner join tracks t on t.AlbumId = a.AlbumId
inner join artists ar on ar.ArtistId = a.ArtistId
where ar.Name = 'Audioslave';
Although I finished the question, I was curious to try to solve this problem using nested subqueries instead and tried to first retrieve the AlbumId and UnitPrice from tracks. I got the correct answer but not the correct list (the question asked for album title and not AlbumId). Here is the code:
select AlbumId, UnitPrice
from tracks
where AlbumId in (
select AlbumId
from albums
where ArtistId in (
select ArtistId
from artists
where Name = 'Audioslave'));
In order to solve the problem with the list, I tried combining the previous codes. However, I get a completely different amount of records being returned (10509).
select a.Title, t.UnitPrice
from albums a
inner join tracks t
where a.AlbumId in (
select AlbumId
from albums
where ArtistId in (
select ArtistId
from artists
where Name = 'Audioslave'));
I don't understand what I'm doing wrong with the last code...Any help would be appreciated! Also, sorry if I wrote too much, I just wanted to convey my thinking process clearly.

Some databases (SQLite, MySQL, Maria, maybe others) allow you to write an INNER JOIN without specifying ON, and they just cross every record on the left with every record on the right in that case. If there were 2 albums and 3 tracks, 6 rows would result. If the albums were A and B, and the tracks were 1, 2 and 3, the rows would be the combination of all: A1, A2, A3, B1, B2, B3
Other databases (Postgres, SQLServer, Oracle, maybe others) refuse to do it unless you specify ON. To get an "every row on the left combined with every row on the right" you have to write CROSS JOIN (or write an inner join with an ON that is always true)
It might help your mental model of what happens during a join to consider that the db takes all the rows on the left and connects them to all the rows on the right, then for each combination of rows, assesses the truth of the ON clause, and the WHERE clause, before deciding to return the row
For example, this will return 10509 rows:
SELECT * FROM albums INNER JOIN tracks ON 1=1
The on clause is always true
This will return 10509 tracks, but only if the query is run on Monday
SELECT * FROM albums INNER JOIN tracks ON strftime('%w', 'now') = 1
What goes in the ON or WHERE doesn't have to have anything to do with the data in the table.. it just has to be something that resolves to a Boolean

Related

Improve SQL query by replacing inner query

I'm trying to simplify this SQL query (I replaced real table names with metaphorical), primarily get rid of the inner query, but I'm brain frozen can't think of any way to do it.
My major concern (aside from aesthetics) is performance under heavy loads
The purpose of the query is to count all books grouping by genre found on any particular shelve where the book is kept (hence the inner query which is effectively telling which shelve to count books on).
SELECT g.name, count(s.book_id) occurances FROM genre g
LEFT JOIN shelve s ON g.shelve_id=s.id
WHERE s.id=(SELECT genre_id FROM book WHERE id=111)
GROUP BY s.genre_id, g.name
It seems like you want to know many books that are on a shelf are in the same genre as book 111: if you liked book "X", we have this many similar books in stock.
One thing I noticed is the WHERE clause in the original required a value for the shelve table, effectively converting it to an INNER JOIN. And speaking of JOINs, you can JOIN instead of the nested select.
SELECT g.name, count(s.book_id) occurances
FROM genre g
INNER JOIN shelve s ON s.id = b.shelve_id
INNER JOIN book b on b.genre_id = s.id
WHERE b.id=111
GROUP BY g.id, g.name
Thinking about it more, I might also start with book rather than genre. In the end, the only reason you need the genre table at all is to find the name, and therefore matching to it by id may be more effective.
SELECT g.name, count(s.book_id) occurances
FROM book b
INNER JOIN shelve s ON s.id = b.genre_id
INNER JOIN genre g on g.shelve_id = s.id
WHERE b.id=111
GROUP BY g.id, g.name
Not sure they meet your idea of "simpler" or not, but they are alternatives.
... unless matching shelve.id with book.genre_id is a typo in the question. It seems very odd the two tables would share the same id values, in which case these will both be wrong.

(SQL) Creating an uncorrelated query

I had to write an SQL-Query for a given Database (it's huge, I won't be able to post it here, but its about artists with albums and release dates, genres etc.).
The Task was to find all artists involved in albums which contains the word "drop". I had to write an correlated and an uncorrelated query. I got the correlated:
SELECT artist
FROM CDDB.ARTISTS ar
WHERE EXISTS
(SELECT album
FROM CDDB.ALBUMS al
INNER JOIN CDDB.ARTIST2ALBUM aa ON al.albumid = aa.albumid
WHERE ar.artistid = aa.artistid
AND album LIKE '\%drop\%');
Now I have to make that uncorrelated, but I don't know how. Is it possible that one can help me without the given tables etc.?
Uncorrelated subqueries are subqueries that can be run independently from the outer query.
Generally speaking, EXISTS is correlated, IN is uncorrelated.
If you change your query to something like:
SELECT artist
FROM CDDB.ARTISTS ar
INNER JOIN CDDB.ARTIST2ALBUM aa ON ar.artistid = aa.artistid
WHERE album in
(SELECT album
FROM CDDB.ALBUMS
WHERE album LIKE '%drop%');
It is now uncorrelated.

Doing a FULL OUTER JOIN in Sqlite3 to get the combination of two columns?

I'm currently working on a database project and one of the problems calls for the following:
The Genre table contains twenty-five entries. The MediaType table contains 5
entries. Write a single SQL query to generate a table with three columns and 125
rows. One column should contain the list of MediaType names; one column
should contain the list of Genre names; the third column should contain a count of
the number of tracks that have each combination of media type and genre. For
example, one row will be: “Rock MPEG Audio File xxx” where xxx is the
number of MPEG Rock tracks, even if the value is 0.
Recognizing this, I believe I'll need to use a FULL OUTER JOIN, which Sqlite3 doesn't support. The part that is confusing me is generating the column with the combination. Below, I've attached the two methods I've tried.
create view T as
select MediaTypeId, M.Name as MName, GenreId, G.Name as GName
from MediaType M, Genre G
SELECT DISTINCT GName, MName, COUNT(*) FROM (
SELECT *
FROM T
OUTER LEFT JOIN MediaType
ON MName = GName
UNION ALL
SELECT *
FROM Genre
OUTER LEFT JOIN T
) GROUP BY GName, MName;
However, that returned nearly 250 rows and the GROUP BY or JOIN(s) is totally wrong.
I've also tried:
SELECT Genre.Name as GenreName, MediaTypeName, COUNT(*)
FROM Genre LEFT OUTER JOIN (
SELECT MediaType.Name as MediaTypeName, Track.Name as TrackName
FROM MediaType LEFT OUTER JOIN Track) GROUP BY GenreName, MediaTypeName;
Which returned 125 rows but they all had the same count of 3503 which leads me to believe the GROUP BY is wrong.
Also, here is a schema of the database:
https://www.dropbox.com/s/onnbwqfrfc82r1t/IMG_2429.png?dl=0
You don't use full outer join to solve this problem.
Because it looks like a homework problem, I'll describe the solution.
First, you want to generate all combinations of genres and media types. Hint: This uses a cross join.
Second, you want to count all the combinations that you have. Hint: this uses an aggregation.
Third, you want to combine these together. Hint: left join.

SQLZOO #12 -- confused about multiple select & join statements

I am attempting to answer question #12 on sqlzoo.net
(http://sqlzoo.net/wiki/More_JOIN_operations). I couldn't figure out the answer on my own but I did manage to find the answer online.
12: Which were the busiest years for 'John Travolta', show the year and the number of movies he made each year for any year in which he made more than 2 movies.
Answer:
SELECT yr,COUNT(title) FROM
movie JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
WHERE name='John Travolta'
GROUP BY yr
HAVING COUNT(title)=(SELECT MAX(c) FROM
(SELECT yr,COUNT(title) AS c FROM
movie JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
WHERE name='John Travolta'
GROUP BY yr) AS t)
One of parts that I do not fully understand is the multiple joins:
FROM movie
JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
Is Actor being joined only with Movie, or is actor being joined with Movie JOIN Casting?
I am trying to find a website that explains complex join statements as my attempted answer was far from correct (missing many sections). I think subselect statements with multiple complex join statements is a bit confusing at the moment. But, I could not find a good website that breaks the information up to help me form my own queries.
The other part I don't fully understand is this:
(SELECT yr,COUNT(title) AS c FROM
movie JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
WHERE name='John Travolta'
GROUP BY yr) AS t)
3. What is the above code trying to find?
Ok, glad you are not afraid to ask, and I'll do my best to help clarify what is going on... Please excuse my re-formatting of the query to my mindset of writing queries. It better shows the relationships of where things are coming from (my perspective), and may help you too.
A few other things about my rewrite. I also like to use alias references to the tables so every column is qualified with the table (or alias) it originates from. It prevents ambiguity, especially for someone who does not know your table structures and relationships between tables. (m = alias to movie, c = alias for casting, a = alias for actor tables). For the sub query, and to keep alias confusion clear, I suffixed them with 2, such as m2, c2, a2.
SELECT
m.yr,
COUNT(m.title)
FROM
movie m
JOIN casting c
ON m.id = c.movieid
JOIN actor a
ON c.actorid = a.id
WHERE
a.name = 'John Travolta'
GROUP BY
m.yr
HAVING
COUNT(m.title) = ( SELECT MAX(t.movieCount)
FROM
( SELECT m2.yr,
COUNT(m2.title) AS movieCount
FROM
movie m2
JOIN casting c2
ON m2.id = c2.movieid
JOIN actor a2
ON c2.actorid = a2.id
WHERE
a2.name='John Travolta'
GROUP BY
m2.yr ) AS t
)
First, look at the outermost query (aliases m, c, a ) and the innermost query (aliases m2, c2, a2) are virtually identical.
The query has to run from the deepest query first... in this case the m2, c2, a2 query. Look at it and see what IT is going to deliver. If you ran that, you would get every year he had a movie and the number of movies... starting result from their sample data goes from 1976 all the way to 2010. So far, nothing complex unto itself (about 20 rows). Now, since each table may have an alias, each sub query (such as this MUST have an alias, so that is why the "as t". So, there is no true table, it is wrapping the entire query's result set and assigning THAT the alias of "t".
So now, go one level up in the query also wrapped in parens...
SELECT MAX(t.movieCount)
FROM (EntireSubquery as t)
Although abbreviated, this is what the engine is doing. Looking at the subquery result given an alias of "t" and finding the maximum "movieCount" value which is the count of movies that were done in a given year. In this case, the actual number is 3 and we are almost done.
Now, to the outermost query... again, this was virtually identical to the innermost query. The only difference is the HAVING clause. This is applied after all the grouping per year is performed. Then it is comparing ITs row result set count per year to the 3 value result of the SELECT MAX( t.movieCount )...
So, all the years that had only 1 or 2 movies are excluded from the result, and only the one year that had 3 movies are included.
Now, to clarify the JOINs. Each table should have a relationship with one or more tables (also known as linking tables, such as the cast table that has both a movie and actors/actresses. So, think of the join as how to I put the tables in order so that each one can touch a piece to the other until I have them all chained together. In this case
Movie -> Casting linked by the movie ID, then Casting -> actor by the actor ID, so that is how I do it visually hierarchically... I am starting FROM the Movie table, JOINing to the cast table based ON Movie ID = Cast Movie ID. Now, from the Casting table joined to the Actor table based on the common Actor ID field
FROM
movie m
JOIN casting c
ON m.id = c.movieid
JOIN actor a
ON c.actorid = a.id
Now, this is a simple relationship, but you COULD have one primary table with multiple child-level tables. You could join multiple tables based on the respective data. Very simple sample to clarify the point. You have a student table going to a school. A student has a degree major, an ethnicity, an address state (assuming an online school and students can be from any state). If you had lookup tables for degrees, ethnicity and states, you might come up with something like...
select
s.firstname,
s.lastname,
d.DegreeDescription,
e.ethnicityDescription,
st.stateName
from
students s
join degrees d
on s.degreemajor = d.degreeID
join ethnicity e
on s.ethnicityID = e.id
join states st
on s.homeState = st.stateID
Notice the hierarchical representation that each table is directly associated under that of the student. Not all tables need to be one deeper than the last.
So, there are many sites out there, such as the w3schools as offered by Mark, but learn to dissect small pieces at a time... what are the bare minimum tables to get from point-A to point-Z and draw the relationships. THEN, tare down based on requirement criteria you are looking for.
The correct answer would be:
SELECT yr, COUNT(title)
FROM movie m
JOIN casting c ON m.id=c.movieid JOIN actor a ON c.actorid=a.id
WHERE name='John Travolta'
GROUP BY yr
HAVING COUNT(title) > 2;
The answer you found (which seems to be a mistake on the sqlzoo site) is looking for any year that has a count equal to the year with the highest count.
I used table aliases in the query above to clear up how the tables are joined. Movie is joined to casting and casting is joined to actor.
The subquery that confuses you is listing each year and a count of movies for that year that star John Travolta. It's not needed if you're answering the question as written.
As for learning resources, make sure you have the basics down. Understand everything at http://w3schools.com/sql. Try searching for "sql joining multiple tables" in your favorite search engine when you're ready for more.

What are some alternatives to a NOT IN query?

Let's say we have a database that records all the Movies a User has not rated yet. Each rating is recorded in a MovieRating table.
When we are looking for movies user #1234 hasn't seen yet:
SELECT *
FROM Movies
WHERE id NOT IN
(SELECT DISTINCT movie_id FROM MovieRating WHERE user_id = 1234);
Querying NOT IN can be very expensive as the size of MovieRating grows. Assume MovieRatings can have 100,000+ rows.
My question is what are some more efficient alternatives to the NOT IN query? I've heard of the LEFT OUTER JOIN and NOT EXIST queries, but are there anything else? Is there any way I can design this database differently?
A correlated sub-query using WHERE NOT EXISTS() is potential your most efficient if you have to do this, but you should test performance against your data.
You may also want to consider limiting your results both in terms of the select list (don't use *) and only getting TOP n rows. That is, you may not need 100k+ movies if the user hasn't seen them. You may want to page the results.
SELECT *
FROM Movies m
WHERE NOT EXISTS (SELECT 1
FROM MovieRating r
WHERE user_id = 1234
AND r.movie_id= m.movie_id)
This is a mock query, because I don't have a db to test this, but something along the lines of the following should work.
select m.* from Movies m
left join MovieRating mr on mr.user_id = 1234
where mr.id is null
That should join the movies table to the movie rating table based on a user id. The where clause is then going to find null entries, which would be movies a user hasn't rated.
You can try this :
SELECT M.*
FROM Movies as M
LEFT OUTER JOIN
MovieRating as MR on M.id = MR.movie_id
and MR.user_id = 1234
WHERE M.id IS NULL