What are some alternatives to a NOT IN query? - sql

Let's say we have a database that records all the Movies a User has not rated yet. Each rating is recorded in a MovieRating table.
When we are looking for movies user #1234 hasn't seen yet:
SELECT *
FROM Movies
WHERE id NOT IN
(SELECT DISTINCT movie_id FROM MovieRating WHERE user_id = 1234);
Querying NOT IN can be very expensive as the size of MovieRating grows. Assume MovieRatings can have 100,000+ rows.
My question is what are some more efficient alternatives to the NOT IN query? I've heard of the LEFT OUTER JOIN and NOT EXIST queries, but are there anything else? Is there any way I can design this database differently?

A correlated sub-query using WHERE NOT EXISTS() is potential your most efficient if you have to do this, but you should test performance against your data.
You may also want to consider limiting your results both in terms of the select list (don't use *) and only getting TOP n rows. That is, you may not need 100k+ movies if the user hasn't seen them. You may want to page the results.
SELECT *
FROM Movies m
WHERE NOT EXISTS (SELECT 1
FROM MovieRating r
WHERE user_id = 1234
AND r.movie_id= m.movie_id)

This is a mock query, because I don't have a db to test this, but something along the lines of the following should work.
select m.* from Movies m
left join MovieRating mr on mr.user_id = 1234
where mr.id is null
That should join the movies table to the movie rating table based on a user id. The where clause is then going to find null entries, which would be movies a user hasn't rated.

You can try this :
SELECT M.*
FROM Movies as M
LEFT OUTER JOIN
MovieRating as MR on M.id = MR.movie_id
and MR.user_id = 1234
WHERE M.id IS NULL

Related

SQLite Subqueries and Inner Joins

I was doing a practice question for SQL which asks to create a list of album titles and unit prices for the artist "Audioslave" and find out how many records are returned.
Here is the relational database picture given in the question:
Initially, I used an inner join to retrieve the list and actually got the correct answer (40 records returned). The code is shown below:
select a.Title, t.UnitPrice
from albums a
inner join tracks t on t.AlbumId = a.AlbumId
inner join artists ar on ar.ArtistId = a.ArtistId
where ar.Name = 'Audioslave';
Although I finished the question, I was curious to try to solve this problem using nested subqueries instead and tried to first retrieve the AlbumId and UnitPrice from tracks. I got the correct answer but not the correct list (the question asked for album title and not AlbumId). Here is the code:
select AlbumId, UnitPrice
from tracks
where AlbumId in (
select AlbumId
from albums
where ArtistId in (
select ArtistId
from artists
where Name = 'Audioslave'));
In order to solve the problem with the list, I tried combining the previous codes. However, I get a completely different amount of records being returned (10509).
select a.Title, t.UnitPrice
from albums a
inner join tracks t
where a.AlbumId in (
select AlbumId
from albums
where ArtistId in (
select ArtistId
from artists
where Name = 'Audioslave'));
I don't understand what I'm doing wrong with the last code...Any help would be appreciated! Also, sorry if I wrote too much, I just wanted to convey my thinking process clearly.
Some databases (SQLite, MySQL, Maria, maybe others) allow you to write an INNER JOIN without specifying ON, and they just cross every record on the left with every record on the right in that case. If there were 2 albums and 3 tracks, 6 rows would result. If the albums were A and B, and the tracks were 1, 2 and 3, the rows would be the combination of all: A1, A2, A3, B1, B2, B3
Other databases (Postgres, SQLServer, Oracle, maybe others) refuse to do it unless you specify ON. To get an "every row on the left combined with every row on the right" you have to write CROSS JOIN (or write an inner join with an ON that is always true)
It might help your mental model of what happens during a join to consider that the db takes all the rows on the left and connects them to all the rows on the right, then for each combination of rows, assesses the truth of the ON clause, and the WHERE clause, before deciding to return the row
For example, this will return 10509 rows:
SELECT * FROM albums INNER JOIN tracks ON 1=1
The on clause is always true
This will return 10509 tracks, but only if the query is run on Monday
SELECT * FROM albums INNER JOIN tracks ON strftime('%w', 'now') = 1
What goes in the ON or WHERE doesn't have to have anything to do with the data in the table.. it just has to be something that resolves to a Boolean

SQL query for finding the movies that users haven't watched

Let, these are the two tables
I've used except keyword to get the desired output
Now, my case is that there are two tables having:
All the user-related data is available (user_id, email, contact...) User_id is of importance for us.
User_id and the movie name that a particular user watches ( multiple records can be there for each user ) Basically this table is created when any user watches a movie that is available.
I don't have the list of available movies, so let us assume that all the movies have been covered by some or the other user in table 2. By using a distinct keyword will give all the movies available.
I need to get a query that gives the output like the user id and the movies that the particular user hasn't watched. Is there a way to get the output without using "PLSQL", "except", "anti join", or "exists" keyword on SQL
SELECT DISTINCT
"tabl1"."type",
"tabl2"."user_id"
FROM
"tabl2"
RIGHT JOIN
"tabl1" ON "tabl1"."userid" = "tabl2"."user_id"
WHERE
"tabl1"."type" NOT IN (SELECT DISTINCT "type"
FROM "tabl1"
LEFT JOIN "tabl2" ON "tabl1"."userid" = "tabl2"."user_id"
WHERE "tabl2"."user_id" IN (SELECT DISTINCT "user_id"
FROM "tabl2"))
I've tried using the join operation but it doesn't give any result and end up having NULL only.
I'm stuck on how to get the required output.
Is there a way to get a similar output like this without using the functions described above.
This looks like the opposite of a many-to-many relationship because one user maybe not watch many movies and one movie not watch by many users.
why you do retrieve it as movies not watch by the particular user.
select movie_name from Movie_table where movie_name not in( select movie_name from userMovieTable where user_id =: user_id)
You want to join user and movie on the condition that the pair is not in the watched table:
with movies as (select distinct movie from watched)
select *
from users u
join movies m on (u.userid, m.movie) not in (select userid, movie from watched)
order by u.userid, m.movie;

SQL SELECT * FROM 2 tables

I am building a small database app for friends where table 1 is contacts and table 2 is users. I can find email on both (One as the loggued in user and the other as the owner of the contact)
SELECT *
FROM contacts
WHERE contacts.username = users.email
I try to show all contacts fields where username is equal to already loggued in users (email)
Thanks you very much!
It sounds like you're trying to JOIN two tables together. Ideally, you don't want to use the email as the primary key on a table (the smaller the data, the faster your JOIN will be); a better option would be to add an auto-incrementing Id (integer) to both the Contacts and Users tables, set as the primary key (unique identifier). Joining on integers is much faster, as integers are 4 bytes per row, vs string which (in MySQL) is 1 per character length (latin1 encoding) + 1 byte.
Anyway, back to the original question. I believe the query you're looking for (MySQL syntax) is:
SELECT c.Id, c.Col1, u.Col2, ...
FROM contacts AS c
INNER JOIN users AS u ON u.email = c.username
Additionally, I would avoid the use of *, as it slows down the query a bit. Instead, try to specify the exact columns you need.
Try the following. Also, I would suggest you learn about joins in SQL.
SELECT *
FROM contacts
INNER JOIN
users on contacts.username = users.email
Use Inner Join:
SELECT *
FROM contacts as c
INNER JOIN
users as u on u.email = c.username

AND conditions in many-to-many relation

Say I have three tables, a table of users, a table of around 500 different items, and the corresponding join table. What I would like to do is:
select * from users u join items_users iu on iu.user_id = u.id
where iu.item_id in (1,2,3,4,5)
and u.city_id = 1 limit 10;
Except, instead of an IN condition, I would like to find users that have all the corresponding items. If it helps, assume that the max number of items that will be searched for at a time will be 5. Also, I am using Postgres, and don't mind denormalizing it if would help as it's a read only system and speed is highest priority.
It's another case of relational division. We have assembled quite an arsenal of queries to deal with this class of problems here.
In this case, with 5 or more items, I might try:
SELECT u.*
FROM users AS u
WHERE u.city_id = 1
AND EXISTS (
SELECT *
FROM items_users AS a
JOIN items_users AS b USING (user_id)
JOIN items_users AS c USING (user_id)
...
WHERE a.user_id = u.user_id
AND a.item_id = 1
AND b.item_id = 2
AND c.item_id = 3
...
)
LIMIT 10;
It was among the fastest in my tests and it fits the requirement of multiple criteria on items_users while only returning columns from user.
Read about indexes at the linked answer. these are crucial for performance.
As your tables are read-only I would also CLUSTER both tables, to minimize the number of pages that have to be visited. If nothing else, CLUSTER items_users using a multi-column index on (user_id, item_id).

SQL need most efficient way to select items list with sublists?

Lets look at some very simple example, have 3 tables:
dbo.Person(PersonId, Name, Surname)
dbo.Pet(PetId, Name, Breed)
dbo.PersonPet(PersonPetId, PersonId, PetId)
Need to select all persons with theirs pets if person has any.... for ex. in final application it should look smth like:
whats the most efficient way:
Select all persons and then in data access layer fill each person pets list with separate select?
Use join in sql level and then in data access layer filter all persons duplicates, by adding only one to result list and from other just filling pet list?
any other ideas?
The most efficient way is to select them all at once:
select p.*, pt.*
from Person p
left outer join PersonPet pp on p.PersonId = pp.PersonId
left outer join Pet pt on pp.PetId = pt.PetId
Need to select all persons with theirs pets if person has any...
Use:
SELECT per.name,
per.surname,
pt.name
FROM dbo.PERSON per
LEFT JOIN dbo.PERSONPET perpet ON perpet.personid = per.personid
JOIN dbo.PETS pt ON pt.petid = perpet.petid
Personally I would do it as a stored proc on the sql server. Whichever way you do it though, for display purposes you're going to have to filter out the duplicate Name and Surname.
The majority of the time taken to retrieve records is spent setting up and tearing down a query to the database. It doesn't make much difference how much data or how many tables you use in the query. It will be much more efficient to use a single query to get all the data. If your data access layer fetches each separately you'll get poor speed. Definitely use a join.
If your client and back end support multiple result sets you could also do somthing like (assuming its MSSQL)
Create Proc usp_GetPeopleAndPets
AS
BEGIN
SELECT PersonId, Name, Surname
FROM
dbo.Person p;
SELECT
pp.PersonID,
p.PetId, p.Name, p.Breed
FROM
dbo.PersonPet pp
INNER JOIN dbo.Pet p
ON pp.PetId = p.PetId
Order BY pp.PersonId
END
The data retrieval time would be roughly equivalent since its one trip to the DB. Some clients support relationships between results which allow you to do something like
Person.GetPets() or Pet.GetPerson()
The main advantage to this approach is that if you're working with people you don't have to worry about the "filter[ing] all person duplicate[s]"