Correct way to use "NOT IN" Postgres - sql

I have two tables, People, and Vehicles. Vehicles belongs to people. Im trying to check if a person does not have a vehicle. I was attempting to do this by joining People and Vehicles, and displaying the persons ID that is NOT IN Vehicles.person_id.
This is returning nothing, and has me wondering if there is something I did wrong, or if there is a more efficient way of doing this.
Query is below
Select People.id
From People
INNER JOIN Vehicles
on People.id=Vehicles.person_id
where People.id NOT IN Vehicles.person_id;

Use left join to figure out the persons with no vehicles
Select distinct People.id
From People
LEFT JOIN Vehicles on People.id=Vehicles.person_id
where Vehicles.person_id is NULL

NOT IN can have issues with NULL values, and should probably be avoided for performance reasons if the subquery is very large.
Try NOT EXISTS:
SELECT p.id
FROM People p
WHERE NOT EXISTS (
SELECT 1
FROM Vehicles v
WHERE v.person_id = p.id)

another solution, using sets:
Select id From People
except
SELECT person_id FROM Vehicles

Use Subquery as below:
Select id
From People
WHERE id NOT IN (SELECT distinct person_id
FROM Vehicles
WHERE person_id IS NOT NULL)
select all people who are not in (by Select id From People WHERE id NOT IN) the List of all the people who has vehicle by SELECT distinct person_id FROM Vehicles (you could avoid null as well here if you want).

Related

SQL: Find all rows in a table when the rows are a foreign key in another table

The caveat here is I must complete this with only the following tools:
The basic SQL construct: SELECT FROM .. AS WHERE... Distinct is ok.
Set operators: UNION, INTERSECT, EXCEPT
Create temporary relations: CREATE VIEW... AS ...
Arithmetic operators like <, >, <=, == etc.
Subquery can be used only in the context of NOT IN or a subtraction operation. I.e. (select ... from... where not in (select...)
I can NOT use any join, limit, max, min, count, sum, having, group by, not exists, any exists, count, aggregate functions or anything else not listed in 1-5 above.
Schema:
People (id, name, age, address)
Courses (cid, name, department)
Grades (pid, cid, grade)
I satisfied the query but I used not exists (which I can't use). The sql below shows only people who took every class in the Courses table:
select People.name from People
where not exists
(select Courses.cid from Courses
where not exists
(select grades.cid from grades
where grades.cid = courses.cid and grades.pid = people.id))
Is there way to solve this by using not in or some other method that I am allowed to use? I've struggled with this for hours. If anyone can help with this goofy obstacle, I'll gladly upvote your answer and select your answer.
As Nick.McDermaid said you can use except to identify students that are missing classes and not in to exclude them.
1 Get the complete list with a cartesian product of people x courses. This is what grades would look like if every student has taken every course.
create view complete_view as
select people.id as pid, courses.id as cid
from people, courses
2 Use except to identify students that are missing at least one class
create view missing_view as select distinct pid from (
select pid, cid from complete_view
except
select pid, cid from grades
) t
3 Use not in to select students that aren't missing any classes
select * from people where id not in (select pid from missing_view)
As Nick suggests, you can use EXCEPT in this case. Here is the sample:
select People.name from People
EXCEPT
select People.name from People AS p
join Grades AS g on g.pid = p.id
join Courses as c on c.cid = g.cid
you can turn the first not exists into not in using a constant value.
select *
from People a
where 1 not in (
select 1
from courses b
...

Multiple joins, average on one table, count on another

I have four tables in a database: City, User, CityRating, CityGreeting. The CityRating table has the UserID and CityID as the PK, and those are FKs to the USer and City table. The CityGreeting table has no PK, but has the UserID and CityID as FKs (the idea is that a user can greet a city as many times as desired, but only rate a city once).
I am trying to write a query that will return the average rating of the city overall, as well as the times a specific user greeted the city:
select City.CityID, City.CityName, City.CityStateOrProvince,
ROUND(AVG(Cast(RateCity.Rating as float)), 2) as AverageRating,
(select COUNT(HelloCity.CityID) from HelloCity where HelloCity.UserID like '<guid>') as TimesVisited
from City
right join RateCity
on City.CityID = RateCity.CityID
right join HelloCity
on City.CityID = HelloCity.CityID
group by City.CityID, City.CityName,
City.CityStateOrProvince, City.CityCountry, City.CityImageUri
Even if I can get this to work as expected (which it currently is not) I feel like it is really messy. In terms of best practices, would it be better to write two queries? This operation would be performed in an api, not sure if the performance would be better on writing two seperate queries instead, or one complex one like this. Any insight on this or how to get the query to work as expected?
***EDIT: Added picture to clarify: Average Rating is the average of all users who rated, and TimesVisited is the amount of times one specific user has visited the city.
I believe you need to aggregate the tables, apart from city separately for this to work correctly:
select c.*, rc.AverageRating, coalesce(hc.TimesVisited, 0) as TimesVisited
from City c join
(select CityId, ROUND(AVG(Cast(RateCity.Rating as float)), 2) as AverageRating
from RateCity rc
group by CityId
) rc
on c.CityID = rc.CityID left join
(select CityId, count(*) as TimesVisited
from HelloCity hc
where hc.UserID like '<guid>'
group by CityId
) hc
on c.CityId = hc.CityId;
Notes:
Table aliases make the query easier to write and to read.
I doubt you really mean right join. That would imply that there are CityIds in the other two tables that are not in City.
By doing the aggregation for each other table, you don't need an aggregation in the outer query.
I do think you want a left join for the HelloCity table, because not all cities might have visitors.
You might want a left join for the RateCity table as well, if not all cities have ratings.
why don't you use a CTE and then do the individual parts in each CTE, it helps to break it down instead of trying to mash together bunch of joins: for example:
DECLARE #userId VARCHAR(10) = 'userid1' ;
WITH
CITY_RATING_CTE (cityId, AverageRating) AS
( SELECT cityId,
AVG(Rating) AS rating
FROM RateCity
GROUP BY cityId),
TIMES_VISITED_CTE AS
( SELECT cityId,
count(*) AS TimesVisited
FROM HelloCity
WHERE UserId = #userId
GROUP BY cityId)
SELECT c.CityId,
c.CityName,
c.CityStateOrProvince,
c.CityImageUri,
cr.AverageRating,
tv.TimesVisited
FROM City c
JOIN CITY_RATING_CTE cr ON cr.cityId = c.CityId
JOIN TIMES_VISITED_CTE tv ON cr.cityId = cr.cityId;

How to include "zero" / "0" results in COUNT aggregate?

I've just got myself a little bit stuck with some SQL. I don't think I can phrase the question brilliantly - so let me show you.
I have two tables, one called person, one called appointment. I'm trying to return the number of appointments a person has (including if they have zero). Appointment contains the person_id and there is a person_id per appointment. So COUNT(person_id) is a sensible approach.
The query:
SELECT person_id, COUNT(person_id) AS "number_of_appointments"
FROM appointment
GROUP BY person_id;
Will return correctly, the number of appointments a person_id has. However, a person who has 0 appointments isn't returned (obviously as they are not in that table).
Tweaking the statement to take person_id from the person table gives me something like:
SELECT person.person_id, COUNT(appointment.person_id) AS "number_of_appointments"
FROM appointment
JOIN person ON person.person_id = appointment.person_id
GROUP BY person.person_id;
This however, will still only return a person_id who has an appointment and not what I want which is a return with persons who have 0 appointments!
Any suggestions please?
You want an outer join for this (and you need to use person as the "driving" table)
SELECT person.person_id, COUNT(appointment.person_id) AS "number_of_appointments"
FROM person
LEFT JOIN appointment ON person.person_id = appointment.person_id
GROUP BY person.person_id;
The reason why this is working, is that the outer (left) join will return NULL for those persons that do not have an appointment. The aggregate function count() will not count NULL values and thus you'll not get a zero.
If you want to learn more about outer joins, here is a nice tutorial: http://sqlzoo.net/wiki/Using_Null
You must use LEFT JOIN instead of INNER JOIN
SELECT person.person_id, COUNT(appointment.person_id) AS "number_of_appointments"
FROM person
LEFT JOIN appointment ON person.person_id = appointment.person_id
GROUP BY person.person_id;
if you do the outer join (with the count), and then use this result as a sub-table, you can get 0 as expected (thanks to the nvl function)
Ex:
select P.person_id, nvl(A.nb_apptmts, 0) from
(SELECT person.person_id
FROM person) P
LEFT JOIN
(select person_id, count(*) as nb_apptmts
from appointment
group by person_id) A
ON P.person_id = A.person_id
USE join to get 0 count in the result using GROUP BY.
simply 'join' does Inner join in MS SQL so , Go for left or right join.
If the table which contains the primary key is mentioned first in the QUERY then use LEFT join else RIGHT join.
EG:
select WARDNO,count(WARDCODE) from MAIPADH
right join MSWARDH on MSWARDH.WARDNO= MAIPADH.WARDCODE
group by WARDNO
.
select WARDNO,count(WARDCODE) from MSWARDH
left join MAIPADH on MSWARDH.WARDNO= MAIPADH.WARDCODE group by WARDNO
Take group by from the table which has Primary key and count from the another table which has actual entries/details.
To change even less on your original query, you can turn your join into a RIGHT join
SELECT person.person_id, COUNT(appointment.person_id) AS "number_of_appointments"
FROM appointment
RIGHT JOIN person ON person.person_id = appointment.person_id
GROUP BY person.person_id;
This just builds on the selected answer, but as the outer join is in the RIGHT direction, only one word needs to be added and less changes. - Just remember that it's there and can sometimes make queries more readable and require less rebuilding.
The problem with a LEFT JOIN is that if there are no appointments, it will still return one row with a null, which when aggregated by COUNT will become 1, and it will appear that the person has one appointment when actually they have none. I think this will give the correct results:
SELECT person.person_id,
(SELECT COUNT(*) FROM appointment WHERE person.person_id = appointment.person_id) AS 'Appointments'
FROM person;

SQL to gather data from one table while counting records in another

I have a users table and a songs table, I want to select all the users in the users table while counting how many songs they have in the songs table. I have this SQL but it doesn't work, can someone spot what i'm doing wrong?
SELECT jos_mfs_users.*, COUNT(jos_mfs_songs.id) as song_count
FROM jos_mfs_users
INNER JOIN jos_mfs_songs
ON jos_mfs_songs.artist=jos_mfs_users.id
Help is much appreciated. Thanks!
The inner join won't work, because it joins every matching row in the songs table with the users table.
SELECT jos_mfs_users.*,
(SELECT COUNT(jos_mfs_songs.id)
FROM jos_mfs_songs
WHERE jos_mfs_songs.artist=jos_mfs_users.id) as song_count
FROM jos_mfs_users
WHERE (SELECT COUNT(jos_mfs_songs.id)
FROM jos_mfs_songs
WHERE jos_mfs_songs.artist=jos_mfs_users.id) > 10
There's a GROUP BY clause missing, e.g.
SELECT jos_mfs_users.id, COUNT(jos_mfs_songs.id) as song_count
FROM jos_mfs_users
INNER JOIN jos_mfs_songs
ON jos_mfs_songs.artist=jos_mfs_users.id
GROUP BY jos_mfs_users.id
If you want to add more columns from jos_mfs_users in the select list you should add them in the GROUP BYclause as well.
Changes:
Don't do SELECT *...specify your fields. I included ID and NAME, you can add more as needed but put them in the GROUP BY as well
Changed to a LEFT JOIN - INNER JOIN won't list any users that have no songs
Added the GROUP BY so it gives a valid count and is valid syntax
SELECT u.id, u.name COUNT(s.id) as song_count
FROM jos_mfs_users AS u
LEFT JOIN jos_mfs_songs AS S
ON s.artist = u.id
GROUP BY U.id, u.name
Try
SELECT
*,
(SELECT COUNT(*) FROM jos_mfs_songs as songs WHERE songs.artist=users.id) as song_count
FROM
jos_mfs_users as users
This seems like a many to many relationship. By that I mean it looks like there can be several records in the users table for each user, one of each song they have.
I would have three tables.
Users, which has one record for each user
Songs, which has one record for each song
USER_SONGS, which has one record for each user/song combination
Now, you can do a count of the songs each user has by doing a query on the intermediate table. You can also find out how many users have a particular song.
This will tell you how many songs each user has
select id, count(*) from USER_SONGS
GROUP BY id;
This will tell you how many users each song has
select artist, count(*) from USER_SONGS
GROUP BY artist;
I'm sure you will need to tweak this for your needs, but it may give you the type of results you are looking for.
You can also join either of these queries to the other two tables to find the user name, and/or artist name.
HTH
Harv Sather
ps I am not sure if you are looking for song counts or artist counts.
You need a GROUP BY clause to use aggregate functions (like COUNT(), for example)
So, assuming that jos_mfs_users.id is a primary key, something like this will work:
SELECT jos_mfs_users.*, COUNT( jos_mfs_users.id ) as song_count
FROM jos_mfs_users
INNER JOIN jos_mfs_songs
ON jos_mfs_songs.artist = jos_mfs_users.id
GROUP BY jos_mfs_users.id
Notice that
since you are grouping by user id, you will get one result per distinct user id in the results
the thing you need to COUNT() is the number of rows that are being grouped (in this case the number of results per user)

SQL Select Count in Where Clause Performance issue

I have the following SQL query that performs horribly due to the select count(1) statement in the where clause. Can anyone suggest a way that would speed this up? The idea is that I only want rows returned where there is one invoice found.
SELECT people.name, people.address
FROM people
WHERE ((SELECT COUNT(1) FROM invoices WHERE invoices.pid = people.id)=1)
COUNT(1) is superstition
What you have is a count per row of people = a cursor/loop like action
So, try a JOIN like this
SELECT people.name, people.address
FROM
people
JOIN
invoices ON invoices.pid = people.id
GROUP BY
people.name, people.address
HAVING
COUNT(*) = 1
I'd also hope you have indexes, at least on invoices.pid and people.pid, name, address
Use a JOIN:
SELECT people.name, people.address
FROM people
JOIN invoices ON invoices.pid = people.id
GROUP BY people.name, people.address
HAVING Count(*) = 1
Joining the tables is probably going to be much better in practice and in performance, I should think.
SELECT people.name, people.address
FROM people INNER JOIN invoices ON invoices.pid = people.id
Edit due to OP being edited: do you want only those people who have exactly one invoice? If so then disregard this and look at one of the other answers.