Why do these two queries produce different results? - sql

I am trying to find the number of persons that are not in the application table.
I have two tables (person and application) with person having a one-to-many relationship with application (person.id=application.person). However, a person may not have an application. There are roughly 35K records in the application table. I was able to reduce the query for the sake of this post and still produce the problem. I would expect the first query to produce the same number of results as the second, but it does not.
Why does this query produce zero results:
select count(*)
from person p where (p.id not in (
select person
from application
))
While this query produces expected results:
select count(*)
from person p where (p.id not in (
select person
from application
where person=p.id
))
From my understanding, the second query is correct because:
when person has no app, inner select returns null in which p.id not in null returns true
when person has app, inner select returns app p.id in which app
p.id not in p.id returns false
However, I do not understand why the first query does not equal the second.
Can someone please explain (thanks much)?

You should not use not in with a subquery. It does not treat NULL values correctly (or at least intuitively). Instead, phrase the query as not exists:
select count(*)
from person p
where not exists (select 1
from application a
where a.person = p.id
);
With NOT IN, if any row in the subquery returns NULL, then no rows are returned at all to the outer query.
Your version with the correlation clause limits the damage. However, my recommendation is to simply use NOT EXISTS.

Related

Query build to find records where all of a series of records have a value

Let me explain a little bit about what I am trying to do because I dont even know the vocab to use to ask. I have an Access 2016 database that records staff QA data. When a staff member misses a QA we assign a job aid that explains the process and they can optionally send back a worksheet showing they learned about what was missed. If they do all of these ina 3 month period they get a credit on their QA score. So I have a series of records all of whom have a date we assigned the work(RA1) and MAY have a work returned date(RC1).
In the below image "lavalleer" has earned the credit because both of her sheets got returned. "maduncn" Did not earn the credit because he didn't do one.
I want to create a query that returns to me only the people that are like "lavalleer". I tried hitting google and searched here and access.programmers.co.uk but I'm only coming up with instructions to use Not null statements. That wouldn't work for me because if I did a IS Not Null on "maduncn" I would get the 4 records but it would exclude the null.
What I need to do is build a query where I can see staff that have dates in ALL of their RC1 fields. If any of their RC1 fields are blank I dont want them to return.
Consider:
SELECT * FROM tablename WHERE NOT UserLogin IN (SELECT UserLogin FROM tablename WHERE RCI IS NULL);
You could use a not exists clause with a correlated subquery, e.g.
select t.* from YourTable t where not exists
(select 1 from YourTable u where t.userlogin = u.userlogin and u.rc1 is null)
Here, select 1 is used purely for optimisation - we don't care what the query returns, just that it has records (or doesn't have records).
Or, you could use a left join to exclude those users for which there is a null rc1 record, e.g.:
select t.* from YourTable t left join
(select u.userlogin from YourTable u where u.rc1 is null) v on t.userlogin = v.userlogin
where v.userlogin is null
In all of the above, change all occurrences of YourTable to the name of your table.

Avoid full scan for subquery

I have a query which was working before I added the exist condition. After adding the exist condition , its going into loop forever and not getting back any results. I think the main reason for that is the full scan for every row level record. Can anyone please tell how to avoid that. The query below is an example of what I am trying to achieve.
Basically the condition is that a car can have many parts and if any one of the parts updated on changes for that car, we want to pick up all the parts. The part has detail table and I want to look at updates to the detail table.
select c.id, p.id
from car c join part p on p.car_id=c.id
where exists (
select 1
from part p join pdetl pd on p.id=pd.part_id
where p.car_id=c.id and pd.updated_on > ?
)
EDITED: Modified the query to get all the parts associated to the car that had part(s) updated.
The inner query gets the parts that were updated. The outer query then pulls all the parts that are associated to the car:
select c.id, p.id
from car c join part p on p.car_id=c.id
where c.id in
(
select c.id
from car c join part p on p.car_id=c.id
where exists (
select 1
from pdetl pd
where p.id=pd.part_id
and pd.updated_on > ?
)
)

Multiply total number of values in column by value in a different table

I am trying to count all the values in one column and then multiply this number by a value in a different table. So far I have:
SELECT CLUB_FEE * COUNT(MEMBER_ID) AS VALUE
FROM CLUB, SUBSCRIPTION
WHERE CLUB_ID = 'CLUB1';
This is not working however, can anyone please help?
I also need help doing this for multiple clubs. Is it possible to do it all in one statement for all clubs and then get the average?
Presumably, you intend something like this:
SELECT MAX(c.CLUB_FEE) * COUNT(MEMBER_ID) AS VALUE
FROM CLUB c JOIN
SUBSCRIPTION s
ON c.CLUB_ID = s.CLUB_ID
WHERE c.CLUB_ID = 'CLUB1';
You can also write this as:
SELECT SUM(c.CLUB_FEE) AS VALUE
FROM CLUB c JOIN
SUBSCRIPTION s
ON c.CLUB_ID = s.CLUB_ID
WHERE c.CLUB_ID = 'CLUB1';
I thought the first version would be clearer, because the OP specifies COUNT() in the question.
If you want it for all clubs that have subscribers:
SELECT SUM(c.CLUB_FEE) AS VALUE
FROM CLUB c JOIN
SUBSCRIPTION s
ON c.CLUB_ID = s.CLUB_ID
GROUP BY c.CLUB_ID;
From inspecting the explain plans, it seems the following version may be a bit more efficient (since it avoids a join and uses only one aggregation). If you need this for ALL clubs at the same time, then probably all solutions will have the same "optimizer cost" (they will all do a join at some point).
select club_fee * (select count(member_id) from subscription where club_id = 'CLUB1')
from club
where club_id = 'CLUB1'
So now the only aggregate function is pushed into a subquery and the rest does not need either a join or another aggregate function.
Of course, this only matters if performance is important; it may very well not be.

SQL Server : join on array of ID's from previous join

I have 2 tables. One has been pruned to show only ID's which meet certain criteria. The second needs to be pruned to show only data that matches the previous "array" of id's. there can be multiple results.
Consider the following:
Query_1_final: Returns the ID's of users whom meet certain criteria:
select
t1.[user_id]
from
[SQLDB].[db].[meeting_parties] as t1
inner join
(select distinct
[user_id]
from
[SQLDB].[db].[meeting_parties]
group by
[user_id]
having
count([user_id]) = 1) as t2 on t1.user_id = t2.user_id
where
[type] = 'organiser'
This works great and returns:
user_id
--------------------
22
1255
9821
and so on...
It produces a single column with the ID's of everyone who is a "Meeting Organizer" and also in the active_meetings table. (note, there are multiple types/roles, this was the best way to grab them all)
Now, I need this data to filter another table, another join. Here is the start of my query
Query_2_PREP: returns 5 columns where the meeting has "started" already.
SELECT
[meeting_id]
,[meeting_style]
,[meeting_day]
,[address]
,[promos]
FROM
[SQLDB].[db].[all_meetings]
WHERE
[meeting_started] = 'TRUE'
This works as well
meeting_id | meeting_style | meeting_day ...
---------------------------------------------
23 open M,F,SA
23 discussion TU,TH
23 lead W,F
and so on...
and returns ALL 10,982 meetings that started, but I need it to return only the meetings that are from the distinct 'organiser's ID's from Query_1_final (which should be more like 1200 records or so)
Ideally, I need something "like" this below (but of course it does not work)
Query 2: needs to return all meetings that are from organiser ID's only.
SELECT
[meeting_party_id]
,[meeting_style]
,[meeting_day]
,[address]
,[promos]
FROM
[SQLDB].[db].[all_meetings]
WHERE
[meeting_started] = 'TRUE'
AND [meeting_party_id] = "ANY Query_1_final results, especially multiple"
I have tried nesting JOIN and INNER JOIN's but I think there is something fundamental I am missing here about SQL. In PHP I would use an array compare or just run another query... any help would be much appreciated.
Just use IN. Here is the structure of the logic:
with q1 as (
<first query here>
)
SELECT m.*
FROM [SQLDB].[db].[all_meetings] m
WHERE meeting_started = 'TRUE' AND
meeting_party_id IN (SELECT user_id FROM q1);

SQL query - Joining a many-to-many relationship, filtering/joining selectively

I find myself in a bit of an unworkable situation with a SQL query and I'm hoping that I'm missing something or might learn something new. The structure of the DB2 database I'm working with isn't exactly built for this sort of query, but I'm tasked with this...
Let's say we have Table People and Table Groups. Groups can contain multiple people, and one person can be part of multiple groups. Yeah, it's already messy. In any case, there are a couple of intermediary tables linking the two. The problem is that I need to start with a list of groups, get all of the people in those groups, and then get all of the groups with which the people are affiliated, which would be a superset of the initial group set. This would mean starting with groups, joining down to the people, and then going BACK and joining to the groups again. I need information from both tables in the result set, too, so that rules out a number of techniques.
I have to join this with a number of other tables for additional information and the query is getting enormous, cumbersome, and slow. I'm wondering if there's some way that I could start with People, join it to Groups, and then specify that if a person has one group that is in the supplied set of groups (which is done via a subquery), then ALL groups for that person should be returned. I don't know of a way to make this happen, but I'm thinking (hoping) that there's a relatively clean way to make this happen in SQL.
A quick and dirty example:
SELECT ...
FROM GROUPS g
JOIN LINKING_A a
ON g.GROUPID = a.GROUPID
AND GROUPID IN (subquery)
JOIN LINKING_B b
ON a.GROUPLIST = b.GROUPLIST
JOIN PEOPLE p
ON b.PERSONID = p.PERSONID
--This gets me all people affiliated with groups,
-- but now I need all groups affiliated with those people...
JOIN LINKING_B b2
ON p.PERSONID = b2.PERSONID
JOIN LINKING_A a2
ON b2.GROUPLIST = a.GROUPLIST
JOIN GROUPS g2
ON a2.GROUPID = g.GROUPID
And then I can return information from p and g2 in the result set. You can see where I'm having trouble. That's a lot of joining on some large tables, not to mention a number of other joins that are performed in this query as well. I need to be able to query by joining PEOPLE to GROUPS, then specify that if any person has an associated group that is in the subquery, it should return ALL groups affiliated with that entry in PEOPLE. I'm thinking that GROUP BY might be just the thing, but I haven't used that one enough to really know. So if Bill is part of group A, B, and C, and our subquery returns a set containing Group A, the result set should include Bill along with groups A, B, and C.
The following is a shorter way to get all the groups that people in the supplied group list are in. Does this help?
Select g.*
From Linking_B b
Join Linking_B b2
On b2.PersonId = b.PersonId
Join Group g
On g.GroupId = b2.GroupId
Where b.Groupid in (SubQuery)
I'm not clear why you have both Linking_A and Linking_B. Generally all you should need to represent a many-to-many relationship between two master tables is a single association table with GroupID and PersonId.
I often recommend using "common table expressions" [CTE's] in order to help you break a problem up into chunks that can be easier to understand. CTE's are specified using a WITH clause, which can contain several CTE's before starting the main SELECT query.
I'm going to assume that the list of groups you want to start with is specified by your subquery, so that will be the 1st CTE. The next one selects people who belong to those groups. The final part of the query then selects groups those people belong to, and returns the columns from both master tables.
WITH g1 as
(subquery)
, p1 as
(SELECT p.*
from g1
join Linking a1 on g1.groupID=a1.groupID
join People p on p.personID=a1.personID )
SELECT p1.*, g2.*
from p1
join Linking a2 on p2.personID=a2.personID
join Groups g2 on g2.groupID=a2.groupID
I think I'd build the list of people you want to pull records for first, then use that to query out all the groups for those people. This will work across any number of link tables with the appropriate joins added:
with persons_wanted as
(
--figure out which people are in a group you want to include
select p.person_key
from person p
join link l1
on p.person_key = l1.person_key
join groups g
on l1.group_key = g.group_key
where g.group name in ('GROUP_I_WANT_PEOPLE_FROM', 'THIS_ONE_TOO')
group by p.person_key --we only want each person_key once
)
--now pull all the groups for the list of people in at least one group we want
select p.name as person_name, g.name as group_name, ...
from person p
join link l1
on p.person_key = l1.person_key
join groups g
on l1.group_key = g.group_key
where p.person_key in (select person_key from persons_wanted);