Convert nested (two) NOT IN into nested (two) NOT Exists - sql

Trying to understand NOT EXISTS better. Can we always replace NOT EXISTS when we have NOT IN, even with nested situation?
I found this similar question, it only has one NOT IN while trying to do with the nested case.
We have two tables, registered and preActivity .
Registered has mId (string), aId (string), quarter (string), year (integer) and preActivity has aId (string), preAId (string) where
> mId is member id,
> aId is the activity Id,
> preAId is the prerequisite activity Id.
If we have this query with nested NOT IN to find out all the members have registered all the required activities(prerequisite) class before for activity (class) swimming at YMCA.
Can we convert it with to two nested NOT EXIST?
SELECT DISTINCT r.mid
FROM registered r
WHERE r.mid NOT IN (SELECT r.mid
FROM preActivity p
WHERE p.aid = "swimming" AND
p.preAId NOT IN (SELECT r2.mid
FROM registered r2
WHERE r2.mid = r.mid));
Using the hint for this post, we can convert one of the NOT IN, but the second one taking me hours. Can someone please help with some explanation ?
Here is what I have so far:
SELECT DISTINCT r.mid
FROM registered r
WHERE NOT EXISTS (SELECT r.mid
FROM preActivity p
WHERE p.aid = "swimming" AND
p.preAId NOT IN (SELECT r2.mid # how can we compare p.preAId with some rows selected from r2 Notice we don't have preAid field from resistered table (following the idea from the post?
FROM registered r2
WHERE r2.mid = r.mid));
Or we can't apply the same idea here since it is a two nested case ?

First thing to remember: the SELECT in an [NOT] EXISTS query doesn't matter, as we're only looking for the existence of rows. You could even write SELECT 1/0 and not get an error. So most people write [NOT] EXISTS (SELECT 1. (I like to put that all on one line and leave the rest of the subquery on new lines)
Secondly, a NOT IN query can have issues surrounding null columns, so it's best to always write a NOT EXISTS instead.
Now, if you analyze an [NOT] IN query, you will see that the semi-join is on the column just before with the column in the SELECT. So a query:
X.colA [NOT] IN
(SELECT Y.colA FROM Y)
can always be converted to
[NOT] EXISTS (SELECT 1
FROM Y
WHERE Y.colA = X.colA)
Another interesting syntax, most useful with multi-column joins or nullable columns, is:
[NOT] EXISTS (
SELECT X.colA
INTERSECT
SELECT Y.colA
FROM Y)
Don't forget to always use the correct table alias on the subquery columns, if you get this wrong then your query can return incorrect results without you noticing.
For example, what happens here?
[NOT] EXISTS (SELECT 1
FROM Y
WHERE X.colA = colA)
In your case, your first NOT IN query is slightly weird.
You are putting r.mid on both sides of the join, so effectively this becomes an EXISTS anyway.
So your query can be rewritten as this:
select distinct r.mid
from registered r
where not exists (select 1
From preActivity p
where p.aid = "swimming" and
not exists (select 1
From registered r2
where r2.mid = r.mid and r2.mid = p.preAId
)
);

Related

Using Select * in a SQL JOIN returns the wrong id value for the wrong table

I have two tables (PlayerDTO and ClubDTO) and am using a JOIN to fetch data as follows:
SELECT * FROM PlayerDTO AS pl
INNER JOIN ClubDTO AS cl
ON pl.currentClub = cl.id
WHERE cl.nation = 7
This returns the correct rows from PlayerDTO, but in every row the id column has been changed to the value of the currentClub column (eg instead of pl.id 3,456 | pl.currentClub 97, it has become pl.id 97 | pl.currentClub 97).
So I tried the query listing all the columns by name instead of Select *:
SELECT pl.id, pl.nationality, pl.currentClub, pl.status, pl.lastName FROM PlayerDTO AS pl
INNER JOIN ClubDTO AS cl
ON pl.currentClub = cl.id
WHERE cl.nation = 7
This works correctly and doesn’t change any values.
PlayerDTO has over 100 columns (I didn’t list them all above for brevity, but I included them all in the query) but obviously I don’t want to write every column name in every query.
So could somebody please explain why Select * changes the id value and what I need to do to make it work correctly? All my tables have a column called id, is that something to do with it?
SELECT *... is, according to the docs...
shorthand for “select all columns.” (Source: Dev.MySQL.com
Both your tables have id columns, so which should be returned? It's not indicated, so MySQL makes a guess. So select what you want to select...
SELECT pl.id, *otherfieldsyouwant* FROM PlayerDTO AS pl...
Or...
SELECT pl.* FROM PlayerDTO AS pl...
Typically, SELECT * is bad form. The odds you are using every field is astronomically low. And the more data you pull, the slower it is.

Selecting ambiguous column from subquery with postgres join inside

I have the following query:
select x.id0
from (
select *
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0
) x;
Since id0 is in both sessions and clicked_products, I get the expected error:
column reference "id0" is ambiguous
However, to fix this problem in the past I simply needed to specify a table. In this situation, I tried:
select sessions.id0
from (
select *
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0
) x;
However, this results in the following error:
missing FROM-clause entry for table "sessions"
How do I return just the id0 column from the above query?
Note: I realize I can trivially solve the problem by getting rid of the subquery all together:
select sessions.id0
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0;
However, I need to do further aggregations and so do need to keep the subquery syntax.
The only way you can do that is by using aliases for the columns returned from the subquery so that the names are no longer ambiguous.
Qualifying the column with the table name does not work, because sessions is not visible at that point (only x is).
True, this way you cannot use SELECT *, but you shouldn't do that anyway. For a reason why, your query is a wonderful example:
Imagine that you have a query like yours that works, and then somebody adds a new column with the same name as a column in the other table. Then your query suddenly and mysteriously breaks.
Avoid SELECT *. It is ok for ad-hoc queries, but not in code.
select x.id from
(select sessions.id0 as id, clicked_products.* from sessions
inner join
clicked_products on
sessions.id0 = clicked_products.session_id0 ) x;
However, you have to specify other columns from the table sessions since you cannot use SELECT *
I assume:
select x.id from (select sessions.id0 id
from sessions
inner join clicked_products
on sessions.id0 = clicked_products.session_id0 ) x;
should work.
Other option is to use Common Table Expression which are more readable and easier to test.
But still need alias or selecting unique column names.
In general selecting everything with * is not a good idea -- reading all columns is waste of IO.

SQL query : How to change integer to boolean

I am using Firebird 2.5.8 and Delphi 10.2.3 and I want to fill a DBGrid with a query:
SELECT c.ID, l.ID,
(
SELECT COUNT(pl.ID)
FROM Tbl_ProtocolLicense AS pl
WHERE (pl.ReferenceId=l.ID)
) AS ReferenceCount
FROM Tbl_License AS l, tbl_client AS c
WHERE l.ClientId=c.Id;
How I can add a value ( ReferenceCount > 0 ) as boolean or (0/1) to that query?
Why even use a correlated query that would be re-calculated again and again for every row ?
The first query does not actually work. Was too hasty.
SELECT
c.ID,
l.ID,
IIF( r.CNT > 0, 1, 0 )
FROM Tbl_License AS l
JOIN tbl_client AS c ON l.ClientId=c.Id
JOIN (
SELECT COUNT(*) as CNT, ReferenceId as ID
FROM Tbl_ProtocolLicense
GROUP BY 2
) as r ON r.ID = l.ID
Note: this assumes that Tbl_ProtocolLicense.ID column is never NULL.
UPD. I gave a bit of lecture about COUNT and other aggregates at http://stackoverflow.com/a/51159126/976391 - but here I missed it myself.
SELECT COUNT(*) as CNT, ReferenceId as ID
FROM Tbl_ProtocolLicense
GROUP BY 2
Do run the query and see the result. Notice anything fishy?
This query only returns rows that do exist, not those that do not exist.
The intermediate grouping query would not have a single row, where count=0 !
And thus the whole Inner Join based query would not have them too!
What should we do is using Outer Join, that lets row exist even when there is no matching row in another table. Read: https://en.wikipedia.org/wiki/Join_(SQL)
SELECT
c.ID,
l.ID,
IIF( r.CNT is not NULL, 1, 0 )
FROM Tbl_License AS l
JOIN tbl_client AS c ON l.ClientId=c.Id
LEFT JOIN (
SELECT COUNT(*) as CNT, ReferenceId as ID
FROM Tbl_ProtocolLicense
GROUP BY 2
) as r ON r.ID = l.ID
Compare the output with the first query and see the difference.
UPD 2. However even that was not good enough, probably. Problem here is that "you say you want the things you do not actually want".
You demand Firebird to COUNT ALL the rows, when you really DO NOT care about the count. All you care is "if there is at least one row or none at all". If there is one row - you do not care if there would be 10 or 100 or 1000 more. So actually counting objects when you do not want their count - is an extra work done for nothing.
That is especially wasteful in Interbase/Firebird family, where counting over the table can trigger garbage collection and slow down the work. But it would be true even in pure Delphi - you do not want to loop through ALL the array if you would be satisfied with finding first suiting element of it.
And then we can move back to the correlated sub-query.
SELECT
c.ID,
l.ID,
IIF( EXISTS (
SELECT * FROM Tbl_ProtocolLicense AS pl
WHERE pl.ReferenceId=l.ID
), 1, 0 )
FROM Tbl_License AS l, tbl_client AS c
WHERE l.ClientId=c.Id;
The bitter side of c.s.q. is that it is being run again and again for every result row
The bitter side of calculating grouped total counts - is that you actually do not need that data, do not need the exact count.
Which is worse? Who knows. Depending on the real data and real tables/indexes - there can be case when one or another approach would be faster. Human would not notice the difference on small data. It is the question on "scaling up" to thousands and millions of real data, where the difference would show.
UPD 3. Can we have best of the both approaches? I hope we can. The trick is - asking exactly what we need and not any more than that. Can we ask Firebird to list all the IDs which we have in the table without actually counting them? Yes, there is.
SELECT DISTINCT ReferenceId FROM Tbl_ProtocolLicense
Run the query and see the result!
Notice, it still would NOT list the IDs that are not in the table. Obvious? Well, I missed it in my first approach and then two persons upvoting me missed too. Stupid errors are hardest to spot, as you can not believe such stupidity.
So, now we have to plug it instead of "counting" query of the 2nd attempt.
SELECT
c.ID,
l.ID,
IIF( r.ReferenceId is NULL, 0, 1 )
FROM Tbl_License AS l
JOIN tbl_client AS c ON l.ClientId=c.Id
LEFT JOIN (
SELECT DISTINCT ReferenceId
FROM Tbl_ProtocolLicense
) as r ON r.ReferenceId = l.ID
UPD. 4 One last trick. If I am correct, this query would have exactly the same result as above, without using IIF/CASE. Try it and compare. If the results are same - then try to understand why and how it works and which extra assumptions about data it requires.
SELECT
c.ID,
l.ID,
COUNT( r.ReferenceId )
FROM Tbl_License AS l
JOIN tbl_client AS c ON l.ClientId=c.Id
LEFT JOIN (
SELECT DISTINCT ReferenceId
FROM Tbl_ProtocolLicense
) as r ON r.ReferenceId = l.ID
GROUP BY c.ID, l.ID
This query is not better than Upd.3, it is just a quest to think about and then to understand SQL better.
Now do some work to actually check and compare the results, because blindly trusting unknown person on the internet is not good. Even if that person is not malicious, he can make stupid mistakes too.
Whatever you peek from Internet forums, that is only "example" and "idea demonstration", and it is always your responsibility to understand and check that example. And maybe to reject it.
To read and to understand:
Conditional Functions as https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-functions-scalarfuncs.html#fblangref25-functions-conditional
grouping as https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-dml-select.html#fblangref25-dml-select-groupby
joins as https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-dml-select.html#fblangref25-dml-select-joins
Additionally it would really be useful for you to read some good book on general SQL, like Martin Gruber's ones

I would like a simple example of a sub-query using T-SQL 2008

Can anyone give me a good example of a subquery using TSQL 2008?
Maximilian Mayer believes that, due to referencing MS documentation, my assertion that there is a difference between a subquery and a subSelect is incorrect. Frankly, I'd consider MSDN's "Subquery Fundamentals" a better choice. Quote:
You are making distinctions between terms that actually mean the same.
O RLY?
A subQUERY...
IE:
WHERE id IN (SELECT n.id FROM TABLE n)
OR id = (SELECT MAX(m.id) FROM TABLE m)
OR EXISTS(SELECT 1/0 FROM TABLE) --won't return a math error for division by zero
...affects the WHERE or HAVING clauses -- the filteration of data -- for a SELECT, INSERT, UPDATE or DELETE statement. The value from a subquery is never directly visible in the SELECT clause.
A subSELECT...
IE:
SELECT t.column,
(SELECT x.col FROM TABLE x) AS col2
FROM TABLE t
...does not affect the filteration of data in the main query, and the value is exposed directly in the SELECT clause. But it's only one value - you can't return two or more columns into a single column in the outer query.
A subselect is a consistent means of performing a LEFT JOIN in ANSI-89 join syntax - if there is no supporting row, the column will be null. Additionally, a non-correlated subselect will return the same value for every row of the main query.
Correlation
If a subquery or subselect is correlated, that query runs once for every record of the main query returned -- which doesn't scale well as the number of rows in the result set increases.
Derived Table/Inline View
IE:
SELECT x.*,
y.max_date,
y.num
FROM TABLE x
JOIN (SELECT t.id,
t.num,
MAX(t.date) AS max_date
FROM TABLE t
GROUP BY t.id, t.num) y ON y.id = x.id
...is a JOIN to a derived table (AKA inline view).
"Inline view" is a better term, because that is all that happens when you reference a non-materialized view -- a view is just a prepared SQL statement. There's no performance or efficiency difference if you create a view with a query like the one in the example, and reference the view name in place of the SELECT statement within the brackets of the JOIN. The example has the same information as a correlated subquery, but the performance benefit of using a join and none of the subquery detriments. And you can return more than one column, because it is a view/derived table.
Conclusion
It should be obvious why I and others make distinctions. The concept of relying on the word "subquery" to categorize any SELECT statement that isn't the main clause is fatality flawed, because it's also a specific case under a categorization of the same word (IE: subquery-subselect, subquery-subquery, subquery-join...). Now think of helping someone who says "I've got a problem with a subquery..."
Maximilian Mayer's idea of "official" documentation was written by technical writers, who often have no experience in the subject and are only summarizing what they've been told to from knowledgeable people who have simplified things. Ultimately, it's just text on a page or screen -- like what you're reading now -- and the decision is up to you if the details I've laid out make sense to you.
For variety's sake, here's one in the where clause:
select
a.firstname,
a.lastname
from
employee a
where
a.companyid in (
select top 10
c.companyid
from
company c
where
c.num_employees > 1000
)
...returns all employees in the top ten companies with over 1000 employees.
SELECT
*,
(SELECT TOP 1 SomeColumn FROM dbo.SomeOtherTable)
FROM
dbo.MyTable
SELECT a.*, b.*
FROM TableA AS a
INNER JOIN
(
SELECT *
FROM TableB
) as b
ON a.id = b.id
Thats a normal subquery, running once for the whole result set.
On the other hand
SELECT a.*, (SELECT b.somecolumn FROM TableB AS b WHERE b.id = a.id)
FROM TableA AS a
is a correlated subquery, running once for every row in the result set.

Using subselect to accomplish LEFT JOIN

Is is possible to accomplish the equivalent of a LEFT JOIN with subselect where multiple columns are required.
Here's what I mean.
SELECT m.*, (SELECT * FROM model WHERE id = m.id LIMIT 1) AS models FROM make m
As it stands now doing this gives me a 'Operand should contain 1 column(s)' error.
Yes I know this is possible with LEFT JOIN, but I was told it was possible with subselect to I'm curious as to how it's done.
There are many practical uses for what you suggest.
This hypothetical query would return the most recent release_date (contrived example) for any make with at least one release_date, and null for any make with no release_date:
SELECT m.make_name,
sub.max_release_date
FROM make m
LEFT JOIN
(SELECT id,
max(release_date) as max_release_date
FROM make
GROUP BY 1) sub
ON sub.id = m.id
A subselect can only have one column returned from it, so you would need one subselect for each column that you would want returned from the model table.