SQL nesting on highschool test - sql

We made this test on my school, and according to the test this is the right answer:
SELECT
names
FROM
COMPANY
WHERE
NOT EXISTS (SELECT
kgmilk
FROM
COWS
WHERE kgmilk < 1000 AND COMPANY.nr = COWS.nr)
Now my question is, can you actually do COMPANY.nr = COWS.nr in the nested query, since you only select one database in that query.

You didn't specify which kind of SQL this is.
If it's MS SQL Server, COMPANY.nr = COWS.nr is possible.
See this example.

Yes, it's called a correlated subquery and it will in effect get evaluated for each row in the main table (ignoring optimization).

The nested query - commonly called a subquery - is a correlated subquery, which means that it is evaluated for each row in the outer query. The correlation occurs when, within the subquery, you reference a table in the outer query. Here's an example of a plain subquery:
select
NAME
from
COMPANY
where
NAME not in (
select
NAME
from
COMPANY_BLACKLIST
)

Related

SQL: Is the newly selected temp table in FROM clause not passed to the sub-query in WHERE clause?

here's a question I'm in trouble with. Basically, there are originally two tables: "a" and "b". I firstly "joined" (without using JOIN clause) them together with some conditions: "a.id=b.id", "b.class="xxx"". Then I name that temp table as A, and want to select the data with the highest income within the people in A.
The error returns "the relation A doesn't exist." And the error arrow turns to the clause "select max(A.income) from A". Therefore, I suspect that the temp table A created in FROM clause will not be passed to the sub-query in WHERE clause?
select * from
(select * from a,b where a.id=b.id and b.class='xxx') as A
where A.income = all
(select max(A.income) from A)
I've encountered this problem while using Postgres, but I think it may also happen in other languages like MYSQL or MSSQL. Are there any possible solutions to solve that? Without using WITH clause? Thanks. (The reason why I say "sub-query" instead of "query" is because I've tried terms like "where A.income>1000" and they all work)
The problem is that your alias a hides the table with the same name. Use a different alias name.
It is unclear whether you want to select from the original table a in the subquery or from the alias. If it is the former, then the above will solve your problem.
If you want to reference the alias in the subquery, you had better use a common table expression:
WITH alias_name AS (/* your FROM subquery */)
SELECT ... /* alias_name can be used in a subquery here */
You can try the below -
select * from a join b on a.id=b.id where b.class='xxx'
and income = all (select max(income) from a join b on a.id=b.id where b.class='xxx')

SQL Correlated subquery

I am trying to execute this query but am getting ORA-00904:"QM"."MDL_MDL_ID":invalid identifier. What is more confusing to me is the main query has two sub queries which only differ in the where clause. However, the first query is running fine but getting error for the second one. Below is the query.
select (
select make_description
from make_colours#dblink1
where makc_id = (
select makc_makc_id
from model_colours#dblink1
where to_char(mdc_id) = md.allocate_vehicle_colour_id
)
) as colour,
(
select make_description
from make_colours#dblink1
where makc_id = (
select makc_makc_id
from model_colours#dblink1
where mdl_mdl_id = qm.mdl_mdl_id
)
) as vehicle_colour
from schema1.web_order wo,
schema1.tot_order tot,
suppliers#dblink1 sp,
external_accounts#dblink1 ea,
schema1.location_contact_detail lcd,
quotation_models#dblink1 qm,
schema1.manage_delivery md
where wo.reference_id = tot.reference_id
and sp.ea_c_id = ea.c_id
and sp.ea_account_type = ea.account_type
and sp.ea_account_code = ea.account_code
and lcd.delivery_det_id = tot.delivery_detail_id
and sp.sup_id = tot.dealer_id
and wo.qmd_id = qm.qmd_id
and wo.reference_id = md.web_reference_id(+)
and supplier_category = 'dealer'
and wo.order_type = 'tot'
and trunc(wo.confirmdeliverydate - 3) = trunc(sysdate)
Oracle usually doesn't recognise table aliases (or anything else) more than one level down in a nested subquery; from the documentation:
Oracle performs a correlated subquery when a nested subquery references a column from a table referred to a parent statement one level above the subquery. [...] A correlated subquery conceptually is evaluated once for each row processed by the parent statement.
Note the 'one level' part. So your qm alias isn't being recognised where it is, in the nested subquery, as it is two levels away from the definition of the qm alias. (The same thing would happen with the original table name if you hadn't aliased it - it isn't specifically to do with aliases).
When you modified your query to just have select qm.mdl_mdl_id as Vehicle_colour - or a valid version of that, maybe (select qm.mdl_mdl_id from dual) as Vehicle_colour - you removed the nesting, and the qm was now only one level down from it's definition in the main body of the query, so it was recognised.
Your reference to md in the first nested subquery probably won't be recognised either, but the parser tends to sort of work backwards, so it's seeing the qm problem first; although it's possible a query rewrite would make it valid:
However, the optimizer may choose to rewrite the query as a join or use some other technique to formulate a query that is semantically equivalent.
You could also add hints to encourage that but it's better not to rely on that.
But you don't need nested subqueries, you can join inside each top level subquery:
select (
select mc2.make_description
from model_colours#dblink1 mc1,
make_colours#dblink1 mc2
where mc2.makc_id = mc1.makc_makc_id
and to_char(mc1.mdc_id) = md.allocate_vehicle_colour_id
) as colour,
(
select mc2.make_description
from model_colours#dblink1 mc1,
make_colours#dblink1 mc2
where mc2.makc_id = mc1.makc_makc_id
and mc1.mdl_mdl_id = qm.mdl_mdl_id
) as vehicle_colour
from schema1.web_order wo,
...
I've stuck with old-style join syntax to match the main query, but you should really consider rewriting the whole thing with modern ANSI join syntax. (I've also removed the rogue comma #Serg mentioned, but you may just have left out other columns in your real select list when posting the question.)
You could probably avoid subqueries altogether by joining to the make and model colour tables in the main query, either twice to handle the separate filter conditions, or once with a bit of logic in the column expressions. Once step at a time though...

SQL 'group by' equivalent query

I've seen the following example:
Let T be a table with 2 columns - [id,value] (both int)
Then:
SELECT * FROM T
WHERE id=(SELECT MAX(id) FROM T t2 where T.value=t2.value);
is equivalent to:
SELECT MAX(id) FROM T GROUP BY value
What is going on behind the scene? How can we refer to T1.value?
What is the meaning of T1.value=t2.value?
#JuanCarlosOropeza is correct, your premise is false. Those are not equivalent queries. The second query should error out. But more to the point. The purpose of the WHERE clause in the subquery is to restrict the rows in the subquery to the id from the outer query.
For what's going on behind the scenes, use the explain plan, which provides information about how the optimizer decides to get the data your query asks for.

MS Access subquery performance

I use MS Access a lot as an ad-hoc data processing tool. One thing I've noticed is that using sub-queries in certain ways tends to have very bad performance with large tables.
For example, this query performs poorly:
SELECT TOP 500 EmployeeID FROM Employee
WHERE EmployeeID NOT IN
(SELECT EmployeeID FROM LoadHistory
WHERE YearWeek = '2015-26');
This version of the same query performs well:
SELECT TOP 500 EmployeeID FROM Employee
WHERE NOT EXISTS
(SELECT 1 FROM LoadHistory
WHERE YearWeek = '2015-26' AND
EmployeeID = Employee.EmployeeID);
And this other form of the same query also performs well:
SELECT TOP 500 Employee.EmployeeID
FROM Employee
LEFT JOIN
(SELECT EmployeeID FROM LoadHistory
WHERE YearWeek = '2015-26') q
ON Employee.EmployeeID = q.EmployeeID
WHERE q.EmployeeID IS NULL;
For style reasons, I prefer the first form. I can't really understand why the optimizer doesn't generate the same plan for the first and second queries. Is there any logic to how the ACE optimizer is behaving here? Are there any other ways to slightly rewrite the first query so the optimizer can do a better job?
NOT IN and NOT EXISTS are very similar . . . but not quite the same.
The semantics of NOT IN specify that it never returns true if any of the values are NULL. That means that the subquery has to verify that this is true.
My guess is that this accounts for the different optimization schemes. This is also why I prefer NOT EXISTS to NOT IN. NOT EXISTS is more intuitive in the treatment of NULL values in the subquery.
Note: You should always qualify column names when you use correlated subqueries:
SELECT TOP 500 EmployeeID
FROM Employee
WHERE NOT EXISTS (SELECT 1
FROM LoadHistory
WHERE LoadHistory.YearWeek = '2015-26' AND
LoadHistory.EmployeeID = Employee.EmployeeID
);
The compiler might be smart enough to avoid this if you declare LoadHistory.EmployeeId as NOT NULL.
I should also mention that NOT EXISTS can take advantage of an index on LoadHistory(EmployeeId) or LoadHistory(EmployeeId, YearWeek). The NOT IN version can use LoadHistory(YearWeek) or LoadHistory(YearWeek, EmployeeId). Perhaps your indexes explain the difference in performance.
It is the difference between in and exists. Exists evaluates to true the first time a sub-query matches a given condition. On the other hand, in would scan the entire table.

SQL - table alias scope

I've just learned ( yesterday ) to use "exists" instead of "in".
BAD
select * from table where nameid in (
select nameid from othertable where otherdesc = 'SomeDesc' )
GOOD
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
And I have some questions about this:
1) The explanation as I understood was: "The reason why this is better is because only the matching values will be returned instead of building a massive list of possible results". Does that mean that while the first subquery might return 900 results the second will return only 1 ( yes or no )?
2) In the past I have had the RDBMS complainin: "only the first 1000 rows might be retrieved", this second approach would solve that problem?
3) What is the scope of the alias in the second subquery?... does the alias only lives in the parenthesis?
for example
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
AND
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeOtherDesc' )
That is, if I use the same alias ( o for table othertable ) In the second "exist" will it present any problem with the first exists? or are they totally independent?
Is this something Oracle only related or it is valid for most RDBMS?
Thanks a lot
It's specific to each DBMS and depends on the query optimizer. Some optimizers detect IN clause and translate it.
In all DBMSes I tested, alias is only valid inside the ( )
BTW, you can rewrite the query as:
select t.*
from table t
join othertable o on t.nameid = o.nameid
and o.otherdesc in ('SomeDesc','SomeOtherDesc');
And, to answer your questions:
Yes
Yes
Yes
You are treading into complicated territory, known as 'correlated sub-queries'. Since we don't have detailed information about your tables and the key structures, some of the answers can only be 'maybe'.
In your initial IN query, the notation would be valid whether or not OtherTable contains a column NameID (and, indeed, whether OtherDesc exists as a column in Table or OtherTable - which is not clear in any of your examples, but presumably is a column of OtherTable). This behaviour is what makes a correlated sub-query into a correlated sub-query. It is also a routine source of angst for people when they first run into it - invariably by accident. Since the SQL standard mandates the behaviour of interpreting a name in the sub-query as referring to a column in the outer query if there is no column with the relevant name in the tables mentioned in the sub-query but there is a column with the relevant name in the tables mentioned in the outer (main) query, no product that wants to claim conformance to (this bit of) the SQL standard will do anything different.
The answer to your Q1 is "it depends", but given plausible assumptions (NameID exists as a column in both tables; OtherDesc only exists in OtherTable), the results should be the same in terms of the data set returned, but may not be equivalent in terms of performance.
The answer to your Q2 is that in the past, you were using an inferior if not defective DBMS. If it supported EXISTS, then the DBMS might still complain about the cardinality of the result.
The answer to your Q3 as applied to the first EXISTS query is "t is available as an alias throughout the statement, but o is only available as an alias inside the parentheses". As applied to your second example box - with AND connecting two sub-selects (the second of which is missing the open parenthesis when I'm looking at it), then "t is available as an alias throughout the statement and refers to the same table, but there are two different aliases both labelled 'o', one for each sub-query". Note that the query might return no data if OtherDesc is unique for a given NameID value in OtherTable; otherwise, it requires two rows in OtherTable with the same NameID and the two OtherDesc values for each row in Table with that NameID value.
Oracle-specific: When you write a query using the IN clause, you're telling the rule-based optimizer that you want the inner query to drive the outer query. When you write EXISTS in a where clause, you're telling the optimizer that you want the outer query to be run first, using each value to fetch a value from the inner query. See "Difference between IN and EXISTS in subqueries".
Probably.
Alias declared inside subquery lives inside subquery. By the way, I don't think your example with 2 ANDed subqueries is valid SQL. Did you mean UNION instead of AND?
Personally I would use a join, rather than a subquery for this.
SELECT t.*
FROM yourTable t
INNER JOIN otherTable ot
ON (t.nameid = ot.nameid AND ot.otherdesc = 'SomeDesc')
It is difficult to generalize that EXISTS is always better than IN. Logically if that is the case, then SQL community would have replaced IN with EXISTS...
Also, please note that IN and EXISTS are not same, the results may be different when you use the two...
With IN, usually its a Full Table Scan of the inner table once without removing NULLs (so if you have NULLs in your inner table, IN will not remove NULLS by default)... While EXISTS removes NULL and in case of correlated subquery, it runs inner query for every row from outer query.
Assuming there are no NULLS and its a simple query (with no correlation), EXIST might perform better if the row you are finding is not the last row. If it happens to be the last row, EXISTS may need to scan till the end like IN.. so similar performance...
But IN and EXISTS are not interchangeable...