SQL Correlated subquery - sql

I am trying to execute this query but am getting ORA-00904:"QM"."MDL_MDL_ID":invalid identifier. What is more confusing to me is the main query has two sub queries which only differ in the where clause. However, the first query is running fine but getting error for the second one. Below is the query.
select (
select make_description
from make_colours#dblink1
where makc_id = (
select makc_makc_id
from model_colours#dblink1
where to_char(mdc_id) = md.allocate_vehicle_colour_id
)
) as colour,
(
select make_description
from make_colours#dblink1
where makc_id = (
select makc_makc_id
from model_colours#dblink1
where mdl_mdl_id = qm.mdl_mdl_id
)
) as vehicle_colour
from schema1.web_order wo,
schema1.tot_order tot,
suppliers#dblink1 sp,
external_accounts#dblink1 ea,
schema1.location_contact_detail lcd,
quotation_models#dblink1 qm,
schema1.manage_delivery md
where wo.reference_id = tot.reference_id
and sp.ea_c_id = ea.c_id
and sp.ea_account_type = ea.account_type
and sp.ea_account_code = ea.account_code
and lcd.delivery_det_id = tot.delivery_detail_id
and sp.sup_id = tot.dealer_id
and wo.qmd_id = qm.qmd_id
and wo.reference_id = md.web_reference_id(+)
and supplier_category = 'dealer'
and wo.order_type = 'tot'
and trunc(wo.confirmdeliverydate - 3) = trunc(sysdate)

Oracle usually doesn't recognise table aliases (or anything else) more than one level down in a nested subquery; from the documentation:
Oracle performs a correlated subquery when a nested subquery references a column from a table referred to a parent statement one level above the subquery. [...] A correlated subquery conceptually is evaluated once for each row processed by the parent statement.
Note the 'one level' part. So your qm alias isn't being recognised where it is, in the nested subquery, as it is two levels away from the definition of the qm alias. (The same thing would happen with the original table name if you hadn't aliased it - it isn't specifically to do with aliases).
When you modified your query to just have select qm.mdl_mdl_id as Vehicle_colour - or a valid version of that, maybe (select qm.mdl_mdl_id from dual) as Vehicle_colour - you removed the nesting, and the qm was now only one level down from it's definition in the main body of the query, so it was recognised.
Your reference to md in the first nested subquery probably won't be recognised either, but the parser tends to sort of work backwards, so it's seeing the qm problem first; although it's possible a query rewrite would make it valid:
However, the optimizer may choose to rewrite the query as a join or use some other technique to formulate a query that is semantically equivalent.
You could also add hints to encourage that but it's better not to rely on that.
But you don't need nested subqueries, you can join inside each top level subquery:
select (
select mc2.make_description
from model_colours#dblink1 mc1,
make_colours#dblink1 mc2
where mc2.makc_id = mc1.makc_makc_id
and to_char(mc1.mdc_id) = md.allocate_vehicle_colour_id
) as colour,
(
select mc2.make_description
from model_colours#dblink1 mc1,
make_colours#dblink1 mc2
where mc2.makc_id = mc1.makc_makc_id
and mc1.mdl_mdl_id = qm.mdl_mdl_id
) as vehicle_colour
from schema1.web_order wo,
...
I've stuck with old-style join syntax to match the main query, but you should really consider rewriting the whole thing with modern ANSI join syntax. (I've also removed the rogue comma #Serg mentioned, but you may just have left out other columns in your real select list when posting the question.)
You could probably avoid subqueries altogether by joining to the make and model colour tables in the main query, either twice to handle the separate filter conditions, or once with a bit of logic in the column expressions. Once step at a time though...

Related

jOOQ - group by uses alias instead of columns

I'm running jOOQ 3.13.6, on Oracle 11g springboot environment.
To use listagg function, I'm trying the solution provided here: https://stackoverflow.com/a/69482329/17505774
Example code:
#Autowired
private DSLContext dsl;
// setting tables
Table table1 = DSL.table("table_1").as("t1");
Table table2 = DSL.table("table_2").as("t2");
// creating fields
final List<Field<?>> fields = new ArrayList<Field<?>>();
fields.add(DSL.field(DSL.name("t1", "t1_id")).as("id"));
Field cf = DSL.field(DSL.name("t2", "t2_code")).as("code");
fields.add(listAgg(cf, ";", null)); // no order
// running query
dsl.settings().withRenderQuotedNames(RenderQuotedNames.EXPLICIT_DEFAULT_UNQUOTED);
dsl.select(fields)
.from(table1)
.join(table2).on("t1.t1_id = t2.t1_id")
.where("t1.t1_id = ?", id)
.groupBy(fields)
However, it executes the following query:
select t1.t1_id id,
listagg(t2.t2_code) within group (null) as code
from table_1 t1
join table_2 t2 on ( t1.t1_id = t2.t2_id )
where t1.t1_id = 1
group by id
The group by should use the original column names (t1.t1_id), not the alias.
Note, for this answer, and for brevity reasons, I'm assuming you had been using the code generator. The answer is the same without code generation usage.
Why the current behaviour?
Note, this behaviour is also documented here in the manual.
In jOOQ, an aliased column expression T1.T1_ID.as("id") can only generate 2 different versions of itself:
T1.T1_ID as ID, i.e. the alias declaration (when inside of SELECT, at the top level)
ID, i.e. the alias reference (when inside of any other clause / expression than the SELECT clause)
There isn't a third type of generated SQL that depends on the location of where you embed the alias expression, e.g. the unaliased column expression T1.T1_ID when you put the expression in WHERE or GROUP BY, etc. The rationale is simple. What would a user expect when they write:
groupBy(T1.T1_ID.as("id"))
Why would they expect the as() call to be a no-op? That would be more surprising than the status quo.
Consistency with other rendering modes
There are other types of QueryPart in jOOQ, which have similar aliasing capabilities:
Field
Table
WindowSpecification
Parameter
CTE
Let's look at the CTE example:
Table<?> cte = name("cte").as(select(...))
That cte reference has 2 modes of rendering itself to SQL:
The CTE declaration (if placed in the WITH clause)
The CTE reference (if placed in FROM, etc.)
I don't think you'd expect that cte reference to ever ignore the aliasing, and just render the SELECT itself?
Likewise with table aliasing:
T1 x = T1.as("x");
This can render itself as:
The alias declaration (if placed in the FROM clause)
The alias reference
Because the FROM clause is logically before any other clauses, you'd never expect your x reference to render only T1, instead of x or T1 as x, right?
So, for consistency reasons across the jOOQ API, the Field aliases must also behave like all the others.
What to do instead?
Don't re-use the alias expression outside of your SELECT clause. Write the jOOQ SQL exactly as you'd write the actual SQL:
ctx.select(T1.T1_ID.as("id"), ...)
.from(T1)
.groupBy(T1.T1_ID)
...

Difference visibility in subquery join and where

I had problems with a simple join:
SELECT *
FROM worker wo
WHERE EXISTS (
SELECT wp.id_working_place
FROM working_place wp
JOIN working_place_worker wpw ON ( wp.id_working_place = wpw.id_working_place
AND wpw.id_worker = wo.id_worker)
)
The error I had was ORA-00904: "WO"."ID_WORKER": not valid identifier.
Then I decided to move the union of tables from join clause to the where clause:
SELECT *
FROM worker wo
WHERE EXISTS (
SELECT wp.id_working_place
FROM working_place wp
JOIN working_place_worker wpw ON ( wp.id_working_place = wpw.id_working_place)
WHERE wpw.id_worker = wo.id_worker
)
And this last query works perfect.
Why is not possible to make it in the join? The table should be visible like it is in the where clause. Am I missing something?
In
FROM working_place wp
JOIN working_place_worker wpw ON ...
WHERE ...
the ON clause refers only to the two tables participating in the join, namely wp and wpw. Names from the outer query are not visible to it.
The WHERE clause (and its cousin HAVING is the means by which the outer query is correlated to the subquery. Names from the outer query are visible to it.
To make it easy to remember,
ON is about the JOIN, how two tables relate to form a row (or rows)
WHERE is about the selection criteria, the test the rows must pass
While the SQL parser will admit literals (which aren't column names) in the ON clause, it draws the line at references to columns outside the join. You could regard this as a favor that guards against errors.
In your case, the wo table is not part of the JOIN, and is rejected. It is part of the whole query, and is recognized by WHERE.

In an EXISTS can my JOIN ON use a value from the original select

I have an order system. Users with can be attached to different orders as a type of different user. They can download documents associated with an order. Documents are only given to certain types of users on the order. I'm having trouble writing the query to check a user's permission to view a document and select the info about the document.
I have the following tables and (applicable) fields:
Docs: DocNo, FileNo
DocAccess: DocNo, UserTypeWithAccess
FileUsers: FileNo, UserType, UserNo
I have the following query:
SELECT Docs.*
FROM Docs
WHERE DocNo = 1000
AND EXISTS (
SELECT * FROM DocAccess
LEFT JOIN FileUsers
ON FileUsers.UserType = DocAccess.UserTypeWithAccess
AND FileUsers.FileNo = Docs.FileNo /* Errors here */
WHERE DocAccess.UserNo = 2000 )
The trouble is that in the Exists Select, it does not recognize Docs (at Docs.FileNo) as a valid table. If I move the second on argument to the where clause it works, but I would rather limit the initial join rather than filter them out after the fact.
I can get around this a couple ways, but this seems like it would be best. Anything I'm missing here? Or is it simply not allowed?
I think this is a limitation of your database engine. In most databases, docs would be in scope for the entire subquery -- including both the where and in clauses.
However, you do not need to worry about where you put the particular clause. SQL is a descriptive language, not a procedural language. The purpose of SQL is to describe the output. The SQL engine, parser, and compiler should be choosing the most optimal execution path. Not always true. But, move the condition to the where clause and don't worry about it.
I am not clear why do you need to join with FileUsers at all in your subquery?
What is the purpose and idea of the query (in plain English)?
In any case, if you do need to join with FileUsers then I suggest to use the inner join and move second filter to the WHERE condition. I don't think you can use it in JOIN condition in subquery - at least I've never seen it used this way before. I believe you can only correlate through WHERE clause.
You have to use aliases to get this working:
SELECT
doc.*
FROM
Docs doc
WHERE
doc.DocNo = 1000
AND EXISTS (
SELECT
*
FROM
DocAccess acc
LEFT OUTER JOIN
FileUsers usr
ON
usr.UserType = acc.UserTypeWithAccess
AND usr.FileNo = doc.FileNo
WHERE
acc.UserNo = 2000
)
This also makes it more clear which table each field belongs to (think about using the same table twice or more in the same query with different aliases).
If you would only like to limit the output to one row you can use TOP 1:
SELECT TOP 1
doc.*
FROM
Docs doc
INNER JOIN
FileUsers usr
ON
usr.FileNo = doc.FileNo
INNER JOIN
DocAccess acc
ON
acc.UserTypeWithAccess = usr.UserType
WHERE
doc.DocNo = 1000
AND acc.UserNo = 2000
Of course the second query works a bit different than the first one (both JOINS are INNER). Depeding on your data model you might even leave the TOP 1 out of that query.

SQL nesting on highschool test

We made this test on my school, and according to the test this is the right answer:
SELECT
names
FROM
COMPANY
WHERE
NOT EXISTS (SELECT
kgmilk
FROM
COWS
WHERE kgmilk < 1000 AND COMPANY.nr = COWS.nr)
Now my question is, can you actually do COMPANY.nr = COWS.nr in the nested query, since you only select one database in that query.
You didn't specify which kind of SQL this is.
If it's MS SQL Server, COMPANY.nr = COWS.nr is possible.
See this example.
Yes, it's called a correlated subquery and it will in effect get evaluated for each row in the main table (ignoring optimization).
The nested query - commonly called a subquery - is a correlated subquery, which means that it is evaluated for each row in the outer query. The correlation occurs when, within the subquery, you reference a table in the outer query. Here's an example of a plain subquery:
select
NAME
from
COMPANY
where
NAME not in (
select
NAME
from
COMPANY_BLACKLIST
)

SQL - table alias scope

I've just learned ( yesterday ) to use "exists" instead of "in".
BAD
select * from table where nameid in (
select nameid from othertable where otherdesc = 'SomeDesc' )
GOOD
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
And I have some questions about this:
1) The explanation as I understood was: "The reason why this is better is because only the matching values will be returned instead of building a massive list of possible results". Does that mean that while the first subquery might return 900 results the second will return only 1 ( yes or no )?
2) In the past I have had the RDBMS complainin: "only the first 1000 rows might be retrieved", this second approach would solve that problem?
3) What is the scope of the alias in the second subquery?... does the alias only lives in the parenthesis?
for example
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
AND
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeOtherDesc' )
That is, if I use the same alias ( o for table othertable ) In the second "exist" will it present any problem with the first exists? or are they totally independent?
Is this something Oracle only related or it is valid for most RDBMS?
Thanks a lot
It's specific to each DBMS and depends on the query optimizer. Some optimizers detect IN clause and translate it.
In all DBMSes I tested, alias is only valid inside the ( )
BTW, you can rewrite the query as:
select t.*
from table t
join othertable o on t.nameid = o.nameid
and o.otherdesc in ('SomeDesc','SomeOtherDesc');
And, to answer your questions:
Yes
Yes
Yes
You are treading into complicated territory, known as 'correlated sub-queries'. Since we don't have detailed information about your tables and the key structures, some of the answers can only be 'maybe'.
In your initial IN query, the notation would be valid whether or not OtherTable contains a column NameID (and, indeed, whether OtherDesc exists as a column in Table or OtherTable - which is not clear in any of your examples, but presumably is a column of OtherTable). This behaviour is what makes a correlated sub-query into a correlated sub-query. It is also a routine source of angst for people when they first run into it - invariably by accident. Since the SQL standard mandates the behaviour of interpreting a name in the sub-query as referring to a column in the outer query if there is no column with the relevant name in the tables mentioned in the sub-query but there is a column with the relevant name in the tables mentioned in the outer (main) query, no product that wants to claim conformance to (this bit of) the SQL standard will do anything different.
The answer to your Q1 is "it depends", but given plausible assumptions (NameID exists as a column in both tables; OtherDesc only exists in OtherTable), the results should be the same in terms of the data set returned, but may not be equivalent in terms of performance.
The answer to your Q2 is that in the past, you were using an inferior if not defective DBMS. If it supported EXISTS, then the DBMS might still complain about the cardinality of the result.
The answer to your Q3 as applied to the first EXISTS query is "t is available as an alias throughout the statement, but o is only available as an alias inside the parentheses". As applied to your second example box - with AND connecting two sub-selects (the second of which is missing the open parenthesis when I'm looking at it), then "t is available as an alias throughout the statement and refers to the same table, but there are two different aliases both labelled 'o', one for each sub-query". Note that the query might return no data if OtherDesc is unique for a given NameID value in OtherTable; otherwise, it requires two rows in OtherTable with the same NameID and the two OtherDesc values for each row in Table with that NameID value.
Oracle-specific: When you write a query using the IN clause, you're telling the rule-based optimizer that you want the inner query to drive the outer query. When you write EXISTS in a where clause, you're telling the optimizer that you want the outer query to be run first, using each value to fetch a value from the inner query. See "Difference between IN and EXISTS in subqueries".
Probably.
Alias declared inside subquery lives inside subquery. By the way, I don't think your example with 2 ANDed subqueries is valid SQL. Did you mean UNION instead of AND?
Personally I would use a join, rather than a subquery for this.
SELECT t.*
FROM yourTable t
INNER JOIN otherTable ot
ON (t.nameid = ot.nameid AND ot.otherdesc = 'SomeDesc')
It is difficult to generalize that EXISTS is always better than IN. Logically if that is the case, then SQL community would have replaced IN with EXISTS...
Also, please note that IN and EXISTS are not same, the results may be different when you use the two...
With IN, usually its a Full Table Scan of the inner table once without removing NULLs (so if you have NULLs in your inner table, IN will not remove NULLS by default)... While EXISTS removes NULL and in case of correlated subquery, it runs inner query for every row from outer query.
Assuming there are no NULLS and its a simple query (with no correlation), EXIST might perform better if the row you are finding is not the last row. If it happens to be the last row, EXISTS may need to scan till the end like IN.. so similar performance...
But IN and EXISTS are not interchangeable...