I am analyzing one SQL statement and there is something with regard to aliases what isn't really clear to me, so I would like to ask if someone can try to explain it. So, this is how the statement looks like
SELECT
a.RecordID
, a.Account
, b.RecordID
, c.SomeField as AlternativeFieldName
FROM TableA a
LEFT JOIN TableB b
ON a.RecordID=b.RecordID
LEFT JOIN TableC c
ON b.RecordID=c.RecordID
WHERE a.DayFrom >= YYYYMMDD and a.DayFrom < YYYYMMDD
AND b.Field1 is null
AND Field2 = 'SOME_VALUE'
as you can see aliases are provided for all three tables in the statement and used always in data selection as well as joins, however in the where clause there is one field from one of the tables above for which an alias is not provided. I wonder it this is correct and if it is what does SQL take as a source table or does it throw an error if it is not?
On this page, I've tried something similar and it actually worked, although I've expected some error. I thought that SQL forces you to used aliases if joining multiple tables. Here is the statement
SELECT *
FROM Customers c
JOIN Orders o
ON c.CustomerID = o.CustomerID
WHERE OrderDate = '1996-07-04'
Thx in advance, cheers!
If SQL Server (or any database) finds an unqualified column name, then it looks to see which tables/subqueries in the FROM clause or outer queries might be providing it.
If the column is in exactly one table/subquery in the FROM clause, then the column is assumed to come from the table/subquery.
If the column is in multiple tables/subqueries in the FROM clause, then the query returns an error.
If the column does not exist in the FROM clause nor in any outer queries, the query returns an error.
If the column does not exist in the FROM clause, but does in an outer query, then that reference is used.
These rules go by the name "scoping". That is a common term in computer languages for figuring out the value of a variable.
No error will be raised for unqualified column names as long as the column name is not ambiguous. In the case of ambiguous names, the column name must be qualified with the table name or alias to avoid an error.
Note there is also the notion of scope that can become an issue when column names are not qualified. Consider this construct:
WHERE b.Field3 IN(
SELECT Field4
FROM TableD
)
If Field4 exists in TableD, the desired results will be returned. But if Field4 does not exist in TableD but exists in one of the outer tables, the predicate will be true for all outer rows when the Field4 value is not NULL and at least one row exists in TableD.
In short, the best practice is to qualify column names in multi-table queries.
Any RDBMS executing the sql queries with joins will try to find column names from all referenced tables. If it finds only one column with that name simply executes the query otherwise throughs ambiguity error. Alias names will be used to avoid ambiguity.
Related
here's a question I'm in trouble with. Basically, there are originally two tables: "a" and "b". I firstly "joined" (without using JOIN clause) them together with some conditions: "a.id=b.id", "b.class="xxx"". Then I name that temp table as A, and want to select the data with the highest income within the people in A.
The error returns "the relation A doesn't exist." And the error arrow turns to the clause "select max(A.income) from A". Therefore, I suspect that the temp table A created in FROM clause will not be passed to the sub-query in WHERE clause?
select * from
(select * from a,b where a.id=b.id and b.class='xxx') as A
where A.income = all
(select max(A.income) from A)
I've encountered this problem while using Postgres, but I think it may also happen in other languages like MYSQL or MSSQL. Are there any possible solutions to solve that? Without using WITH clause? Thanks. (The reason why I say "sub-query" instead of "query" is because I've tried terms like "where A.income>1000" and they all work)
The problem is that your alias a hides the table with the same name. Use a different alias name.
It is unclear whether you want to select from the original table a in the subquery or from the alias. If it is the former, then the above will solve your problem.
If you want to reference the alias in the subquery, you had better use a common table expression:
WITH alias_name AS (/* your FROM subquery */)
SELECT ... /* alias_name can be used in a subquery here */
You can try the below -
select * from a join b on a.id=b.id where b.class='xxx'
and income = all (select max(income) from a join b on a.id=b.id where b.class='xxx')
Can SQLite distinguish between a column from some aliased table, e.g. table1.column and a column that is aliased with the same name, i.e. column, in the SELECT statement?
This is relevant because I need to refer to the column that I construct in the SELECT statement later on in a HAVING clause, but must not confuse it with the column in aliased table. To my knowledge, I cannot alias the table to be constructed in my SELECT statement (without reverting to some nasty work-around like SELECT * FROM (SELECT ...) AS alias) to ensure both are distinguishable.
Here's a stripped down version of the code I am concerned with:
SELECT
a.entity,
b.DATE,
TOTAL(a.dollar_amount*b.ret_usd)/TOTAL(a.dollar_amount) AS ret_usd
FROM holdings a
LEFT JOIN returns b
ON a.stock = b.stock AND
a.DATE = b.DATE
GROUP BY
a.entity,
b.DATE
HAVING
ret_usd NOT NULL
Essentially, I want to get rid of groups for which I cannot find any returns and thus would show up with NULL values. I am not using an INNER JOIN because in my production code I merge multiple types of returns - for some of which I may have no data. I only want to drop those groups for which I have no returns for any of the return types.
To my understanding, the SQLite documentation does not address this issue.
LEFT JOIN all the return tables, then add a WHERE something like
COALESCE(b.ret_used, c.ret_used, d.ret_used....) is not NULL
You might need a similar strategy to determine which ret_used in the TOTAL. FYI, TOTAL never returns NULL.
I have two tables. Both have lot of columns. Now I have a common column called ID on which I would join.
Now since this variable ID is present in both the tables if I do simply this
select a.*,b.*
from table_a as a
left join table_b as b on a.id=b.id
This will give an error as id is duplicate (present in both the tables and getting included for both).
I don't want to write down separately each column of b in the select statement. I have lots of columns and that is a pain. Can I rename the ID column of b in the join statement itself similar to SAS data merge statements?
I am using Postgres.
Postgres would not give you an error for duplicate output column names, but some clients do. (Duplicate names are also not very useful.)
Either way, use the USING clause as join condition to fold the two join columns into one:
SELECT *
FROM tbl_a a
LEFT JOIN tbl_b b USING (id);
While you join the same table (self-join) there will be more duplicate column names. The query would make hardly any sense to begin with. This starts to make sense for different tables. Like you stated in your question to begin with: I have two tables ...
To avoid all duplicate column names, you have to list them in the SELECT clause explicitly - possibly dealing out column aliases to get both instances with different names.
Or you can use a NATURAL join - if that fits your unexplained use case:
SELECT *
FROM tbl_a a
NATURAL LEFT JOIN tbl_b b;
This joins on all columns that share the same name and folds those automatically - exactly the same as listing all common column names in a USING clause. You need to be aware of rules for possible NULL values ...
Details in the manual.
I've just learned ( yesterday ) to use "exists" instead of "in".
BAD
select * from table where nameid in (
select nameid from othertable where otherdesc = 'SomeDesc' )
GOOD
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
And I have some questions about this:
1) The explanation as I understood was: "The reason why this is better is because only the matching values will be returned instead of building a massive list of possible results". Does that mean that while the first subquery might return 900 results the second will return only 1 ( yes or no )?
2) In the past I have had the RDBMS complainin: "only the first 1000 rows might be retrieved", this second approach would solve that problem?
3) What is the scope of the alias in the second subquery?... does the alias only lives in the parenthesis?
for example
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
AND
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeOtherDesc' )
That is, if I use the same alias ( o for table othertable ) In the second "exist" will it present any problem with the first exists? or are they totally independent?
Is this something Oracle only related or it is valid for most RDBMS?
Thanks a lot
It's specific to each DBMS and depends on the query optimizer. Some optimizers detect IN clause and translate it.
In all DBMSes I tested, alias is only valid inside the ( )
BTW, you can rewrite the query as:
select t.*
from table t
join othertable o on t.nameid = o.nameid
and o.otherdesc in ('SomeDesc','SomeOtherDesc');
And, to answer your questions:
Yes
Yes
Yes
You are treading into complicated territory, known as 'correlated sub-queries'. Since we don't have detailed information about your tables and the key structures, some of the answers can only be 'maybe'.
In your initial IN query, the notation would be valid whether or not OtherTable contains a column NameID (and, indeed, whether OtherDesc exists as a column in Table or OtherTable - which is not clear in any of your examples, but presumably is a column of OtherTable). This behaviour is what makes a correlated sub-query into a correlated sub-query. It is also a routine source of angst for people when they first run into it - invariably by accident. Since the SQL standard mandates the behaviour of interpreting a name in the sub-query as referring to a column in the outer query if there is no column with the relevant name in the tables mentioned in the sub-query but there is a column with the relevant name in the tables mentioned in the outer (main) query, no product that wants to claim conformance to (this bit of) the SQL standard will do anything different.
The answer to your Q1 is "it depends", but given plausible assumptions (NameID exists as a column in both tables; OtherDesc only exists in OtherTable), the results should be the same in terms of the data set returned, but may not be equivalent in terms of performance.
The answer to your Q2 is that in the past, you were using an inferior if not defective DBMS. If it supported EXISTS, then the DBMS might still complain about the cardinality of the result.
The answer to your Q3 as applied to the first EXISTS query is "t is available as an alias throughout the statement, but o is only available as an alias inside the parentheses". As applied to your second example box - with AND connecting two sub-selects (the second of which is missing the open parenthesis when I'm looking at it), then "t is available as an alias throughout the statement and refers to the same table, but there are two different aliases both labelled 'o', one for each sub-query". Note that the query might return no data if OtherDesc is unique for a given NameID value in OtherTable; otherwise, it requires two rows in OtherTable with the same NameID and the two OtherDesc values for each row in Table with that NameID value.
Oracle-specific: When you write a query using the IN clause, you're telling the rule-based optimizer that you want the inner query to drive the outer query. When you write EXISTS in a where clause, you're telling the optimizer that you want the outer query to be run first, using each value to fetch a value from the inner query. See "Difference between IN and EXISTS in subqueries".
Probably.
Alias declared inside subquery lives inside subquery. By the way, I don't think your example with 2 ANDed subqueries is valid SQL. Did you mean UNION instead of AND?
Personally I would use a join, rather than a subquery for this.
SELECT t.*
FROM yourTable t
INNER JOIN otherTable ot
ON (t.nameid = ot.nameid AND ot.otherdesc = 'SomeDesc')
It is difficult to generalize that EXISTS is always better than IN. Logically if that is the case, then SQL community would have replaced IN with EXISTS...
Also, please note that IN and EXISTS are not same, the results may be different when you use the two...
With IN, usually its a Full Table Scan of the inner table once without removing NULLs (so if you have NULLs in your inner table, IN will not remove NULLS by default)... While EXISTS removes NULL and in case of correlated subquery, it runs inner query for every row from outer query.
Assuming there are no NULLS and its a simple query (with no correlation), EXIST might perform better if the row you are finding is not the last row. If it happens to be the last row, EXISTS may need to scan till the end like IN.. so similar performance...
But IN and EXISTS are not interchangeable...
My boss found a bug in a query I created, and I don't understand the reasoning behind the bug, although the query results prove he's correct. Here's the query (simplified version) before the fix:
select PTNO,PTNM,CATCD
from PARTS
left join CATEGORIES on (CATEGORIES.CATCD=PARTS.CATCD);
and here it is after the fix:
select PTNO,PTNM,PARTS.CATCD
from PARTS
left join CATEGORIES on (CATEGORIES.CATCD=PARTS.CATCD);
The bug was, that null values were being shown for column CATCD, i.e. the query results included results from table CATEGORIES instead of PARTS.
Here's what I don't understand: if there was ambiguity in the original query, why didn't Oracle throw an error? As far as I understood, in the case of left joins, the "main" table in the query (PARTS) has precedence in ambiguity.
Am I wrong, or just not thinking about this problem correctly?
Update:
Here's a revised example, where the ambiguity error is not thrown:
CREATE TABLE PARTS (PTNO NUMBER, CATCD NUMBER, SECCD NUMBER);
CREATE TABLE CATEGORIES(CATCD NUMBER);
CREATE TABLE SECTIONS(SECCD NUMBER, CATCD NUMBER);
select PTNO,CATCD
from PARTS
left join CATEGORIES on (CATEGORIES.CATCD=PARTS.CATCD)
left join SECTIONS on (SECTIONS.SECCD=PARTS.SECCD) ;
Anybody have a clue?
Here's the query (simplified version)
I think by simplifying the query you removed the real cause of the bug :-)
What oracle version are you using? Oracle 10g ( 10.2.0.1.0 ) gives:
create table parts (ptno number , ptnm number , catcd number);
create table CATEGORIES (catcd number);
select PTNO,PTNM,CATCD from PARTS
left join CATEGORIES on (CATEGORIES.CATCD=PARTS.CATCD);
I get ORA-00918: column ambiguously defined
Interesting in SQL server that throws an error (as it should)
select id
from sysobjects s
left join syscolumns c on s.id = c.id
Server: Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'id'.
select id
from sysobjects
left join syscolumns on sysobjects.id = syscolumns.id
Server: Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'id'.
From my experience if you create a query like this the data result will pull CATCD from the right side of the join not the left when there is a field overlap like this.
So since this join will have all records from PARTS with only some pull through from CATEGORIES you will have NULL in the CATCD field any time there is no data on the right side.
By explicitly defining the column as from PARTS (ie left side) you will get a non null value assuming that the field has data in PARTS.
Remember that with LEFT JOIN you are only guarantied data in fields from the left table, there may well be empty columns to the right.
This may be a bug in the Oracle optimizer. I can reproduce the same behavior on the query with 3 tables. Intuitively it does seem that it should produce an error. If I rewrite it in either of the following ways, it does generate an error:
(1) Using old-style outer join
select ptno, catcd
from parts, categories, sections
where categories.catcd (+) = parts.catcd
and sections.seccd (+) = parts.seccd
(2) Explicitly isolating the two joins
select ptno, catcd
from (
select ptno, seccd, catcd
from parts
left join categories on (categories.CATCD=parts.CATCD)
)
left join sections on (sections.SECCD=parts.SECCD)
I used DBMS_XPLAN to get details on the execution of the query, which did show something interesting. The plan is basically to outer join PARTS and CATEGORIES, project that result set, then outer join it to SECTIONS. The interesting part is that in the projection of the first outer join, it is only including PTNO and SECCD -- it is NOT including the CATCD from either of the first two tables. Therefore the final result is getting CATCD from the third table.
But I don't know whether this is a cause or an effect.
I'm afraid I can't tell you why you're not getting an exception, but I can postulate as to why it chose CATEGORIES' version of the column over PARTS' version.
As far as I understood, in the case of left joins, the "main" table in the query (PARTS) has precedence in ambiguity
It's not clear whether by "main" you mean simply the left table in a left join, or the "driving" table, as you see the query conceptually... But in either case, what you see as the "main" table in the query as you've written it will not necessarily be the "main" table in the actual execution of that query.
My guess is that Oracle is simply using the column from the first table it hits in executing the query. And since most individual operations in SQL do not require one table to be hit before the other, the DBMS will decide at parse time which is the most efficient one to scan first. Try getting an execution plan for the query. I suspect it may reveal that it's hitting CATEGORIES first and then PARTS.
I am using Oracle 9.2.0.8.0. and it does give the error "ORA-00918: column ambiguously defined".
This is a known bug with some Oracle versions when using ANSI-style joins. The correct behavior would be to get an ORA-00918 error.
It's always best to specify your table names anyway; that way your queries don't break when you happen to add a new column with a name that is also used in another table.
It is generally advised to be specific and fully qualify all column names anyway, as it saves the optimizer a little work. Certainly in SQL Server.
From what I can gleen from the Oracle docs, it seems it will only throw if you select the column name twice in the select list, or once in the select list and then again elsewhere like an order by clause.
Perhaps you have uncovered an 'undocumented feature' :)
Like HollyStyles, I cannot find anything in the Oracle docs which can explain what you are seeing.
PostgreSQL, DB2, MySQL and MSSQL all refuse to run the first query, as it's ambiguous.
#Pat: I get the same error here for your query. My query is just a little bit more complicated than what I originally posted. I'm working on a reproducible simple example now.
A bigger question you should be asking yourself is - why do I have a category code in the parts table that doesn't exist in the categories table?
This is a bug in Oracle 9i. If you join more than 2 tables using ANSI notation, it will not detect ambiguities in column names, and can return the wrong column if an alias isn't used.
As has been mentioned already, it is fixed in 10g, so if an alias isn't used, an error will be returned.