Why is selecting specified columns, and all, wrong in Oracle SQL? - sql

Say I have a select statement that goes..
select * from animals
That gives a a query result of all the columns in the table.
Now, if the 42nd column of the table animals is is_parent, and I want to return that in my results, just after gender, so I can see it more easily. But I also want all the other columns.
select is_parent, * from animals
This returns ORA-00936: missing expression.
The same statement will work fine in Sybase, and I know that you need to add a table alias to the animals table to get it to work ( select is_parent, a.* from animals ani), but why must Oracle need a table alias to be able to work out the select?

Actually, it's easy to solve the original problem. You just have to qualify the *.
select is_parent, animals.* from animals;
should work just fine. Aliases for the table names also work.

There is no merit in doing this in production code. We should explicitly name the columns we want rather than using the SELECT * construct.
As for ad hoc querying, get yourself an IDE - SQL Developer, TOAD, PL/SQL Developer, etc - which allows us to manipulate queries and result sets without needing extensions to SQL.

Good question, I've often wondered this myself but have then accepted it as one of those things...
Similar problem is this:
sql>select geometrie.SDO_GTYPE from ngg_basiscomponent
ORA-00904: "GEOMETRIE"."SDO_GTYPE": invalid identifier
where geometrie is a column of type mdsys.sdo_geometry.
Add an alias and the thing works.
sql>select a.geometrie.SDO_GTYPE from ngg_basiscomponent a;

Lots of good answers so far on why select * shouldn't be used and they're all perfectly correct. However, don't think any of them answer the original question on why the particular syntax fails.
Sadly, I think the reason is... "because it doesn't".
I don't think it's anything to do with single-table vs. multi-table queries:
This works fine:
select *
from
person p inner join user u on u.person_id = p.person_id
But this fails:
select p.person_id, *
from
person p inner join user u on u.person_id = p.person_id
While this works:
select p.person_id, p.*, u.*
from
person p inner join user u on u.person_id = p.person_id
It might be some historical compatibility thing with 20-year old legacy code.
Another for the "buy why!!!" bucket, along with why can't you group by an alias?

The use case for the alias.* format is as follows
select parent.*, child.col
from parent join child on parent.parent_id = child.parent_id
That is, selecting all the columns from one table in a join, plus (optionally) one or more columns from other tables.
The fact that you can use it to select the same column twice is just a side-effect. There is no real point to selecting the same column twice and I don't think laziness is a real justification.

Select * in the real world is only dangerous when referring to columns by index number after retrieval rather than by name, the bigger problem is inefficiency when not all columns are required in the resultset (network traffic, cpu and memory load).
Of course if you're adding columns from other tables (as is the case in this example it can be dangerous as these tables may over time have columns with matching names, select *, x in that case would fail if a column x is added to the table that previously didn't have it.

why must Oracle need a table alias to be able to work out the select
Teradata is requiring the same. As both are quite old (maybe better call it mature :-) DBMSes this might be historical reasons.
My usual explanation is: an unqualified * means everything/all columns and the parser/optimizer is simply confused because you request more than everything.

Related

How can I set a limitation for one column on all SQL selects?

We have a database that holds data for numerous customers. We want to give customers access to the database, but only to the data that belongs to them. Parsing the select to then insert in the where clause "and Company.Name = 'Acme'" strikes me as weak because SQL selects can be very complex and handling 100% of all cases may be difficult.
Is there some way to do the equivalent of (I know this is not valid SQL):
select * from * where Company.Name = 'Acme' and (passed_in_select)
You can nest a full select in as an inner part of a large select. Is there some way to do the above? This way it's a very simple restriction on the select and that is likely to work 100% of the time.
Here is a system solution called "virtual private database" for Oracle database:
https://docs.oracle.com/cd/B28359_01/network.111/b28531/vpd.htm
For other databases look whether there is similar built-in solution.
But there is very simple solution using the WITH clause:
WITH
tab_a__ AS (SELECT * FROM tab_a WHERE comp="xy"),
tab_b__ AS (SELECT * FROM tab_b WHERE comp="xy")
SELECT ... //original select
You just have to find all used tables in the select, add __ behind and add the CTEs to the WITH clause.
Notes: Some databases do not support WITH clause though it is an SQL standard. Some databases can have alias length limitation you could exceed by adding the suffix.
select * from
(
select * from table_a
) outer_table_a
where outer_table_a.col_a = 'test'
I do this sort of thing often especially when I want to perform some aggregation on the data in the inner query (sum, max, etc.) I do this with SQL Server, I do not know if it is valid with other DBMS but I would be surprised if it were not.
I don't know if I would rely on this approach to effectively grant permissions. Perhaps views would allow you lock things down a bit tighter. It sounds like you're planning to tack something on dynamically to a query that you may not have written? In that case whomever writes that query could transform your column of interest which would result in visibility over things you didn't intend, like:
select * from
(
select 'test' as col_a, launch_codes from table_a
) outer_table_a
where outer_table_a.col_a = 'test'

How exactly is the value of count(*) determined in BigQuery?

I am joining a table of about 70000 rows with a slightly bigger second table through inner join each. Now count(a.business_column) and count(*) give different results. The former correctly reports back ~70000, while the latter gives ~200000. But this only happens when I select count(*) alone, when I select them together they give the same result (~70000). How is this possible?
select
count(*)
/*,count(a.business_column)*/
from table_a a
inner join each table_b b
on b.key_column = a.business_column
UPDATE: For a step by step explanation on how this works, see BigQuery flattens when using field with same name as repeated field instead.
To answer the title question: COUNT(*) in BigQuery is always accurate.
The caveat is that in SQL COUNT(*) and COUNT(column) have semantically different meanings - and the sample query can be interpreted in different ways.
See: http://www.xaprb.com/blog/2009/04/08/the-dangerous-subtleties-of-left-join-and-count-in-sql/
There they have this sample query:
select user.userid, count(email.subject)
from user
inner join email on user.userid = email.userid
group by user.userid;
That query turns out to be ambigous, and the article author changes it for a more explicit one, adding this comment:
But what if that’s not what the author of the query meant? There’s no
way to really know. There are several possible intended meanings for
the query, and there are several different ways to write the query to
express those meanings more clearly. But the original query is
ambiguous, for a few reasons. And everyone who reads this query
afterwards will end up guessing what the original author meant. “I
think I can safely change this to…”
UPDATE: For a step by step explanation on how this works, see BigQuery flattens when using field with same name as repeated field instead.
COUNT(*) counts most repeated field in your query, if you want to count full records - use COUNT(0).

Create new table from average of multiple columns in multiple tables

I have the following query:
CREATE TABLE Professor_Average
SELECT Instructor, SUM( + instreffective_avg + howmuchlearned_avg + instrrespect_avg)/5
FROM instreffective_average, howmuchlearned_average, instrrespect_average
GROUP BY Instructor;
It is telling me that Instructor is ambiguous. How do I fix this?
Qualify instructor with the name of the table it came from.
For example: instreffective_average.Instructor
If you don't do this, SQL will guess which table of the query it came from, but if there are 2 or more possibilities it doesn't try to guess and tells you it needs help deciding.
Your query most likely fails in more than one way.
In addition to what #Patashu told you about table-qualifying column names, you need to JOIN your tables properly. Since Instructor is ambiguous in your query I am guessing (for lack of information) it could look like this:
SELECT ie.Instructor
,SUM(ie.instreffective_avg + h.howmuchlearned_avg + ir.instrrespect_avg)/5
FROM instreffective_average ie
JOIN howmuchlearned_average h USING (Instructor)
JOIN instrrespect_average ir USING (Instructor)
GROUP BY Instructor
I added table aliases to make it easier to read.
This assumes that the three tables each have a column Instructor by which they can be joined. Without JOIN conditions you get a CROSS JOIN, meaning that every row of every table will be combined with every row of every other table. Very expensive nonsense in most cases.
USING (Instructor) is short syntax for ON ie.Instructor = h.Instructor. It also collapses the joined (necessarily identical) columns into one. Therefore, you would get away without table-qualifying Instructor in the SELECT list in my example. Not every RDBMS supports this standard-SQL feature, but you failed to provide more information.

SQL - table alias scope

I've just learned ( yesterday ) to use "exists" instead of "in".
BAD
select * from table where nameid in (
select nameid from othertable where otherdesc = 'SomeDesc' )
GOOD
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
And I have some questions about this:
1) The explanation as I understood was: "The reason why this is better is because only the matching values will be returned instead of building a massive list of possible results". Does that mean that while the first subquery might return 900 results the second will return only 1 ( yes or no )?
2) In the past I have had the RDBMS complainin: "only the first 1000 rows might be retrieved", this second approach would solve that problem?
3) What is the scope of the alias in the second subquery?... does the alias only lives in the parenthesis?
for example
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
AND
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeOtherDesc' )
That is, if I use the same alias ( o for table othertable ) In the second "exist" will it present any problem with the first exists? or are they totally independent?
Is this something Oracle only related or it is valid for most RDBMS?
Thanks a lot
It's specific to each DBMS and depends on the query optimizer. Some optimizers detect IN clause and translate it.
In all DBMSes I tested, alias is only valid inside the ( )
BTW, you can rewrite the query as:
select t.*
from table t
join othertable o on t.nameid = o.nameid
and o.otherdesc in ('SomeDesc','SomeOtherDesc');
And, to answer your questions:
Yes
Yes
Yes
You are treading into complicated territory, known as 'correlated sub-queries'. Since we don't have detailed information about your tables and the key structures, some of the answers can only be 'maybe'.
In your initial IN query, the notation would be valid whether or not OtherTable contains a column NameID (and, indeed, whether OtherDesc exists as a column in Table or OtherTable - which is not clear in any of your examples, but presumably is a column of OtherTable). This behaviour is what makes a correlated sub-query into a correlated sub-query. It is also a routine source of angst for people when they first run into it - invariably by accident. Since the SQL standard mandates the behaviour of interpreting a name in the sub-query as referring to a column in the outer query if there is no column with the relevant name in the tables mentioned in the sub-query but there is a column with the relevant name in the tables mentioned in the outer (main) query, no product that wants to claim conformance to (this bit of) the SQL standard will do anything different.
The answer to your Q1 is "it depends", but given plausible assumptions (NameID exists as a column in both tables; OtherDesc only exists in OtherTable), the results should be the same in terms of the data set returned, but may not be equivalent in terms of performance.
The answer to your Q2 is that in the past, you were using an inferior if not defective DBMS. If it supported EXISTS, then the DBMS might still complain about the cardinality of the result.
The answer to your Q3 as applied to the first EXISTS query is "t is available as an alias throughout the statement, but o is only available as an alias inside the parentheses". As applied to your second example box - with AND connecting two sub-selects (the second of which is missing the open parenthesis when I'm looking at it), then "t is available as an alias throughout the statement and refers to the same table, but there are two different aliases both labelled 'o', one for each sub-query". Note that the query might return no data if OtherDesc is unique for a given NameID value in OtherTable; otherwise, it requires two rows in OtherTable with the same NameID and the two OtherDesc values for each row in Table with that NameID value.
Oracle-specific: When you write a query using the IN clause, you're telling the rule-based optimizer that you want the inner query to drive the outer query. When you write EXISTS in a where clause, you're telling the optimizer that you want the outer query to be run first, using each value to fetch a value from the inner query. See "Difference between IN and EXISTS in subqueries".
Probably.
Alias declared inside subquery lives inside subquery. By the way, I don't think your example with 2 ANDed subqueries is valid SQL. Did you mean UNION instead of AND?
Personally I would use a join, rather than a subquery for this.
SELECT t.*
FROM yourTable t
INNER JOIN otherTable ot
ON (t.nameid = ot.nameid AND ot.otherdesc = 'SomeDesc')
It is difficult to generalize that EXISTS is always better than IN. Logically if that is the case, then SQL community would have replaced IN with EXISTS...
Also, please note that IN and EXISTS are not same, the results may be different when you use the two...
With IN, usually its a Full Table Scan of the inner table once without removing NULLs (so if you have NULLs in your inner table, IN will not remove NULLS by default)... While EXISTS removes NULL and in case of correlated subquery, it runs inner query for every row from outer query.
Assuming there are no NULLS and its a simple query (with no correlation), EXIST might perform better if the row you are finding is not the last row. If it happens to be the last row, EXISTS may need to scan till the end like IN.. so similar performance...
But IN and EXISTS are not interchangeable...

Ambiguity in Left joins (oracle only?)

My boss found a bug in a query I created, and I don't understand the reasoning behind the bug, although the query results prove he's correct. Here's the query (simplified version) before the fix:
select PTNO,PTNM,CATCD
from PARTS
left join CATEGORIES on (CATEGORIES.CATCD=PARTS.CATCD);
and here it is after the fix:
select PTNO,PTNM,PARTS.CATCD
from PARTS
left join CATEGORIES on (CATEGORIES.CATCD=PARTS.CATCD);
The bug was, that null values were being shown for column CATCD, i.e. the query results included results from table CATEGORIES instead of PARTS.
Here's what I don't understand: if there was ambiguity in the original query, why didn't Oracle throw an error? As far as I understood, in the case of left joins, the "main" table in the query (PARTS) has precedence in ambiguity.
Am I wrong, or just not thinking about this problem correctly?
Update:
Here's a revised example, where the ambiguity error is not thrown:
CREATE TABLE PARTS (PTNO NUMBER, CATCD NUMBER, SECCD NUMBER);
CREATE TABLE CATEGORIES(CATCD NUMBER);
CREATE TABLE SECTIONS(SECCD NUMBER, CATCD NUMBER);
select PTNO,CATCD
from PARTS
left join CATEGORIES on (CATEGORIES.CATCD=PARTS.CATCD)
left join SECTIONS on (SECTIONS.SECCD=PARTS.SECCD) ;
Anybody have a clue?
Here's the query (simplified version)
I think by simplifying the query you removed the real cause of the bug :-)
What oracle version are you using? Oracle 10g ( 10.2.0.1.0 ) gives:
create table parts (ptno number , ptnm number , catcd number);
create table CATEGORIES (catcd number);
select PTNO,PTNM,CATCD from PARTS
left join CATEGORIES on (CATEGORIES.CATCD=PARTS.CATCD);
I get ORA-00918: column ambiguously defined
Interesting in SQL server that throws an error (as it should)
select id
from sysobjects s
left join syscolumns c on s.id = c.id
Server: Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'id'.
select id
from sysobjects
left join syscolumns on sysobjects.id = syscolumns.id
Server: Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'id'.
From my experience if you create a query like this the data result will pull CATCD from the right side of the join not the left when there is a field overlap like this.
So since this join will have all records from PARTS with only some pull through from CATEGORIES you will have NULL in the CATCD field any time there is no data on the right side.
By explicitly defining the column as from PARTS (ie left side) you will get a non null value assuming that the field has data in PARTS.
Remember that with LEFT JOIN you are only guarantied data in fields from the left table, there may well be empty columns to the right.
This may be a bug in the Oracle optimizer. I can reproduce the same behavior on the query with 3 tables. Intuitively it does seem that it should produce an error. If I rewrite it in either of the following ways, it does generate an error:
(1) Using old-style outer join
select ptno, catcd
from parts, categories, sections
where categories.catcd (+) = parts.catcd
and sections.seccd (+) = parts.seccd
(2) Explicitly isolating the two joins
select ptno, catcd
from (
select ptno, seccd, catcd
from parts
left join categories on (categories.CATCD=parts.CATCD)
)
left join sections on (sections.SECCD=parts.SECCD)
I used DBMS_XPLAN to get details on the execution of the query, which did show something interesting. The plan is basically to outer join PARTS and CATEGORIES, project that result set, then outer join it to SECTIONS. The interesting part is that in the projection of the first outer join, it is only including PTNO and SECCD -- it is NOT including the CATCD from either of the first two tables. Therefore the final result is getting CATCD from the third table.
But I don't know whether this is a cause or an effect.
I'm afraid I can't tell you why you're not getting an exception, but I can postulate as to why it chose CATEGORIES' version of the column over PARTS' version.
As far as I understood, in the case of left joins, the "main" table in the query (PARTS) has precedence in ambiguity
It's not clear whether by "main" you mean simply the left table in a left join, or the "driving" table, as you see the query conceptually... But in either case, what you see as the "main" table in the query as you've written it will not necessarily be the "main" table in the actual execution of that query.
My guess is that Oracle is simply using the column from the first table it hits in executing the query. And since most individual operations in SQL do not require one table to be hit before the other, the DBMS will decide at parse time which is the most efficient one to scan first. Try getting an execution plan for the query. I suspect it may reveal that it's hitting CATEGORIES first and then PARTS.
I am using Oracle 9.2.0.8.0. and it does give the error "ORA-00918: column ambiguously defined".
This is a known bug with some Oracle versions when using ANSI-style joins. The correct behavior would be to get an ORA-00918 error.
It's always best to specify your table names anyway; that way your queries don't break when you happen to add a new column with a name that is also used in another table.
It is generally advised to be specific and fully qualify all column names anyway, as it saves the optimizer a little work. Certainly in SQL Server.
From what I can gleen from the Oracle docs, it seems it will only throw if you select the column name twice in the select list, or once in the select list and then again elsewhere like an order by clause.
Perhaps you have uncovered an 'undocumented feature' :)
Like HollyStyles, I cannot find anything in the Oracle docs which can explain what you are seeing.
PostgreSQL, DB2, MySQL and MSSQL all refuse to run the first query, as it's ambiguous.
#Pat: I get the same error here for your query. My query is just a little bit more complicated than what I originally posted. I'm working on a reproducible simple example now.
A bigger question you should be asking yourself is - why do I have a category code in the parts table that doesn't exist in the categories table?
This is a bug in Oracle 9i. If you join more than 2 tables using ANSI notation, it will not detect ambiguities in column names, and can return the wrong column if an alias isn't used.
As has been mentioned already, it is fixed in 10g, so if an alias isn't used, an error will be returned.