Unknown SQL syntax - sql

I'm following an SQL course and I'm having troubles understanding an example given my our professor, there's usually a lot of mistakes in our sheets that we have to correct but here I think it might just come from my obvious ignorance of the subject.
So the database contains three tables organized like this:
Student (StudentNumber, Name, Year)
Course (Code, Name, Hours)
Results (StudentNumbber, Code, Grade)
We're asked to give the student numbers that follow the "M11104" coded course with one query to the database servor.
Here is the solution given:
Select S. *
FROM Student S, Results R
WHERE Code = 'M1105'
AND S.StudentNumber = R.StudentNumber;
I just don't get how is this supposed to work, first of all the S and R are no real attributes to the database given, and the SELECT S.* doesn't seem to mean anything there.

In your example S and R are simply defined as aliases of Student and Results tables. Using S.* is exactly the same as saying Student.*, or ALL Columns of the Student Table.

S and R are aliases. S is an alias for the Student table or view, and R is an alias for the Results table or view.
Aliases are used to avoid typing the full name of a table every time it is referred to in a query. It becomes more clear when we use the optional AS keyword, like this:
SELECT
S.StudentNumber,
S.Name,
S.[Year]
FROM Students AS S
INNER JOIN Results AS R ON
R.StudentNumber=S.StudentNumber
WHERE
R.Code='M1105'
The comma syntax is discouraged, use JOIN instead. If you use an alias for any table, then you should also qualify all references to columns with that alias. Failing to do so can lead to nasty runtime errors when schemas change (in your example, if a Code column was added to the second table too, an ambiguous column exception would suddenly occur). The square brackets are used to quote object names that are reserved words (parts of the syntax or built-in functions, like YEAR).

This is called an "alias".
When you have a statement of the form:
SELECT … FROM [tableName] 'anyCharacter';
the character in the end of the statement became an alias to the table and you can use the alias instead of the table name for example:
SELECT st.Name FROM Student st;
-- ^^ ^^
-- 'st' is an alias for the 'Student' table

Related

Error when using join and join produces error

I am trying to make groups and make joins with the below tables but I get an
ORA-00918: column ambiguously defined
error.
Any ideas how to fix?
SELECT staffn, job, COUNT(*)"staffcount", AVG(sal)"AverageSal"
FROM staff, shop
WHERE staff.shopno= shop.shopno
GROUP BY shopno, job;
You should use proper alias and your group by clause must include all the unaggregated columns as follows:
SELECT s.staffn, sh.job,
COUNT(*)"staffcount",
AVG(s.sal)"AverageSal"
FROM staff s join shop sh
WHERE s.shopno= sh.shopn
Group by s.staffn, sh.job
Did you mean to include staffn In your select? I would guess that this was unique to a row in staff and would make selecting the average (or any other aggregation) sal a bit useless (and if you did want to do that, you’d need to include it in the group by). I think you really meant to select the same column in your group by.
Your error is telling you that Oracle doesn’t know where a column should be taken from, multiple row sources in your query could provide it. The complete error message will also make it clear which column this is referring to, but we can already see that at least shopno is shared, we can arbitrarily take it from staff.
SELECT staff.shopno, job, COUNT(*)"staffcount", AVG(sal)"AverageSal"
FROM staff, shop
WHERE staff.shopno= shop.shopno
GROUP BY staff.shopno, job;
In both tables you used, there is at least a field with the same name. You must specify which field used which table.
for more information
Never use commas in the FROM clause. Always use proper, explicit, standard, readable JOIN syntax.
In a query that references multiple tables, qualify all column references.
You haven't shown the layout of your tables, but you presumably want something like this:
SELECT st.staffn, st.job, COUNT(*) as staffcount,
AVG(st.sal) as AverageSal
FROM staff st JOIN
shop sh
ON st.shopno = sh.shopno
GROUP BY st.staffn, st.job;
This assumes that all the columns come from the staff table, which seems reasonable enough in the absence of other information.

Divide by 'Over' clause in MSSQL works but divide by 'Alias' does not

In MS SQL Server, I spent too much time trying to resolve this. I finally figured it out, except I don't know the reason. How come, dividing by the cast statement in line 4 works below
SELECT
cast(dbo.FACTINVOICEHEADER.TOTAL_NET_AMOUNT_AMOUNT AS decimal(18,8))
AS TOTAL_NET_AMOUNT_AMOUNT,
cast((SUM(dbo.FACTINVOICEHEADER.TOTAL_NET_AMOUNT_AMOUNT)
OVER (PARTITION BY dbo.DIMPROJECT.PROJECT_KEY)) AS decimal(18,8))
AS ActualAmountPaidOnProjectGroupedByInvoice,
((dbo.FACTINVOICEHEADER.TOTAL_NET_AMOUNT_AMOUNT)
/
(cast((SUM(dbo.FACTINVOICEHEADER.TOTAL_NET_AMOUNT_AMOUNT)
OVER (PARTITION BY dbo.DIMPROJECT.PROJECT_KEY)) AS decimal(18,8))))
AS 'Allocation_Amount',
But when I try and divide by the alias that I created, ''ActualAmountPaidOnMatterGroupedByInvoice' in Line 3 I get an error message:
Msg 207, Level 16, State 1, Line 131 Invalid column name 'ActualAmountPaidOnMatterGroupedByInvoice'
Sample incorrect code:
SELECT
cast(dbo.FACTINVOICEHEADER.TOTAL_NET_AMOUNT_AMOUNT AS decimal(18,8))
AS TOTAL_NET_AMOUNT_AMOUNT,
cast((SUM(dbo.FACTINVOICEHEADER.TOTAL_NET_AMOUNT_AMOUNT)
OVER (PARTITION BY dbo.DIMPROJECT.PROJECT_KEY))
AS decimal(18,8))
AS ActualAmountPaidOnProjectGroupedByInvoice,
((dbo.FACTINVOICEHEADER.TOTAL_NET_AMOUNT_AMOUNT)
/
(ActualAmountPaidOnProjectGroupedByInvoice) AS decimal(18,8))))
AS 'Allocation_Amount'
How come? Thanks all!
The reason that you cannot use the alias in the query, is because the alias has not been recognized by the query engine yet. The engine evaluates queries in stages in the following order:
FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> LIMIT
One of the last steps in the SELECT stage is to apply the aliases specified in the query to the resulting dataset. Since these are not applied until the end of the SELECT stage, they are not available in the evaluation of the data to be returned nor in the WHERE, GROUP BY, or HAVING stages.
Additionally, some query engines do allow aliases (or ordinal position) to be used in the ORDER BY stage. As pointed out by Julian in the comments, MSSQL does allow for ordinal position ordering syntax.
I think you might be misunderstanding where aliased columns are available/able to be referenced by the aliased name, particularly because you said (paraphrase) "an alias I created on line 3 of the sql wasn't available on line 4":
Wrong:
SELECT
1200 as games_won,
25 as years_played,
--can't use these aliases below in the same select block that they were declared in
games_won / years_played as games_won_per_year
...
Right:
SELECT
1200 as games_won,
25 as years_played,
--can use the values though
1200 / 25 as games_won_per_year
Right:
SELECT
games_won / years_played as games_won_per_year --alias from inner scope is available in this outer scope
FROM
(
SELECT
--these aliases only become available outside the brackets
1200 as games_won,
25 as years_played
) x
You can't alias a column and use the alias again in the same select block; you can only alias in an inner/subquery and use the alias in an outer query. SQL is not like a programming language that operates line by line:
int gameswon = 1200;
int yearsplayed = 25;
int winsperyear = gameswon / yearsplayed;
Here in this C# you can see we declare variables (aliases) on earlier lines and use them on later lines but that's because the programming language operates line by line. The results of an earlier line execution are available to later lines. SQL doesn't work like that; SQL works on entire sections of the query at a time. Your columns don't acquire those aliases you gave them until the entire select block is finished being processed so you cannot give a column or calculation an alias and then use that alias again in the same select block. The only way to get round this and create an alias that you will later use repeatedly is to create the alias in a subquery.
Here's another example:
SELECT
fih.tot_amt / fih.amt_per_proj AS allocation_amount
FROM
(
SELECT
CAST(f.total_net_amount_amount AS DECIMAL(18,8)) as tot_mt,
CAST(SUM(f.total_net_amoun_amount) OVER (PARTITION BY p.project_key)) AS DECIMAL(18,8)) AS amt_per_proj
FROM
dbo.factinvoiceheader f
INNER JOIN
dbo.dimproject p
ON ...
) fih
Here you can see I pulled the columns I wanted and aliased them in an inner query and then used the aliases in the outer query - it works because the aliases decalred inside the inner block are made available to the outer block
Always remember that SQL is not line by line line a typical programming language, but block by block. Indeed in most programming languages, things declared in inner code blocks are not available in outer code blocks (unless they're some globalised thing like javascript var) so SQL is a departure from what you're used to. Every time you create a block of instructions in SQL you have an opportunity to re-alias the columns of data.
Because SQL is block by block based, I indent my SQLs in blocks to make it easy to see what gets processed together. Keywords like SELECT, FROM, WHERE, GROUP BY and ORDER BY denote blocks and aliases can be created for columns in a SELECT, and for tables in a FROM. In taking your example above I've applied aliases not just to the calculations and columns but to the tables as well. It makes the query massively easier to read when it's indented and aliased throughout- give your table names an alias rather than writing dbo.factinvoiceheader. before every column name
Here's a set of tips for making your SQLs neater and easier to read and debug:
don't put them all on one line or at the same indent level - indent according to how deep or shallow the block of instructions is
select, from, where, group by, order by etc denote the start of a block of operations - indent them all to the same level and indent their sub-instructions another level (if your select is indent level 2, the columns being selected should be indent level 3)
when you have an inner query indent that too unless it's really simple and reads nicely as a one liner
use lowercase for column and table names, upper case for reserved words, functions, datatypes (some people prefer camel case for functions)
decide whether to use canelCase or underscore_style to split your words and keep to it
always alias tables, and always select columns as tablealias.columnname - this prevents your query breaking in future if a table has a column added that is the same name as an original column you selected without qualifying what table the original column came from
aliasing tables allows another vital operation; repeatedly joining the same table into a query. If your Person table has a WorkAddress and a HomeAddress the only way you can join the address table in twice to get both addresses for a person, is to alias the table (person join address h on p.homeaddressid = h.id join address w on p.workaddressid = w.id)

Create new table from average of multiple columns in multiple tables

I have the following query:
CREATE TABLE Professor_Average
SELECT Instructor, SUM( + instreffective_avg + howmuchlearned_avg + instrrespect_avg)/5
FROM instreffective_average, howmuchlearned_average, instrrespect_average
GROUP BY Instructor;
It is telling me that Instructor is ambiguous. How do I fix this?
Qualify instructor with the name of the table it came from.
For example: instreffective_average.Instructor
If you don't do this, SQL will guess which table of the query it came from, but if there are 2 or more possibilities it doesn't try to guess and tells you it needs help deciding.
Your query most likely fails in more than one way.
In addition to what #Patashu told you about table-qualifying column names, you need to JOIN your tables properly. Since Instructor is ambiguous in your query I am guessing (for lack of information) it could look like this:
SELECT ie.Instructor
,SUM(ie.instreffective_avg + h.howmuchlearned_avg + ir.instrrespect_avg)/5
FROM instreffective_average ie
JOIN howmuchlearned_average h USING (Instructor)
JOIN instrrespect_average ir USING (Instructor)
GROUP BY Instructor
I added table aliases to make it easier to read.
This assumes that the three tables each have a column Instructor by which they can be joined. Without JOIN conditions you get a CROSS JOIN, meaning that every row of every table will be combined with every row of every other table. Very expensive nonsense in most cases.
USING (Instructor) is short syntax for ON ie.Instructor = h.Instructor. It also collapses the joined (necessarily identical) columns into one. Therefore, you would get away without table-qualifying Instructor in the SELECT list in my example. Not every RDBMS supports this standard-SQL feature, but you failed to provide more information.

Why is selecting specified columns, and all, wrong in Oracle SQL?

Say I have a select statement that goes..
select * from animals
That gives a a query result of all the columns in the table.
Now, if the 42nd column of the table animals is is_parent, and I want to return that in my results, just after gender, so I can see it more easily. But I also want all the other columns.
select is_parent, * from animals
This returns ORA-00936: missing expression.
The same statement will work fine in Sybase, and I know that you need to add a table alias to the animals table to get it to work ( select is_parent, a.* from animals ani), but why must Oracle need a table alias to be able to work out the select?
Actually, it's easy to solve the original problem. You just have to qualify the *.
select is_parent, animals.* from animals;
should work just fine. Aliases for the table names also work.
There is no merit in doing this in production code. We should explicitly name the columns we want rather than using the SELECT * construct.
As for ad hoc querying, get yourself an IDE - SQL Developer, TOAD, PL/SQL Developer, etc - which allows us to manipulate queries and result sets without needing extensions to SQL.
Good question, I've often wondered this myself but have then accepted it as one of those things...
Similar problem is this:
sql>select geometrie.SDO_GTYPE from ngg_basiscomponent
ORA-00904: "GEOMETRIE"."SDO_GTYPE": invalid identifier
where geometrie is a column of type mdsys.sdo_geometry.
Add an alias and the thing works.
sql>select a.geometrie.SDO_GTYPE from ngg_basiscomponent a;
Lots of good answers so far on why select * shouldn't be used and they're all perfectly correct. However, don't think any of them answer the original question on why the particular syntax fails.
Sadly, I think the reason is... "because it doesn't".
I don't think it's anything to do with single-table vs. multi-table queries:
This works fine:
select *
from
person p inner join user u on u.person_id = p.person_id
But this fails:
select p.person_id, *
from
person p inner join user u on u.person_id = p.person_id
While this works:
select p.person_id, p.*, u.*
from
person p inner join user u on u.person_id = p.person_id
It might be some historical compatibility thing with 20-year old legacy code.
Another for the "buy why!!!" bucket, along with why can't you group by an alias?
The use case for the alias.* format is as follows
select parent.*, child.col
from parent join child on parent.parent_id = child.parent_id
That is, selecting all the columns from one table in a join, plus (optionally) one or more columns from other tables.
The fact that you can use it to select the same column twice is just a side-effect. There is no real point to selecting the same column twice and I don't think laziness is a real justification.
Select * in the real world is only dangerous when referring to columns by index number after retrieval rather than by name, the bigger problem is inefficiency when not all columns are required in the resultset (network traffic, cpu and memory load).
Of course if you're adding columns from other tables (as is the case in this example it can be dangerous as these tables may over time have columns with matching names, select *, x in that case would fail if a column x is added to the table that previously didn't have it.
why must Oracle need a table alias to be able to work out the select
Teradata is requiring the same. As both are quite old (maybe better call it mature :-) DBMSes this might be historical reasons.
My usual explanation is: an unqualified * means everything/all columns and the parser/optimizer is simply confused because you request more than everything.

Ambiguity in Left joins (oracle only?)

My boss found a bug in a query I created, and I don't understand the reasoning behind the bug, although the query results prove he's correct. Here's the query (simplified version) before the fix:
select PTNO,PTNM,CATCD
from PARTS
left join CATEGORIES on (CATEGORIES.CATCD=PARTS.CATCD);
and here it is after the fix:
select PTNO,PTNM,PARTS.CATCD
from PARTS
left join CATEGORIES on (CATEGORIES.CATCD=PARTS.CATCD);
The bug was, that null values were being shown for column CATCD, i.e. the query results included results from table CATEGORIES instead of PARTS.
Here's what I don't understand: if there was ambiguity in the original query, why didn't Oracle throw an error? As far as I understood, in the case of left joins, the "main" table in the query (PARTS) has precedence in ambiguity.
Am I wrong, or just not thinking about this problem correctly?
Update:
Here's a revised example, where the ambiguity error is not thrown:
CREATE TABLE PARTS (PTNO NUMBER, CATCD NUMBER, SECCD NUMBER);
CREATE TABLE CATEGORIES(CATCD NUMBER);
CREATE TABLE SECTIONS(SECCD NUMBER, CATCD NUMBER);
select PTNO,CATCD
from PARTS
left join CATEGORIES on (CATEGORIES.CATCD=PARTS.CATCD)
left join SECTIONS on (SECTIONS.SECCD=PARTS.SECCD) ;
Anybody have a clue?
Here's the query (simplified version)
I think by simplifying the query you removed the real cause of the bug :-)
What oracle version are you using? Oracle 10g ( 10.2.0.1.0 ) gives:
create table parts (ptno number , ptnm number , catcd number);
create table CATEGORIES (catcd number);
select PTNO,PTNM,CATCD from PARTS
left join CATEGORIES on (CATEGORIES.CATCD=PARTS.CATCD);
I get ORA-00918: column ambiguously defined
Interesting in SQL server that throws an error (as it should)
select id
from sysobjects s
left join syscolumns c on s.id = c.id
Server: Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'id'.
select id
from sysobjects
left join syscolumns on sysobjects.id = syscolumns.id
Server: Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'id'.
From my experience if you create a query like this the data result will pull CATCD from the right side of the join not the left when there is a field overlap like this.
So since this join will have all records from PARTS with only some pull through from CATEGORIES you will have NULL in the CATCD field any time there is no data on the right side.
By explicitly defining the column as from PARTS (ie left side) you will get a non null value assuming that the field has data in PARTS.
Remember that with LEFT JOIN you are only guarantied data in fields from the left table, there may well be empty columns to the right.
This may be a bug in the Oracle optimizer. I can reproduce the same behavior on the query with 3 tables. Intuitively it does seem that it should produce an error. If I rewrite it in either of the following ways, it does generate an error:
(1) Using old-style outer join
select ptno, catcd
from parts, categories, sections
where categories.catcd (+) = parts.catcd
and sections.seccd (+) = parts.seccd
(2) Explicitly isolating the two joins
select ptno, catcd
from (
select ptno, seccd, catcd
from parts
left join categories on (categories.CATCD=parts.CATCD)
)
left join sections on (sections.SECCD=parts.SECCD)
I used DBMS_XPLAN to get details on the execution of the query, which did show something interesting. The plan is basically to outer join PARTS and CATEGORIES, project that result set, then outer join it to SECTIONS. The interesting part is that in the projection of the first outer join, it is only including PTNO and SECCD -- it is NOT including the CATCD from either of the first two tables. Therefore the final result is getting CATCD from the third table.
But I don't know whether this is a cause or an effect.
I'm afraid I can't tell you why you're not getting an exception, but I can postulate as to why it chose CATEGORIES' version of the column over PARTS' version.
As far as I understood, in the case of left joins, the "main" table in the query (PARTS) has precedence in ambiguity
It's not clear whether by "main" you mean simply the left table in a left join, or the "driving" table, as you see the query conceptually... But in either case, what you see as the "main" table in the query as you've written it will not necessarily be the "main" table in the actual execution of that query.
My guess is that Oracle is simply using the column from the first table it hits in executing the query. And since most individual operations in SQL do not require one table to be hit before the other, the DBMS will decide at parse time which is the most efficient one to scan first. Try getting an execution plan for the query. I suspect it may reveal that it's hitting CATEGORIES first and then PARTS.
I am using Oracle 9.2.0.8.0. and it does give the error "ORA-00918: column ambiguously defined".
This is a known bug with some Oracle versions when using ANSI-style joins. The correct behavior would be to get an ORA-00918 error.
It's always best to specify your table names anyway; that way your queries don't break when you happen to add a new column with a name that is also used in another table.
It is generally advised to be specific and fully qualify all column names anyway, as it saves the optimizer a little work. Certainly in SQL Server.
From what I can gleen from the Oracle docs, it seems it will only throw if you select the column name twice in the select list, or once in the select list and then again elsewhere like an order by clause.
Perhaps you have uncovered an 'undocumented feature' :)
Like HollyStyles, I cannot find anything in the Oracle docs which can explain what you are seeing.
PostgreSQL, DB2, MySQL and MSSQL all refuse to run the first query, as it's ambiguous.
#Pat: I get the same error here for your query. My query is just a little bit more complicated than what I originally posted. I'm working on a reproducible simple example now.
A bigger question you should be asking yourself is - why do I have a category code in the parts table that doesn't exist in the categories table?
This is a bug in Oracle 9i. If you join more than 2 tables using ANSI notation, it will not detect ambiguities in column names, and can return the wrong column if an alias isn't used.
As has been mentioned already, it is fixed in 10g, so if an alias isn't used, an error will be returned.