SQL join syntax, when do you use "AS"? - sql

I am working on an SQL statement and just wanted to clarify my understanding of join syntax in MS SQL.
Say I have the SQL...
select t.year from HOUSE AS h
LEFT OUTER JOIN SUBJECT K
ON H.Name = K.Name
INNER JOIN TESTCASE AS T
ON k.year IS NOT NULL
Sorry for the confusing example but basically - why am I able to use LEFT OUTER JOIN SUBJECT K without the keyword AS whereas with an INNER JOIN, I need to use a keyword of AS for INNER JOIN TESTCASE AS T?

'AS' is not required in either of these cases, but I prefer it personally from the point of view of readability, as it conveys exactly what you were meaning to do.

As far as I know the AS is only for easy coding. You can create a smaller or more clear name for the table. So House as H and further in the query you can use H.Name instead of typing House.Name

I believe 'AS' used to be a required keyword, but is no longer needed, only for readability. I find 'As' is often used when creating a name for a data item in the SELECT, but not when giving a name to a table in a JOIN. For example:
SELECT t.ID as TestID FROM [House] h JOIN [TestCase] t ON t.ID=h.TestID
I think this is probably because a set of data items displayed with no 'As' could become confusing as to how many items there are, and how many are just names applied to them.

interesting though where the AS keyword is useful... lets say two tables have name and you get this back from a query in code then it is cool to change the column name to something unique.

Related

How to do self join

I want to do self-join I have written a code as well but its throwing an error and it seems that there is some issue with an alias.
Apart from this it would also be helpful for me if someone let me know any best site where I can learn query in MS Access. I searched a number of places everywhere, it is showing through UI Interface but I want to learn query in MS Access.
(SELECT distinct itemname,vendorname,price,count(*)
from vendor_Details1
group by itemname,vendorname,price
order by vendorname) A
inner join
(SELECT distinct itemname,vendorname,price,count(*)
from vendor_Details1
group by itemname,vendorname,price
order by vendorname) B
on A.vendorname=B.vendorname
It is entirely unclear what you are trying to do. However, "join" is an operator in the FROM clause that operates on two tables, views, or subqueries.
The structure of a self-join looks like:
select . . . -- list of columns here
from t as t1 inner join -- you need aliases for the table so you can distinguish the references
t as t2
on t1.? = t2.? -- the join condition goes here
Your query doesn't even have a select.

Using "From Multiple Tables" or "Join" performance difference [duplicate]

Most SQL dialects accept both the following queries:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x = b.x
SELECT a.foo, b.foo
FROM a
LEFT JOIN b ON a.x = b.x
Now obviously when you need an outer join, the second syntax is required. But when doing an inner join why should I prefer the second syntax to the first (or vice versa)?
The old syntax, with just listing the tables, and using the WHERE clause to specify the join criteria, is being deprecated in most modern databases.
It's not just for show, the old syntax has the possibility of being ambiguous when you use both INNER and OUTER joins in the same query.
Let me give you an example.
Let's suppose you have 3 tables in your system:
Company
Department
Employee
Each table contain numerous rows, linked together. You got multiple companies, and each company can have multiple departments, and each department can have multiple employees.
Ok, so now you want to do the following:
List all the companies, and include all their departments, and all their employees. Note that some companies don't have any departments yet, but make sure you include them as well. Make sure you only retrieve departments that have employees, but always list all companies.
So you do this:
SELECT * -- for simplicity
FROM Company, Department, Employee
WHERE Company.ID *= Department.CompanyID
AND Department.ID = Employee.DepartmentID
Note that the last one there is an inner join, in order to fulfill the criteria that you only want departments with people.
Ok, so what happens now. Well, the problem is, it depends on the database engine, the query optimizer, indexes, and table statistics. Let me explain.
If the query optimizer determines that the way to do this is to first take a company, then find the departments, and then do an inner join with employees, you're not going to get any companies that don't have departments.
The reason for this is that the WHERE clause determines which rows end up in the final result, not individual parts of the rows.
And in this case, due to the left join, the Department.ID column will be NULL, and thus when it comes to the INNER JOIN to Employee, there's no way to fulfill that constraint for the Employee row, and so it won't appear.
On the other hand, if the query optimizer decides to tackle the department-employee join first, and then do a left join with the companies, you will see them.
So the old syntax is ambiguous. There's no way to specify what you want, without dealing with query hints, and some databases have no way at all.
Enter the new syntax, with this you can choose.
For instance, if you want all companies, as the problem description stated, this is what you would write:
SELECT *
FROM Company
LEFT JOIN (
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID
Here you specify that you want the department-employee join to be done as one join, and then left join the results of that with the companies.
Additionally, let's say you only want departments that contains the letter X in their name. Again, with old style joins, you risk losing the company as well, if it doesn't have any departments with an X in its name, but with the new syntax, you can do this:
SELECT *
FROM Company
LEFT JOIN (
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID AND Department.Name LIKE '%X%'
This extra clause is used for the joining, but is not a filter for the entire row. So the row might appear with company information, but might have NULLs in all the department and employee columns for that row, because there is no department with an X in its name for that company. This is hard with the old syntax.
This is why, amongst other vendors, Microsoft has deprecated the old outer join syntax, but not the old inner join syntax, since SQL Server 2005 and upwards. The only way to talk to a database running on Microsoft SQL Server 2005 or 2008, using the old style outer join syntax, is to set that database in 8.0 compatibility mode (aka SQL Server 2000).
Additionally, the old way, by throwing a bunch of tables at the query optimizer, with a bunch of WHERE clauses, was akin to saying "here you are, do the best you can". With the new syntax, the query optimizer has less work to do in order to figure out what parts goes together.
So there you have it.
LEFT and INNER JOIN is the wave of the future.
The JOIN syntax keeps conditions near the table they apply to. This is especially useful when you join a large amount of tables.
By the way, you can do an outer join with the first syntax too:
WHERE a.x = b.x(+)
Or
WHERE a.x *= b.x
Or
WHERE a.x = b.x or a.x not in (select x from b)
Basically, when your FROM clause lists tables like so:
SELECT * FROM
tableA, tableB, tableC
the result is a cross product of all the rows in tables A, B, C. Then you apply the restriction WHERE tableA.id = tableB.a_id which will throw away a huge number of rows, then further ... AND tableB.id = tableC.b_id and you should then get only those rows you are really interested in.
DBMSs know how to optimise this SQL so that the performance difference to writing this using JOINs is negligible (if any). Using the JOIN notation makes the SQL statement more readable (IMHO, not using joins turns the statement into a mess). Using the cross product, you need to provide join criteria in the WHERE clause, and that's the problem with the notation. You are crowding your WHERE clause with stuff like
tableA.id = tableB.a_id
AND tableB.id = tableC.b_id
which is only used to restrict the cross product. WHERE clause should only contain RESTRICTIONS to the resultset. If you mix table join criteria with resultset restrictions, you (and others) will find your query harder to read. You should definitely use JOINs and keep the FROM clause a FROM clause, and the WHERE clause a WHERE clause.
The first way is the older standard. The second method was introduced in SQL-92, http://en.wikipedia.org/wiki/SQL. The complete standard can be viewed at http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt .
It took many years before database companies adopted the SQL-92 standard.
So the reason why the second method is preferred, it is the SQL standard according the ANSI and ISO standards committee.
The second is preferred because it is far less likely to result in an accidental cross join by forgetting to put inthe where clause. A join with no on clause will fail the syntax check, an old style join with no where clause will not fail, it will do a cross join.
Additionally when you later have to a left join, it is helpful for maintenance that they all be in the same structure. And the old syntax has been out of date since 1992, it is well past time to stop using it.
Plus I have found that many people who exclusively use the first syntax don't really understand joins and understanding joins is critical to getting correct results when querying.
I think there are some good reasons on this page to adopt the second method -using explicit JOINs. The clincher though is that when the JOIN criteria are removed from the WHERE clause it becomes much easier to see the remaining selection criteria in the WHERE clause.
In really complex SELECT statements it becomes much easier for a reader to understand what is going on.
The SELECT * FROM table1, table2, ... syntax is ok for a couple of tables, but it becomes exponentially (not necessarily a mathematically accurate statement) harder and harder to read as the number of tables increases.
The JOIN syntax is harder to write (at the beginning), but it makes it explicit what criteria affects which tables. This makes it much harder to make a mistake.
Also, if all the joins are INNER, then both versions are equivalent. However, the moment you have an OUTER join anywhere in the statement, things get much more complicated and it's virtually guarantee that what you write won't be querying what you think you wrote.
When you need an outer join the second syntax is not always required:
Oracle:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x = b.x(+)
MSSQLServer (although it's been deprecated in 2000 version)/Sybase:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x *= b.x
But returning to your question. I don't know the answer, but it is probably related to the fact that a join is more natural (syntactically, at least) than adding an expression to a where clause when you are doing exactly that: joining.
I hear a lot of people complain the first one is too difficult to understand and that it is unclear. I don't see a problem with it, but after having that discussion, I use the second one even on INNER JOINS for clarity.
To the database, they end up being the same. For you, though, you'll have to use that second syntax in some situations. For the sake of editing queries that end up having to use it (finding out you needed a left join where you had a straight join), and for consistency, I'd pattern only on the 2nd method. It'll make reading queries easier.
Well the first and second queries may yield different results because a LEFT JOIN includes all records from the first table, even if there are no corresponding records in the right table.
If both are inner joins there is no difference in the semantics or the execution of the SQL or performance. Both are ANSI Standard SQL It is purely a matter of preference, of coding standards within your work group.
Over the last 25 years, I've developed the habit that if I have a fairly complicated SQL I will use the INNER JOIN syntax because it is easier for the reader to pick out the structure of the query at a glance. It also gives more clarity by singling out the join conditions from the residual conditions, which can save time (and mistakes) if you ever come back to your query months later.
However for outer joins, for the purpose of clarity I would not under any circumstances use non-ansi extensions.

Unknown column on 'on clause' in sql joins?

Can you guys help me please in understanding the SQL specifications on join. I don't understand it. I kept getting errors called Unknown column list on on clause.
I got this error over my SQL syntax, I almost rubbed it in my face I just can't understand why it is not working, I have read some article regarding that it is because of precedence etc but I am really confused on what I have done wrong here.
select product.name , product.price from product inner join product_category on
(product_category.product_no = product.product_no ) where product_category.sub_category =
"COFFIN";
I know this question have been ask a hudred and million times here, but the ones I saw are complicated nowhere close in this very basic sql syntax.
THanks for helping me out.
EDIT:
I just had realize that I have product_category not a direct child to my product table so
I just have typed
select * from product
join specifications
join product_category on ( specifications.product_no = product_category.product_no);
But this still gave me an error, unknown column product_category.
I've read and followed some instruction similarly to this sites:
MYSQL unknown clause join column in next join
Unknown column {0} in on clause
MySQL "Unknown Column in On Clause"
I am really frustrated. I really can't get it to work.
For each new table you join in your query, each table must have at least one ON clause. It's hard to know exactly what you're trying to do without knowing the schema (table names, columns, etc), but here's an example
select *
from product p
join specifications s
on p.product_no = s.product_no
join product_category pc
on pc.spec_no = p.spec_no
Check out this link on table aliases as well. Gives a good example on joins + really useful info on how to increase the readability of your SQL
http://msdn.microsoft.com/en-us/library/ms187455(v=sql.90).aspx
I found this article useful as well as it visually displays the different types of joins
http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
You are missing the part where you specify the joining conditions between product and specifcations.
select * from product
join specifications YOU NEED SOMETHING HERE
join product_category on etc
I modified the SQL syntax to look like this, I have overlook a key that connects product_category onto specification so I made necessary link and it worked!!!
SELECT *
FROM product
JOIN specifications ON ( product.product_no = specifications.product_no )
JOIN product_category ON ( specifications.spec_no = product_category.spec_no )
WHERE product_category.sub_category = "COFFIN"
LIMIT 0 , 30
Also thanks for the heads up on missing joining condition on specifications. Heck this carelessness cost so much time.
Thank you so much!
The default join type is an inner join. So if you write join, the database reads inner join, and insist that you include an on clause.
If you'd like to join without a condition, specify the cross join explicitly:
select *
from product p
cross join
specifications s
inner join
product_category pc
on pc.product_no = p.product_no
left join
some_other_table sot
on 1=1
The last join, with the on 1=1 condition, is another way to do a cross join. It's subtly different in that it will return rows from the left table even if the right table is empty.
Example at SQL Fiddle.

sql join syntax

I'm kind of new to writing sql and I have a question about joins. Here's an example select:
select bb.name from big_box bb, middle_box mb, little_box lb
where lb.color = 'green' and lb.parent_box = mb and mb.parent_box = bb;
So let's say that I'm looking for the names of all the big boxes that have nested somewhere inside them a little box that's green. If I understand correctly, the above syntax is another way of getting the same results that we could get by using the 'join' keyword.
Questions: is the above select statement efficient for the task it's doing? If not, what is a better way to do it? Is the statement syntactic sugar for a join or is it actually doing something else?
If you have links to any good material on the subject I'd gladly read it, but since I don't know exactly what this technique is called I'm having trouble googling it.
You are using implicit join syntax. This is equivalent to using the JOIN keyword but it is a good idea to avoid this syntax completely and instead use explicit joins:
SELECT bb.name
FROM big_box bb
JOIN middle_box mb ON mb.parent_box = bb.id
JOIN little_box lb ON lb.parent_box = mb.id
WHERE lb.color = 'green'
You were also missing the column name in the join condition. I have guessed that the column is called id.
This type of query should be efficient if the tables are indexed correctly. In particular there should be foreign key constraints on the join conditions and an index on little_box.color.
An issue with your query is that if there are multiple green boxes inside a single box you will get duplicate rows returned. These duplicates can be removed by addding DISTINCT after SELECT.

Why is selecting specified columns, and all, wrong in Oracle SQL?

Say I have a select statement that goes..
select * from animals
That gives a a query result of all the columns in the table.
Now, if the 42nd column of the table animals is is_parent, and I want to return that in my results, just after gender, so I can see it more easily. But I also want all the other columns.
select is_parent, * from animals
This returns ORA-00936: missing expression.
The same statement will work fine in Sybase, and I know that you need to add a table alias to the animals table to get it to work ( select is_parent, a.* from animals ani), but why must Oracle need a table alias to be able to work out the select?
Actually, it's easy to solve the original problem. You just have to qualify the *.
select is_parent, animals.* from animals;
should work just fine. Aliases for the table names also work.
There is no merit in doing this in production code. We should explicitly name the columns we want rather than using the SELECT * construct.
As for ad hoc querying, get yourself an IDE - SQL Developer, TOAD, PL/SQL Developer, etc - which allows us to manipulate queries and result sets without needing extensions to SQL.
Good question, I've often wondered this myself but have then accepted it as one of those things...
Similar problem is this:
sql>select geometrie.SDO_GTYPE from ngg_basiscomponent
ORA-00904: "GEOMETRIE"."SDO_GTYPE": invalid identifier
where geometrie is a column of type mdsys.sdo_geometry.
Add an alias and the thing works.
sql>select a.geometrie.SDO_GTYPE from ngg_basiscomponent a;
Lots of good answers so far on why select * shouldn't be used and they're all perfectly correct. However, don't think any of them answer the original question on why the particular syntax fails.
Sadly, I think the reason is... "because it doesn't".
I don't think it's anything to do with single-table vs. multi-table queries:
This works fine:
select *
from
person p inner join user u on u.person_id = p.person_id
But this fails:
select p.person_id, *
from
person p inner join user u on u.person_id = p.person_id
While this works:
select p.person_id, p.*, u.*
from
person p inner join user u on u.person_id = p.person_id
It might be some historical compatibility thing with 20-year old legacy code.
Another for the "buy why!!!" bucket, along with why can't you group by an alias?
The use case for the alias.* format is as follows
select parent.*, child.col
from parent join child on parent.parent_id = child.parent_id
That is, selecting all the columns from one table in a join, plus (optionally) one or more columns from other tables.
The fact that you can use it to select the same column twice is just a side-effect. There is no real point to selecting the same column twice and I don't think laziness is a real justification.
Select * in the real world is only dangerous when referring to columns by index number after retrieval rather than by name, the bigger problem is inefficiency when not all columns are required in the resultset (network traffic, cpu and memory load).
Of course if you're adding columns from other tables (as is the case in this example it can be dangerous as these tables may over time have columns with matching names, select *, x in that case would fail if a column x is added to the table that previously didn't have it.
why must Oracle need a table alias to be able to work out the select
Teradata is requiring the same. As both are quite old (maybe better call it mature :-) DBMSes this might be historical reasons.
My usual explanation is: an unqualified * means everything/all columns and the parser/optimizer is simply confused because you request more than everything.