SQL and implicit join - sql

w3schools.com has the following SQL statement example:
SELECT
Product_Orders.OrderID, Persons.LastName, Persons.FirstName
FROM
Persons,
Product_Orders
WHERE
Persons.LastName='Hansen' AND Persons.FirstName='Ola'
What is not clear to me is how the product order table is joined with the persons table. Is this SQL proper, and what does it imply about the result set sent back?

It's technically proper, though I would avoid using that type of syntax for a couple reasons which I won't go into right here.
That syntax is colloquially called the "old style" join syntax and is equivalent to:
SELECT Product_Orders.OrderID, Persons.LastName, Persons.FirstName
FROM Persons
CROSS JOIN Product_Orders
WHERE Persons.LastName='Hansen' AND Persons.FirstName='Ola'
The answer of how they "join" together is that they don't. The result of the query is the cross product of the entire Product_Orders table with the "Ola Hanson" row(s) in the Persons table. It doesn't make much sense in most of the use cases I can think of, to be quite honest with you.
The really strange thing about the results you'll get here is that the OrderID from the Product_Orders table will not necessarily align with the person on the order. It looks to be a mistake of omission on the page -- though to be fair the example intends to demonstrate aliasing, not joins.
W3Schools doesn't have a good reputation around here.

Firs of all you have to understand how Joins works. From you query:
A cross product is done between the two tables supplied in the SQL statement, you can see this from this FROM Persons, Product_Orders.
For every record in Persons table is joined to each record in the Product_Orders table. This will give you a huge temp table if both tables are big.
Then your filter i.e. the qualifying condition will be applied to this temp table to give you your result.
I hope this helps.

Related

Natural Join Explanation

I am a student trying to learn Microsoft SQL and I am frustrated that I cannot understand how natural join works. I have a problem with a solution but I cannot understand how this solution was created through natural join. My friend sent my his old solution but cant remember how he got the result. I really want to figure out how this natural join works. Can someone please explain how this answer was achieved?
The question taken from "Database Design, Application, and Administration 6th Ed.:
Show the result of a natural join that combines the Customer and OrderTbl tables.
Edit: The book doesn't give a query statement on how to get the results for the natural join in the chapters I have read so far. Which makes things much more confusing for me as I cannot simply add the table and send a query.
Edit2: The reason why some entries in the order table have null values is because the question is implying that some orders were processed through the internet. A friend of mine told me that was the reason why he got the question right opposed to some people who argued against it. :S
If I read your question correctly, you are trying to learn two concepts at the same time. The first is INNER JOIN (sometimes called equijoin). The second is natural join.
It's easier to explain natural join, assuming you already know how INNER JOIN works. A natural join is simply an equijoin where the column names indicate to you what the join condition ought to be. In your case, the fact that CustNo appears in both tables is the only clue you need in order to devise the correct join condition. You also include the join field only once in the result.
Column names are actually quite arbitrary, and could have been made very different in this case. for example, if the column Customer.CustNo had been named Customer.ID instead, you wouldn't be able to do a natural join.
for a correct solution in your case, see the answer provided by JamieC.
If you simply want the query which will result in the final table in your question here it is,
SELECT
o.OrdNo,
o.OrdDate,
o.EmpNo,
o.CustNo,
c.CustFirstName
c.CustLastName,
c.CustCity,
c.CustSatate,
c.CustZip,
c.CustBal
FROM OrderTbl o
INNER JOIN Customer c
ON o.CustNo = c.CustNo
ORDER BY c.CustNo
So, by way of explanation; this query selects all data from Customer and OrderTbl joining the two using CustNo which is the primary key (presumably) in Customer and a foreign key in OrderTbl. The ordering of the result is a little more tricky, and based almost purely on guesswork, I suspect the result is ordered by CustNo as well.
The Employee table does not feature at all in the result, however as the OrderTbl table has some blanks for EmpNo, you would almost certainly want a LEFT JOIN/RIGHT JOIN (as appropriate) if you wanted to retrieve any information about the employee from the orders table.
MS SQL Server doesn't support NATURAL JOIN. However, if you were using a platform that would support it, a simple:
SELECT * FROM Customer NATURAL JOIN OrderTbl;
should do the trick.
https://en.wikipedia.org/wiki/Join_(SQL)#Natural_join is quite good.

Do subselects do an implicit join?

I have a sql query that seems to work but I dont really understand why. Therefore I would very much appreciate if someone could help explain whats going on:
THE QUERY RETURNS: All organisations that dont have any comments that were not created by the consultant who created the organisation record.
SELECT \"organisations\".*
FROM \"organisations\"
WHERE \"organisations\".\"id\" NOT IN
(SELECT \"comments\".\"commentable_id\"
FROM \"comments\"
WHERE \"comments\".\"commentable_type\" = 'Organisation'
AND (comments.author_id != organisations.consultant_id)
ORDER BY \"comments\".\"created_at\" ASC
)
It seems to do so correctly.
The part I dont understand is why (comments.author_id != organisations.consultant_id) is working!? I dont understand how postgres even knows what "organisations" is inside that subselect? It is not defined in here.
If this was written as a join where I had joined comments to organisations then I would totally understand how you could do something like this but in this case its a subselect. How does it know how to map the comments and organisations table and exclude the ones where (comments.author_id != organisations.consultant_id)
That subselect happens in a row so it can see all columns of that row. You will probably get better performance with this
select organisations.*
from organisations
where not exists (
select 1
from comments
where
commentable_type = 'organisation' and
author_id != organisations.consultant_id
)
Notice that it is not necessary to qualify commentable_type since the one in comments has priority over any other outside the subselect. And if comments does not have a consultant_id column then it would be possible to take its qualifier out, although not recommended for better legibility.
The order by in your query buys you nothing, just added cost.
You are running a correlated subquery. http://technet.microsoft.com/en-us/library/ms187638(v=sql.105).aspx
This is commonly used in all databases. A subquery in the WHERE clause can refer to tables used in the parent query, and they often do.
That being said, your current query could likely be written better.
Here is one way, using an outer join with comments, where no matches are found based on your criteria -
select o.*
from organizations o
left join comments c
on c.commentable_type <> 'Organisation'
and c.author_id = o.consultant_id
where c.commentable_id is null

Joins vs Join like queries - Are they equivalent?

I expect this question has been asked multiple times, with different twists.. I want to try and get a generic and comprehensive understanding of this topic though. (does it belong in programming SO? ..)
Lets say I have a table for sports and a table for matches. matches, among other fields has a sport_id column, and this is a 1:many relationship.
Lets say I want to list out sports which have matches on day X. I could do this in 3 ways that I can think of..
Nested queries - easy to reason?
SELECT *
FROM sports
WHERE id IN (SELECT sport_id FROM matches WHERE <DATE CHECK>)
From/where - easy to write?
SELECT sports.*
FROM sports, matches
WHERE sports.id = matches.sport_id
AND <DATE CHECK>
Joins - I am not too familiar, so forgive any mistakes
SELECT *
FROM sports
JOIN matches ON sports.id = matches.sport_id
WHERE <DATE CHECK>
There might be other methods based on variations of Join which might be better suited here, inner join perhaps..
What I want to know is how I can compare these 3 on the basis of
Equivalent response (same rows returned?)
Performance on DB
Are all of them 1 query/network call or ?
Are any of these answers db engine dependent?
How might I choose among these?
Is #2 syntactic sugar for #3? is #1? Or are they optimized to #3 on some/all cases?
The second and third forms are completely equivalent (except you have an extra comma in the third version). FROM sports, matches is an implicit join, FROM sports JOIN matches is an explicit join. Implicit joins are the earlier form, explicit joins are more modern and generally preferred by database experts.
The version with WHERE IN is almost the same, but there are some differences. First, SELECT * will return columns from both tables in the join, but will only return columns from sports in the WHERE IN query. Second, if a row in sports matches multiple rows in matches, the joins will return a row for each pair of matches (it performs a cross product), while WHERE IN will just return the row from sports once regardless of how many matches there are.
Performance differences are implementation dependent. There shouldn't be any difference between the explicit and implicit joins, they're just syntactic sugar. However, databases don't always optimize the WHERE IN queries the same. For instance, when I've used EXPLAIN with MySQL, WHERE IN queries often perform a full scan over the outer table, matching the column against the index of the table in the subquery, even though the subquery might only return a small number of rows. I think some people have told me that recent MySQL versions are better at this.
They will all be just 1 network call. All queries are just a single call to the database server.
BTW, there's another form that you didn't list, using WHERE EXISTS with a correlated subquery.
SELECT *
FROM sports s
WHERE EXISTS (SELECT 1
FROM matches m
WHERE s.id = m.sport_id AND <DATE CHECK>)
Performance differences between this and JOIN will again be implementation-dependent.
Here is what I think about your questions
1.Equivalent response (same rows returned?)
for first QUERY where you usered IN Oprator my answer is NO (you get same number of rows but only columns from table sports )
and second and third is almost same
2.Performance on DB
First In oprator is Slower then join beacause
The IN is evaluated (and the select from b re-run) for each row in a, whereas the JOIN is optimized to use indices and other neat paging tricks...
ANSI JOIN Syntax
SELECT fname, lname, department
FROM names INNER JOIN departments ON names.employeeid = departments.employeeid
Former Microsoft JOIN Syntax
SELECT fname, lname, department
FROM names, departments
WHERE names.employeeid = departments.employeeid
If written correctly, either format will produce identical results. But that is a big if. The older Microsoft join syntax lends itself to mistakes because the syntax is a little less obvious. On the other hand, the ANSI syntax is very explicit and there is little chance you can make a mistake.
3.Are all of them 1 query/network call or ?
-Trial 1 result for IN
-Trial 2 result for Microsoft JOIN,
-Trial 3 result for ANSI JOIN
4.Are any of these answers db engine dependant?
(Sorry I didt got ans for this question)
5.How might I choose among these?
I sugges you shuold use ANSI JOIN
6.Is #2 syntactic sugar for #3? is #1? Or are they optimized to #3 on some/all cases?
-I think NO as I mentioned above #3 syntex is more batter
as per my past experience
I ran across a slow-performing query from an ERP program. After reviewing the code, which used the Microsoft JOIN syntax, I noticed that instead of creating a LEFT JOIN, the developer had accidentally created a CROSS JOIN instead. In this particular example, less than 10,000 rows should have resulted from the LEFT JOIN, but because a CROSS JOIN was used, over 11 million rows were returned instead. Then the developer used a SELECT DISTINCT to get rid of all the unnecessary rows created by the CROSS JOIN. As you can guess, this made for a very lengthy query. I notified the vendor’s support department about it, and they fixed their code.
The moral of this story is that you probably should be using the ANSI syntax, not the old Microsoft syntax. Besides reducing the odds of making silly mistakes, this code is more portable between database, and eventually, I imagine Microsoft will eventually stop supporting the old format, making the ANSI syntax the only option

INNER JOIN vs multiple table names in "FROM" [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
INNER JOIN versus WHERE clause — any difference?
What is the difference between an INNER JOIN query and an implicit join query (i.e. listing multiple tables after the FROM keyword)?
For example, given the following two tables:
CREATE TABLE Statuses(
id INT PRIMARY KEY,
description VARCHAR(50)
);
INSERT INTO Statuses VALUES (1, 'status');
CREATE TABLE Documents(
id INT PRIMARY KEY,
statusId INT REFERENCES Statuses(id)
);
INSERT INTO Documents VALUES (9, 1);
What is the difference between the below two SQL queries?
From the testing I've done, they return the same result. Do they do the same thing? Are there situations where they will return different result sets?
-- Using implicit join (listing multiple tables)
SELECT s.description
FROM Documents d, Statuses s
WHERE d.statusId = s.id
AND d.id = 9;
-- Using INNER JOIN
SELECT s.description
FROM Documents d
INNER JOIN Statuses s ON d.statusId = s.id
WHERE d.id = 9;
There is no reason to ever use an implicit join (the one with the commas). Yes for inner joins it will return the same results. However, it is subject to inadvertent cross joins especially in complex queries and it is harder for maintenance because the left/right outer join syntax (deprecated in SQL Server, where it doesn't work correctly right now anyway) differs from vendor to vendor. Since you shouldn't mix implicit and explict joins in the same query (you can get wrong results), needing to change something to a left join means rewriting the entire query.
If you do it the first way, people under the age of 30 will probably chuckle at you, but as long as you're doing an inner join, they produce the same result and the optimizer will generate the same execution plan (at least as far as I've ever been able to tell).
This does of course presume that the where clause in the first query is how you would be joining in the second query.
This will probably get closed as a duplicate, btw.
The nice part of the second method is that it helps separates the join condition (on ...) from the filter condition (where ...). This can help make the intent of the query more readable.
The join condition will typically be more descriptive of the structure of the database and the relation between the tables. e.g., the salary table is related to the employee table by the EmployeeID column, and queries involving those two tables will probably always join on that column.
The filter condition is more descriptive of the specific task being performed by the query. If the query is FindRichPeople, the where clause might be "where salaries.Salary > 1000000"... thats describing the task at hand, not the database structure.
Note that the SQL compiler doesn't see it that way... if it decides that it will be faster to cross join and then filter the results, it will cross join and filter the results. It doesn't care what is in the ON clause and whats in the WHERE clause. But, that typically wont happen if the on clause matches a foreign key or joins to a primary key or indexed column. As far as operating correctly, they are identical; as far as writing readable, maintainable code, the second way is probably a little better.
there is no difference as far as I know is the second one with the inner join the new way to write such statements and the first one the old method.
The first one does a Cartesian product on all record within those two tables then filters by the where clause.
The second only joins on records that meet the requirements of your ON clause.
EDIT: As others have indicated, the optimization engine will take care of an attempt on a Cartesian product and will result in the same query more or less.
A bit same. Can help you out.
Left join vs multiple tables in SQL (a)
Left join vs multiple tables in SQL (b)
In the example you've given, the queries are equivalent; if you're using SQL Server, run the query and display the actual exection plan to see what the server's doing internally.

Why is selecting specified columns, and all, wrong in Oracle SQL?

Say I have a select statement that goes..
select * from animals
That gives a a query result of all the columns in the table.
Now, if the 42nd column of the table animals is is_parent, and I want to return that in my results, just after gender, so I can see it more easily. But I also want all the other columns.
select is_parent, * from animals
This returns ORA-00936: missing expression.
The same statement will work fine in Sybase, and I know that you need to add a table alias to the animals table to get it to work ( select is_parent, a.* from animals ani), but why must Oracle need a table alias to be able to work out the select?
Actually, it's easy to solve the original problem. You just have to qualify the *.
select is_parent, animals.* from animals;
should work just fine. Aliases for the table names also work.
There is no merit in doing this in production code. We should explicitly name the columns we want rather than using the SELECT * construct.
As for ad hoc querying, get yourself an IDE - SQL Developer, TOAD, PL/SQL Developer, etc - which allows us to manipulate queries and result sets without needing extensions to SQL.
Good question, I've often wondered this myself but have then accepted it as one of those things...
Similar problem is this:
sql>select geometrie.SDO_GTYPE from ngg_basiscomponent
ORA-00904: "GEOMETRIE"."SDO_GTYPE": invalid identifier
where geometrie is a column of type mdsys.sdo_geometry.
Add an alias and the thing works.
sql>select a.geometrie.SDO_GTYPE from ngg_basiscomponent a;
Lots of good answers so far on why select * shouldn't be used and they're all perfectly correct. However, don't think any of them answer the original question on why the particular syntax fails.
Sadly, I think the reason is... "because it doesn't".
I don't think it's anything to do with single-table vs. multi-table queries:
This works fine:
select *
from
person p inner join user u on u.person_id = p.person_id
But this fails:
select p.person_id, *
from
person p inner join user u on u.person_id = p.person_id
While this works:
select p.person_id, p.*, u.*
from
person p inner join user u on u.person_id = p.person_id
It might be some historical compatibility thing with 20-year old legacy code.
Another for the "buy why!!!" bucket, along with why can't you group by an alias?
The use case for the alias.* format is as follows
select parent.*, child.col
from parent join child on parent.parent_id = child.parent_id
That is, selecting all the columns from one table in a join, plus (optionally) one or more columns from other tables.
The fact that you can use it to select the same column twice is just a side-effect. There is no real point to selecting the same column twice and I don't think laziness is a real justification.
Select * in the real world is only dangerous when referring to columns by index number after retrieval rather than by name, the bigger problem is inefficiency when not all columns are required in the resultset (network traffic, cpu and memory load).
Of course if you're adding columns from other tables (as is the case in this example it can be dangerous as these tables may over time have columns with matching names, select *, x in that case would fail if a column x is added to the table that previously didn't have it.
why must Oracle need a table alias to be able to work out the select
Teradata is requiring the same. As both are quite old (maybe better call it mature :-) DBMSes this might be historical reasons.
My usual explanation is: an unqualified * means everything/all columns and the parser/optimizer is simply confused because you request more than everything.