SQL INNER JOIN implemented as implicit JOIN - sql

Recently, I came across an SQL query which looked like this:
SELECT * FROM A, B WHERE A.NUM = B.NUM
To me, it seems as if this will return exactly the same as an INNER JOIN:
SELECT * FROM A INNER JOIN B ON A.NUM = B.NUM
Is there any sane reason why anyone would use a CROSS JOIN here? Edit: it seems as if most SQL applications will automatically use a INNER JOIN here.
The database is HSQLDB

The older syntax is a SQL antipattern. It should be replaced with an inner join anytime you see it. Part of why it is an antipattern is because it is impoosible to tell if a cross join was intended or not if the where clasues is ommitted. This causes many accidental cross joins espcially in complex queries. Further, in some databases (espcially Sql server) the implict outer joins do not work correctly and so people try to combine explicit and implict joins and get bad results without even realizing it. All in all it is a poor practice to even consider using an implict join.

Yes, your both statements will return the same result. Which one is to be used is a matter of taste. Every sane database system will use a join for both if possible, no sane optimizer will really use a cross product in the first case.
But note that your first syntax is not a cross join. It is just an implicit notation for a join which does not specify which kind of join to use. Instead, the optimizer must check the WHERE clauses to determine whether to use an inner join or a cross join: If an applicable join condition is found in the WHERE clause, this will result in an inner join. If no such clause is found it will result in a cross join. Since your first example specifies an applicable join condition (WHERE A.NUM = B.NUM) this results in an INNER JOIN and thus exactly equivalent to your second case.

Related

Where vs ON in outer join

I am wondering how to have a better SQL performance when we decide whether to duplicate our criteria when it is already in Where clause.
My friend claimed it is up to DB engines but I am not so sure.
Regardless of DB engines, normally, the condition in Where clause should be executed first before join, but I assume it means inner join but not outer join. Because some conditions can only be executed AFTER outer join.
For example:
Select a.*, b.*
From A a
Left outer join B on a.id = b.id
Where b.id is NULL;
The condition in Where cannot be executed before outer join.
So, I assume the whole ON clause must be executed first before where clause, and it seems the ON clause will control the size of table B (or table A if we use right outer join) before outer join. That seems not related to DB engines to me.
And that raised my question: when we use outer join, should we always deplicate our criteria in ON Clause?
for example (I use a table to outer join with a shorter version of itself)
temp_series_installment & series_id > 18940000 vs temp_series_installment:
select sql_no_cache s.*, t.* from temp_series_installment s
left outer join temp_series_installment t on s.series_id = t.series_id and t.series_id > 18940000 and t.incomplete = 1
where t.incomplete = 1;
VS
select sql_no_cache s.*, t.* from temp_series_installment s
left outer join temp_series_installment t on s.series_id = t.series_id and t.series_id > 18940000
where t.incomplete = 1;
Edit: where t.incomplete = 1 performs the logic of: where t.series_id is not null
which is an inner join suggested by Gordon Linoff
But what I have been asking is: if it outer join a smaller table, it should have been faster right?
I tried to see if there is any performace difference in mysql:
But it is out of my expectation, why is the second one faster? I thought by outer joining a smaller table, the query will be faster.
My idea is from:
https://www.ibm.com/support/knowledgecenter/en/SSZLC2_8.0.0/com.ibm.commerce.developer.doc/refs/rsdperformanceworkspaces.htm
Section:
Push predicates into the OUTER JOIN clause whenever possible
Duplicate constant condition for different tables whenever possible
Regardless of DB engines, normally, the condition in Where clause should be executed first before join, but I assume it means inner join but not outer join. Because some conditions can only be executed AFTER outer join.
This is simply not true. SQL is a descriptive language. It does not specify how the query gets executed. It only specifies what the result set looks like. The SQL compiler/optimizer determines the actual processing steps to meet the requirements described by the query.
In terms of semantics, the FROM clause is the first clause that is "evaluated". Hence, FROM is logically processed before the WHERE clause.
The rest of your question is similarly misguided. Comparison logic in the where clause, such as:
from s left join
t
on s.series_id = t.series_id and t.series_id > 18940000
where t.incomplete = 1
turns the outer join into an inner join. Hence, the logic is different from what you think is going on.
As Gordon Lindolf pointed out it's not true, Your friend is plain wrong.
I want just to add developers like to think SQL like they think their language of trade (C++, VB, Java), but those are procedural/imperative languages.
When you code SQL you are in another paradigm. You are just describing a function to be applied to a dataset.
Let's get your own example:
Select a.*, b.*
From A a
Left outer join B on a.id = b.id
Where b.id is NULL;
If a.Id and b.Id are not null columns.
It's semantically equal to
Select a.*, null, ..., null
From A a
where not exists (select * from B b where b.Id = a.Id)
Now try to run those to queries and profile.
In most DBMS I can expect both queries to run in the exact same way.
It happens because the engine decides how to implement your "function" over the dataset.
Note the above example is the equivalent in set mathematics to:
Give me the set A minus the intersection between A and B.
Engines can decide how to implement your query because they have some tricks under its sleeve.
It has metrics about your tables, indexes, etc and can use it to, for example, "make a join" in a diferent order you wrote it.
IMHO engines today are really good at finding the best way to implement the function you describe and rarely needs query hints.
Of course you can end describing your funciton in a way too complicated, affecting how the engines decides to run it.
The art of better describing functions and sets and managins indexes is what we call query tunning.

What if i dont use Join Keyword in query?

I have a query where i am retrieving data from more than two tables. I am using the filter criteria in where clause but not using any join keyword
select
d.proc_code,
d.dos,
s.svc_type
from
claim_detail d, h_claim_hdr hh, car_svc s
where
d.bu_id="$inp_bu_id"
and
hh.bu_id="$inp_bu_id"
and
s.bu_id="$inp_bu_id"
and
d.audit_nbr="$inp_audit_nbr"
and
hh.audit_nbr="$inp_audit_nbr"
and
d.audit_nbr=hh.audit_nbr
and
s.car_svc_nbr=hh.aut_nbr
Is there a better way of writing this?
Although you are not using a JOIN keyword, your query does perform a JOIN.
A more "modern" way of writing your query (i.e. one following the ANSI SQL standard) would be as follows:
select
d.proc_code,
d.dos,
s.svc_type
from
claim_detail d
join
h_claim_hdr hh on d.audit_nbr=hh.audit_nbr
join
car_svc s on s.car_svc_nbr=hh.aut_nbr
where
d.bu_id="$inp_bu_id"
and
hh.bu_id="$inp_bu_id"
and
s.bu_id="$inp_bu_id"
and
d.audit_nbr="$inp_audit_nbr"
and
hh.audit_nbr="$inp_audit_nbr"
Note that this is simply a modern syntax. It expresses the same query, and it will not impact the performance.
Note that in order for a row to appear in the output of this query, the corresponding rows must exist in all three queries (i.e. it's an inner join). If you would like to return rows of claim_detail for which no h_claim_hdr and / or car_svc existed, use left outer join instead.
A comma in the from clause is essentially the same as a cross join. You really don't want to use a cross join, unless you really know what you are doing.
Proper join syntax has several advantages. The most important of which is the ability to express other types of joins easily and compatibly across databases.
Most people would probably find this version easier to follow and maintain:
select d.proc_code, d.dos, s.svc_type
from claim_detail d join
h_claim_hdr hh
on d.bu_id = hh.bu_id and d.audit_nbr = hh.audit_nbr
car_svc s
on d.bu_id = s.bu_id and s.car_svc_nbr = hh.aut_nbr
where d.bu_id = "$inp_bu_id"
d.audit_nbr = "$inp_audit_nbr";
Using the WHERE clause instead of the JOIN keyword is essentially a different syntax for doing a join. I believe it is called Theta syntax, where using the JOIN clause is called ANSI syntax.
I believe ANSI syntax is almost universally recommended, and some databases require ANSI syntax for outer JOINs.
If you do not use JOIN it will be an implicit inner join. As is in your example with the join criteria on your WHERE clause. So you could me missing records. Lets say you want all records from the first table even if there is not a corresponding record in the second. Your current code would only return the records from the first table that have a matching record in the second.
Joins

FULL OUTER JOIN vs. FULL JOIN

Just playing around with queries and examples to get a better understanding of joins. I'm noticing that in SQL Server 2008, the following two queries give the same results:
SELECT * FROM TableA
FULL OUTER JOIN TableB
ON TableA.name = TableB.name
SELECT * FROM TableA
FULL JOIN TableB
ON TableA.name = TableB.name
Are these performing exactly the same action to produce the same results, or would I run into different results in a more complicated example? Is this just interchangeable terminology?
Actually they are the same. LEFT OUTER JOIN is same as LEFT JOIN and RIGHT OUTER JOIN is same as RIGHT JOIN. It is more informative way to compare from INNER Join.
See this Wikipedia article for details.
Microsoft® SQL Server™ 2000 uses these SQL-92 keywords for outer joins
specified in a FROM clause:
LEFT OUTER JOIN or LEFT JOIN
RIGHT OUTER JOIN or RIGHT JOIN
FULL OUTER JOIN or FULL JOIN
From MSDN
The full outer join or full join returns all rows from both tables, matching up the rows wherever a match can be made and placing NULLs in the places where no matching row exists.
It's true that some databases recognize the OUTER keyword. Some do not.
Where it is recognized, it is usually an optional keyword.
Almost always, FULL JOIN and FULL OUTER JOIN do exactly the same thing. (I can't think of an example where they do not. Can anyone else think of one?)
This may leave you wondering, "Why would it even be a keyword if it has no meaning?" The answer boils down to programming style.
In the old days, programmers strived to make their code as compact as possible. Every character meant longer processing time. We used 1, 2, and 3 letter variables. We used 2 digit years. We eliminated all unnecessary white space. Some people still program that way. It's not about processing time anymore. It's more about fast coding.
Modern programmers are learning to use more descriptive variables and put more remarks and documentation into their code. Using extra words like OUTER make sure that other people who read the code will have an easier time understanding it. There will be less ambiguity. This style is much more readable and kinder to the people in the future who will have to maintain that code.

Queries that implicit SQL joins can't do?

I've never learned how joins work but just using select and the where clause has been sufficient for all the queries I've done. Are there cases where I can't get the right results using the WHERE clause and I have to use a JOIN? If so, could someone please provide examples? Thanks.
Implicit joins are more than 20 years out-of-date. Why would you even consider writing code with them?
Yes, they can create problems that explicit joins don't have. Speaking about SQL Server, the left and right join implicit syntaxes are not guaranteed to return the correct results. Sometimes, they return a cross join instead of an outer join. This is a bad thing. This was true even back to SQL Server 2000 at least, and they are being phased out, so using them is an all around poor practice.
The other problem with implicit joins is that it is easy to accidentally do a cross join by forgetting one of the where conditions, especially when you are joining too many tables. By using explicit joins, you will get a syntax error if you forget to put in a join condition and a cross join must be explicitly specified as such. Again, this results in queries that return incorrect values or are fixed by using distinct to get rid of the cross join which is inefficient at best.
Moreover, if you have a cross join, the maintenance developer who comes along in a year to make a change doesn't know if it was intended or not when you use implicit joins.
I believe some ORMs also now require explicit joins.
Further, if you are using implied joins because you don't understand how joins operate, chances are high that you are writing code that, in fact, does not return the correct result because you don't know how to evaluate what the correct result would be since you don't understand what a join is meant to do.
If you write SQL code of any flavor, there is no excuse for not thoroughly understanding joins.
Yes. When doing outer joins. You can read this simple article on joins. Joins are not hard to understand at all so you should start learning (and using them where appropriate) right away.
Are there cases where I can't get the right results using the WHERE clause and I have to use a JOIN?
Any time your query involves two or more tables, a join is being used. This link is great for showing the differences in joins with pictures as well as sample result sets.
If the join criteria is in the WHERE clause, then the ANSI-89 JOIN syntax is being used. The reason for the newer JOIN syntax in the ANSI-92 format, is that it made LEFT JOIN more consistent across various databases. For example, Oracle used (+) on the side that was optional while in SQL Server you had to use =*.
Implicit join syntax by default uses Inner joins. It is sometimes possible to modify the implicit join syntax to specify outer joins, but it is vendor dependent in my experience (i know oracle has the (-) and (+) notation, and I believe sqlserver uses *= ). So, I believe your question can be boiled down to understanding the differences between inner and outer joins.
We can look at a simple example for an inner vs outer join using a simple query..........
The implicit INNER join:
select a.*, b.*
from table a, table b
where a.id = b.id;
The above query will bring back ONLY rows where the 'a' row has a matching row in 'b' for it's 'id' field.
The explicit OUTER JOIN:
select * from
table a LEFT OUTER JOIN table b
on a.id = b.id;
The above query will bring back EVERY row in a, whether or not it has a matching row in 'b'. If no match exists for 'b', the 'b' fields will be null.
In this case, if you wanted to bring back EVERY row in 'a' regardless of whether it had a corresponding 'b' row, you would need to use the outer join.
Like I said, depending on your database vendor, you may still be able to use the implicit join syntax and specify an outer join type. However, this ties you to that vendor. Also, any developers not familiar wit that specialized syntax may have difficulty understanding your query.
Any time you want to combine the results of two tables you'll need to join them. Take for example:
Users table:
ID
FirstName
LastName
UserName
Password
and Addresses table:
ID
UserID
AddressType (residential, business, shipping, billing, etc)
Line1
Line2
City
State
Zip
where a single user could have his home AND his business address listed (or a shipping AND a billing address), or no address at all. Using a simple WHERE clause won't fetch a user with no addresses because the addresses are in a different table. In order to fetch a user's addresses now, you'll need to do a join as:
SELECT *
FROM Users
LEFT OUTER JOIN Addresses
ON Users.ID = Addresses.UserID
WHERE Users.UserName = "foo"
See http://www.w3schools.com/Sql/sql_join.asp for a little more in depth definition of the different joins and how they work.
Using Joins :
SELECT a.MainID, b.SubValue AS SubValue1, b.SubDesc AS SubDesc1, c.SubValue AS SubValue2, c.SubDesc AS SubDesc2
FROM MainTable AS a
LEFT JOIN SubValues AS b ON a.MainID = b.MainID AND b.SubTypeID = 1
LEFT JOIN SubValues AS c ON a.MainID = c.MainID AND b.SubTypeID = 2
Off-hand, I can't see a way of getting the same results as that by using a simple WHERE clause to join the tables.
Also, the syntax commonly used in WHERE clauses to do left and right joins (*= and =*) is being phased out,
Oracle supports LEFT JOIN and RIGHT JOIN using their special join operator (+) (and SQL Server used to support *= and =* on join predicates, but no longer does). But a simple FULL JOIN can't be done with implicit joins alone:
SELECT f.title, a.first_name, a.last_name
FROM film f
FULL JOIN film_actor fa ON f.film_id = fa.film_id
FULL JOIN actor a ON fa.actor_id = a.actor_id
This produces all films and their actors including all the films without actor, as well as the actors without films. To emulate this with implicit joins only, you'd need unions.
-- Inner join part
SELECT f.title, a.first_name, a.last_name
FROM film f, film_actor fa, actor a
WHERE f.film_id = fa.film_id
AND fa.actor_id = a.actor_id
-- Left join part
UNION ALL
SELECT f.title, null, null
FROM film f
WHERE NOT EXISTS (
SELECT 1
FROM film_actor fa
WHERE fa.film_id = f.film_id
)
-- Right join part
UNION ALL
SELECT null, a.first_name, a.last_name
FROM actor a
WHERE NOT EXISTS (
SELECT 1
FROM film_actor fa
WHERE fa.actor_id = a.actor_id
)
This will quickly become very inefficient both syntactically as well as from a performance perspective.

Mixing Left and right Joins? Why?

Doing some refactoring in some legacy code I've found in a project. This is for MSSQL. The thing is, i can't understand why we're using mixed left and right joins and collating some of the joining conditions together.
My question is this: doesn't this create implicit inner joins in some places and implicit full joins in others?
I'm of the school that just about anything can be written using just left (and inner/full) or just right (and inner/full) but that's because i like to keep things simple where possible.
As an aside, we convert all this stuff to work on oracle databases as well, so maybe there's some optimization rules that work differently with Ora?
For instance, here's the FROM part of one of the queries:
FROM Table1
RIGHT OUTER JOIN Table2
ON Table1.T2FK = Table2.T2PK
LEFT OUTER JOIN Table3
RIGHT OUTER JOIN Table4
LEFT OUTER JOIN Table5
ON Table4.T3FK = Table5.T3FK
AND Table4.T2FK = Table5.T2FK
LEFT OUTER JOIN Table6
RIGHT OUTER JOIN Table7
ON Table6.T6PK = Table7.T6FK
LEFT OUTER JOIN Table8
RIGHT OUTER JOIN Table9
ON Table8.T8PK= Table9.T8FK
ON Table7.T9FK= Table9.T9PK
ON Table4.T7FK= Table7.T7PK
ON Table3.T3PK= Table4.T3PK
RIGHT OUTER JOIN ( SELECT *
FROM TableA
WHERE ( TableA.PK = #PK )
AND ( TableA.Date BETWEEN #StartDate
AND #EndDate )
) Table10
ON Table4.T4PK= Table10.T4FK
ON Table2.T2PK = Table4.T2PK
One thing I would do is make sure you know what results you are expecting before messing with this. Wouldn't want to "fix" it and have different results returned. Although honestly, with a query that poorly designed, I'm not sure that you are actually getting correct results right now.
To me this looks like something that someone did over time maybe even originally starting with inner joins, realizing they wouldn't work and changing to outer joins but not wanting to bother changing the order the tables were referenced in the query.
Of particular concern to me for maintenance purposes is to put the ON clauses next to the tables you are joining as well as converting all the joins to left joins rather than mixing right and left joins. Having the ON clause for table 4 and table 3 down next to table 9 makes no sense at all to me and should contribute to confusion as to what the query should actually return. You may also need to change the order of the joins in order to convert to all left joins. Personally I prefer to start with the main table that the others will join to (which appears to be table2) and then work down the food chain from there.
It could probably be converted to use all LEFT joins: I'd be looking and moving the right-hand table in each RIGHT to be above all the existing LEFTs, then you might be able to then turn every RIGHT join into a LEFT join. I'm not sure you'll get any FULL joins behind the scenes -- if the query looks like it is, it might be a quirk of this specific query rather than a SQL Server "rule": that query you've provided does seem to be mixing it up in a rather confusing way.
As for Oracle optimisation -- that's certainly possible. No experience of Oracle myself, but speaking to a friend who's knowledgeable in this area, Oracle (no idea what version) is/was fussy about the order of predicates. For example, with SQL Server you can write your way clause so that columns are in any order and indexes will get used, but with Oracle you end up having to specify the columns in the order they appear in the index in order to get best performance with the index. As stated - no idea if this is the case with newer Oracle's, but was the case with older ones (apparently).
Whether this explains this particular construction, I can't say. It could simply be less-thean-optimal code if it's changed over the years and a clean-up is what it's begging for.
LEFT and RIGHT join are pure syntax sugar.
Any LEFT JOIN can be transformed into a RIGHT JOIN merely by switching the sets.
Pre-9i Oracle used this construct:
WHERE table1.col(+) = table2.col
, (+) here denoting the nullable column, and LEFT and RIGHT joins could be emulated by mere switching:
WHERE table1.col = table2.col(+)
In MySQL, there is no FULL OUTER JOIN and it needs to be emulated.
Ususally it is done this way:
SELECT *
FROM table1
LEFT JOIN
table2
ON table1.col = table2.col
UNION ALL
SELECT *
FROM table1
RIGHT JOIN
table2
ON table1.col = table2.col
WHERE table1.col IS NULL
, and it's more convenient to copy the JOIN and replace LEFT with RIGHT, than to swap the tables.
Note that in SQL Server plans, Hash Left Semi Join and Hash Right Semi Join are different operators.
For the query like this:
SELECT *
FROM table1
WHERE table1.col IN
(
SELECT col
FROM table2
)
, Hash Match (Left Semi Join) hashes table1 and removes the matched elements from the hash table in runtime (so that they cannot match more than one time).
Hash Match (Right Semi Join) hashes table2 and removes the duplicate elements from the hash table while building it.
I may be missing something here, but the only difference between LEFT and RIGHT joins is which order the source tables were written in, and so having multiple LEFT joins or multiple RIGHT joins is no different to having a mix. The equivalence to FULL OUTERs could be achieved just as easily with all LEFT/RIGHT than with a mix, n'est pas?
We have some LEFT OUTER JOINs and RIGHT OUTER JOINs in the same query. Typically such queries are large, have been around a long time, probably badly written in the first place and have received infrequent maintenance. I assume the RIGHT OUTER JOINs were introduced as a means of maintaining the query without taking on the inevitable risk when refactoring a query significantly.
I think most SQL coders are most confortable with using all LEFT OUTER JOINs, probably because a FROM clause is read left-to-right in the English way.
The only time I use a RIGHT OUTER JOIN myself is when when writing a new query based on an existing query (no need to reinvent the wheel) and I need to change an INNER JOIN to an OUTER JOIN. Rather than change the order of the JOINs in the FROM clause just to be able to use a LEFT OUTER JOIN I would instead use a RIGHT OUTER JOIN and this would not bother me. This is quite rare though. If the original query had LEFT OUTER JOINs then I'd end up with a mix of LEFT- and RIGHT OUTER JOINs, which again wouldn't bother me. Hasn't happened to me yet, though.
Note that for SQL products such as the Access database engine that do not support FULL OUTER JOIN, one workaround is to UNION a LEFT OUTER JOIN and a RIGHT OUTER JOIN in the same query.
The bottom line is that this is a very poorly formatted SQL statement and should be re-written. Many of the ON clauses are located far from their JOIN statements, which I am not sure is even valid SQL.
For clarity's sake, I would rewrite the query using all LEFT JOINS (rather than RIGHT), and locate the using statements underneath their corresponding JOIN clauses. Otherwise, this is a bit of a train wreck and is obfuscating the purpose of the query, making errors during future modifications more likely to occur.
doesn't this create implicit inner
joins in some places and implicit full
joins in others?
Perhaps you are assuming that because you don't see the ON clause for some joins, e.g., RIGHT OUTER JOIN Table4, but it is located down below, ON Table4.T7FK= Table7.T7PK. I don't see any implicit inner joins, which could occur if there was a WHERE clause like WHERE Table3.T3PK is not null.
The fact that you are asking questions like this is a testament to the opaqueness of the query.
To answer another portion of this question that hasn't been answered yet, the reason this query is formatted so oddly is that it's likely built using the Query Designer inside SQL Management Studio. The give away is the combined ON clauses that happen many lines after the table is mentioned. Essentially tables get added in the build query window and the order is kept even if that way things are connected would favor moving a table up, so to speak, and keeping all the joins a certain direction.