"On" left join order - sql

I've read through 20+ posts with a similar title, but failed to find an answer, so apologies in advance if one is available.
I have always believed that
select * FROM A LEFT JOIN B on ON A.ID = B.ID
was equivalent to
select * FROM A LEFT JOIN B on ON B.ID = A.ID
but was told today that "since you have a left join, you must have it as A = B, because flipped it will act as an inner join.
Any truth to this?

Whoever told you that does not understand how JOINs and join conditions work. He/She is completely wrong.
The order of the tables matters for a left join. a left join b is different than b left join a, but the order of the join condition is meaningless.

A.ID = B.ID is the condition on which the tables are joined and returns TRUE or FALSE.
Since equality(=) is commutative, the order of the operands does not affect the result.

They are completely incorrect and it is trivial to prove.
DECLARE #A TABLE (ID INT)
DECLARE #B TABLE (ID INT)
INSERT INTO #A(ID) SELECT 1
INSERT INTO #A(ID) SELECT 2
INSERT INTO #B(ID) SELECT 1
SELECT *
FROM #A a
LEFT JOIN #B b ON a.ID=b.ID
SELECT *
FROM #A a
LEFT JOIN #B b ON b.ID=a.ID
The order of the tables matter (A Left JOIN B versus B LEFT JOIN A), the order of the join condition group matter if an OR is used (A=B OR A IS NULL AND A IS NOT NULL - always use parentheses with OR), but within a condition group(a.ID=b.ID for example) it doesn't matter.

Related

RIGHT JOIN in place of subselect - a genuine use case?

I have avoided RIGHT OUTER JOIN, since the same can be achieved using LEFT OUTER JOIN if you reorder the tables.
However, recently I have been working with the need to have large numbers of joins, and I often encounter a pattern where a series of INNER JOINs are LEFT JOINed to a sub select which itself contains many INNER JOINs:
SELECT *
FROM Tab_1 INNER JOIN Tab_2 INNER JOIN Tab_3...
LEFT JOIN (SELECT *
FROM Tab_4 INNER JOIN Tab_5 INNER JOIN Tab_6....
)...
The script is hard to read. I often encounter sub sub selects. Some are correlated sub-selects and performance across the board is not good (probably not only because of the way the scripts are written).
I could of tidy it up in several ways, such as using common table expressions, views, staging tables etc, but a single RIGHT JOIN could remove the need for the sub selects. In many cases, doing so would improve performance.
In the example below, is there a way to replicate the result given by the first two SELECT statements, but using only INNER and LEFT joins?
DECLARE #A TABLE (Id INT)
DECLARE #B TABLE (Id_A INT, Id_C INT)
DECLARE #C TABLE (Id INT)
INSERT #A VALUES (1),(2)
INSERT #B VALUES (1,10),(2,20),(1,20)
INSERT #C VALUES (10),(30)
-- Although we want to see all the rows in A, we only want to see rows in C that have a match in B, which must itself match A
SELECT A.Id, T.Id
FROM
#A AS A
LEFT JOIN ( SELECT *
FROM #B AS B
INNER JOIN #C AS C ON B.Id_C = C.Id) AS T ON A.Id = T.Id_A;
-- NB Right join as although B and C MUST match, we only want to see them if they also have a row in A - otherwise null.
SELECT A.Id, C.Id
FROM
#B AS B
INNER JOIN #C AS C ON B.Id_C = C.Id
RIGHT JOIN #A AS A ON B.Id_A = A.Id;
Would you rather see the long-winded sub-selects, or a RIGHT JOIN, assuming decent comments in each case?
All the articles I have ever read have said pretty much what I think about RIGHT JOINS, that they are unecessary and confusing. Is this case strong enough to break the cultural aversion?
As #jarlh wrote most people think LEFT to RIGHT as much more intuitive, so it's very confusing to see RIGHT joins in the code.
In this cases sometimes I found that SQL Server creates better query plans when I use OUTER APPLY in combination with WHERE EXISTS clauses, over your LEFT JOINs and inner INNER JOIN with WHERE EXISTS
The result is not much different of what you have in your first example:
SELECT A.Id, T.Id
FROM
#A AS A
OUTER APPLY (
SELECT C.Id FROM #C AS C
WHERE EXISTS (SELECT 1 FROM #B AS B WHERE A.Id = B.Id_a AND B.Id_C = C.Id) )T;
I have found an answer to this question in the old scripts that I was going through - I came across this syntax which performs the same function as the RIGHT JOIN example, using LEFT JOINs (or at least I think it does - it certainly gives the correct results in the example):
DECLARE #A TABLE (Id INT)
DECLARE #B TABLE (Id_A INT, Id_C INT)
DECLARE #C TABLE (Id INT)
INSERT #A VALUES (1),(2)
INSERT #B VALUES (1,10),(2,20),(1,20)
INSERT #C VALUES (10),(30)
SELECT
A.Id, C.Id
FROM
#A AS A
LEFT JOIN #B AS B
INNER JOIN #C AS C
ON C.Id = B.Id_C
ON B.Id_A = A.Id
I don't know if there is a name for this pattern, which I have not seen before in other places of work, but it seems to work like a "nested" join, allowing the LEFT JOIN to preserve rows from the later INNER JOIN.
EDIT: I have done some more research and apparently this is an ANSI SQL syntax for nesting joins, but... it does not seem to be very popular!
Descriptive Article
Relevant Stack Exchange Question and Answer

How to use oracle outer join with a filter where clause

If i write a sql:
select *
from a,b
where a.id=b.id(+)
and b.val="test"
and i want all records from a where corresponding record in b does not exist or it exists with val="test", is this the correct query?
You're much better off using the ANSI syntax
SELECT *
FROM a
LEFT OUTER JOIN b ON( a.id = b.id and
b.val = 'test' )
You can do the same thing using Oracle's syntax as well but it gets a bit hinkey
SELECT *
FROM a,
b
WHERE a.id = b.id(+)
AND b.val(+) = 'test'
Note that in both cases, I'm ignoring the c table since you don't specify a join condition. And I'm assuming that you don't really want to join A to B and then generate a Cartesian product with C.
Move the condition into the JOIN clause and use the ANSI standard join pattern.
SELECT NameYourFields,...
FROM A
LEFT OUTER JOIN B
ON A.ID = B.ID
AND B.VAL = 'test'
INNER JOIN C
ON ...
A LEFT OUTER JOIN is one of the JOIN operations that allow you to specify a join clause. It preserves the unmatched rows from the first (left) table, joining them with a NULL row in the shape of the second (right) table.
So you can do as follows :
SELECT
FROM a LEFT OUTER JOIN b
ON a.id = b.id
--Note that you have used double quote "test" which is not used for varchar in SQL you should use single quote 'test'
AND b.val = 'test';
SELECT * FROM abc a, xyz b
WHERE a.id = b.id
AND b.val = 'test'

Do I have to do a LEFT JOIN after a RIGHT JOIN?

Say I have three tables in SQL server 2008 R2
SELECT a.*, b.*, c.*
FROM
Table_A a
RIGHT JOIN Table_B b ON a.id = b.id
LEFT JOIN Table_C c ON b.id = c.id
or
SELECT a.*, b.*, c.*
FROM
Table_A a
RIGHT JOIN Table_B b ON a.id = b.id
JOIN Table_C c ON b.id = c.id
also, does it matter if I use b.id or a.id on joining c?
i.e. instead of JOIN Table_C c ON b.id = c.id, use JOIN Table_C c ON a.id = c.id
Thank you!
If it doesn't change the semantics of the query, the database server can reorder the joins to run in whichever way it thinks is more efficient.
Usually, if you want to force a certain order, you can use inline view subqueries, as in
SELECT a.*, x.*
FROM
Table_A a
RIGHT JOIN
(
SELECT *, b.id as id2 FROM Table_B b
LEFT JOIN Table_C c ON b.id = c.id
) x
ON a.id = x.id2
According to the definitions:
JOIN
: Return rows when there is at least one match in both tables
LEFT JOIN Return all rows from the left table, even if there are no matches in the right table
RIGHT JOIN Return all rows from the right table, even if there are no matches in the left table
The first option would include all raws from the 1st Join on Tables a and b even if there are no matching ones in table c, while the second statement would show only raws which match ones in table c.
regarding the second question i guess it would make a difference, since the 1st join includes all ids from table b, even though there are no matching ones in table a, so once you change your Join creterium to a.id you will get a different set of ids than b.id.
Yes, you do need a LEFT JOIN after a RIGHT JOIN
See
http://sqlfiddle.com/#!3/2c079/5/0
http://sqlfiddle.com/#!3/2c079/6/0
If you don't, the (inner) JOIN at the end will cancel out the effect of your RIGHT JOIN.
That wouldn't make any sense to have a RIGHT JOIN if you don't care. And if you care, you will have to add a LEFT JOIN after it.

Using Join based on condition

Can anyone please explain me how can we use join on the basis of condition.
Lets say i am filtering data on the basis of a condition now my concern is if a particular BIT type parameters value is 1 then the data set include one more join else return same as earlier.
Here is three tables A,B,C
now i want to make a proc which has the #bool bit parameter
if #bool=0
then
select A.* from A
inner join B on B.id=A.id
and if #bool=1
then
select A.* from A
INNER JOIN B on B.id=A.id
inner join C on C.id=A.id
Thanks In Advance.
What you have will work (certainly in a SPROC in MS SQL Server anyway) with minor mods.
if #bool=0 then
select A.* from A
inner join B on B.id=A.id
else if #bool=1 then -- Or just else if #boll is limited to [0,1]
select A.* from A
INNER JOIN B on B.id=A.id
inner join C on C.id=A.id
However, the caveat is that SQL parameter sniffing will cache a plan for the first path it goes down, which won't necessarily be optimal for other paths through your code.
Also, if you do take this 'multiple alternative query' approach to your procs, it is generally a good idea to ensure that the column names and types returned are identitical in all cases (Your query is fine because it is A.*).
Edit
Assuming that you are using SQL Server, an alternative is to use dynamic sql:
DECLARE #sql NVARCHAR(MAX)
SET #sql = N'select A.* from A
inner join B on B.id=A.id'
IF #bool = 1
SET #sql = #sql + N' inner join C on C.id=A.id'
sp_executesql #sql
If you need to add filters etc, have a look at this post: Add WHERE clauses to SQL dynamically / programmatically
select A.* from A
inner join B on B.id = A.id
left outer join C on C.id = A.id and #bool = 1
where (#bool = 1 and C.id is not null) or #bool = 0
The #bool = 1 "activates" the left outer join, so to speak, and turns it, in effect, into an inner join by applying it in the WHERE clause, too. If #bool = 0 then the left outer join returns nothing from C and removes the WHERE restriction.
Try the following query
SELECT A.*
FROM A
INNER JOIN B on B.id=A.id
INNER JOIN C on C.id=A.id and #bool=1
You do it using a union:
SELECT A.*
FROM A
INNER JOIN B on B.id=A.id
WHERE bool = 0
UNION ALL
SELECT A.*
FROM A
INNER JOIN B on B.id=A.id
INNER JOIN C on C.id=A.id
WHERE bool = 1
I'm assuming that bool is stored in table A or B.

How do I find records that are not joined?

I have two tables that are joined together.
A has many B
Normally you would do:
select * from a,b where b.a_id = a.id
To get all of the records from a that has a record in b.
How do I get just the records in a that does not have anything in b?
select * from a where id not in (select a_id from b)
Or like some other people on this thread says:
select a.* from a
left outer join b on a.id = b.a_id
where b.a_id is null
select * from a
left outer join b on a.id = b.a_id
where b.a_id is null
The following image will help to understand SQL LET JOIN :
Another approach:
select * from a where not exists (select * from b where b.a_id = a.id)
The "exists" approach is useful if there is some other "where" clause you need to attach to the inner query.
SELECT id FROM a
EXCEPT
SELECT a_id FROM b;
You will probably get a lot better performance (than using 'not in') if you use an outer join:
select * from a left outer join b on a.id = b.a_id where b.a_id is null;
SELECT <columnns>
FROM a WHERE id NOT IN (SELECT a_id FROM b)
In case of one join it is pretty fast, but when we are removing records from database which has about 50 milions records and 4 and more joins due to foreign keys, it takes a few minutes to do it.
Much faster to use WHERE NOT IN condition like this:
select a.* from a
where a.id NOT IN(SELECT DISTINCT a_id FROM b where a_id IS NOT NULL)
//And for more joins
AND a.id NOT IN(SELECT DISTINCT a_id FROM c where a_id IS NOT NULL)
I can also recommended this approach for deleting in case we don't have configured cascade delete.
This query takes only a few seconds.
The first approach is
select a.* from a where a.id not in (select b.ida from b)
the second approach is
select a.*
from a left outer join b on a.id = b.ida
where b.ida is null
The first approach is very expensive. The second approach is better.
With PostgreSql 9.4, I did the "explain query" function and the first query as a cost of cost=0.00..1982043603.32.
Instead the join query as a cost of cost=45946.77..45946.78
For example, I search for all products that are not compatible with no vehicles. I've 100k products and more than 1m compatibilities.
select count(*) from product a left outer join compatible c on a.id=c.idprod where c.idprod is null
The join query spent about 5 seconds, instead the subquery version has never ended after 3 minutes.
Another way of writing it
select a.*
from a
left outer join b
on a.id = b.id
where b.id is null
Ouch, beaten by Nathan :)
This will protect you from nulls in the IN clause, which can cause unexpected behavior.
select * from a where id not in (select [a id] from b where [a id] is not null)