Weird join on on behavior in tsql [duplicate] - sql

This question already has answers here:
Strange / esoteric join syntax
(2 answers)
Closed 5 years ago.
I recently found old code that uses JOIN JOIN ON ON instead of the more familiar JOIN ON JOIN ON syntax.
DECLARE #a TABLE (
val INT
)
DECLARE #b TABLE (
val INT
)
DECLARE #c TABLE (
val INT
)
INSERT INTO #a VALUES (1),(2),(4)
INSERT INTO #b VALUES (1),(2),(4)
INSERT INTO #c VALUES (1),(2),(4)
SELECT *
FROM #a as a
join #b as b
join #c as c
on b.val = c.val on a.val = b.val
What I find weird now is that if you consult the query plan, first a and c is joined but there is not even a join condition a.val = c.val.
Can anybody explain the implicit evaluation of this case?

I would say it is query optimizer thing. First your query:
SELECT *
FROM #a as a
join #b as b
join #c as c
on b.val = c.val
on a.val = b.val;
Is the same as:
SELECT *
FROM #a AS A
JOIN ( #b AS B
JOIN #c AS C ON B.Val = C.Val
) ON A.Val = B.Val;
Second if you use hint:
FORCE ORDER
When you put this query hint on to your query, it tells SQL Server that when it executes the statement to not change the order of the joins in the query. It will join the tables in the exact order that is specified in the query.
Normally the SQL Server optimizer will rearrange your joins to be in the order that it thinks will be optimal for your query to execute.
SELECT *
FROM #a as a
join #b as b
join #c as c
on b.val = c.val
on a.val = b.val
OPTION (FORCE ORDER);
You will get:

Since you are joining:
#a and #b and
#b and #c
on the same b.val column, it's equivalent (and has better performance) if you just join these two tables together (on a.val = c.val) and then bring in everything from #b in the final result set.
Your join condition between #a and #c is not explicit, but implicit.
Additional miscellaneous info:
Also, since because you are joining table variables, it's most likely that the row estimates for each of the iterators in your execution plan (the table scans of #a, #b and #c) are going to be 1.
So, having this information around, SQL Server will most likely think that there's no reason to join 1 row tables in any particular order. So on some executions you could get #a and #b joined in the bottom branch of the execution plan and in others you could get #a and #c.
But this is just all speculation, what is certain is that the join conditions are implicit, but not explicit, which is why you're getting #a and #c joined first.

Related

RIGHT JOIN in place of subselect - a genuine use case?

I have avoided RIGHT OUTER JOIN, since the same can be achieved using LEFT OUTER JOIN if you reorder the tables.
However, recently I have been working with the need to have large numbers of joins, and I often encounter a pattern where a series of INNER JOINs are LEFT JOINed to a sub select which itself contains many INNER JOINs:
SELECT *
FROM Tab_1 INNER JOIN Tab_2 INNER JOIN Tab_3...
LEFT JOIN (SELECT *
FROM Tab_4 INNER JOIN Tab_5 INNER JOIN Tab_6....
)...
The script is hard to read. I often encounter sub sub selects. Some are correlated sub-selects and performance across the board is not good (probably not only because of the way the scripts are written).
I could of tidy it up in several ways, such as using common table expressions, views, staging tables etc, but a single RIGHT JOIN could remove the need for the sub selects. In many cases, doing so would improve performance.
In the example below, is there a way to replicate the result given by the first two SELECT statements, but using only INNER and LEFT joins?
DECLARE #A TABLE (Id INT)
DECLARE #B TABLE (Id_A INT, Id_C INT)
DECLARE #C TABLE (Id INT)
INSERT #A VALUES (1),(2)
INSERT #B VALUES (1,10),(2,20),(1,20)
INSERT #C VALUES (10),(30)
-- Although we want to see all the rows in A, we only want to see rows in C that have a match in B, which must itself match A
SELECT A.Id, T.Id
FROM
#A AS A
LEFT JOIN ( SELECT *
FROM #B AS B
INNER JOIN #C AS C ON B.Id_C = C.Id) AS T ON A.Id = T.Id_A;
-- NB Right join as although B and C MUST match, we only want to see them if they also have a row in A - otherwise null.
SELECT A.Id, C.Id
FROM
#B AS B
INNER JOIN #C AS C ON B.Id_C = C.Id
RIGHT JOIN #A AS A ON B.Id_A = A.Id;
Would you rather see the long-winded sub-selects, or a RIGHT JOIN, assuming decent comments in each case?
All the articles I have ever read have said pretty much what I think about RIGHT JOINS, that they are unecessary and confusing. Is this case strong enough to break the cultural aversion?
As #jarlh wrote most people think LEFT to RIGHT as much more intuitive, so it's very confusing to see RIGHT joins in the code.
In this cases sometimes I found that SQL Server creates better query plans when I use OUTER APPLY in combination with WHERE EXISTS clauses, over your LEFT JOINs and inner INNER JOIN with WHERE EXISTS
The result is not much different of what you have in your first example:
SELECT A.Id, T.Id
FROM
#A AS A
OUTER APPLY (
SELECT C.Id FROM #C AS C
WHERE EXISTS (SELECT 1 FROM #B AS B WHERE A.Id = B.Id_a AND B.Id_C = C.Id) )T;
I have found an answer to this question in the old scripts that I was going through - I came across this syntax which performs the same function as the RIGHT JOIN example, using LEFT JOINs (or at least I think it does - it certainly gives the correct results in the example):
DECLARE #A TABLE (Id INT)
DECLARE #B TABLE (Id_A INT, Id_C INT)
DECLARE #C TABLE (Id INT)
INSERT #A VALUES (1),(2)
INSERT #B VALUES (1,10),(2,20),(1,20)
INSERT #C VALUES (10),(30)
SELECT
A.Id, C.Id
FROM
#A AS A
LEFT JOIN #B AS B
INNER JOIN #C AS C
ON C.Id = B.Id_C
ON B.Id_A = A.Id
I don't know if there is a name for this pattern, which I have not seen before in other places of work, but it seems to work like a "nested" join, allowing the LEFT JOIN to preserve rows from the later INNER JOIN.
EDIT: I have done some more research and apparently this is an ANSI SQL syntax for nesting joins, but... it does not seem to be very popular!
Descriptive Article
Relevant Stack Exchange Question and Answer

"On" left join order

I've read through 20+ posts with a similar title, but failed to find an answer, so apologies in advance if one is available.
I have always believed that
select * FROM A LEFT JOIN B on ON A.ID = B.ID
was equivalent to
select * FROM A LEFT JOIN B on ON B.ID = A.ID
but was told today that "since you have a left join, you must have it as A = B, because flipped it will act as an inner join.
Any truth to this?
Whoever told you that does not understand how JOINs and join conditions work. He/She is completely wrong.
The order of the tables matters for a left join. a left join b is different than b left join a, but the order of the join condition is meaningless.
A.ID = B.ID is the condition on which the tables are joined and returns TRUE or FALSE.
Since equality(=) is commutative, the order of the operands does not affect the result.
They are completely incorrect and it is trivial to prove.
DECLARE #A TABLE (ID INT)
DECLARE #B TABLE (ID INT)
INSERT INTO #A(ID) SELECT 1
INSERT INTO #A(ID) SELECT 2
INSERT INTO #B(ID) SELECT 1
SELECT *
FROM #A a
LEFT JOIN #B b ON a.ID=b.ID
SELECT *
FROM #A a
LEFT JOIN #B b ON b.ID=a.ID
The order of the tables matter (A Left JOIN B versus B LEFT JOIN A), the order of the join condition group matter if an OR is used (A=B OR A IS NULL AND A IS NOT NULL - always use parentheses with OR), but within a condition group(a.ID=b.ID for example) it doesn't matter.

Where vs AND in LEFT JOIN [duplicate]

This question already has answers here:
Difference between "on .. and" and "on .. where" in SQL Left Join? [duplicate]
(6 answers)
Closed 8 years ago.
I normally don't use an AND in the same line as ON when performing a LEFT JOIN, as I have faced issues in the past. I rather prefer to not go into this pickle and instead put any additional condition in a WHERE clause, which is reliable. But today, out of curiousity, I would love to put this question forward and be clear once and for all.
Question: What really goes on when in a LEFT JOIN when I use the "exta" conditions? Why doesn't it behave in the same manner as WHERE?
SAMPLE QUERIES
create table #a
(
id int,
name varchar(3)
)
create table #b
(
id int,
name varchar(3)
)
insert into #a
select 1, 'abc'
union
select 2, 'def'
union
select 3, 'ghi'
insert into #b
select 1, 'abc'
union
select 2, 'def'
select * from #a a left join #b b on a.id = b.id
where a.id = 3
select * from #a a left join #b b on a.id = b.id
and a.id = 3
This version filters on a.id:
select *
from #a a left join
#b b
on a.id = b.id
where a.id = 3
This version does not filter on a.id:
select *
from #a a left join
#b b
on a.id = b.id and a.id = 3;
Why not? Go to the definition of the left join. It takes all rows from the first table, regardless of whether the on clause evaluates to true, false, or NULL. So, filters on the first table have no impact in a left join.
Filters on the first table should be in the where clause. Filters on the second table should be in the on clause.
You can put conditions on the "right tables" columns in the ON clause.

How i change this statement to dynamic SQL?

I trying to change this to be dynamic but I stuck at the set of data..
Example, the statement
SELECT * FROM A
WHERE id IN (1,2)
and also 1,2 come from
SELECT id FROM B
WHERE type='%xxx%'
Statement above can return many number
I try to declare #id but I have no idea
So, Have any idea?
Thank you for suggestion :)
SELECT * FROM A
WHERE id IN (
SELECT id FROM B
WHERE type='%xxx%')
This is called a subquery.
EDIT: When using a subquery is not an option and you want to use a variable, you can declare a temporary table and join table A with that table.
DECLARE #C table (id int)
INSERT #C (id)
SELECT id FROM B
WHERE type='%xxx%'
SELECT A.*
FROM A INNER JOIN #C c ON A.id = c.id
SELECT A.* FROM A
INNER JOIN B
WHERE B.type='%xxx%'
AND A.ID = B.ID
Perhaps you can just use inner join, it should return you the result set that you looking at

Does a SQL join only execute the minimum number of conditions?

In C# if I run the following.
if(obj.a() && obj.b()){
// do something
}
Function b will only execute if a returns true. Does the same thing happen below?
select
*
from
tablea a
inner join tableb b
isnumeric(b.col1) = 1
and cast(b.col1 as int) = a.id
Will the cast only be executed when b.col1 is a numeric?
You can simulate short-circuit evaluation using a CASE expression.
ON CASE WHEN ISNUMERIC(b.col1) = 1
THEN CAST(b.col1 AS int)
ELSE NULL
END = a.id
This covers short-circuat evaluation in SQL-Server deeply:
http://www.sqlservercentral.com/articles/T-SQL/71950/
In short: the evaluation order depends on the query optimizer.
Edit: As Martin commented this does not guarantee the order since it could also be optimized. From the above link(i should have read it completely):
When run against a SQL Server 2000, no error is thrown, but SQL Server
2005 and 2008 implement an optimization to push non-SARGable
predicates into the index scan from the subquery which causes the
statement to fail.
To avoid this issue, the query can be rewritten incorporating a CASE
expression, maybe a bit obscure, but guaranteed not to fail.
So this should guarantee that ISNUMERIC will be evaluated first:
SELECT aData.*,bData.*
FROM #TableA aData INNER JOIN #TableB bData
ON aData.id = CASE ISNUMERIC(bData.col1) WHEN 1 THEN CAST(bData.col1 AS INT) END
Ignore my first approach(which might not work everytime):
You should modify your join to ensure that is gets evaluated correctly:
SELECT aData.*,bData.*
FROM #TableA aData INNER JOIN
(
SELECT col1
FROM #TableB b
WHERE ISNUMERIC(b.col1) = 1
) AS bData
ON aData.id = CAST(bData.Col1 AS int)
Sample data:
create table #TableA(id int)
create table #TableB(col1 varchar(10))
insert into #TableA values(1);
insert into #TableA values(2);
insert into #TableA values(3);
insert into #TableA values(4);
insert into #TableB values('1');
insert into #TableB values('2');
insert into #TableB values(null);
insert into #TableB values('4abc');
SELECT aData.*,bData.*
FROM #TableA aData INNER JOIN
(
SELECT col1
FROM #TableB b
WHERE ISNUMERIC(b.col1) = 1
) AS bData
ON aData.id = CAST(bData.Col1 AS int)
drop table #TableA;
drop table #TableB;
Result:
id col1
1 1
2 2
From HERE:
No. Where the precedence is not determined by the Formats or by parentheses, effective evaluation of expressions is generally performed from left to right. However, it is implementation-dependent whether expressions are actually evaluated left to right, particularly when operands or operators might cause conditions to be raised or if the results of the expressions can be determined without completely evaluating all parts of the expression.