RIGHT JOIN in place of subselect - a genuine use case? - sql

I have avoided RIGHT OUTER JOIN, since the same can be achieved using LEFT OUTER JOIN if you reorder the tables.
However, recently I have been working with the need to have large numbers of joins, and I often encounter a pattern where a series of INNER JOINs are LEFT JOINed to a sub select which itself contains many INNER JOINs:
SELECT *
FROM Tab_1 INNER JOIN Tab_2 INNER JOIN Tab_3...
LEFT JOIN (SELECT *
FROM Tab_4 INNER JOIN Tab_5 INNER JOIN Tab_6....
)...
The script is hard to read. I often encounter sub sub selects. Some are correlated sub-selects and performance across the board is not good (probably not only because of the way the scripts are written).
I could of tidy it up in several ways, such as using common table expressions, views, staging tables etc, but a single RIGHT JOIN could remove the need for the sub selects. In many cases, doing so would improve performance.
In the example below, is there a way to replicate the result given by the first two SELECT statements, but using only INNER and LEFT joins?
DECLARE #A TABLE (Id INT)
DECLARE #B TABLE (Id_A INT, Id_C INT)
DECLARE #C TABLE (Id INT)
INSERT #A VALUES (1),(2)
INSERT #B VALUES (1,10),(2,20),(1,20)
INSERT #C VALUES (10),(30)
-- Although we want to see all the rows in A, we only want to see rows in C that have a match in B, which must itself match A
SELECT A.Id, T.Id
FROM
#A AS A
LEFT JOIN ( SELECT *
FROM #B AS B
INNER JOIN #C AS C ON B.Id_C = C.Id) AS T ON A.Id = T.Id_A;
-- NB Right join as although B and C MUST match, we only want to see them if they also have a row in A - otherwise null.
SELECT A.Id, C.Id
FROM
#B AS B
INNER JOIN #C AS C ON B.Id_C = C.Id
RIGHT JOIN #A AS A ON B.Id_A = A.Id;
Would you rather see the long-winded sub-selects, or a RIGHT JOIN, assuming decent comments in each case?
All the articles I have ever read have said pretty much what I think about RIGHT JOINS, that they are unecessary and confusing. Is this case strong enough to break the cultural aversion?

As #jarlh wrote most people think LEFT to RIGHT as much more intuitive, so it's very confusing to see RIGHT joins in the code.
In this cases sometimes I found that SQL Server creates better query plans when I use OUTER APPLY in combination with WHERE EXISTS clauses, over your LEFT JOINs and inner INNER JOIN with WHERE EXISTS
The result is not much different of what you have in your first example:
SELECT A.Id, T.Id
FROM
#A AS A
OUTER APPLY (
SELECT C.Id FROM #C AS C
WHERE EXISTS (SELECT 1 FROM #B AS B WHERE A.Id = B.Id_a AND B.Id_C = C.Id) )T;

I have found an answer to this question in the old scripts that I was going through - I came across this syntax which performs the same function as the RIGHT JOIN example, using LEFT JOINs (or at least I think it does - it certainly gives the correct results in the example):
DECLARE #A TABLE (Id INT)
DECLARE #B TABLE (Id_A INT, Id_C INT)
DECLARE #C TABLE (Id INT)
INSERT #A VALUES (1),(2)
INSERT #B VALUES (1,10),(2,20),(1,20)
INSERT #C VALUES (10),(30)
SELECT
A.Id, C.Id
FROM
#A AS A
LEFT JOIN #B AS B
INNER JOIN #C AS C
ON C.Id = B.Id_C
ON B.Id_A = A.Id
I don't know if there is a name for this pattern, which I have not seen before in other places of work, but it seems to work like a "nested" join, allowing the LEFT JOIN to preserve rows from the later INNER JOIN.
EDIT: I have done some more research and apparently this is an ANSI SQL syntax for nesting joins, but... it does not seem to be very popular!
Descriptive Article
Relevant Stack Exchange Question and Answer

Related

SQL Intersect VS Inner Join [duplicate]

This question already has an answer here:
What's different between INTERSECT and JOIN?
(1 answer)
Closed 4 years ago.
I understand, that INNER JOIN is made for referenced keys and INTERSECT is not. But afaik in some cases, both of them can do the same thing. So, is there a difference (in performance or anything) between the following two expressions? And if there is, which one is better?
Expression 1:
SELECT id FROM customers
INNER JOIN orders ON customers.id = orders.customerID;
Expression 2:
SELECT id FROM customers
INTERSECT
SELECT customerID FROM orders
They are very different, even in your case.
The INNER JOIN will return duplicates, if id is duplicated in either table. INTERSECT removes duplicates. The INNER JOIN will never return NULL, but INTERSECT will return NULL.
The two are very different; INNER JOIN is an operator that generally matches on a limited set of columns and can return zero rows or more rows from either table. INTERSECT is a set-based operator that compares complete rows between two sets and can never return more rows than in the smaller table.
Try the following, for example:
CREATE TABLE #a (id INT)
CREATE TABLE #b (id INT)
INSERT INTO #a VALUES (1), (NULL), (2)
INSERT INTO #b VALUES (1), (NULL), (3), (1)
SELECT a.id FROM #a a
INNER JOIN #b b ON a.id = b.id
SELECT id FROM #a
INTERSECT
SELECT id FROM #b

"On" left join order

I've read through 20+ posts with a similar title, but failed to find an answer, so apologies in advance if one is available.
I have always believed that
select * FROM A LEFT JOIN B on ON A.ID = B.ID
was equivalent to
select * FROM A LEFT JOIN B on ON B.ID = A.ID
but was told today that "since you have a left join, you must have it as A = B, because flipped it will act as an inner join.
Any truth to this?
Whoever told you that does not understand how JOINs and join conditions work. He/She is completely wrong.
The order of the tables matters for a left join. a left join b is different than b left join a, but the order of the join condition is meaningless.
A.ID = B.ID is the condition on which the tables are joined and returns TRUE or FALSE.
Since equality(=) is commutative, the order of the operands does not affect the result.
They are completely incorrect and it is trivial to prove.
DECLARE #A TABLE (ID INT)
DECLARE #B TABLE (ID INT)
INSERT INTO #A(ID) SELECT 1
INSERT INTO #A(ID) SELECT 2
INSERT INTO #B(ID) SELECT 1
SELECT *
FROM #A a
LEFT JOIN #B b ON a.ID=b.ID
SELECT *
FROM #A a
LEFT JOIN #B b ON b.ID=a.ID
The order of the tables matter (A Left JOIN B versus B LEFT JOIN A), the order of the join condition group matter if an OR is used (A=B OR A IS NULL AND A IS NOT NULL - always use parentheses with OR), but within a condition group(a.ID=b.ID for example) it doesn't matter.

Is there a fundamental difference between INTERSECT and INNER JOIN? [duplicate]

This question already has an answer here:
What's different between INTERSECT and JOIN?
(1 answer)
Closed 4 years ago.
I understand, that INNER JOIN is made for referenced keys and INTERSECT is not. But afaik in some cases, both of them can do the same thing. So, is there a difference (in performance or anything) between the following two expressions? And if there is, which one is better?
Expression 1:
SELECT id FROM customers
INNER JOIN orders ON customers.id = orders.customerID;
Expression 2:
SELECT id FROM customers
INTERSECT
SELECT customerID FROM orders
They are very different, even in your case.
The INNER JOIN will return duplicates, if id is duplicated in either table. INTERSECT removes duplicates. The INNER JOIN will never return NULL, but INTERSECT will return NULL.
The two are very different; INNER JOIN is an operator that generally matches on a limited set of columns and can return zero rows or more rows from either table. INTERSECT is a set-based operator that compares complete rows between two sets and can never return more rows than in the smaller table.
Try the following, for example:
CREATE TABLE #a (id INT)
CREATE TABLE #b (id INT)
INSERT INTO #a VALUES (1), (NULL), (2)
INSERT INTO #b VALUES (1), (NULL), (3), (1)
SELECT a.id FROM #a a
INNER JOIN #b b ON a.id = b.id
SELECT id FROM #a
INTERSECT
SELECT id FROM #b

Weird join on on behavior in tsql [duplicate]

This question already has answers here:
Strange / esoteric join syntax
(2 answers)
Closed 5 years ago.
I recently found old code that uses JOIN JOIN ON ON instead of the more familiar JOIN ON JOIN ON syntax.
DECLARE #a TABLE (
val INT
)
DECLARE #b TABLE (
val INT
)
DECLARE #c TABLE (
val INT
)
INSERT INTO #a VALUES (1),(2),(4)
INSERT INTO #b VALUES (1),(2),(4)
INSERT INTO #c VALUES (1),(2),(4)
SELECT *
FROM #a as a
join #b as b
join #c as c
on b.val = c.val on a.val = b.val
What I find weird now is that if you consult the query plan, first a and c is joined but there is not even a join condition a.val = c.val.
Can anybody explain the implicit evaluation of this case?
I would say it is query optimizer thing. First your query:
SELECT *
FROM #a as a
join #b as b
join #c as c
on b.val = c.val
on a.val = b.val;
Is the same as:
SELECT *
FROM #a AS A
JOIN ( #b AS B
JOIN #c AS C ON B.Val = C.Val
) ON A.Val = B.Val;
Second if you use hint:
FORCE ORDER
When you put this query hint on to your query, it tells SQL Server that when it executes the statement to not change the order of the joins in the query. It will join the tables in the exact order that is specified in the query.
Normally the SQL Server optimizer will rearrange your joins to be in the order that it thinks will be optimal for your query to execute.
SELECT *
FROM #a as a
join #b as b
join #c as c
on b.val = c.val
on a.val = b.val
OPTION (FORCE ORDER);
You will get:
Since you are joining:
#a and #b and
#b and #c
on the same b.val column, it's equivalent (and has better performance) if you just join these two tables together (on a.val = c.val) and then bring in everything from #b in the final result set.
Your join condition between #a and #c is not explicit, but implicit.
Additional miscellaneous info:
Also, since because you are joining table variables, it's most likely that the row estimates for each of the iterators in your execution plan (the table scans of #a, #b and #c) are going to be 1.
So, having this information around, SQL Server will most likely think that there's no reason to join 1 row tables in any particular order. So on some executions you could get #a and #b joined in the bottom branch of the execution plan and in others you could get #a and #c.
But this is just all speculation, what is certain is that the join conditions are implicit, but not explicit, which is why you're getting #a and #c joined first.

Using Join based on condition

Can anyone please explain me how can we use join on the basis of condition.
Lets say i am filtering data on the basis of a condition now my concern is if a particular BIT type parameters value is 1 then the data set include one more join else return same as earlier.
Here is three tables A,B,C
now i want to make a proc which has the #bool bit parameter
if #bool=0
then
select A.* from A
inner join B on B.id=A.id
and if #bool=1
then
select A.* from A
INNER JOIN B on B.id=A.id
inner join C on C.id=A.id
Thanks In Advance.
What you have will work (certainly in a SPROC in MS SQL Server anyway) with minor mods.
if #bool=0 then
select A.* from A
inner join B on B.id=A.id
else if #bool=1 then -- Or just else if #boll is limited to [0,1]
select A.* from A
INNER JOIN B on B.id=A.id
inner join C on C.id=A.id
However, the caveat is that SQL parameter sniffing will cache a plan for the first path it goes down, which won't necessarily be optimal for other paths through your code.
Also, if you do take this 'multiple alternative query' approach to your procs, it is generally a good idea to ensure that the column names and types returned are identitical in all cases (Your query is fine because it is A.*).
Edit
Assuming that you are using SQL Server, an alternative is to use dynamic sql:
DECLARE #sql NVARCHAR(MAX)
SET #sql = N'select A.* from A
inner join B on B.id=A.id'
IF #bool = 1
SET #sql = #sql + N' inner join C on C.id=A.id'
sp_executesql #sql
If you need to add filters etc, have a look at this post: Add WHERE clauses to SQL dynamically / programmatically
select A.* from A
inner join B on B.id = A.id
left outer join C on C.id = A.id and #bool = 1
where (#bool = 1 and C.id is not null) or #bool = 0
The #bool = 1 "activates" the left outer join, so to speak, and turns it, in effect, into an inner join by applying it in the WHERE clause, too. If #bool = 0 then the left outer join returns nothing from C and removes the WHERE restriction.
Try the following query
SELECT A.*
FROM A
INNER JOIN B on B.id=A.id
INNER JOIN C on C.id=A.id and #bool=1
You do it using a union:
SELECT A.*
FROM A
INNER JOIN B on B.id=A.id
WHERE bool = 0
UNION ALL
SELECT A.*
FROM A
INNER JOIN B on B.id=A.id
INNER JOIN C on C.id=A.id
WHERE bool = 1
I'm assuming that bool is stored in table A or B.