How do JOINs distribute over OUTER JOINs? - sql

Are there any case where these two are not equivalent?
A OUTER JOIN (B JOIN C)
A OUTER JOIN C OUTER JOIN B

Yes:
In your first example (A OUTERJOIN (B JOIN C)), if either B or C does not have a matching record, both B and C are omitted.
In your second example (A OUTERJOIN C OUTERJOIN B), C can be returned even if B does not have a matching record.

Still same as my comment. If B join C produces an empty result set, then A outer join "empty result set" is the same as just A.
A outer join B outer join C is something different (as least if one of B and C are not empty.)

Since A seems to connect to C, it would be clearer to write the first option as:
A OUTERJOIN (C JOIN B)
The key difference between your two options is whether data from C is returned when there is a match between A and C. Just looking at this aspect the two options could be viewed as the sets:
intersect(A,intersect(C,B))
intersect(A,C)
Clearly, the two are different since the first form can eliminate rows from C before it is intersected with A.

What fields you join on can make a difference as tables are not guaranteed to have only one possible field to join to another table on. And in this case does table b have a field to join to either table c or table a? That makes a differnce. What fields you want returned make a difference to the results set and whther two things will return the same results. The state of the data makes a differnce as some queries will appear to be equivalent until the data changes. SO understnding that these are not equivalent queires helps you avoid these mistakes. Whether you use a full, left or right outer join makes a difference as well. And finally what where clauses you add can make a differnce in whther they appear to be equivalent.
Check out these examples using temp tables (SQL server syntax)
create table #a (aid int, sometext varchar(50))
create table #b (bid int, sometext2 varchar(50), cid int, aid int)
create table #c (cid int, sometext3 varchar(50), aid int)
insert into #a
values(1, 'test') , (2, 'test2'), (3, 'test3')
insert into #b
values(1, 'test', 1, 2) , (2, 'test2', 2, 1), (3, 'test3', 2, 2)
insert into #c
values(1, 'test', 1) , (2, 'test2', 2), (3, 'test3', 1)
select *
from #a a
left outer join #c c on a.aid = c.aid
left outer join #b b on a.aid = b.aid
select *
from #a a
left outer join #c c on a.aid = c.aid
left outer join #b b on c.cid = b.cid
select *
from #a a
left outer join #b b
join #c c on b.cid = c.cid
on a.aid = b.aid
select *
from #a a
right outer join #c c on a.aid = c.aid
right outer join #b b on a.aid = b.aid
select *
from #a a
right outer join #c c on a.aid = c.aid
right outer join #b b on c.cid = b.cid
select *
from #a a
right outer join #b b
join #c c on b.cid = c.cid
on a.aid = b.aid
select *
from #a a
full outer join #c c on a.aid = c.aid
full outer join #b b on a.aid = b.aid
select *
from #a a
full outer join #c c on a.aid = c.aid
full outer join #b b on c.cid = b.cid
select *
from #a a
full outer join #b b
join #c c on b.cid = c.cid
on a.aid = b.aid

Related

Multiple Joins and Subquery with Not Exists

I want to delete records from the BETA TABLE that doesn't exist in the ALPHA Table as well as excludes the records obtained by the inner join of CHARLIE and DELTA tables.
This is the query for A & B
SELECT B.* from ALPHA A right join
BETA B ON A.ID= B.ID
where A.ID is NULL
This gives me the records of BETA Table that don't exist in Alpha Table
Now my second query is
SELECT C.* FROM CHARLIE C INNER JOIN
DELETE D ON
C.ID=D.ID
This gives me the records from the inner join of CHARLIE AND DELTA
I have tried using the query below, but it doesn't work and doesn't delete anything
DELETE B
From ALPHA A right join
BETA B ON A.ID = B.ID
where A.ID is NULL AND NOT EXISTS
( SELECT C.* FROM CHARLIE C INNER JOIN DELTA D ON
C.ID = D.ID WHERE B.ID = C.ID )
I would really appreciate any help.
I have solutions for you using "IN" and "Exists". Please check and let me know.
DECLARE #ALPHA TABLE( ID INT,VAL1 DECIMAL(18,2),VAL2 DECIMAL(18,2));
DECLARE #BETA TABLE( ID INT,VAL1 DECIMAL(18,2),VAL2 DECIMAL(18,2));
DECLARE #CHARLIE TABLE( ID INT,VAL1 DECIMAL(18,2),VAL2 DECIMAL(18,2));
DECLARE #DELTA TABLE( ID INT,VAL1 DECIMAL(18,2),VAL2 DECIMAL(18,2));
------------------------USING IN-------------------------------
DELETE FROM #BETA
WHERE ID IN (SELECT B.ID from #ALPHA A right join #BETA B ON A.ID= B.ID
WHERE A.ID is NULL )
OR ID IN (SELECT C.ID FROM #CHARLIE C INNER JOIN #DELTA D ON C.ID=D.ID)
-------------------Using Exists--------------------
DELETE FROM #BETA
WHERE Exists (SELECT B.* from #ALPHA A right join #BETA B ON A.ID= B.ID
where A.ID is NULL )
OR Exists (SELECT C.* FROM #CHARLIE C INNER JOIN #DELTA D ON C.ID=D.ID)
Note: I have used Sql table variables instead of Normal table.
Just use not exists/exists for both. I am not sure what "as well as excludes the records obtained by the inner join of CHARLIE and DELTA tables".
If you want to delete records where both conditions are met, then
DELETE B FROM BETA B
WHERE NOT EXISTS (SELECT 1 FROM ALPHA A WHERE A.id = B.ID
) OR
EXISTS (SELECT 1
FROM CHARLIE C INNER JOIN
DELTA D
ON C.ID = D.ID
WHERE B.ID = C.ID
) ;
If you don't want the CHARLIE/DELTA records to be excluded, then use AND NOT EXISTS rather than OR EXISTS.

Left join inside left join

I have problem getting values from tables.
I need something like this
A.Id a1
B.Id b1
C.Id c1
B.Id b2
C.Id c2
C.Id c3
C.Id c4
Table A and B are joined together and also table B and C.
Table A can have one/zero or more values from table B. Same situation is for values from table C.
I need to perform left join on table A over table B and inside that left join on table B over table C.
I tried with left join from table A and B, but don't know how to perform left join inside that left join.
Is that possible? What would syntax for that look like?
edit:
Data would look like this
ZZN1 P1 NULL
ZZN1 P2 NAB1
ZZN2 P3 NAB2
ZZN2 P3 NAB3
No need to nest the left joins, you can simply flatten them and let your RDMBS handle the logic.
Sample schema:
a(id)
b(id, aid) -- aid is a foreign key to a(id)
c(id, bid) -- bid is a foreign key to b(id)
Query:
select a.id, b.id, c.id
from a
left join b on b.aid = a.id
left join c on c.bid = b.id
If the first left join does not succeed, then the second one cannot be performed either, since joining column b.id will be null. On the other hand, if the first left join succeeds, then the second one may, or may not succeed, depending if the relevant bid is available in c.
SELECT A.Name, B.Name , C.Name
FROM A
LEFT JOIN B ON A.id = B.id
LEFT JOIN C ON B.id = C.id

SQL Server : joins new syntax(ANSI vs. non-ANSI SQL JOIN syntax)

I have tried to convert old MS sql join syntax to new join syntax but number of rows in the results not matching.
Original SQL:
select
b.Amount
from
TableA a, TableB b,TableC c, TableD d
where
a.inv_no *= b.inv_no and
a.inv_item *= b.inv_item and
c.currency *= b.cash_ccy and
d.tx_code *= b.cash_receipt
Converted SQL:
SELECT
b.AMOUNT
FROM
(TableA AS a
LEFT OUTER JOIN
TableB AS b ON a.INV_NO = b.INV_NO
AND a.inv_item = b.inv_item
LEFT OUTER JOIN
TableC AS c ON c.currency = b.cash_ccy)
LEFT OUTER JOIN
TableD as d ON d.tx_code = b.cash_receipt
Findings
Results are same on both original SQL and modified SQL upto joining of 3 tables but when joining the fourth table (TableD) to the modified SQL, the number of rows returned is different.
The order of fields within predicates is important when using SQL Server's (deprecated) proprietary ANSI 89 join syntax *= or =*
So while
SELECT *
FROM TableA AS A
LEFT JOIN TableB AS B
ON A.ColA = B.ColB;
Is exactly the same as
SELECT *
FROM TableA AS A
LEFT JOIN TableB AS B
ON B.ColB = A.ColA; -- NOTE ORDER HERE
The eqivalent
SELECT *
FROM TableA AS A, TableB AS b
WHERE A.ColA *= B.ColB;
Is not the same as
SELECT *
FROM TableA AS A, TableB AS b
WHERE B.ColA *= A.ColB;
This last query's ANSI 92 equivalent would be
SELECT *
FROM TableA AS A
RIGHT JOIN TableB AS B
ON A.ColA = B.ColB;
Or if you dislike RIGHT JOIN as much as I do you would probably write:
SELECT *
FROM TableB AS B
LEFT OUTER JOIN TableA AS A
ON B.ColB = A.ColA;
So actually the equivalent query in ANSI 92 join syntax would involve starting with TableA, TableC and TableD (since these are the leading fields in the original WHERE Clause). Then since there is no direct link between the three, you end up with a cross join
SELECT b.Amount
FROM TableA AS a
CROSS JOIN TableD AS d
CROSS JOIN TableC AS c
LEFT JOIN TableB AS B
ON c.currency = b.cash_ccy
AND d.tx_code = b.cash_receipt
AND a.INV_NO = b.INV_NO
AND a.inv_item = b.inv_item;
This is the equivalent rewrite, and explans the difference in the number of rows
WORKING EXAMPLE
Needs to be run on SQL Server 2008 or earlier with compatibility level 80 or less
-- SAMPLE DATA --
CREATE TABLE #TableA (Inv_No INT, Inv_item INT);
CREATE TABLE #TableB (Inv_No INT, Inv_item INT, cash_ccy INT, cash_receipt INT, Amount INT);
CREATE TABLE #TableC (currency INT);
CREATE TABLE #TableD (tx_code INT);
INSERT #TableA (inv_no, inv_item) VALUES (1, 1), (2, 2);
INSERT #TableB (inv_no, inv_item, cash_ccy, cash_receipt, Amount) VALUES (1, 1, 1, 1, 1), (2, 2, 2, 2, 2);
INSERT #TableC (currency) VALUES (1), (2), (3), (4);
INSERT #TableD (tx_code) VALUES (1), (2), (3), (4);
-- ORIGINAL QUERY(32 ROWS)
SELECT
b.Amount
FROM
#TableA a, #TableB b,#TableC c, #TableD d
WHERE
a.inv_no *= b.inv_no and
a.inv_item *= b.inv_item and
c.currency *= b.cash_ccy and
d.tx_code *= b.cash_receipt
-- INCORRECT ANSI 92 REWRITE (2 ROWS)
SELECT b.AMOUNT
FROM #TableA AS a
LEFT OUTER JOIN #TableB AS b
ON a.INV_NO = b.INV_NO
and a.inv_item = b.inv_item
LEFT OUTER JOIN #TableC AS c
ON c.currency = b.cash_ccy
LEFT OUTER JOIN #TableD as d
ON d.tx_code = b.cash_receipt;
-- CORRECT ANSI 92 REWRITE (32 ROWS)
SELECT b.Amount
FROM #TableA AS a
CROSS JOIN #TableD AS d
CROSS JOIN #TableC AS c
LEFT JOIN #TableB AS B
ON c.currency = b.cash_ccy
AND d.tx_code = b.cash_receipt
AND a.INV_NO = b.INV_NO
AND a.inv_item = b.inv_item;

Differences between forms of LEFT JOIN

What is the difference between these query
SELECT
A.AnyField
FROM
A
LEFT JOIN B ON B.AId = A.Id
LEFT JOIN C ON B.CId = C.Id
and this another query
SELECT
A.AnyField
FROM
A
LEFT JOIN
(
B
JOIN C ON B.CId = C.Id
) ON B.AId = A.Id
Original answer:
They are not the same.
For example a left join b left join c will return a rows, plus b rows even if there are no c rows.
a left join (b join c) will never return b rows if there are no c rows.
Added later:
SQL>create table a (id int);
SQL>create table b (id int);
SQL>create table c (id int);
SQL>insert into a values (1);
SQL>insert into a values (2);
SQL>insert into b values (1);
SQL>insert into b values (1);
SQL>insert into c values (2);
SQL>insert into c values (2);
SQL>insert into c values (2);
SQL>insert into c values (2);
SQL>select a.id from a left join b on a.id = b.id left join c on b.id = c.id;
id
===========
1
1
2
3 rows found
SQL>select a.id from a left join (b join c on b.id = c.id) on a.id = b.id;
id
===========
1
2
2 rows found
The first query is going to take ALL records from table a and then only records from table b where a.id is equal to b.id. Then it's going to take all records from table c where the resulting records in table b have a cid that matches c.id.
The second query is going to first JOIN b and c on the id. That is, records will only make it to the resultset from that join where the b.CId and the c.ID are the same, because it's an INNER JOIN.
Then the result of the b INNER JOIN c will be LEFT JOINed to table a. That is, the DB will take all records from a and only the records from the results of b INNER JOIN c where a.id is equal to b.id
The difference is that you may end up with more data from b in your first query since the DB isn't dropping records from your result set just because b.cid <> c.id.
For a visual, the following Venn diagram shows which records are available
SELECT
A.AnyField
FROM
A
LEFT JOIN B ON B.AId = A.Id
LEFT JOIN C ON B.CId = C.Id
In this query you are LEFT JOINing C with B which will give you all records possible with B whether or not there is a match to any records in C.
SELECT
A.AnyField
FROM
A
LEFT JOIN
(
B
JOIN C ON B.CId = C.Id
) ON B.AId = A.Id
In this query you are INNER JOINing C with B which will result in matching B and C records.
Both queries will give you the same result set as you are only pulling records from A so you will not see what records had matches and what did not in regards to B and C.

Order of Full Outer Joins Yields Different Number of Result Rows..Why?

I'm working with two tables A and B. Table A identifies equity securities and Table B has a number of details on the security.
For example, when B.Item = 5301, the row specifies price for a given security. When B.Item = 9999, the row specifies dividends for a given security. I am trying to get both price and dividends in the same row. In order to achieve this, I FULL JOINed table B twice to table A.
SELECT *
FROM a a
FULL JOIN (SELECT *
FROM b) b
ON b.code = a.code
AND b.item = 3501
FULL JOIN (SELECT *
FROM b) b2
ON b2.code = a.code
AND b.item = 9999
AND b2.year_ = b.year_
AND b.freq = b2.freq
AND b2.seq = b.seq
WHERE a.code IN ( 122514 )
The remaining fields in the join clause like Year_, Freq, and Seq just make sure the dates of the price and dividends match. A.Code simply identifies a single security.
My issue is that when I flip the order of the full joins I get a different number of results. So if b.Item = 9999 comes before b.Item 2501, I get one result. The other way around I get 2 results. I realized the table B has zero entries for security 122514 for dividend, but has two entries for price.
When price is specified first, I get both prices and dividend fields are null. However, when dividend is specified first, I get NULLs for the dividend fields and also nulls for the prices fields.
Why aren't the two price entries showing up? I would expect them to do so in a FULL JOIN
It's because your second FULL OUTER JOIN refers to your first FULL OUTER JOIN. This means changing the order of them is making a fundamental change to the query.
Here is some pseudo-SQL that demonstrates how this works:
DECLARE #a TABLE (Id INT, Name VARCHAR(50));
INSERT INTO #a VALUES (1, 'Dog Trades');
INSERT INTO #a VALUES (2, 'Cat Trades');
DECLARE #b TABLE (Id INT, ItemCode VARCHAR(1), PriceDate DATE, Price INT, DividendDate DATE, Dividend INT);
INSERT INTO #b VALUES (1, 'p', '20141001', 100, '20140101', 1000);
INSERT INTO #b VALUES (1, 'p', '20141002', 50, NULL, NULL);
INSERT INTO #b VALUES (2, 'c', '20141001', 10, '20141001', 500);
INSERT INTO #b VALUES (2, 'c', NULL, NULL, '20141002', 300);
--Same results
SELECT a.*, b1.*, b2.* FROM #a a FULL OUTER JOIN #b b1 ON b1.Id = a.Id AND b1.ItemCode = 'p' FULL OUTER JOIN #b b2 ON b2.Id = a.Id AND b2.ItemCode = 'c';
SELECT a.*, b2.*, b1.* FROM #a a FULL OUTER JOIN #b b1 ON b1.Id = a.Id AND b1.ItemCode = 'c' FULL OUTER JOIN #b b2 ON b2.Id = a.Id AND b2.ItemCode = 'p';
--Different results
SELECT a.*, b1.*, b2.* FROM #a a FULL OUTER JOIN #b b1 ON b1.Id = a.Id AND b1.ItemCode = 'p' FULL OUTER JOIN #b b2 ON b2.Id = a.Id AND b2.ItemCode = 'c' AND b2.DividendDate = b1.PriceDate;
SELECT a.*, b2.*, b1.* FROM #a a FULL OUTER JOIN #b b1 ON b1.Id = a.Id AND b1.ItemCode = 'c' FULL OUTER JOIN #b b2 ON b2.Id = a.Id AND b2.ItemCode = 'p' AND b2.DividendDate = b1.PriceDate;