Does the join order matter in SQL? - sql

Disregarding performance, will I get the same result from query A and B below? How about C and D?
----- Scenario 1:
-- A (left join)
select *
from a left join b
on <blahblah>
left join c
on <blahblan>
-- B (left join)
select *
from a left join c
on <blahblah>
left join b
on <blahblan>
----- Scenario 2:
-- C (inner join)
select *
from a join b
on <blahblah>
join c
on <blahblan>
-- D (inner join)
select *
from a join c
on <blahblah>
join b
on <blahblan>

For INNER joins, no, the order doesn't matter. The queries will return same results, as long as you change your selects from SELECT * to SELECT a.*, b.*, c.*.
For (LEFT, RIGHT or FULL) OUTER joins, yes, the order matters - and (updated) things are much more complicated.
First, outer joins are not commutative, so a LEFT JOIN b is not the same as b LEFT JOIN a
Outer joins are not associative either, so in your examples which involve both (commutativity and associativity) properties:
a LEFT JOIN b
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
is equivalent to:
a LEFT JOIN c
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
but:
a LEFT JOIN b
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
AND c.bc_id = b.bc_id
is not equivalent to:
a LEFT JOIN c
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
AND b.bc_id = c.bc_id
Another (hopefully simpler) associativity example. Think of this as (a LEFT JOIN b) LEFT JOIN c:
a LEFT JOIN b
ON b.ab_id = a.ab_id -- AB condition
LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
This is equivalent to a LEFT JOIN (b LEFT JOIN c):
a LEFT JOIN
b LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
ON b.ab_id = a.ab_id -- AB condition
only because we have "nice" ON conditions. Both ON b.ab_id = a.ab_id and c.bc_id = b.bc_id are equality checks and do not involve NULL comparisons.
You can even have conditions with other operators or more complex ones like: ON a.x <= b.x or ON a.x = 7 or ON a.x LIKE b.x or ON (a.x, a.y) = (b.x, b.y) and the two queries would still be equivalent.
If however, any of these involved IS NULL or a function that is related to nulls like COALESCE(), for example if the condition was b.ab_id IS NULL, then the two queries would not be equivalent.

If you try joining C on a field from B before joining B, i.e.:
SELECT A.x,
A.y,
A.z
FROM A
INNER JOIN C
on B.x = C.x
INNER JOIN B
on A.x = B.x
your query will fail, so in this case the order matters.

for regular Joins, it doesn't. TableA join TableB will produce the same execution plan as TableB join TableA (so your C and D examples would be the same)
for left and right joins it does. TableA left Join TableB is different than TableB left Join TableA, BUT its the same than TableB right Join TableA

Oracle optimizer chooses join order of tables for inner join.
Optimizer chooses the join order of tables only in simple FROM clauses .
U can check the oracle documentation in their website.
And for the left, right outer join the most voted answer is right.
The optimizer chooses the optimal join order as well as the optimal index for each table. The join order can affect which index is the best choice. The optimizer can choose an index as the access path for a table if it is the inner table, but not if it is the outer table (and there are no further qualifications).
The optimizer chooses the join order of tables only in simple FROM clauses. Most joins using the JOIN keyword are flattened into simple joins, so the optimizer chooses their join order.
The optimizer does not choose the join order for outer joins; it uses the order specified in the statement.
When selecting a join order, the optimizer takes into account:
The size of each table
The indexes available on each table
Whether an index on a table is useful in a particular join order
The number of rows and pages to be scanned for each table in each join order

Related

Difference between JOIN expressions in Presto SQL

I would like to ask the difference between the following join expressions and in what conditions is Method 2 more preferred than Method 1.
You can imagine tables a, b and c to be CTEs for i.e. With a AS (xxxxx), b AS (xxxxx), and c AS (xxxxx).
Method 1:
Select
a.customerid,
b.customerage,
b.customermobile,
a.itemid,
c.itemname
from a
LEFT JOIN b on
a.customerid = b.customerid
LEFT JOIN c on
a.itemid = c.itemid
Method 2:
Select
a.customerid,
b.customerage,
b.customermobile,
a.itemid,
c.itemname
from ((( a
LEFT JOIN b on
(a.customerid = b.customerid))
LEFT JOIN c on
(a.itemid = c.itemid))
There is no difference. This structure:
from a left join
b left join
c
is exactly defined as
from (a left join
b
) left join
c
(I'm leaving out the on clauses to simplify the explanation.)
Note: The order of evaluation is important for outer joins. But even for inner joins, the above is subtly different from:
from a left join
(b left join
c
)
For instance, this won't even parse if the on clause between b and c references a as well.

Does the order of columns in a SQL join statement matter? [duplicate]

Disregarding performance, will I get the same result from query A and B below? How about C and D?
----- Scenario 1:
-- A (left join)
select *
from a left join b
on <blahblah>
left join c
on <blahblan>
-- B (left join)
select *
from a left join c
on <blahblah>
left join b
on <blahblan>
----- Scenario 2:
-- C (inner join)
select *
from a join b
on <blahblah>
join c
on <blahblan>
-- D (inner join)
select *
from a join c
on <blahblah>
join b
on <blahblan>
For INNER joins, no, the order doesn't matter. The queries will return same results, as long as you change your selects from SELECT * to SELECT a.*, b.*, c.*.
For (LEFT, RIGHT or FULL) OUTER joins, yes, the order matters - and (updated) things are much more complicated.
First, outer joins are not commutative, so a LEFT JOIN b is not the same as b LEFT JOIN a
Outer joins are not associative either, so in your examples which involve both (commutativity and associativity) properties:
a LEFT JOIN b
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
is equivalent to:
a LEFT JOIN c
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
but:
a LEFT JOIN b
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
AND c.bc_id = b.bc_id
is not equivalent to:
a LEFT JOIN c
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
AND b.bc_id = c.bc_id
Another (hopefully simpler) associativity example. Think of this as (a LEFT JOIN b) LEFT JOIN c:
a LEFT JOIN b
ON b.ab_id = a.ab_id -- AB condition
LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
This is equivalent to a LEFT JOIN (b LEFT JOIN c):
a LEFT JOIN
b LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
ON b.ab_id = a.ab_id -- AB condition
only because we have "nice" ON conditions. Both ON b.ab_id = a.ab_id and c.bc_id = b.bc_id are equality checks and do not involve NULL comparisons.
You can even have conditions with other operators or more complex ones like: ON a.x <= b.x or ON a.x = 7 or ON a.x LIKE b.x or ON (a.x, a.y) = (b.x, b.y) and the two queries would still be equivalent.
If however, any of these involved IS NULL or a function that is related to nulls like COALESCE(), for example if the condition was b.ab_id IS NULL, then the two queries would not be equivalent.
If you try joining C on a field from B before joining B, i.e.:
SELECT A.x,
A.y,
A.z
FROM A
INNER JOIN C
on B.x = C.x
INNER JOIN B
on A.x = B.x
your query will fail, so in this case the order matters.
for regular Joins, it doesn't. TableA join TableB will produce the same execution plan as TableB join TableA (so your C and D examples would be the same)
for left and right joins it does. TableA left Join TableB is different than TableB left Join TableA, BUT its the same than TableB right Join TableA
Oracle optimizer chooses join order of tables for inner join.
Optimizer chooses the join order of tables only in simple FROM clauses .
U can check the oracle documentation in their website.
And for the left, right outer join the most voted answer is right.
The optimizer chooses the optimal join order as well as the optimal index for each table. The join order can affect which index is the best choice. The optimizer can choose an index as the access path for a table if it is the inner table, but not if it is the outer table (and there are no further qualifications).
The optimizer chooses the join order of tables only in simple FROM clauses. Most joins using the JOIN keyword are flattened into simple joins, so the optimizer chooses their join order.
The optimizer does not choose the join order for outer joins; it uses the order specified in the statement.
When selecting a join order, the optimizer takes into account:
The size of each table
The indexes available on each table
Whether an index on a table is useful in a particular join order
The number of rows and pages to be scanned for each table in each join order

Getting a field from a table that is not in the left outer join in the same query

I have 4 tables, 3 of which are joined, but I need to get a field from the forth table (Table_D). For the sake of simplicity lets say they are Tables A, B, C, D.
Select Distinct A.Field_1, B.Field_2, C.Field_3
From Table_A
Left outer join B on A.Field_z= B.Field_z
Left Outer Join C on A.Field_z= C.Field_z
where A.Field_z in (1111);
This seems to work but I need a field in Table_D that is only connected to Table_A through Table_C.
How can I add it to the join? or can I?
Thanks!
WB
The on clause has essentially all the features of the where clause. on clauses for any part of the join can refer to any attribute of any entity in the from clause, I suspect your confusion comes from thinking that you can only join subsequent entities to the first entity in the from clause. If Table_D is related to Table_A through Table_C, you might see a from/join/on construct like:
select a.thing, b.things, c.thang, d.stuff
from Table_A a
left join Table_B b
on a.id = b.a_id
left join Table_C c
on a.id = c.a_id
left join Table D d
on d.id = c.d_id
where
a.this = b.that
Note that the word "outer" is implied in a left join, and is not necessary for syntactic completion.
Of course you can add it. You don't specify the logic, but something like this:
Select Distinct A.Field_1, B.Field_2, C.Field_3, D.??
From Table_A a Left outer join
B
on A.Field_z= B.Field_z Left Outer Join
C
on A.Field_z= C.Field_z Left Outer Join
D
on . . .
where A.Field_z in (1111);
Just fill in the conditions that you want. You want a left join, because the condition between c and d would otherwise turn the outer join to c into an inner join.
If I understand you correctly, I believe this would work:
Left Outer Join D on C.Field_z= D.Field_z
You can try this way. It will help i guess
Select Distinct A.Field_1, B.Field_2, C.Field_3, c.Field_4
From Table_A
Left outer join B on A.Field_z= B.Field_z
Left Outer Join ( select C.field_3, d.Field_4, c.Field_z from c inner join d on c.field_z = d.field_z ) c on A.Field_z= C.Field_z
where A.Field_z in (1111);

PL/SQL Using multiple left join

SELECT * FROM Table A LEFT JOIN TABLE B LEFT JOIN TABLE C
From the snippet above, TABLE C will left join into (TABLE B) or (data from TABLE A LEFT JOIN TABLE B) or (TABLE A)?
TABLE C will left join into 1. (TABLE B) or 2. (data from TABLE A LEFT JOIN
TABLE B) or 3. (TABLE A)?
The second. But The join condition will help you to understand more.
You can write:
SELECT *
FROM Table A
LEFT JOIN TABLE B ON (A.id = B.id)
LEFT JOIN TABLE C ON (A.ID = C.ID)
But you are able to:
SELECT *
FROM Table A
LEFT JOIN TABLE B ON (A.id = B.id)
LEFT JOIN TABLE C ON (A.id = C.id and B.code = C.code)
So, you can join on every field from previous tables and you join on "the result" (though the engine may choose its way to get the result) of the previous joins.
Think at left join as non-commutative operation (A left join B is not the same as B left join A) So, the order is important and C will be left joined at the previous joined tables.
The Oracle documentation is quite specific about how the joins are processed:
To execute a join of three or more tables, Oracle first joins two of
the tables based on the join conditions comparing their columns and
then joins the result to another table based on join conditions
containing columns of the joined tables and the new table. Oracle
continues this process until all tables are joined into the result.
This is the logic approach to handling the joins and is consistent with the ANSI standard (in other words, all database engines process the joins in order).
However, when the query is actually executed, the optimizer may choose to run the joins in a different order. The result needs to be logically the same as processing the joins in the order given in the query.
Also, the join conditions may cause some unexpected conditions to arise. So if you have:
from A left outer join
B
on A.id = B.id left outer join
C
on B.id = C.id
Then, you might have the condition where A and C each have a row with a particular id, but B does not. With this formulation, you will not see the row in C because it is joining to NULL. So, be careful with join conditions on left outer join, particularly when joining to a table other than the first table in the chain.
You need to mentioned the column name properly in order to run the query. Let´s say if you are using:
SELECT *
FROM Table A
LEFT JOIN TABLE B ON (A.id = B.id)
LEFT JOIN TABLE C ON (A.id = C.id and B.code = C.code)
Then you may get the following error:
ORA-00933:SQL command not properly ended.
So to avoid it you can try:
SELECT A.id as "Id_from_A", B.code as "Code_from_B"
FROM Table A
LEFT JOIN TABLE B ON (A.id = B.id)
LEFT JOIN TABLE C ON (A.id = C.id and B.code = C.code)
Thanks

INNER and LEFT OUTER join help

Say I have 3 tables. TableA, TableB, TableC
I need to operate the recordset that is available after a INNER JOIN.
Set 1 -> TableA INNER JOIN TableB
Set 2 -> TableC INNER JOIN TableB
I need the Set 1 irrespective of if Set 2 is empty or not (LEFT OUTER JOIN) comes to mind.
So essentially, I am trying to write a query and have come this far
SELECT *
FROM TableA
INNER JOIN TableB ON ...
LEFT OUTER JOIN (TableC INNER JOIN TableB)
How would I write in SQL Server?
EDIT: In reality, what I am trying to do is to join multiple tables. How would your response change if I need to join multiple tables ex: OUTER JOIN OF (INNER JOIN of TableA and TableB) and (INNER JOIN OF TableC and TableD) NOTE: There is a new TableD in the equation
SELECT * FROM TableA
INNER JOIN TableB ON TableB.id = TableA.id
LEFT JOIN TABLEC ON TABLEC.id = TABLEB.id
I Don't know what columns you are trying to use but it is just that easy
Edit:
Looking at your edit it seems that you are confused about what Joins actually do. In the example I have written above you will recieve the following results.
Columns -> You will get all of the columns for TableA,TableB and TableC
Rows-> You will start off with all of the rows from tableA. Next you will remove all rows from TableA that do not have a matching "id" in Table B.(You will have duplicates if it is not a 1:1 relationship between TableA and TableB).
Now if you take the results from above you will match any records from TableC that match the TableB.id column. Any rows from above that do not have a matching TableC record will get a null value for all of the columns from TableC in the results.
ADVICE- I am betting that only part of this made sense to you but my advice is that you start writing some queries, predict the results and then see if your predictions are correct to see if you understand what it is doing.
What you want isn't a JOIN but a UNION.
SELECT * FROM TableA INNER JOIN TableB ON ...
UNION
SELECT * FROM TableC INNER JOIN TableD ON ...
You can actually add an ordering to your joins just like in a math equation where you might do this: (5 + 4) * (3 + 1).
Given the second part of your question, give this a try:
SELECT
<your columns>
FROM
(TableA INNER JOIN Table B ON <join criteria for A to B>)
LEFT OUTER JOIN
(TableC INNER JOIN Table D ON <join criteria for C to D>) ON
<join criteria for AxB to CxD>
Select * from ((((TableA a inner join TableB b on a.id = b.id)
left outer join TableC c on b.id = c.id)
full outer join TableD d on c.id = d.id)
right outer join TableE e on e.id = d.id)
/* etc, etc... */
You can lose the brackets if you want.
try this..
SELECT *
FROM TableA a
INNER JOIN TableB b ON a.id=b.id
LEFT OUTER JOIN (SELECT *
FROM TableC c
INNER JOIN TableD d on c.id=d.id
) dt on b.id=dt.id
You didn't give your join conditions or explain how the tables are intended to be related, so it's not obvious how this might be simplified.
SELECT a.a_id, b1.b_id b1_id, b2_id, bc.c_id
FROM TableA a JOIN TableB b1 on a.b_id = b1.b_id
LEFT JOIN (SELECT c.c_id, b2.b_id b2_id
FROM TableC c JOIN TableB b2 ON c.b_id = b2.b_id
) bc ON bc.c_id = a.c_id;
Looking at your latest edit, you can do something along the lines of:
SELECT <columns>
FROM (SELECT <columns> FROM TableA JOIN TableB ON <A-B join conditions>)
LEFT JOIN
(SELECT <columns> FROM TableC JOIN TableD ON <C-D join conditions>)
ON <AB-CD join conditions>
Although you don't actually need the inner projections, and can do:
SELECT <columns>
FROM (TableA a JOIN TableB b ON <A-B join conditions>)
LEFT JOIN
(TableC c JOIN TableD d ON <C-D join conditions>)
ON <AB-CD join conditions>
Where the AB-CD join conditions are written in terms of columns of a, b, c, d etc directly.
Since you're using Sql Server, why not create views that help you? Stuffing everything in a gigantic Sql statement can become hard to read. An example view might look like:
create view AandB
as
select *
from A
inner join B on B.aid = A.aid
And the same for CandD. Then you can retrieve the optional join with simple Sql:
select *
from AndB
left outer join CandD on AndB.cid = CandD.cid
If you're interested in rows from both sets, you can do a full join:
select *
from AndB
full outer join CandD on AndB.cid = CandD.cid
Assuming I Understand your question, I think this is what you're asking for:
SELECT *
FROM TableA INNER JOIN TableB on TableA.JoinColumn = TableB.JoinColumn
LEFT OUTER JOIN TableC on TableB.JoinColum = TableC.JoinColumn
INNER JOIN TableD on TableC.JoinColumn = TableD.JoinColumn
Note that the JoinColumn used to join A & B doesn't necesarilly have to be the same column as the one used to join B & C, and so on for C & D.
SELECT *
FROM TableA A
INNER JOIN TableB B ON B.?? = A.?? AND ...
LEFT JOIN TableC C ON C.?? = B.?? AND ...
LEFT JOIN TableB B2 ON B2.?? = C.?? AND ...
LEFT JOIN TableD D ON D.?? = C.?? AND ...
So here's the thing: logically, joins aren't actually between specific tables, they are between a table and the rest of the "set" (of joins and tables). So while you know that there is a 1-to-1 relationship between C and B2 or between C and D, you can't INNER JOIN to C because C could be null from it's LEFT JOIN to B, which will eliminate those rows, effectively undoing your LEFT join.
So basically, any joins to a table that's LEFT outer joined must also be LEFT outer joined. Does this make sense?