Difference between JOIN expressions in Presto SQL - sql

I would like to ask the difference between the following join expressions and in what conditions is Method 2 more preferred than Method 1.
You can imagine tables a, b and c to be CTEs for i.e. With a AS (xxxxx), b AS (xxxxx), and c AS (xxxxx).
Method 1:
Select
a.customerid,
b.customerage,
b.customermobile,
a.itemid,
c.itemname
from a
LEFT JOIN b on
a.customerid = b.customerid
LEFT JOIN c on
a.itemid = c.itemid
Method 2:
Select
a.customerid,
b.customerage,
b.customermobile,
a.itemid,
c.itemname
from ((( a
LEFT JOIN b on
(a.customerid = b.customerid))
LEFT JOIN c on
(a.itemid = c.itemid))

There is no difference. This structure:
from a left join
b left join
c
is exactly defined as
from (a left join
b
) left join
c
(I'm leaving out the on clauses to simplify the explanation.)
Note: The order of evaluation is important for outer joins. But even for inner joins, the above is subtly different from:
from a left join
(b left join
c
)
For instance, this won't even parse if the on clause between b and c references a as well.

Related

Does the order of columns in a SQL join statement matter? [duplicate]

Disregarding performance, will I get the same result from query A and B below? How about C and D?
----- Scenario 1:
-- A (left join)
select *
from a left join b
on <blahblah>
left join c
on <blahblan>
-- B (left join)
select *
from a left join c
on <blahblah>
left join b
on <blahblan>
----- Scenario 2:
-- C (inner join)
select *
from a join b
on <blahblah>
join c
on <blahblan>
-- D (inner join)
select *
from a join c
on <blahblah>
join b
on <blahblan>
For INNER joins, no, the order doesn't matter. The queries will return same results, as long as you change your selects from SELECT * to SELECT a.*, b.*, c.*.
For (LEFT, RIGHT or FULL) OUTER joins, yes, the order matters - and (updated) things are much more complicated.
First, outer joins are not commutative, so a LEFT JOIN b is not the same as b LEFT JOIN a
Outer joins are not associative either, so in your examples which involve both (commutativity and associativity) properties:
a LEFT JOIN b
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
is equivalent to:
a LEFT JOIN c
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
but:
a LEFT JOIN b
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
AND c.bc_id = b.bc_id
is not equivalent to:
a LEFT JOIN c
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
AND b.bc_id = c.bc_id
Another (hopefully simpler) associativity example. Think of this as (a LEFT JOIN b) LEFT JOIN c:
a LEFT JOIN b
ON b.ab_id = a.ab_id -- AB condition
LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
This is equivalent to a LEFT JOIN (b LEFT JOIN c):
a LEFT JOIN
b LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
ON b.ab_id = a.ab_id -- AB condition
only because we have "nice" ON conditions. Both ON b.ab_id = a.ab_id and c.bc_id = b.bc_id are equality checks and do not involve NULL comparisons.
You can even have conditions with other operators or more complex ones like: ON a.x <= b.x or ON a.x = 7 or ON a.x LIKE b.x or ON (a.x, a.y) = (b.x, b.y) and the two queries would still be equivalent.
If however, any of these involved IS NULL or a function that is related to nulls like COALESCE(), for example if the condition was b.ab_id IS NULL, then the two queries would not be equivalent.
If you try joining C on a field from B before joining B, i.e.:
SELECT A.x,
A.y,
A.z
FROM A
INNER JOIN C
on B.x = C.x
INNER JOIN B
on A.x = B.x
your query will fail, so in this case the order matters.
for regular Joins, it doesn't. TableA join TableB will produce the same execution plan as TableB join TableA (so your C and D examples would be the same)
for left and right joins it does. TableA left Join TableB is different than TableB left Join TableA, BUT its the same than TableB right Join TableA
Oracle optimizer chooses join order of tables for inner join.
Optimizer chooses the join order of tables only in simple FROM clauses .
U can check the oracle documentation in their website.
And for the left, right outer join the most voted answer is right.
The optimizer chooses the optimal join order as well as the optimal index for each table. The join order can affect which index is the best choice. The optimizer can choose an index as the access path for a table if it is the inner table, but not if it is the outer table (and there are no further qualifications).
The optimizer chooses the join order of tables only in simple FROM clauses. Most joins using the JOIN keyword are flattened into simple joins, so the optimizer chooses their join order.
The optimizer does not choose the join order for outer joins; it uses the order specified in the statement.
When selecting a join order, the optimizer takes into account:
The size of each table
The indexes available on each table
Whether an index on a table is useful in a particular join order
The number of rows and pages to be scanned for each table in each join order

Getting a field from a table that is not in the left outer join in the same query

I have 4 tables, 3 of which are joined, but I need to get a field from the forth table (Table_D). For the sake of simplicity lets say they are Tables A, B, C, D.
Select Distinct A.Field_1, B.Field_2, C.Field_3
From Table_A
Left outer join B on A.Field_z= B.Field_z
Left Outer Join C on A.Field_z= C.Field_z
where A.Field_z in (1111);
This seems to work but I need a field in Table_D that is only connected to Table_A through Table_C.
How can I add it to the join? or can I?
Thanks!
WB
The on clause has essentially all the features of the where clause. on clauses for any part of the join can refer to any attribute of any entity in the from clause, I suspect your confusion comes from thinking that you can only join subsequent entities to the first entity in the from clause. If Table_D is related to Table_A through Table_C, you might see a from/join/on construct like:
select a.thing, b.things, c.thang, d.stuff
from Table_A a
left join Table_B b
on a.id = b.a_id
left join Table_C c
on a.id = c.a_id
left join Table D d
on d.id = c.d_id
where
a.this = b.that
Note that the word "outer" is implied in a left join, and is not necessary for syntactic completion.
Of course you can add it. You don't specify the logic, but something like this:
Select Distinct A.Field_1, B.Field_2, C.Field_3, D.??
From Table_A a Left outer join
B
on A.Field_z= B.Field_z Left Outer Join
C
on A.Field_z= C.Field_z Left Outer Join
D
on . . .
where A.Field_z in (1111);
Just fill in the conditions that you want. You want a left join, because the condition between c and d would otherwise turn the outer join to c into an inner join.
If I understand you correctly, I believe this would work:
Left Outer Join D on C.Field_z= D.Field_z
You can try this way. It will help i guess
Select Distinct A.Field_1, B.Field_2, C.Field_3, c.Field_4
From Table_A
Left outer join B on A.Field_z= B.Field_z
Left Outer Join ( select C.field_3, d.Field_4, c.Field_z from c inner join d on c.field_z = d.field_z ) c on A.Field_z= C.Field_z
where A.Field_z in (1111);

How to get rows from one or another joined table and then further to more joined tables depending on which first two tables were joined

I have three tables (a, b, c) and two (b and c) need to be joined to get detail data from the first table a. But the problem is that I need to do this in one query.
If I join both tables in the same query than no records are found as detail data is either in b or c, but never in both.
To further complicate things, I need to further join other tables (b2, c2) based on with the record found is from b or c.
I am using MS SQL.
The query I have now is:
select a.*, b.name1, c.name1, b2.url, c2.url
left join b on a.aID = b.aID
left join c on a.aID = c.aID
inner join b2 on b.bID = b2.bID
inner join c2 on c.cID = c2.cID
where a.date > '9/1/2016'
I searched for a few days, but no one seems to need to go after the fourth and fifth tables in the query and so couldn't find any similar answer
Is there any way to do this? Performance is not an issue as the number of records will be less than 1,000 after executing a where clause that will always limit the records from table a.
A series of left joins with inner joins as subqueries should do the trick. The subqueries pull the b/b2 and c/c2 data together for reference in the left joins with a:
select a.*, b.name1, c.name1, b2join.url, c2join.url
FROM a
LEFT JOIN b
on a.aID = b.aID
LEFT JOIN c
on a.aID = c.aID
LEFT JOIN
(SELECT b.aID, b.bID, b2.url
FROM b
INNER JOIN b2
on b.bID = b2.bID) as b2join
on b2join.aID = a.aID
LEFT JOIN
(SELECT c.aID, c.cID, c2.url
FROM c
INNER JOIN c2
on c.cID = c2.cID) as c2join
on c2join.aID = a.aID
I think a Left Join will do it, at least for the first part:
select a.*, b.name1, c.name1
left join b on a.aID = b.aID
left join c on a.aID = c.aID
where a.date > '9/1/2016'
You will get c.name1 null when data is in b table, and b.name1 will be null when data is in c table.
For the other two joins and not sure but left join could also work.
I think, it is more simple and brief way to get the same result:
select a.*, b.name1, c.name1, b2join.url, c2join.url
FROM a
LEFT JOIN b on a.aID = b.aID
LEFT JOIN c on a.aID = c.aID
LEFT JOIN b as b2join on b2join.bID = b.bID
LEFT JOIN a as c2join on c2join.cID = c.cID

Outer join in oracle with ANSI standard

select ...
from A left outer join B on (B.x=A.x)
left outer join C on (C.y=A.y)
want to add one additional join of table D with table C with a condition D.z=C.z
select ...
from A left outer join B on (B.x=A.x)
left outer join C on (C.y=A.y), D inner join C on (D.z=C.z)
however, query does not work after adding this part " , D inner join C on (D.z=C.z) ".
Any suggestions ?
You should just add left outer join D on (D.z=C.z). If you use INNER JOIN you remove rows from A and B which not connected with C and D
select ...
from A left outer join B on (B.x=A.x)
left outer join C on (C.y=A.y)
left outer join D on (D.z=C.z)
My understanding is that it is not just table C but the result of an inner join between C and D that you want to outer-join to table A.
If that is so, then #valex's suggestion is an alternative but equivalent way to represent that logic.
In some SQL products the syntax would allow you to write out the logic exactly as intended:
…
FROM
A
LEFT JOIN B ON (B.x=A.x)
LEFT JOIN
C
INNER JOIN D ON (D.z=C.z)
ON (C.y=A.y)
Oracle doesn't support such syntax. But you could rewrite the query like this in order to make the syntax more closely reflect the intended logic:
…
FROM
C
INNER JOIN D ON (D.z=C.z)
RIGHT JOIN A ON (C.y=A.y)
LEFT JOIN B ON (B.x=A.x)
Now it is clear that C and D are supposed to be inner-joined and their result should be outer-joined to A (A being on the outer side of the join, which is the right side this time, hence RIGHT JOIN), followed by an outer join of B to A.
Still, as I said, #valex's is an equivalent suggestion that should produce the same results.

Does the join order matter in SQL?

Disregarding performance, will I get the same result from query A and B below? How about C and D?
----- Scenario 1:
-- A (left join)
select *
from a left join b
on <blahblah>
left join c
on <blahblan>
-- B (left join)
select *
from a left join c
on <blahblah>
left join b
on <blahblan>
----- Scenario 2:
-- C (inner join)
select *
from a join b
on <blahblah>
join c
on <blahblan>
-- D (inner join)
select *
from a join c
on <blahblah>
join b
on <blahblan>
For INNER joins, no, the order doesn't matter. The queries will return same results, as long as you change your selects from SELECT * to SELECT a.*, b.*, c.*.
For (LEFT, RIGHT or FULL) OUTER joins, yes, the order matters - and (updated) things are much more complicated.
First, outer joins are not commutative, so a LEFT JOIN b is not the same as b LEFT JOIN a
Outer joins are not associative either, so in your examples which involve both (commutativity and associativity) properties:
a LEFT JOIN b
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
is equivalent to:
a LEFT JOIN c
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
but:
a LEFT JOIN b
ON b.ab_id = a.ab_id
LEFT JOIN c
ON c.ac_id = a.ac_id
AND c.bc_id = b.bc_id
is not equivalent to:
a LEFT JOIN c
ON c.ac_id = a.ac_id
LEFT JOIN b
ON b.ab_id = a.ab_id
AND b.bc_id = c.bc_id
Another (hopefully simpler) associativity example. Think of this as (a LEFT JOIN b) LEFT JOIN c:
a LEFT JOIN b
ON b.ab_id = a.ab_id -- AB condition
LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
This is equivalent to a LEFT JOIN (b LEFT JOIN c):
a LEFT JOIN
b LEFT JOIN c
ON c.bc_id = b.bc_id -- BC condition
ON b.ab_id = a.ab_id -- AB condition
only because we have "nice" ON conditions. Both ON b.ab_id = a.ab_id and c.bc_id = b.bc_id are equality checks and do not involve NULL comparisons.
You can even have conditions with other operators or more complex ones like: ON a.x <= b.x or ON a.x = 7 or ON a.x LIKE b.x or ON (a.x, a.y) = (b.x, b.y) and the two queries would still be equivalent.
If however, any of these involved IS NULL or a function that is related to nulls like COALESCE(), for example if the condition was b.ab_id IS NULL, then the two queries would not be equivalent.
If you try joining C on a field from B before joining B, i.e.:
SELECT A.x,
A.y,
A.z
FROM A
INNER JOIN C
on B.x = C.x
INNER JOIN B
on A.x = B.x
your query will fail, so in this case the order matters.
for regular Joins, it doesn't. TableA join TableB will produce the same execution plan as TableB join TableA (so your C and D examples would be the same)
for left and right joins it does. TableA left Join TableB is different than TableB left Join TableA, BUT its the same than TableB right Join TableA
Oracle optimizer chooses join order of tables for inner join.
Optimizer chooses the join order of tables only in simple FROM clauses .
U can check the oracle documentation in their website.
And for the left, right outer join the most voted answer is right.
The optimizer chooses the optimal join order as well as the optimal index for each table. The join order can affect which index is the best choice. The optimizer can choose an index as the access path for a table if it is the inner table, but not if it is the outer table (and there are no further qualifications).
The optimizer chooses the join order of tables only in simple FROM clauses. Most joins using the JOIN keyword are flattened into simple joins, so the optimizer chooses their join order.
The optimizer does not choose the join order for outer joins; it uses the order specified in the statement.
When selecting a join order, the optimizer takes into account:
The size of each table
The indexes available on each table
Whether an index on a table is useful in a particular join order
The number of rows and pages to be scanned for each table in each join order