When doing an outer join, is it the order of the tables that matter or the order of the ON clause?
For example is
FROM TABLEA A
LEFT JOIN TABLEB B ON A.id = B.id
the same as
FROM TABLEA A
LEFT JOIN TABLEB B ON B.id = A.id
What about if you have multiple tables? Is it a LEFT JOIN if the first table out of many is the one you want all rows from regardless of the ON clause?
For example,
FROM TABLEA A
LEFT JOIN TABLEB B ON A.id = B.id
LEFT JOIN TABLEC C ON C.ID = A.ID
Does it take all the rows from TABLEA because it is to the left in the table list or the rows from C because it is on the left in the ON clause?
LEFT JOIN means take the table on the left (the first one specified), and join the rows from the table on the right (the second one specified). It will join them up based on the ON condition being true. Since the ON condition just needs to be true, the way it is written doesn't matter at all, it's just an expression that is evaluated.
LEFT JOIN ensures that every row from the table on the left is retained, and joined with NULLs if there is no row to join up to it from the table on the right. So that means that order of the tables is certainly significant.
If the table ordering was reversed, and there were only two tables, a RIGHT JOIN would have the same effect (i.e. keep the rows from the second table specified).
FROM TABLEA A LEFT JOIN TABLEB B ON A.id=B.id LEFT JOIN TABLEC C ON C.ID=A.ID
FROM TABLEA A LEFT JOIN TABLEB B ON B.id=A.id LEFT JOIN TABLEC C ON A.ID=C.ID
These both would return the same results.
FROM TABLEA A LEFT JOIN TABLEB B ON B.id=A.id LEFT JOIN TABLEC C ON B.ID=C.ID
This might or might not return the same results depending on what data is actually in table b and table C because table C is related to table b not directly related to table A. Personally I would always treat this as being different by definition than the first set of joins and that if is is the same that is accidental at this point in time.
When writing multiple joins especially when you have Outer joins, I personally find it helpful to start with the parent table (that usually being the one you want on the left side of the join) first and add any other other inner join tables before doing the left joins. If I have child and grandchild tables (vice multiple child tables), then I try to do in descending order of parent, child, grandchild to be clear what is related to what.
Related
I'd like to know if anyone would know an elegant and scalable method to full outer join multiple tables, given that I might want to regularly add new tables to the join?
For now my method consists in full joining table A with table B, store the result as a cte, then full joining the cte to table C, store the result as a cte2, full joining cte2 to table D... you got it.
Creating a new cte every time i want to add another table to the join is not very practical, but every other solutions i found so far have the same issue, there's always some kind of infinite looping either on ctes or in selects (like SELECT blabla FROM (SELECT blabla2 FROM..)).
Is there any way that i don't know that would help me perform this multiple full join without falling in an infinite recursive loop of ctes?
Thanks
EDIT: Sorry it seems it wasn't clear enough
When i perform a multiple full join in one query like:
SELECT
a.*, b.*, c.*
FROM
tableA a
FULL JOIN
tableB b
ON
a.id = b.id
FULL JOIN
tableC c
ON
a.id = c.id
If the id is present in tableB and tableC but not tableA, my result will create two lines where there should be one, because i joined b to a and c to a but not b to c. That's why i need to full join the result of the full join of a and b to c.
So if i have let's say five table instead of three, i need to full join the result of the full join of the result of the full join of the result of the full join... x)
This fiddle illustrates the problem.
If you want the rows from tables B and C to join, you need to accomodate the fact that maybe the data comes from table B and not A. The easiest is probably to use COALESCE.
Your join should therefore look like:
SELECT a.*, b.*, c.*
FROM tableA a
FULL JOIN tableB b ON a.id = b.id
FULL JOIN tableC c ON COALESCE(a.id, b.id) = c.id
-- FULL JOIN tableD d ON COALESCE(a.id, b.id, c.id) = d.id
-- FULL JOIN tableE e ON COALESCE(a.id, b.id, c.id, d.id) = e.id
Most databases that support FULL JOIN also support USING, which is the simplest way to do what you want:
SELECT *
FROM tableA a FULL JOIN
tableB b
USING (id) FULL JOIN
tableC c
USING (id);
The semantics of USING mean that only non-NULL values are used, if such a value is available.
Though my sql knowledge is pretty good, I cannot get my head around the difference in a left vs inner join specifically when doing an update.
employee_table
column1:id
column2:socialsecurity
private_info_table
column1:id
column2:socialsecurity
I need to update employee_table.socialsecurity to be private_info_table.socialsecurity
should I do a left or inner join: ???
update e
set e.socialsecurity=p.socialsecurity
from employee_table e
join private_info_table p --should this be left or inner join?
on p.id=e.id
Left Join
Select *
from A
left join B on a.id = b.pid
In the conditions a.id = b.pid, every row is returned for both A and B BUT if the value from A (i.e. a.id) doesn't match the value from B (i.e., B), all fields of B will be null. On the other hand, for those rows where this condition is true, all values for B are shown. Notice that A's values have to be returned because it is on the left of the 'left' keyword.
Inner Join
Select *
from A
inner join B on a.id = b.pid
Rows are returned for rows where a.id = b.pid is true, otherwise no rows are returned if it is false. This is a mutually exclusive join.
In your case, don't use a left join because all records on the left of the 'left' keyword will be updated with null or non-null values. This means you will unintentionally update non-matched record with null values based upon my description of left joins.
It should be an inner join if you just want to update those records that are in both tables. If you do a left join it will update the employee table to be null wherever the id isn't found in the private info table.
The reason this matter is if you have some social security numbers in the employee table that already that aren't found in the private info table. You wouldn't want to erase them with a null value.
SELECT * FROM Table A LEFT JOIN TABLE B LEFT JOIN TABLE C
From the snippet above, TABLE C will left join into (TABLE B) or (data from TABLE A LEFT JOIN TABLE B) or (TABLE A)?
TABLE C will left join into 1. (TABLE B) or 2. (data from TABLE A LEFT JOIN
TABLE B) or 3. (TABLE A)?
The second. But The join condition will help you to understand more.
You can write:
SELECT *
FROM Table A
LEFT JOIN TABLE B ON (A.id = B.id)
LEFT JOIN TABLE C ON (A.ID = C.ID)
But you are able to:
SELECT *
FROM Table A
LEFT JOIN TABLE B ON (A.id = B.id)
LEFT JOIN TABLE C ON (A.id = C.id and B.code = C.code)
So, you can join on every field from previous tables and you join on "the result" (though the engine may choose its way to get the result) of the previous joins.
Think at left join as non-commutative operation (A left join B is not the same as B left join A) So, the order is important and C will be left joined at the previous joined tables.
The Oracle documentation is quite specific about how the joins are processed:
To execute a join of three or more tables, Oracle first joins two of
the tables based on the join conditions comparing their columns and
then joins the result to another table based on join conditions
containing columns of the joined tables and the new table. Oracle
continues this process until all tables are joined into the result.
This is the logic approach to handling the joins and is consistent with the ANSI standard (in other words, all database engines process the joins in order).
However, when the query is actually executed, the optimizer may choose to run the joins in a different order. The result needs to be logically the same as processing the joins in the order given in the query.
Also, the join conditions may cause some unexpected conditions to arise. So if you have:
from A left outer join
B
on A.id = B.id left outer join
C
on B.id = C.id
Then, you might have the condition where A and C each have a row with a particular id, but B does not. With this formulation, you will not see the row in C because it is joining to NULL. So, be careful with join conditions on left outer join, particularly when joining to a table other than the first table in the chain.
You need to mentioned the column name properly in order to run the query. Let´s say if you are using:
SELECT *
FROM Table A
LEFT JOIN TABLE B ON (A.id = B.id)
LEFT JOIN TABLE C ON (A.id = C.id and B.code = C.code)
Then you may get the following error:
ORA-00933:SQL command not properly ended.
So to avoid it you can try:
SELECT A.id as "Id_from_A", B.code as "Code_from_B"
FROM Table A
LEFT JOIN TABLE B ON (A.id = B.id)
LEFT JOIN TABLE C ON (A.id = C.id and B.code = C.code)
Thanks
Say I have three tables in SQL server 2008 R2
SELECT a.*, b.*, c.*
FROM
Table_A a
RIGHT JOIN Table_B b ON a.id = b.id
LEFT JOIN Table_C c ON b.id = c.id
or
SELECT a.*, b.*, c.*
FROM
Table_A a
RIGHT JOIN Table_B b ON a.id = b.id
JOIN Table_C c ON b.id = c.id
also, does it matter if I use b.id or a.id on joining c?
i.e. instead of JOIN Table_C c ON b.id = c.id, use JOIN Table_C c ON a.id = c.id
Thank you!
If it doesn't change the semantics of the query, the database server can reorder the joins to run in whichever way it thinks is more efficient.
Usually, if you want to force a certain order, you can use inline view subqueries, as in
SELECT a.*, x.*
FROM
Table_A a
RIGHT JOIN
(
SELECT *, b.id as id2 FROM Table_B b
LEFT JOIN Table_C c ON b.id = c.id
) x
ON a.id = x.id2
According to the definitions:
JOIN
: Return rows when there is at least one match in both tables
LEFT JOIN Return all rows from the left table, even if there are no matches in the right table
RIGHT JOIN Return all rows from the right table, even if there are no matches in the left table
The first option would include all raws from the 1st Join on Tables a and b even if there are no matching ones in table c, while the second statement would show only raws which match ones in table c.
regarding the second question i guess it would make a difference, since the 1st join includes all ids from table b, even though there are no matching ones in table a, so once you change your Join creterium to a.id you will get a different set of ids than b.id.
Yes, you do need a LEFT JOIN after a RIGHT JOIN
See
http://sqlfiddle.com/#!3/2c079/5/0
http://sqlfiddle.com/#!3/2c079/6/0
If you don't, the (inner) JOIN at the end will cancel out the effect of your RIGHT JOIN.
That wouldn't make any sense to have a RIGHT JOIN if you don't care. And if you care, you will have to add a LEFT JOIN after it.
I really need help getting this query right. I can't share actual table and column names, but will try my best to layout the problem simply.
Assume the following tables. The tables and keys CANNOT be changed. Period. I don't care if you think it's a bad design, this question isn't a design question, it's on SQL syntax.
Table A - Primary key named id1
Table B - Contains two foreign keys, TableA.id1 and Foo.id2(ignore Foo, it doesn't matter for this)
Table C - Contains two foreign keys, TableA.id1 and Foo.id2, additional interesting
columns.
Constraints:
The SQL gets a set of id1s passed in as an argument.
It must return a list of Table C rows.
It must only return Table C rows where a Table B row exists with a matching TableA.id1 and Foo.id2 - There ARE rows in Table C that don't match Table B
A row MUST be returned for every id1 passed in, even if no Table C row exists.
At first I tried a Left Outer Join from Table A to Table B then an Inner Join to Table C. That violates the 4th rule above, as the Inner Join drops out those rows.
Next I tried two Left Outer joins. This is closer, but has the side effect of including rows that match the Table A join to Table B, but don't have a corresponding Table C entry, which isn't what I want.
So, here's what I came up with.
SELECT
a.id1,
c.*
FROM
TableB b
INNER JOIN
TableC c USING (id1,id2)
RIGHT OUTER JOIN
TableA a USING (id1)
WHERE
a.id1 in (x,y,z)
I'm a bit wary of a Right Outer Join, as the documentation I've read says it can be replaced with a Left Outer, but it doesn't appear so for this case. It also seems a bit rare, which is making other devs nervous, so I'm being cautious.
So, three questions in one.
Is this correct?
Did I use the Right Outer Join correctly?
Is there a cleaner way to achieve the same thing?
EDIT: DB is MySQL
You can rewrite it as a LEFT OUTER JOIN by using parentheses. In pseudo-SQL change this:
SELECT ...
FROM b
INNER JOIN c ON ...
RIGHT OUTER JOIN a ON ...
to this:
SELECT ...
FROM a
LEFT OUTER JOIN (
b INNER JOIN c ON ...
) ON ...
You can use an EXISTS clause, which sometimes works better
SELECT
a.id1,
c.*
FROM TableA a
LEFT JOIN TableC c
ON c.id1 = a.id1 AND EXISTS (
select *
from TableB b
where b.id1=c.id1 and b.id2=c.id2)
WHERE
a.id1 in (x,y,z)
As you have written it, it works because ANSI JOINs are always processed top to bottom. Since you need to test B against C before joining to A, it is about the only way to write it without introducing a subquery [(B x C) RIGHT JOIN A]. However, a bad query plan could perform all records in B and C (B x C) before right joining to A.
The EXISTS method efficiently uses the filter on A, then LEFT JOINs to C and for each C found, validates that it also exists in B (or discards).
Q's
Yes your query is correct
Yes
EXISTS should work better
Yeah, you need to start with TableA and then add tables B and C using joins. The only reason you even need TableA is to make sure you have a row for each parameter.
Select a.id1,c.*
From
TableA a
Left Join TableB b on a.id1=b.id1
Left Join TableC c on b.id1=c.id1 and b.id2=c.id2
Where a.id1 in (x,y,z)
You need to do OUTER joins all the way across, or rows that are missing in B will also cause data from A to be filtered out of the result set. By joining C to B (instead of directly to A) you are using B to filter. You could do it with a complicated EXISTS clause, but this is cleaner.