Need help with a sql query that has an inner and outer join - sql

I really need help getting this query right. I can't share actual table and column names, but will try my best to layout the problem simply.
Assume the following tables. The tables and keys CANNOT be changed. Period. I don't care if you think it's a bad design, this question isn't a design question, it's on SQL syntax.
Table A - Primary key named id1
Table B - Contains two foreign keys, TableA.id1 and Foo.id2(ignore Foo, it doesn't matter for this)
Table C - Contains two foreign keys, TableA.id1 and Foo.id2, additional interesting
columns.
Constraints:
The SQL gets a set of id1s passed in as an argument.
It must return a list of Table C rows.
It must only return Table C rows where a Table B row exists with a matching TableA.id1 and Foo.id2 - There ARE rows in Table C that don't match Table B
A row MUST be returned for every id1 passed in, even if no Table C row exists.
At first I tried a Left Outer Join from Table A to Table B then an Inner Join to Table C. That violates the 4th rule above, as the Inner Join drops out those rows.
Next I tried two Left Outer joins. This is closer, but has the side effect of including rows that match the Table A join to Table B, but don't have a corresponding Table C entry, which isn't what I want.
So, here's what I came up with.
SELECT
a.id1,
c.*
FROM
TableB b
INNER JOIN
TableC c USING (id1,id2)
RIGHT OUTER JOIN
TableA a USING (id1)
WHERE
a.id1 in (x,y,z)
I'm a bit wary of a Right Outer Join, as the documentation I've read says it can be replaced with a Left Outer, but it doesn't appear so for this case. It also seems a bit rare, which is making other devs nervous, so I'm being cautious.
So, three questions in one.
Is this correct?
Did I use the Right Outer Join correctly?
Is there a cleaner way to achieve the same thing?
EDIT: DB is MySQL

You can rewrite it as a LEFT OUTER JOIN by using parentheses. In pseudo-SQL change this:
SELECT ...
FROM b
INNER JOIN c ON ...
RIGHT OUTER JOIN a ON ...
to this:
SELECT ...
FROM a
LEFT OUTER JOIN (
b INNER JOIN c ON ...
) ON ...

You can use an EXISTS clause, which sometimes works better
SELECT
a.id1,
c.*
FROM TableA a
LEFT JOIN TableC c
ON c.id1 = a.id1 AND EXISTS (
select *
from TableB b
where b.id1=c.id1 and b.id2=c.id2)
WHERE
a.id1 in (x,y,z)
As you have written it, it works because ANSI JOINs are always processed top to bottom. Since you need to test B against C before joining to A, it is about the only way to write it without introducing a subquery [(B x C) RIGHT JOIN A]. However, a bad query plan could perform all records in B and C (B x C) before right joining to A.
The EXISTS method efficiently uses the filter on A, then LEFT JOINs to C and for each C found, validates that it also exists in B (or discards).
Q's
Yes your query is correct
Yes
EXISTS should work better

Yeah, you need to start with TableA and then add tables B and C using joins. The only reason you even need TableA is to make sure you have a row for each parameter.
Select a.id1,c.*
From
TableA a
Left Join TableB b on a.id1=b.id1
Left Join TableC c on b.id1=c.id1 and b.id2=c.id2
Where a.id1 in (x,y,z)
You need to do OUTER joins all the way across, or rows that are missing in B will also cause data from A to be filtered out of the result set. By joining C to B (instead of directly to A) you are using B to filter. You could do it with a complicated EXISTS clause, but this is cleaner.

Related

What is a good way to make multiple full outer join?

I'd like to know if anyone would know an elegant and scalable method to full outer join multiple tables, given that I might want to regularly add new tables to the join?
For now my method consists in full joining table A with table B, store the result as a cte, then full joining the cte to table C, store the result as a cte2, full joining cte2 to table D... you got it.
Creating a new cte every time i want to add another table to the join is not very practical, but every other solutions i found so far have the same issue, there's always some kind of infinite looping either on ctes or in selects (like SELECT blabla FROM (SELECT blabla2 FROM..)).
Is there any way that i don't know that would help me perform this multiple full join without falling in an infinite recursive loop of ctes?
Thanks
EDIT: Sorry it seems it wasn't clear enough
When i perform a multiple full join in one query like:
SELECT
a.*, b.*, c.*
FROM
tableA a
FULL JOIN
tableB b
ON
a.id = b.id
FULL JOIN
tableC c
ON
a.id = c.id
If the id is present in tableB and tableC but not tableA, my result will create two lines where there should be one, because i joined b to a and c to a but not b to c. That's why i need to full join the result of the full join of a and b to c.
So if i have let's say five table instead of three, i need to full join the result of the full join of the result of the full join of the result of the full join... x)
This fiddle illustrates the problem.
If you want the rows from tables B and C to join, you need to accomodate the fact that maybe the data comes from table B and not A. The easiest is probably to use COALESCE.
Your join should therefore look like:
SELECT a.*, b.*, c.*
FROM tableA a
FULL JOIN tableB b ON a.id = b.id
FULL JOIN tableC c ON COALESCE(a.id, b.id) = c.id
-- FULL JOIN tableD d ON COALESCE(a.id, b.id, c.id) = d.id
-- FULL JOIN tableE e ON COALESCE(a.id, b.id, c.id, d.id) = e.id
Most databases that support FULL JOIN also support USING, which is the simplest way to do what you want:
SELECT *
FROM tableA a FULL JOIN
tableB b
USING (id) FULL JOIN
tableC c
USING (id);
The semantics of USING mean that only non-NULL values are used, if such a value is available.

SQL Server Outer joins

When doing an outer join, is it the order of the tables that matter or the order of the ON clause?
For example is
FROM TABLEA A
LEFT JOIN TABLEB B ON A.id = B.id
the same as
FROM TABLEA A
LEFT JOIN TABLEB B ON B.id = A.id
What about if you have multiple tables? Is it a LEFT JOIN if the first table out of many is the one you want all rows from regardless of the ON clause?
For example,
FROM TABLEA A
LEFT JOIN TABLEB B ON A.id = B.id
LEFT JOIN TABLEC C ON C.ID = A.ID
Does it take all the rows from TABLEA because it is to the left in the table list or the rows from C because it is on the left in the ON clause?
LEFT JOIN means take the table on the left (the first one specified), and join the rows from the table on the right (the second one specified). It will join them up based on the ON condition being true. Since the ON condition just needs to be true, the way it is written doesn't matter at all, it's just an expression that is evaluated.
LEFT JOIN ensures that every row from the table on the left is retained, and joined with NULLs if there is no row to join up to it from the table on the right. So that means that order of the tables is certainly significant.
If the table ordering was reversed, and there were only two tables, a RIGHT JOIN would have the same effect (i.e. keep the rows from the second table specified).
FROM TABLEA A LEFT JOIN TABLEB B ON A.id=B.id LEFT JOIN TABLEC C ON C.ID=A.ID
FROM TABLEA A LEFT JOIN TABLEB B ON B.id=A.id LEFT JOIN TABLEC C ON A.ID=C.ID
These both would return the same results.
FROM TABLEA A LEFT JOIN TABLEB B ON B.id=A.id LEFT JOIN TABLEC C ON B.ID=C.ID
This might or might not return the same results depending on what data is actually in table b and table C because table C is related to table b not directly related to table A. Personally I would always treat this as being different by definition than the first set of joins and that if is is the same that is accidental at this point in time.
When writing multiple joins especially when you have Outer joins, I personally find it helpful to start with the parent table (that usually being the one you want on the left side of the join) first and add any other other inner join tables before doing the left joins. If I have child and grandchild tables (vice multiple child tables), then I try to do in descending order of parent, child, grandchild to be clear what is related to what.

SQL trying to do a JOIN to include results from multiple Tables

I'm a complete novice teaching myself SQL by writing and modifying a few queries and reports at work.
I've got something of a handle on the various types of JOINs and I've used INNER JOIN a few times with decent success.
What I'm stuck on should be a simple task, but my Google-Fu must be weak. Here's what I'm trying to do.
Say I have 3 tables, Table_A, Table_B, and Table_C, and each table has a column called [Serial_Number].
What I'm wanting to select is 3 of the other columns if A.Serial_Number = B.Serial_Number OR C.Serial_Number.
I've tried doing:
SELECT
*
FROM
Table_A AS A
INNER JOIN Table_B AS B ON A.Serial_Number = B.Serial_Number
INNER JOIN Table_C AS C ON A.Serial_Number = C.Serial_Number
But this always yields 0 results as the nature of the data dictates that if A matches B, it will never match C and vice versa. I also tried a LEFT OUTER JOIN as the second clause, but this just includes NULLs from Table_C that have already matched on Table_B.
All the searches I have done relating to JOINs on multiple tables seem to be about using JOINS to further exclude records, where I'm actually wanting to INCLUDE more records.
Like I said, I'm sure this is really simple, just needing a nudge in right direction.
Thanks!
The use of two inner joins here is akin to saying
If A.Serial_Number = B.Serial_Number AND
A.Serial_Number = C.Serial_Number
Using left outer join on the second clause - by which i presume you mean second join - would perform a left join on a result set already filtered by A.Serial_Number = B.Serial_Number by the first inner join. Given that B.Serial_Number doesn't relate to C.Serial_Number you wouldn't expect the an equijoin to return any result from tablec.
What you want is a left outer join like you tried but for both tableb and tablec.
Select *
From tablea
Left join tableb on tableb.Serial_Number = tablea.Serial_Number
Left join tablec on tablec.Serial_Number = tablea.Serial_Number
This way regardless of whether tablea.Serial_Number is in tableb it will still be returned and thus available to be joined to tablec
Agreed. Your output for your inner joins is producing NULLs which is why it is resulting in 0. I would suggest modifying your INNER JOIN.

Are the SQL concepts LEFT OUTER JOIN and WHERE NOT EXISTS basically the same?

Whats the difference between using a LEFT OUTER JOIN, rather than a sub-query that starts with a WHERE NOT EXISTS (...)?
No they are not the same thing, as they will not return the same rowset in the most simplistic use case.
The LEFT OUTER JOIN will return all rows from the left table, both where rows exist in the related table and where they does not. The WHERE NOT EXISTS() subquery will only return rows where the relationship is not met.
However, if you did a LEFT OUTER JOIN and looked for IS NULL on the foreign key column in the WHERE clause, you can make equivalent behavior to the WHERE NOT EXISTS.
For example this:
SELECT
t_main.*
FROM
t_main
LEFT OUTER JOIN t_related ON t_main.id = t_related.id
/* IS NULL in the WHERE clause */
WHERE t_related.id IS NULL
Is equivalent to this:
SELECT
t_main.*
FROM t_main
WHERE
NOT EXISTS (
SELECT t_related.id
FROM t_related
WHERE t_main.id = t_related.id
)
But this one is not equivalent:
It will return rows from t_main both having and not having related rows in t_related.
SELECT
t_main.*
FROM
t_main
LEFT OUTER JOIN t_related ON t_main.id = t_related.id
/* WHERE clause does not exclude NULL foreign keys */
Note This does not speak to how the queries are compiled and executed, which differs as well -- this only addresses a comparison of the rowsets they return.
As Michael already answered your question here is a quick sample to illustrate the difference:
Table A
Key Data
1 somedata1
2 somedata2
Table B
Key Data
1 data1
Left outer join:
SELECT *
FROM A
LEFT OUTER JOIN B
ON A.Key = B.Key
Result:
Key Data Key Data
1 somedata1 1
2 somedata2 null null
EXISTS use:
SELECT *
FROM A WHERE EXISTS ( SELECT B.Key FROM B WHERE A.Key = B.Key )
Not Exists In:
SELECT *
FROM A WHERE NOT EXISTS ( SELECT B.Key FROM B WHERE A.Key = B.Key )
Result:
Key Data
2 somedata2
Left outer join is more flexible than where not exists. You must use a left outer join if you want to return any of the columns from the child table. You can also use the left outer join to return records that match the parent table as well as all records in the parent table that have no match. Where not exists only lets you return the records with no match.
However in the case where they do return the equivalent rows and you do not need any of the columns in the right table, then where exists is likely to be the more performant choice (at least in SQL server, I don't know about other dbs).
I suspect the answer ultimately is, both are used (among other constructs) to perform the relational operation antijoin in SQL.
I suspect the OP wanted to know which construct is better when they are functionally the same (ie I want to see only rows where there is no match in the secondary table).
As such, WHERE NOT EXISTS will always be as quick or quicker, so is a good habit to get into.

How to make a one to one left outer join?

I was wondering, is there a way to make a kind of one to one left outer join:
I need a join that matches say table A with table B, for each record on table A it must search for its pair on table B, but there exists only 1 record that matches that condition, so when it has found its pair on B, it must stop and continue with the next row at table A.
What I have is a simple LEFT OUTER JOIN.
select * from A left outer join B on A.ID = B.ID order by (NAME) asc
Thanks in advance!
SQL doesn't work this way. In the first place it does not look at things row-by-row. In the second place what defines the record you want to match on?
Assuming you don't really care which row is selcted, something like this might work:
SELECT *
From tableA
left outer join
(select b.* from tableb b1
join (Select min(Id) from tableb group by id) b2 on b1.id - b2.id) b
on a.id = b.id
BUt it still is pretty iffy that you wil get the records you want when there are multiple records with the id in table b.
The syntax you present in your question is correct. There is no difference in the query for joining on a one-to-one relationship than on a one-to-many.