What is a good way to make multiple full outer join? - sql

I'd like to know if anyone would know an elegant and scalable method to full outer join multiple tables, given that I might want to regularly add new tables to the join?
For now my method consists in full joining table A with table B, store the result as a cte, then full joining the cte to table C, store the result as a cte2, full joining cte2 to table D... you got it.
Creating a new cte every time i want to add another table to the join is not very practical, but every other solutions i found so far have the same issue, there's always some kind of infinite looping either on ctes or in selects (like SELECT blabla FROM (SELECT blabla2 FROM..)).
Is there any way that i don't know that would help me perform this multiple full join without falling in an infinite recursive loop of ctes?
Thanks
EDIT: Sorry it seems it wasn't clear enough
When i perform a multiple full join in one query like:
SELECT
a.*, b.*, c.*
FROM
tableA a
FULL JOIN
tableB b
ON
a.id = b.id
FULL JOIN
tableC c
ON
a.id = c.id
If the id is present in tableB and tableC but not tableA, my result will create two lines where there should be one, because i joined b to a and c to a but not b to c. That's why i need to full join the result of the full join of a and b to c.
So if i have let's say five table instead of three, i need to full join the result of the full join of the result of the full join of the result of the full join... x)

This fiddle illustrates the problem.
If you want the rows from tables B and C to join, you need to accomodate the fact that maybe the data comes from table B and not A. The easiest is probably to use COALESCE.
Your join should therefore look like:
SELECT a.*, b.*, c.*
FROM tableA a
FULL JOIN tableB b ON a.id = b.id
FULL JOIN tableC c ON COALESCE(a.id, b.id) = c.id
-- FULL JOIN tableD d ON COALESCE(a.id, b.id, c.id) = d.id
-- FULL JOIN tableE e ON COALESCE(a.id, b.id, c.id, d.id) = e.id

Most databases that support FULL JOIN also support USING, which is the simplest way to do what you want:
SELECT *
FROM tableA a FULL JOIN
tableB b
USING (id) FULL JOIN
tableC c
USING (id);
The semantics of USING mean that only non-NULL values are used, if such a value is available.

Related

Best way to eliminate duplicates rows after multiple joins

I'll consider three simple tables. A, B are my entity tables and C is an intermediate table that creates a many-to-many relationship between A & B.
Schemas:
A: (id INTEGER PRIMARY KEY)
B: (id INTEGER PRIMARY KEY)
C: (
A_id INTEGER,
B_id INTEGER,
FOREIGN KEY(A_id) REFERENCES A(id),
FOREIGN KEY(B_id) REFERENCES B(id)
)
Now, consider the below query
SELECT
A.id
FROM A
LEFT OUTER JOIN C
ON (A.id = C.A_id)
LEFT OUTER JOIN B
ON (C.B_id = B.id)
WHERE ...;
This query would result in duplicate values of A.id, which is expected because C might have multiple rows associated with each row of A. My question is what's the best way to eliminate these duplicates and get the A records. I only need the A records.
I am aware of two ways,
-- Using DISTINCT
SELECT
DISTINCT(A.id), ...
FROM A
LEFT OUTER JOIN C
ON (A.id = C.A_id)
LEFT OUTER JOIN B
ON (C.B_id = B.id)
WHERE ...
ORDER BY A.id;
And
-- Or using A.id IN (above query)/ A.id = Any(above query)
SELECT
...
FROM A
WHERE A.id IN (
SELECT
A.id
FROM A
LEFT OUTER JOIN C
ON (A.id = C.A_id)
LEFT OUTER JOIN B
ON (C.B_id = B.id)
WHERE ...
);
I'm using PostgreSQL. I need to include all the tables for filtering, so not joining a table cannot be considered as an improvement. I've analyzed both the queries but I still feel there might be a better way to do this(in terms of performance).
Any help is really appreciated!
I would suggest exists:
SELECT A.id
FROM A
WHERE EXISTS (SELECT 1
FROM C JOIN
B
ON C.B_id = B.id
WHERE A.id = C.A_id AND . . .
)
You can also try following query:
SELECT
a.* -- or whatever columns you need of a
FROM a
WHERE EXISTS(
SELECT 1
FROM c
WHERE c.a_id = a.id
)
Note, that there is no need to join table b as the existence of the row in c always guarantees for the row in b and you do not need any information contained in this row/table.
Perhaps even more clean might be:
SELECT DISTINCT
a.* -- or whatever columns you need of a
FROM a
LEFT JOIN c
You can have a look at the query plans and execution times using EXPLAIN ANALYZE <query>. Perhaps this gives you a hint on what to use best.
But be aware of caching, repeat both queries multiple times this way to see comparable results.

SQL Server Outer joins

When doing an outer join, is it the order of the tables that matter or the order of the ON clause?
For example is
FROM TABLEA A
LEFT JOIN TABLEB B ON A.id = B.id
the same as
FROM TABLEA A
LEFT JOIN TABLEB B ON B.id = A.id
What about if you have multiple tables? Is it a LEFT JOIN if the first table out of many is the one you want all rows from regardless of the ON clause?
For example,
FROM TABLEA A
LEFT JOIN TABLEB B ON A.id = B.id
LEFT JOIN TABLEC C ON C.ID = A.ID
Does it take all the rows from TABLEA because it is to the left in the table list or the rows from C because it is on the left in the ON clause?
LEFT JOIN means take the table on the left (the first one specified), and join the rows from the table on the right (the second one specified). It will join them up based on the ON condition being true. Since the ON condition just needs to be true, the way it is written doesn't matter at all, it's just an expression that is evaluated.
LEFT JOIN ensures that every row from the table on the left is retained, and joined with NULLs if there is no row to join up to it from the table on the right. So that means that order of the tables is certainly significant.
If the table ordering was reversed, and there were only two tables, a RIGHT JOIN would have the same effect (i.e. keep the rows from the second table specified).
FROM TABLEA A LEFT JOIN TABLEB B ON A.id=B.id LEFT JOIN TABLEC C ON C.ID=A.ID
FROM TABLEA A LEFT JOIN TABLEB B ON B.id=A.id LEFT JOIN TABLEC C ON A.ID=C.ID
These both would return the same results.
FROM TABLEA A LEFT JOIN TABLEB B ON B.id=A.id LEFT JOIN TABLEC C ON B.ID=C.ID
This might or might not return the same results depending on what data is actually in table b and table C because table C is related to table b not directly related to table A. Personally I would always treat this as being different by definition than the first set of joins and that if is is the same that is accidental at this point in time.
When writing multiple joins especially when you have Outer joins, I personally find it helpful to start with the parent table (that usually being the one you want on the left side of the join) first and add any other other inner join tables before doing the left joins. If I have child and grandchild tables (vice multiple child tables), then I try to do in descending order of parent, child, grandchild to be clear what is related to what.

Do I have to do a LEFT JOIN after a RIGHT JOIN?

Say I have three tables in SQL server 2008 R2
SELECT a.*, b.*, c.*
FROM
Table_A a
RIGHT JOIN Table_B b ON a.id = b.id
LEFT JOIN Table_C c ON b.id = c.id
or
SELECT a.*, b.*, c.*
FROM
Table_A a
RIGHT JOIN Table_B b ON a.id = b.id
JOIN Table_C c ON b.id = c.id
also, does it matter if I use b.id or a.id on joining c?
i.e. instead of JOIN Table_C c ON b.id = c.id, use JOIN Table_C c ON a.id = c.id
Thank you!
If it doesn't change the semantics of the query, the database server can reorder the joins to run in whichever way it thinks is more efficient.
Usually, if you want to force a certain order, you can use inline view subqueries, as in
SELECT a.*, x.*
FROM
Table_A a
RIGHT JOIN
(
SELECT *, b.id as id2 FROM Table_B b
LEFT JOIN Table_C c ON b.id = c.id
) x
ON a.id = x.id2
According to the definitions:
JOIN
: Return rows when there is at least one match in both tables
LEFT JOIN Return all rows from the left table, even if there are no matches in the right table
RIGHT JOIN Return all rows from the right table, even if there are no matches in the left table
The first option would include all raws from the 1st Join on Tables a and b even if there are no matching ones in table c, while the second statement would show only raws which match ones in table c.
regarding the second question i guess it would make a difference, since the 1st join includes all ids from table b, even though there are no matching ones in table a, so once you change your Join creterium to a.id you will get a different set of ids than b.id.
Yes, you do need a LEFT JOIN after a RIGHT JOIN
See
http://sqlfiddle.com/#!3/2c079/5/0
http://sqlfiddle.com/#!3/2c079/6/0
If you don't, the (inner) JOIN at the end will cancel out the effect of your RIGHT JOIN.
That wouldn't make any sense to have a RIGHT JOIN if you don't care. And if you care, you will have to add a LEFT JOIN after it.

Need help with a sql query that has an inner and outer join

I really need help getting this query right. I can't share actual table and column names, but will try my best to layout the problem simply.
Assume the following tables. The tables and keys CANNOT be changed. Period. I don't care if you think it's a bad design, this question isn't a design question, it's on SQL syntax.
Table A - Primary key named id1
Table B - Contains two foreign keys, TableA.id1 and Foo.id2(ignore Foo, it doesn't matter for this)
Table C - Contains two foreign keys, TableA.id1 and Foo.id2, additional interesting
columns.
Constraints:
The SQL gets a set of id1s passed in as an argument.
It must return a list of Table C rows.
It must only return Table C rows where a Table B row exists with a matching TableA.id1 and Foo.id2 - There ARE rows in Table C that don't match Table B
A row MUST be returned for every id1 passed in, even if no Table C row exists.
At first I tried a Left Outer Join from Table A to Table B then an Inner Join to Table C. That violates the 4th rule above, as the Inner Join drops out those rows.
Next I tried two Left Outer joins. This is closer, but has the side effect of including rows that match the Table A join to Table B, but don't have a corresponding Table C entry, which isn't what I want.
So, here's what I came up with.
SELECT
a.id1,
c.*
FROM
TableB b
INNER JOIN
TableC c USING (id1,id2)
RIGHT OUTER JOIN
TableA a USING (id1)
WHERE
a.id1 in (x,y,z)
I'm a bit wary of a Right Outer Join, as the documentation I've read says it can be replaced with a Left Outer, but it doesn't appear so for this case. It also seems a bit rare, which is making other devs nervous, so I'm being cautious.
So, three questions in one.
Is this correct?
Did I use the Right Outer Join correctly?
Is there a cleaner way to achieve the same thing?
EDIT: DB is MySQL
You can rewrite it as a LEFT OUTER JOIN by using parentheses. In pseudo-SQL change this:
SELECT ...
FROM b
INNER JOIN c ON ...
RIGHT OUTER JOIN a ON ...
to this:
SELECT ...
FROM a
LEFT OUTER JOIN (
b INNER JOIN c ON ...
) ON ...
You can use an EXISTS clause, which sometimes works better
SELECT
a.id1,
c.*
FROM TableA a
LEFT JOIN TableC c
ON c.id1 = a.id1 AND EXISTS (
select *
from TableB b
where b.id1=c.id1 and b.id2=c.id2)
WHERE
a.id1 in (x,y,z)
As you have written it, it works because ANSI JOINs are always processed top to bottom. Since you need to test B against C before joining to A, it is about the only way to write it without introducing a subquery [(B x C) RIGHT JOIN A]. However, a bad query plan could perform all records in B and C (B x C) before right joining to A.
The EXISTS method efficiently uses the filter on A, then LEFT JOINs to C and for each C found, validates that it also exists in B (or discards).
Q's
Yes your query is correct
Yes
EXISTS should work better
Yeah, you need to start with TableA and then add tables B and C using joins. The only reason you even need TableA is to make sure you have a row for each parameter.
Select a.id1,c.*
From
TableA a
Left Join TableB b on a.id1=b.id1
Left Join TableC c on b.id1=c.id1 and b.id2=c.id2
Where a.id1 in (x,y,z)
You need to do OUTER joins all the way across, or rows that are missing in B will also cause data from A to be filtered out of the result set. By joining C to B (instead of directly to A) you are using B to filter. You could do it with a complicated EXISTS clause, but this is cleaner.

How can I implement SQL INTERSECT and MINUS operations in MS Access

I have researched and haven't found a way to run INTERSECT and MINUS operations in MS Access. Does any way exist
INTERSECT is an inner join. MINUS is an outer join, where you choose only the records that don't exist in the other table.
INTERSECT
select distinct
a.*
from
a
inner join b on a.id = b.id
MINUS
select distinct
a.*
from
a
left outer join b on a.id = b.id
where
b.id is null
If you edit your original question and post some sample data then an example can be given.
EDIT: Forgot to add in the distinct to the queries.
INTERSECT is NOT an INNER JOIN. They're different. An INNER JOIN will give you duplicate rows in cases where INTERSECT WILL not. You can get equivalent results by:
SELECT DISTINCT a.*
FROM a
INNER JOIN b
on a.PK = b.PK
Note that PK must be the primary key column or columns. If there is no PK on the table (BAD!), you must write it like so:
SELECT DISTINCT a.*
FROM a
INNER JOIN b
ON a.Col1 = b.Col1
AND a.Col2 = b.Col2
AND a.Col3 = b.Col3 ...
With MINUS, you can do the same thing, but with a LEFT JOIN, and a WHERE condition checking for null on one of table b's non-nullable columns (preferably the primary key).
SELECT DISTINCT a.*
FROM a
LEFT JOIN b
on a.PK = b.PK
WHERE b.PK IS NULL
That should do it.
They're done through JOINs. The old fashioned way :)
For INTERSECT, you can use an INNER JOIN. Pretty straightforward. Just need to use a GROUP BY or DISTINCT if you have don't have a pure one-to-one relationship going on. Otherwise, as others had mentioned, you can get more results than you'd expect.
For MINUS, you can use a LEFT JOIN and use the WHERE to limit it so you're only getting back rows from your main table that don't have a match with the LEFT JOINed table.
Easy peasy.
Unfortunately MINUS is not supported in MS Access - one workaround would be to create three queries, one with the full dataset, one that pulls the rows you want to filter out, and a third that left joins the two tables and only pulls records that only exist in your full dataset.
Same thing goes for INTERSECT, except you would be doing it via an inner join and only returning records that exist in both.
No MINUS in Access, but you can use a subquery.
SELECT DISTINCT a.*
FROM a
WHERE a.PK NOT IN (SELECT DISTINCT b.pk FROM b)
I believe this one does the MINUS
SELECT DISTINCT
a.CustomerID,
b.CustomerID
FROM
tblCustomers a
LEFT JOIN
[Copy Of tblCustomers] b
ON
a.CustomerID = b.CustomerID
WHERE
b.CustomerID IS NULL