How to get values from tables A and C, joined by table B with default values from C when C has no key from A - sql

here is my situation:
I have 3 tables:
A: (A_id, Name)
B: (B_id, A_id, Name)
C: (C_id, B_id, State)
What i want is to have the following resultset:
A.A_id,A.Name, C.State
the complicator is that i need State to have a default value when there is no B data to link.
In that case, i want
A.A_id, A.Name, 'Default_Value'
I dont know much of advanced Sql, so any pointers are greatly appreciated.

select
coalesce(c.State, 'default value')
from
a
left join b on a.id = b.A_id
left join c on b.B_id = c.B_id
the best visual explanation of joins I've ever seen: A Visual Explanation of SQL Joins
COALESCE() returns the first of its parameters which isn't NULL

You could use ISNULL in the select
SELECT A.A_id,A.Name, ISNULL (C.State, 'Default_Value')
from A
left join b...
left join c...

SELECT A.A_id, A.Name, COASLESCE(C.State, 'Default_Value')
FROM
A LEFT JOIN
(B INNER JOIN C ON C.B_id = B.B_id)
ON B.A_id = A.A_id
Some information on joins: What is the difference between "INNER JOIN" and "OUTER JOIN"?
What's happening here is that we are joining table B and C with an INNER JOIN where the respective B_id column is equal. The INNER specifies that results will be returned only when records exist in both tables that match the C.B_id = B.B_id condition.
The LEFT JOIN will join those combined values to table A if the matching condition exists, while still returning the records from table A if no match exists. That is, if nothing exists for the condition B.A_id = A.A_id, NULL values are returned for the columns from the right side of the join (the B and C join). We perform the COASLESCE, so that if the queried column returns with NULL, it can default to some specified value.
COALESCE has some added benefits when performing this function: http://msdn.microsoft.com/en-us/library/ms190349.aspx
One last thing, table B in your example is commonly known as a junction table (or join table, or bridge table)... http://en.wikipedia.org/wiki/Junction_table

Related

Best way to eliminate duplicates rows after multiple joins

I'll consider three simple tables. A, B are my entity tables and C is an intermediate table that creates a many-to-many relationship between A & B.
Schemas:
A: (id INTEGER PRIMARY KEY)
B: (id INTEGER PRIMARY KEY)
C: (
A_id INTEGER,
B_id INTEGER,
FOREIGN KEY(A_id) REFERENCES A(id),
FOREIGN KEY(B_id) REFERENCES B(id)
)
Now, consider the below query
SELECT
A.id
FROM A
LEFT OUTER JOIN C
ON (A.id = C.A_id)
LEFT OUTER JOIN B
ON (C.B_id = B.id)
WHERE ...;
This query would result in duplicate values of A.id, which is expected because C might have multiple rows associated with each row of A. My question is what's the best way to eliminate these duplicates and get the A records. I only need the A records.
I am aware of two ways,
-- Using DISTINCT
SELECT
DISTINCT(A.id), ...
FROM A
LEFT OUTER JOIN C
ON (A.id = C.A_id)
LEFT OUTER JOIN B
ON (C.B_id = B.id)
WHERE ...
ORDER BY A.id;
And
-- Or using A.id IN (above query)/ A.id = Any(above query)
SELECT
...
FROM A
WHERE A.id IN (
SELECT
A.id
FROM A
LEFT OUTER JOIN C
ON (A.id = C.A_id)
LEFT OUTER JOIN B
ON (C.B_id = B.id)
WHERE ...
);
I'm using PostgreSQL. I need to include all the tables for filtering, so not joining a table cannot be considered as an improvement. I've analyzed both the queries but I still feel there might be a better way to do this(in terms of performance).
Any help is really appreciated!
I would suggest exists:
SELECT A.id
FROM A
WHERE EXISTS (SELECT 1
FROM C JOIN
B
ON C.B_id = B.id
WHERE A.id = C.A_id AND . . .
)
You can also try following query:
SELECT
a.* -- or whatever columns you need of a
FROM a
WHERE EXISTS(
SELECT 1
FROM c
WHERE c.a_id = a.id
)
Note, that there is no need to join table b as the existence of the row in c always guarantees for the row in b and you do not need any information contained in this row/table.
Perhaps even more clean might be:
SELECT DISTINCT
a.* -- or whatever columns you need of a
FROM a
LEFT JOIN c
You can have a look at the query plans and execution times using EXPLAIN ANALYZE <query>. Perhaps this gives you a hint on what to use best.
But be aware of caching, repeat both queries multiple times this way to see comparable results.

What is a good way to make multiple full outer join?

I'd like to know if anyone would know an elegant and scalable method to full outer join multiple tables, given that I might want to regularly add new tables to the join?
For now my method consists in full joining table A with table B, store the result as a cte, then full joining the cte to table C, store the result as a cte2, full joining cte2 to table D... you got it.
Creating a new cte every time i want to add another table to the join is not very practical, but every other solutions i found so far have the same issue, there's always some kind of infinite looping either on ctes or in selects (like SELECT blabla FROM (SELECT blabla2 FROM..)).
Is there any way that i don't know that would help me perform this multiple full join without falling in an infinite recursive loop of ctes?
Thanks
EDIT: Sorry it seems it wasn't clear enough
When i perform a multiple full join in one query like:
SELECT
a.*, b.*, c.*
FROM
tableA a
FULL JOIN
tableB b
ON
a.id = b.id
FULL JOIN
tableC c
ON
a.id = c.id
If the id is present in tableB and tableC but not tableA, my result will create two lines where there should be one, because i joined b to a and c to a but not b to c. That's why i need to full join the result of the full join of a and b to c.
So if i have let's say five table instead of three, i need to full join the result of the full join of the result of the full join of the result of the full join... x)
This fiddle illustrates the problem.
If you want the rows from tables B and C to join, you need to accomodate the fact that maybe the data comes from table B and not A. The easiest is probably to use COALESCE.
Your join should therefore look like:
SELECT a.*, b.*, c.*
FROM tableA a
FULL JOIN tableB b ON a.id = b.id
FULL JOIN tableC c ON COALESCE(a.id, b.id) = c.id
-- FULL JOIN tableD d ON COALESCE(a.id, b.id, c.id) = d.id
-- FULL JOIN tableE e ON COALESCE(a.id, b.id, c.id, d.id) = e.id
Most databases that support FULL JOIN also support USING, which is the simplest way to do what you want:
SELECT *
FROM tableA a FULL JOIN
tableB b
USING (id) FULL JOIN
tableC c
USING (id);
The semantics of USING mean that only non-NULL values are used, if such a value is available.

Are left outer joins associative?

It's easy to understand why left outer joins are not commutative, but I'm having some trouble understanding whether they are associative. Several online sources suggest that they are not, but I haven't managed to convince myself that this is the case.
Suppose we have three tables: A, B, and C.
Let A contain two columns, ID and B_ID, where ID is the primary key of table A and B_ID is a foreign key corresponding to the primary key of table B.
Let B contain two columns, ID and C_ID, where ID is the primary key of table B and C_ID is a foreign key corresponding to the primary key of table C.
Let C contain two columns, ID and VALUE, where ID is the primary key of table C and VALUE just contains some arbitrary values.
Then shouldn't (A left outer join B) left outer join C be equal to A left outer join (B left outer join C)?
In this thread, it is said, that they are not associative: Is LEFT OUTER JOIN associative?
However, I've found some book online where it is stated, that OUTER JOINs are associative, when the tables on the far left side and far right side have no attributes in common (here).
Here is a graphical presentation (MSPaint ftw):
Another way to look at it:
Since you said that table A joins with B, and B joins with C, then:
When you first join A and B, you are left with all records from A. Some of them have values from B. Now, for some of those rows for which you got value from B, you get values from C.
When you first join B and C, you and up with the whole table B, where some of the records have values from C. Now, you take all records from A and join some of them with all rows from B joined with C. Here, again, you get all rows from A, but some of them have values from B, some of which have values from C.
I don't see any possibility where, in conditons described by you, there would be a data loss depending on the sequence of LEFT joins.
Basing on the data provided by Tilak in his answer (which is now deleted), I've built a simple test case:
CREATE TABLE atab (id NUMBER, val VARCHAR2(10));
CREATE TABLE btab (id NUMBER, val VARCHAR2(10));
CREATE TABLE ctab (id NUMBER, val VARCHAR2(10));
INSERT INTO atab VALUES (1, 'A1');
INSERT INTO atab VALUES (2, 'A2');
INSERT INTO atab VALUES (3, 'A3');
INSERT INTO btab VALUES (1, 'B1');
INSERT INTO btab VALUES (2, 'B2');
INSERT INTO btab VALUES (4, 'B4');
INSERT INTO ctab VALUES (1, 'C1');
INSERT INTO ctab VALUES (3, 'C3');
INSERT INTO ctab VALUES (5, 'C5');
SELECT ab.aid, ab.aval, ab.bval, c.val AS cval
FROM (
SELECT a.id AS aid, a.val AS aval, b.id AS bid, b.val AS bval
FROM atab a LEFT OUTER JOIN btab b ON (a.id = b.id)
) ab
LEFT OUTER JOIN ctab c ON (ab.bid = c.id)
ORDER BY ab.aid
;
AID AVAL BVAL CVAL
---------- ---------- ---------- ----------
1 A1 B1 C1
2 A2 B2
3 A3
SELECT a.id, a.val AS aval, bc.bval, bc.cval
FROM
atab a
LEFT OUTER JOIN (
SELECT b.id AS bid, b.val AS bval, c.id AS cid, c.val AS cval
FROM btab b LEFT OUTER JOIN ctab c ON (b.id = c.id)
) bc
ON (a.id = bc.bid)
ORDER BY a.id
;
ID AVAL BVAL CVAL
---------- ---------- ---------- ----------
1 A1 B1 C1
2 A2 B2
3 A3
It seems in this particular example, that both solutions give the same result. I can't think of any other dataset that would make those queries return different results.
Check at SQLFiddle:
MySQL
Oracle
PostgreSQL
SQLServer
If you're assuming that you're JOINing on a foreign key, as your question seems to imply, then yes, I think OUTER JOIN is guaranteed to be associative, as covered by Przemyslaw Kruglej's answer.
However, given that you haven't actually specified the JOIN condition, the pedantically correct answer is that no, they're not guaranteed to be associative. There are two easy ways to violate associativity with perverse ON clauses.
1. One of the JOIN conditions involves columns from all 3 tables
This is a pretty cheap way to violate associativity, but strictly speaking nothing in your question forbade it. Using the column names suggested in your question, consider the following two queries:
-- This is legal
SELECT * FROM (A JOIN B ON A.b_id = B.id)
JOIN C ON (A.id = B.id) AND (B.id = C.id)
-- This is not legal
SELECT * FROM A
JOIN (B JOIN C ON (A.id = B.id) AND (B.id = C.id))
ON A.b_id = B.id
The bottom query isn't even a valid query, but the top one is. Clearly this violates associativity.
2. One of the JOIN conditions can be satisfied despite all fields from one table being NULL
This way, we can even have different numbers of rows in our result set depending upon the order of the JOINs. For example, let the condition for JOINing A on B be A.b_id = B.id, but the condition for JOINing B on C be B.id IS NULL.
Thus we get these two queries, with very different output:
SELECT * FROM (A LEFT OUTER JOIN B ON A.b_id = B.id)
LEFT OUTER JOIN C ON B.id IS NULL;
SELECT * FROM A
LEFT OUTER JOIN (B LEFT OUTER JOIN C ON B.id IS NULL)
ON A.b_id = B.id;
You can see this in action here: http://sqlfiddle.com/#!9/d59139/1
In addition to the previous answers: The topic is nicely discussed in Michael M. David, Advanced ANSI SQL Data Modeling and Structure Processing, Artech House, 1999, pages 19--21. Pages available online.
I find particularly noteworthy that he discusses that the table (LEFT JOIN ...) and join clauses (ON ... ) have to be considered separately, so associativity could refer to both (re-arranging of table clauses and re-arranging of join conditions, i.e., on clauses). So the notion of associativity is not the same as for, e.g., addition of numbers, it has two dimensions.

PL/SQL Using multiple left join

SELECT * FROM Table A LEFT JOIN TABLE B LEFT JOIN TABLE C
From the snippet above, TABLE C will left join into (TABLE B) or (data from TABLE A LEFT JOIN TABLE B) or (TABLE A)?
TABLE C will left join into 1. (TABLE B) or 2. (data from TABLE A LEFT JOIN
TABLE B) or 3. (TABLE A)?
The second. But The join condition will help you to understand more.
You can write:
SELECT *
FROM Table A
LEFT JOIN TABLE B ON (A.id = B.id)
LEFT JOIN TABLE C ON (A.ID = C.ID)
But you are able to:
SELECT *
FROM Table A
LEFT JOIN TABLE B ON (A.id = B.id)
LEFT JOIN TABLE C ON (A.id = C.id and B.code = C.code)
So, you can join on every field from previous tables and you join on "the result" (though the engine may choose its way to get the result) of the previous joins.
Think at left join as non-commutative operation (A left join B is not the same as B left join A) So, the order is important and C will be left joined at the previous joined tables.
The Oracle documentation is quite specific about how the joins are processed:
To execute a join of three or more tables, Oracle first joins two of
the tables based on the join conditions comparing their columns and
then joins the result to another table based on join conditions
containing columns of the joined tables and the new table. Oracle
continues this process until all tables are joined into the result.
This is the logic approach to handling the joins and is consistent with the ANSI standard (in other words, all database engines process the joins in order).
However, when the query is actually executed, the optimizer may choose to run the joins in a different order. The result needs to be logically the same as processing the joins in the order given in the query.
Also, the join conditions may cause some unexpected conditions to arise. So if you have:
from A left outer join
B
on A.id = B.id left outer join
C
on B.id = C.id
Then, you might have the condition where A and C each have a row with a particular id, but B does not. With this formulation, you will not see the row in C because it is joining to NULL. So, be careful with join conditions on left outer join, particularly when joining to a table other than the first table in the chain.
You need to mentioned the column name properly in order to run the query. Let´s say if you are using:
SELECT *
FROM Table A
LEFT JOIN TABLE B ON (A.id = B.id)
LEFT JOIN TABLE C ON (A.id = C.id and B.code = C.code)
Then you may get the following error:
ORA-00933:SQL command not properly ended.
So to avoid it you can try:
SELECT A.id as "Id_from_A", B.code as "Code_from_B"
FROM Table A
LEFT JOIN TABLE B ON (A.id = B.id)
LEFT JOIN TABLE C ON (A.id = C.id and B.code = C.code)
Thanks

Do I have to do a LEFT JOIN after a RIGHT JOIN?

Say I have three tables in SQL server 2008 R2
SELECT a.*, b.*, c.*
FROM
Table_A a
RIGHT JOIN Table_B b ON a.id = b.id
LEFT JOIN Table_C c ON b.id = c.id
or
SELECT a.*, b.*, c.*
FROM
Table_A a
RIGHT JOIN Table_B b ON a.id = b.id
JOIN Table_C c ON b.id = c.id
also, does it matter if I use b.id or a.id on joining c?
i.e. instead of JOIN Table_C c ON b.id = c.id, use JOIN Table_C c ON a.id = c.id
Thank you!
If it doesn't change the semantics of the query, the database server can reorder the joins to run in whichever way it thinks is more efficient.
Usually, if you want to force a certain order, you can use inline view subqueries, as in
SELECT a.*, x.*
FROM
Table_A a
RIGHT JOIN
(
SELECT *, b.id as id2 FROM Table_B b
LEFT JOIN Table_C c ON b.id = c.id
) x
ON a.id = x.id2
According to the definitions:
JOIN
: Return rows when there is at least one match in both tables
LEFT JOIN Return all rows from the left table, even if there are no matches in the right table
RIGHT JOIN Return all rows from the right table, even if there are no matches in the left table
The first option would include all raws from the 1st Join on Tables a and b even if there are no matching ones in table c, while the second statement would show only raws which match ones in table c.
regarding the second question i guess it would make a difference, since the 1st join includes all ids from table b, even though there are no matching ones in table a, so once you change your Join creterium to a.id you will get a different set of ids than b.id.
Yes, you do need a LEFT JOIN after a RIGHT JOIN
See
http://sqlfiddle.com/#!3/2c079/5/0
http://sqlfiddle.com/#!3/2c079/6/0
If you don't, the (inner) JOIN at the end will cancel out the effect of your RIGHT JOIN.
That wouldn't make any sense to have a RIGHT JOIN if you don't care. And if you care, you will have to add a LEFT JOIN after it.