Explanation of code for right excluding join? - sql

I just found a great page with Venn diagrams of different joins and the code for executing them:
http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
I used the "Right Excluding Join" in my query, the Venn diagram looks like this:
and here is the code:
SELECT subjects.subject
FROM sold_subjects
RIGHT JOIN subjects
ON sold_subjects.subject = subjects.subject
WHERE sold_subjects.subject IS NULL
I am asking for an explanation of what this code actually does, particularly what happens in the last row. I understand that we are joining the two relations where they have the same subject, but what happens when we set subjects for one of the relations to NULL in the last row?

First, what do JOIN and RIGHT JOIN do?
The JOIN gets information from two tables and joins them according to rules you specify in the ON or WHERE clauses.
The JOIN modifiers, such as LEFT, INNER, OUTER and RIGHT control the behavior you JOIN will have in case of unmatched records -- when no record in A matches a record in B according to the specified rules, and vice-versa.
To understand this part, take table A as being the left table and table B as being the right one. When you have multiple joins, the right table in each join is the one whose name is immediately right of the JOIN command.
e.g. FROM a1 LEFT JOIN ... LEFT JOIN b
The b table is the right one and whatever comes before is the left one.
This is a summary of the modifiers' behavior:
LEFT: preserves unmatched records in the left table, discards those in the right table;
RIGHT: preserves unmatched records in the right table, discards those in the left table;
INNER: preserves only the records that are matched, discards unmatched from both tables;
OUTER or FULL: preserves all records, regardless of matches.
What is visually happening?
Imagine you have two simple tables with the same names of the ones you put in there.
sold_subjects subjects
subject subject
1 1
2 4
3 5
4 6
When you RIGHT JOIN two tables, you create a third one that looks like this:
joined_table
sold_subjects.subject subjects.subject
1 1
4 4
NULL 5
NULL 6
Please note that the subjects 2 and 3 are already gone in this subset.
When you add a WHERE clause with sold_subjects.subject IS NULL, you are only keeping the last two lines where there was no match in subjects.

The right join makes sure that you will keep all the records of the right table. If there is no match with the left table, then all the variables in the result originating from the left table will be null (because there is no match).
The where clause checks whether the value of lefttable.subject is null or not. If it's not null, then obviously the join succeeded. If it is null, then the join did not work, leaving this value blank. So this where clause will, per definition, return all the records of the right table that have no match in the left table, which is exactly what the venn diagram says!
This is a very common practice in SQL, there are may use cases. For example: left table is sales, right table is customers, and you want to know all the customers without sales.

RIGHT JOIN is shorthand for RIGHT OUTER JOIN.
Consider the excellent explanation in the fine manual:
LEFT OUTER JOIN returns all rows in the qualified Cartesian product
(i.e., all combined rows that pass its join condition), plus one copy
of each row in the left-hand table for which there was no right-hand
row that passed the join condition. This left-hand row is extended to
the full width of the joined table by inserting null values for the
right-hand columns. Note that only the JOIN clause's own condition is
considered while deciding which rows have matches. Outer conditions
are applied afterwards.
Conversely, RIGHT OUTER JOIN returns all the joined rows, plus one row
for each unmatched right-hand row (extended with nulls on the left).
This is just a notational convenience, since you could convert it to a
LEFT OUTER JOIN by switching the left and right tables.
Bold emphasis mine. Your query is just one way to exclude rows that are not present in another table, with a shiny buzz word attached ("Right Excluding JOIN"). There are others:
Select rows which are not present in other table
Now, for the tricky part - or where you deviate from the original:
But what happens when we set subjects for one of the relations to NULL in the last row?
Your query has:
WHERE sold_subjects.subject IS NULL
Where the original says:
WHERE A.Key IS NULL
Key is supposed to imply NOT NULL. The query simply does not work if either of the underlying table columns sold_subjects.subject or subjects.subject can be NULL. There would be no way to disambiguate how the row qualified:
subjects.subject IS NULL and no row with NULL in sold_subjects.subject
subjects.subject IS NULL and some row with NULL in sold_subjects.subject
subjects.subject IS NOT NULL but no matching row in sold_subjects
If one of the linking columns can be NULL, and you want to treat NULL values like they were actual values (which they are not), i.e. match NULL to NULL, you could substitute with an anti-join using the NULL-safe operator IS NOT DISTINCT FROM:
SELECT s.subject
FROM subjects s
LEFT JOIN sold_subjects ss ON ss.subject IS NOT DISTINCT FROM s.subject
WHERE ss.subject IS NULL;
Also with shorter syntax, using the more commonly used LEFT JOIN, but otherwise identical. IS NOT DISTINCT FROM is often slower than a simple =, only use it where you need it. Typically, you join tables on key columns that are defined NOT NULL - implicitly (a PK column is NOT NULL automatically) or explicitly.

Related

Is there any way or formula to calculate Number of Columns generated from LEFT, RIGHT, INNER , OUTER,CROSS

I attended few interviews, and this question was common, Given Two Table- A & B
Primary & Foreign Key- A_ID B_ID
A_ID
1
2
2
3
5
4
3
5
B_ID 2 1 1 3 3 3 4 5
Questions are asked like- How many Columns will be generated after Applying the following set of joins-
LEFT, RIGHT, OUTER, INNER, CROSS
It's evident that we can calculate the no. of columns generated through visual calculations, but is there any formula for answering faster??
JOINs of any sort do not generate columns. They generate rows.
The number of columns is whatever is selected.
Either you have misinterpreted the question or it is a trick question.
That's pretty simple. Unless you define a projection that states otherwise (i.e. use SELECT *) the number of columns of the result is the number of columns from one plus the number of columns from the other table, for all joins.
I think you've misremembered your interview - the number of columns arising from the joins shown is "two" in all cases (you've shown one column from two tables, joining will generate a result set in which there are two columns available to select. The number of columns produced by the query depends on which of these are selected and how many times. The same column can be selected multiple times. Other columns that are not part of the resultset from the join can be added too
I think your question was intended to assess your understanding of JOINs as a process and I think they wanted to know the number of ROWS produced by joining those result sets
INNER JOIN
Sum up the results of multiplying the number of appearances of a value on one side, by the number of appearances of a value on the other side
1: once on the left, twice on the right: 1x2=2
2: 2x1=2
3: 2x3=6
4: 1x1=1
5: 2x1=2
2+2+6+1+2 = 13
OUTER JOIN
As per inner join but any values that appear 0 times on the opposite side shall be considered as appearing 1 time on that opposite side.
Whether it is declared as LEFT or RIGHT join, and which way round the tables are written, determines which is the "opposite" side of the join
A LEFT JOIN B - A is on the left, it is a left join, A is the current side, B is the opposite side
A RIGHT JOIN B - A is on the left, it is a right join, B is the current side, A is the opposite side
B LEFT JOIN A - B is on the left, it is a left join, B is the current side, A is the opposite side
B RIGHT JOIN A - B is on the left, it is a right join, A is the current side, B is the opposite side
For example if A_ID had 6, 6, 6, 6 and B_ID never had any 6, and it was A LEFT JOIN B meaning that A is the current side and there are no 6 on the opposite, they shall be considered as appearing 1 time on the opposite.
Thus the formula uses 4x1 instead of 4x0 because there are four occurrences of 6 on the current and 0-but-consider-as-1 on the opposite
If it was A RIGHT JOIN B then B is the "current" side, A is the "opposite" side, the value 6 appears 0 times on the current side so the formula uses 0x4. We only ever up from 0 to 1 for the opposite side, bit the "current" side. If there are 0 occurrences on the current side we keep 0x4 and the rows disappear from the output
You don't have any rows that meet this criteria (some on one side but not the other), so the left and right joins are the same as an inner here
Avoid using RIGHT JOIN - rewrite them (and turn the tables round) so they become LEFT joins. It makes it a lot easier to reason about what rows are output because SQL processes things in a FROM clause in left to right order. Try to do all your inner joins first, then all your left joins. Any table that links to a left joined table also needs to be left joined if you don't want to lose rows
CROSS
The number of rows on the left multiplied by the number of rows on the right
I think they've asked you about the no of records affected or displayed for using each of operations.If not, numbers of columns i.e. (A_ID,B_ID) can be chosen depending on your interviewer wish to display

Why does FULL JOIN order make a difference in these queries?

I'm using PostgreSQL. Everything I read here suggests that in a query using nothing but full joins on a single column, the order of tables joined basically doesn't matter.
My intuition says this should also go for multiple columns, so long as every common column is listed in the query where possible (that is, wherever both joined tables have the column in common). But this is not the case, and I'm trying to figure out why.
Simplified to three tables a, b, and c.
Columns in table a: id, name_a
Columns in table b: id, id_x
Columns in table c: id, id_x
This query:
SELECT *
FROM a
FULL JOIN b USING(id)
FULL JOIN c USING(id, id_x);
returns a different number of rows than this one:
SELECT *
FROM a
FULL JOIN c USING(id)
FULL JOIN b USING(id, id_x);
What I want/expect is hard to articulate, but basically, a I'd like a "complete" full merger. I want no null fields anywhere unless that is unavoidable.
For example, whenever there is a not-null id, I want the corresponding name column to always have the name_a and not be null. Instead, one of those example queries returns semi-redundant results, with one row having a name_a but no id, and another having an id but no name_a, rather than a single merged row.
When the joins are listed in the other order, I do get that desired result (but I'm not sure what other problems might occur, because future data is unknown).
Your queries are different.
In the first, you are doing a full join to b using a single column, id.
In the second, you are doing a full join to b using two columns.
Although the two queries could return the same results under some circumstances, there is not reason to think that the results would be comparable.
Argument order matters in OUTER JOINs, except that FULL NATURAL JOIN is symmetric. They return what an INNER JOIN (ON, USING or NATURAL) does but also the unmatched rows from the left (LEFT JOIN), right (RIGHT JOIN) or both (FULL JOIN) tables extended by NULLs.
USING returns the single shared value for each specified column in INNER JOIN rows; in NULL-extended rows another common column can have NULL in one table's version and a value in the other's.
Join order matters too. Even FULL NATURAL JOIN is not associative, since with multiple tables each pair of tables (either operand being an original or join result) can have a unique set of common columns, ie in general (A ⟗ B) ⟗ C ≠ A ⟗ (B ⟗ C).
There are a lot of special cases where certain additional identities hold. Eg FULL JOIN USING all common column names and OUTER JOIN ON equality of same-named columns are symmetric. Some cases involve CKs (candidate keys), FKs (foreign keys) and other constraints on arguments.
Your question doesn't make clear exactly what input conditions you are assuming or what output conditions you are seeking.

Join table with almost same data

I have 2 tables, each table having one column(datatype=char). First table has 3 rows all single A's, second table has 5 rows all Single A's. What will be the result of inner join, left join, right join, full outer join.
I know the result but i want to understand how it works in detail.
When there are matching rows in both tables, there is no difference between any of those types of joins. They will give identical results: one result row for each combination of matching left-hand rows and right-hand rows.
The different types of joins are relevant when you do have a particular value in one table and do not have it in the other.
If you want to include rows from the left-hand table even if there are no matching right-hand rows, you'd use LEFT JOIN. Likewise you'd use RIGHT JOIN when you want to match the other way around: include all right-hand rows, even if there are no matching left-hand rows.
And when you only want to include rows for which there is a match, you'd use INNER JOIN.

When joining 2 tables one table comes up null

I am joining 2 tables on the first table I get all the relevant data on the second table I only get nulls. There are no nulls in either table Can any one tell me why this is happening?
select * from apmast
left join apitem
on apmast.fvendno + apmast.fccompany = apitem.fcinvkey
There is a problem with your ON that's resulting in you not getting matching records. A LEFT JOIN means that you should get all data from the left table and only the matching records from the right table, or else NULL where there are no matching records. The key to the join, however, is the ON statement. Make sure that apmast.fvendno + apmast.fccompany is actually equal to apitem.fcinvkey.
here is a explanation on the types of joins just incase you get stuck in the future.
INNER JOIN this will get only the rows that match in both the FROM clause and the JOINING table.
LEFT OUTER JOIN this gets all the rows from the table specified in the FROM clause and only the rows that match in the JOINING table.
RIGHT OUTER JOIN this gets all the rows from the table specified in the JOIN clause and only the rows that match in the FROM clause.
FULL OUTER JOIN this will get all the rows from both tables.
SELF JOIN this is used when you need to join the table back to its self to return data.

What are the uses of the different join operations?

What are the uses of the different join operations in SQL? Like I want to know why do we need the different inner and outer joins?
The only type of join you really need is LEFT OUTER JOIN. Every other type of join can be rewritten in terms of one or more left outer joins, and possibly some filtering. So why do we need all the others? Is it just to confuse people? Wouldn't it be simpler if there were only one type of join?
You could also ask: Why have both a <= b and b >= a? Don't these just do the same thing? Can't we just get rid of one of them? It would simplify things!
Sometimes it's easier to swap <= to >= instead of swapping the arguments round. Similarly, a left join and a right join are the same thing just with the operands swapped. But again it's practical to have both options instead of requiring people to write their queries in a specific order.
Another thing you could ask is: In logic why do we have AND, OR, NOT, XOR, NAND, NOR, etc? All these can be rewritten in terms of NANDs! Why not just have NAND? Well it's awkward to write an OR in terms of NANDs, and it's not as obvious what the intention is - if you write OR, people know immediately what you mean. If you write a bunch of NANDs, it is not obvious what you are trying to achieve.
Similarly, if you want to do a FULL OUTER JOIN b you could make a left join and a right join, remove duplicated results, and then union all. But that's a pain and so there's a shorthand for it.
When do you use each one? Here's a simplified rule:
If you always want a result row for each row in the LEFT table, use a LEFT OUTER JOIN.
If you always want a result row for each row in the RIGHT table, use a RIGHT OUTER JOIN.
If you always want a result row for each row in either table, use a FULL OUTER JOIN.
If you only want a result row when there's a row in both tables, use an INNER JOIN.
If you want all possible pairs of rows, one row from each table, use a CROSS JOIN.
inner join - joins rows from both sets of the match based on specified criteria.
outer join - selects all of one set, along with matching or empty (if not matched) elements from the other set. Outer joins can be left or right, to specify which set is returned in its entirety.
To make the other answers clearer - YOU GET DIFFERENT RESULTS according to the join you choose, when the columns you're joining on contain null values - for example.
So - for each Real-life scenario there is a join that suits it (either you want the lines without the data or not in the null values example).
My answer assumes 2 tables joined on a single key:
INNER JOIN - get the results that are in both join tables (according to the join rule)
FULL OUTER JOIN - get all results from both table (Cartesian product)
LEFT OUTER JOIN - get all the results from left table and the matching results from the right
You can add WHERE clauses in order to further constrain the results.
Use these in order to only get what you want to get.