Join clause joining 3 tables in same criteria - sql

I've saw a join just like this:
Select <blablabla>
from
TableA TA
Inner join TableB TB on Ta.Id = Tb.Id
Inner join TableC TC on Tc.Id = Tb.Id and Ta.OtheriD = Tc.OtherColumn
But what's the point (end effect) of that second join clause?
What the implications when an outer join clause is used?
And, more important, what is the best to rewrite it in a way that is easy
to understand what it's trying to join?
And, more important, what is the best way to rewrite it to get rid of the construction
and mantain the correctness of the query.
I don't specify the RDBMS, because it's a more generic question, but for those
curious (since people always ask): it's SQL Server 2005.
EDIT: It's just a made up example (since I would have to dig the original source - which I don't have access anymore). I found the original join clause on a 10 join SELECT command.

It simply means you have an extra restriction on the intersection between tablea and tablec.
Because we know Ta.Id = Tb.Id, Tc.Id = Tb.Id is the same as Tc.Id = Ta.Id. Inner joins are associative. So it makes more sense like this so each join is between 2 tables only
Select <blablabla>
from
TableB TB
Inner join
TableA TA on Tb.Id = Ta.Id --a and b intersection
Inner join
TableC TC on Ta.Id = Tc.Id and Ta.OtheriD = Tc.Column --a and c intersection

Your Q : But what's the point (end effect) of that second join clause?
Effectively filters rows...you could move the second half of the on statement into the where clause if you really want, only really effects readability. gbn's answer looks good for this 3 table example,but to expand on it...sometimes a rewrite like this isn't possible. I have seen an occasion where 2 different systems (one oracle 8i and one SQL server 2000) had their databases joined together. A 3 part key was identified as being required to make the records unique in both systems, but each component of the 3 part key was held in different tables...the final result had a few joins like that.
Functionally...I'm not sure if there's a difference really. Unless I'm completely off, readability seems to be the biggest difference.
Your Second Q: What the implications when an outer join clause is used?
You'll potentially get a bunch of nulls (pending how you setup the outer join) while the inner join would have dropped them. Be careful though...inner joins is associative...as gbn put it: An OUTER JOIN is different and order does matter

The user may want to furthur filter the set of rows which are included in the Join set...

The point of the second join is to further limit your result set based on the contents of TableC. The first join gives you ONLY records that exist in TA and TB. The second join gives you ONLY results from the first join that also exist in TC.

Related

Where vs ON in outer join

I am wondering how to have a better SQL performance when we decide whether to duplicate our criteria when it is already in Where clause.
My friend claimed it is up to DB engines but I am not so sure.
Regardless of DB engines, normally, the condition in Where clause should be executed first before join, but I assume it means inner join but not outer join. Because some conditions can only be executed AFTER outer join.
For example:
Select a.*, b.*
From A a
Left outer join B on a.id = b.id
Where b.id is NULL;
The condition in Where cannot be executed before outer join.
So, I assume the whole ON clause must be executed first before where clause, and it seems the ON clause will control the size of table B (or table A if we use right outer join) before outer join. That seems not related to DB engines to me.
And that raised my question: when we use outer join, should we always deplicate our criteria in ON Clause?
for example (I use a table to outer join with a shorter version of itself)
temp_series_installment & series_id > 18940000 vs temp_series_installment:
select sql_no_cache s.*, t.* from temp_series_installment s
left outer join temp_series_installment t on s.series_id = t.series_id and t.series_id > 18940000 and t.incomplete = 1
where t.incomplete = 1;
VS
select sql_no_cache s.*, t.* from temp_series_installment s
left outer join temp_series_installment t on s.series_id = t.series_id and t.series_id > 18940000
where t.incomplete = 1;
Edit: where t.incomplete = 1 performs the logic of: where t.series_id is not null
which is an inner join suggested by Gordon Linoff
But what I have been asking is: if it outer join a smaller table, it should have been faster right?
I tried to see if there is any performace difference in mysql:
But it is out of my expectation, why is the second one faster? I thought by outer joining a smaller table, the query will be faster.
My idea is from:
https://www.ibm.com/support/knowledgecenter/en/SSZLC2_8.0.0/com.ibm.commerce.developer.doc/refs/rsdperformanceworkspaces.htm
Section:
Push predicates into the OUTER JOIN clause whenever possible
Duplicate constant condition for different tables whenever possible
Regardless of DB engines, normally, the condition in Where clause should be executed first before join, but I assume it means inner join but not outer join. Because some conditions can only be executed AFTER outer join.
This is simply not true. SQL is a descriptive language. It does not specify how the query gets executed. It only specifies what the result set looks like. The SQL compiler/optimizer determines the actual processing steps to meet the requirements described by the query.
In terms of semantics, the FROM clause is the first clause that is "evaluated". Hence, FROM is logically processed before the WHERE clause.
The rest of your question is similarly misguided. Comparison logic in the where clause, such as:
from s left join
t
on s.series_id = t.series_id and t.series_id > 18940000
where t.incomplete = 1
turns the outer join into an inner join. Hence, the logic is different from what you think is going on.
As Gordon Lindolf pointed out it's not true, Your friend is plain wrong.
I want just to add developers like to think SQL like they think their language of trade (C++, VB, Java), but those are procedural/imperative languages.
When you code SQL you are in another paradigm. You are just describing a function to be applied to a dataset.
Let's get your own example:
Select a.*, b.*
From A a
Left outer join B on a.id = b.id
Where b.id is NULL;
If a.Id and b.Id are not null columns.
It's semantically equal to
Select a.*, null, ..., null
From A a
where not exists (select * from B b where b.Id = a.Id)
Now try to run those to queries and profile.
In most DBMS I can expect both queries to run in the exact same way.
It happens because the engine decides how to implement your "function" over the dataset.
Note the above example is the equivalent in set mathematics to:
Give me the set A minus the intersection between A and B.
Engines can decide how to implement your query because they have some tricks under its sleeve.
It has metrics about your tables, indexes, etc and can use it to, for example, "make a join" in a diferent order you wrote it.
IMHO engines today are really good at finding the best way to implement the function you describe and rarely needs query hints.
Of course you can end describing your funciton in a way too complicated, affecting how the engines decides to run it.
The art of better describing functions and sets and managins indexes is what we call query tunning.

Use of more then one join and left join

If I got more then one join and in the second join I use left join.
by using this clause its going to take all the data from the two first tables or only from the second table.
Thanks
Join is just a method to connect different tables. Theoretically (not computationally), there's no limit on the number of joins you used on a query.
Keep in mind that the order of joins are important once you started to use something other than inner joins. For example, a LEFT JOIN b is not equivalent to b LEFT JOIN a.
With that being said, when you have more than one join, the result should be interpreted carefully.
Consider
SELECT a.id,b.name,c.department
FROM
a INNER JOIN b on a.id = b.id
LEFT JOIN c on a.id = c.id
The resulting table would consist of all id that is present in both a and b, and return NULL if a department is not present for those id.
So to answer your question, joins consider all your data in the query. However, the output table depends on the joins you used. If there are still confusion, you can refer to this question which addressed a similar thing.

Why are the two queries different (left join on ... and ... as opposed to using where clause)

I'm wondering why the following two queries produce different results (the first query has more rows than second).
SELECT * FROM A
JOIN ...
JOIN ...
JOIN C ON ...
LEFT JOIN B ON B.id = A.id AND B.otherId = C.otherId
As opposed to:
SELECT * FROM A
JOIN ...
JOIN ...
JOIN C ON ...
LEFT JOIN B ON B.id = A.id
WHERE B.otherId = C.otherId
Please help me understand. In the second query, the left join has only 1 condition so shouldn't it include all the results from the first query and more (where the extra rows have unmatched otherId). Then the WHERE clause should ensure that the otherId matches, like in the first query. Why are they different?
The WHERE is performed first by the Query engine before performing the JOIN.
The reasoning being why do the expensive JOIN, if we are going to filter some rows later.
The query engines are pretty good at optimizing the query you write.
Also you will see this effect only in OUTER JOINs. In inner joins both WHERE and JOIN conditions behave the same.
The second query returns less rows because your where clause was filtering the records out, and this is essentially changing the query from a left outer join to an inner join. So, you need to be careful where you place your filters in, but this will not matter if you were to do an inner join.
You've received correct answers, but allow me to delve a little deeper into the difference between join criteria and filtering criteria. Take a simple query with a left join:
select a.Key, a.NonKey1, b.NonKey2
from a
left join b on b.Key = a.Key;
This lists out all of NonKey1 values from table a and any NonKey2 fields from table b with matching key values or NULL where there is no match. A common variant is to look at only those rows in a that do not have a match in b:
select a.Key, a.NonKey1, b.NonKey2
from a
left join b on b.Key = a.Key
where b.Key is null;
Careful! If you accidentally write where b.Key is not null you've just changed your outer join into a regular inner join. Do that sometime and see if QA can catch it. On second thought, don't. (Also, having b.NonKey2 in the selection list is meaningless as it can only ever be NULL, but let's leave it there for the moment.) The join is based on the key fields of both tables matching. After the joining is complete, all rows with a successful join are discarded and only the results without a match remain. That means b.Key in the join criteria cannot be NULL and in the filtering criteria must be NULL for a row to be added to the result set. Fine, that's what we wanted. But consider what would happen if we moved the check to become part of the join criteria.
select a.Key, a.NonKey1, b.NonKey2
from a
left join b on b.Key = a.Key and b.Key is null;
The result is everything from a with nothing at all from b. Probably not what we wanted. If you think about it, you will see we could just as well have written on 0 = 1 and gotten the same result. What we've done is move a value from one context where NULL means one thing (success) to a context where NULL means something entirely different (failure).
So, in computer languages as in human languages, be careful of context. It can completely change the meaning of what you're trying to say.

Left join or select from multiple table using comma (,) [duplicate]

This question already has answers here:
SQL left join vs multiple tables on FROM line?
(12 answers)
Closed 8 years ago.
I'm curious as to why we need to use LEFT JOIN since we can use commas to select multiple tables.
What are the differences between LEFT JOIN and using commas to select multiple tables.
Which one is faster?
Here is my code:
SELECT mw.*,
nvs.*
FROM mst_words mw
LEFT JOIN (SELECT no as nonvs,
owner,
owner_no,
vocab_no,
correct
FROM vocab_stats
WHERE owner = 1111) AS nvs ON mw.no = nvs.vocab_no
WHERE (nvs.correct > 0 )
AND mw.level = 1
...and:
SELECT *
FROM vocab_stats vs,
mst_words mw
WHERE mw.no = vs.vocab_no
AND vs.correct > 0
AND mw.level = 1
AND vs.owner = 1111
First of all, to be completely equivalent, the first query should have been written
SELECT mw.*,
nvs.*
FROM mst_words mw
LEFT JOIN (SELECT *
FROM vocab_stats
WHERE owner = 1111) AS nvs ON mw.no = nvs.vocab_no
WHERE (nvs.correct > 0 )
AND mw.level = 1
So that mw.* and nvs.* together produce the same set as the 2nd query's singular *. The query as you have written can use an INNER JOIN, since it includes a filter on nvs.correct.
The general form
TABLEA LEFT JOIN TABLEB ON <CONDITION>
attempts to find TableB records based on the condition. If the fails, the results from TABLEA are kept, with all the columns from TableB set to NULL. In contrast
TABLEA INNER JOIN TABLEB ON <CONDITION>
also attempts to find TableB records based on the condition. However, when fails, the particular record from TableA is removed from the output result set.
The ANSI standard for CROSS JOIN produces a Cartesian product between the two tables.
TABLEA CROSS JOIN TABLEB
-- # or in older syntax, simply using commas
TABLEA, TABLEB
The intention of the syntax is that EACH row in TABLEA is joined to EACH row in TABLEB. So 4 rows in A and 3 rows in B produces 12 rows of output. When paired with conditions in the WHERE clause, it sometimes produces the same behaviour of the INNER JOIN, since they express the same thing (condition between A and B => keep or not). However, it is a lot clearer when reading as to the intention when you use INNER JOIN instead of commas.
Performance-wise, most DBMS will process a LEFT join faster than an INNER JOIN. The comma notation can cause database systems to misinterpret the intention and produce a bad query plan - so another plus for SQL92 notation.
Why do we need LEFT JOIN? If the explanation of LEFT JOIN above is still not enough (keep records in A without matches in B), then consider that to achieve the same, you would need a complex UNION between two sets using the old comma-notation to achieve the same effect. But as previously stated, this doesn't apply to your example, which is really an INNER JOIN hiding behind a LEFT JOIN.
Notes:
The RIGHT JOIN is the same as LEFT, except that it starts with TABLEB (right side) instead of A.
RIGHT and LEFT JOINS are both OUTER joins. The word OUTER is optional, i.e. it can be written as LEFT OUTER JOIN.
The third type of OUTER join is FULL OUTER join, but that is not discussed here.
Separating the JOIN from the WHERE makes it easy to read, as the join logic cannot be confused with the WHERE conditions. It will also generally be faster as the server will not need to conduct two separate queries and combine the results.
The two examples you've given are not really equivalent, as you have included a sub-query in the first example. This is a better example:
SELECT vs.*, mw.*
FROM vocab_stats vs, mst_words mw
LEFT JOIN vocab_stats vs ON mw.no = vs.vocab_no
WHERE vs.correct > 0
AND mw.level = 1
AND vs.owner = 1111

SELECT subset from two tables and LEFT JOIN results

I'm trying to write a bit of SQL for SQLITE that will take a subset from two tables (TableA and TableB) and then perform a LEFT JOIN.
This is what I've tried, but this produces the wrong result:
Select * from TableA
Left Join TableB using(key)
where TableA.key2 = "xxxx"
AND TableB.key3 = "yyyy"
This ignore cases where key2="xxxx" but key3 != "yyyy".
I want all the rows from TableA that match my criteria whether or not their corresponding value in TableB matches, but only those rows from TableB that match both conditions.
I did manage to solve this by using a VIEW, but I'm sure there must be a better way of doing this. It's just beginning to drive me insane tryng to solve it now.
(Thanks for any help, hope I've explained this well enough).
YOu have made the classic left join mistake. In most databases if you want a condition on the table on the right side of the left join you must put this condition in the join itself and NOT the where clause. IN SQL Server this would turn the left join into an inner join. I've not used SQl lite so I don't know if it does the same but all records must meet the where clause generally.
Select *
from TableA
Left Join TableB on TableA.key = TableB.key
and TableB.key3 = "yyyy"
where TableA.key2 = "xxxx"