This question already has answers here:
SQL left join vs multiple tables on FROM line?
(12 answers)
Closed 8 years ago.
I'm curious as to why we need to use LEFT JOIN since we can use commas to select multiple tables.
What are the differences between LEFT JOIN and using commas to select multiple tables.
Which one is faster?
Here is my code:
SELECT mw.*,
nvs.*
FROM mst_words mw
LEFT JOIN (SELECT no as nonvs,
owner,
owner_no,
vocab_no,
correct
FROM vocab_stats
WHERE owner = 1111) AS nvs ON mw.no = nvs.vocab_no
WHERE (nvs.correct > 0 )
AND mw.level = 1
...and:
SELECT *
FROM vocab_stats vs,
mst_words mw
WHERE mw.no = vs.vocab_no
AND vs.correct > 0
AND mw.level = 1
AND vs.owner = 1111
First of all, to be completely equivalent, the first query should have been written
SELECT mw.*,
nvs.*
FROM mst_words mw
LEFT JOIN (SELECT *
FROM vocab_stats
WHERE owner = 1111) AS nvs ON mw.no = nvs.vocab_no
WHERE (nvs.correct > 0 )
AND mw.level = 1
So that mw.* and nvs.* together produce the same set as the 2nd query's singular *. The query as you have written can use an INNER JOIN, since it includes a filter on nvs.correct.
The general form
TABLEA LEFT JOIN TABLEB ON <CONDITION>
attempts to find TableB records based on the condition. If the fails, the results from TABLEA are kept, with all the columns from TableB set to NULL. In contrast
TABLEA INNER JOIN TABLEB ON <CONDITION>
also attempts to find TableB records based on the condition. However, when fails, the particular record from TableA is removed from the output result set.
The ANSI standard for CROSS JOIN produces a Cartesian product between the two tables.
TABLEA CROSS JOIN TABLEB
-- # or in older syntax, simply using commas
TABLEA, TABLEB
The intention of the syntax is that EACH row in TABLEA is joined to EACH row in TABLEB. So 4 rows in A and 3 rows in B produces 12 rows of output. When paired with conditions in the WHERE clause, it sometimes produces the same behaviour of the INNER JOIN, since they express the same thing (condition between A and B => keep or not). However, it is a lot clearer when reading as to the intention when you use INNER JOIN instead of commas.
Performance-wise, most DBMS will process a LEFT join faster than an INNER JOIN. The comma notation can cause database systems to misinterpret the intention and produce a bad query plan - so another plus for SQL92 notation.
Why do we need LEFT JOIN? If the explanation of LEFT JOIN above is still not enough (keep records in A without matches in B), then consider that to achieve the same, you would need a complex UNION between two sets using the old comma-notation to achieve the same effect. But as previously stated, this doesn't apply to your example, which is really an INNER JOIN hiding behind a LEFT JOIN.
Notes:
The RIGHT JOIN is the same as LEFT, except that it starts with TABLEB (right side) instead of A.
RIGHT and LEFT JOINS are both OUTER joins. The word OUTER is optional, i.e. it can be written as LEFT OUTER JOIN.
The third type of OUTER join is FULL OUTER join, but that is not discussed here.
Separating the JOIN from the WHERE makes it easy to read, as the join logic cannot be confused with the WHERE conditions. It will also generally be faster as the server will not need to conduct two separate queries and combine the results.
The two examples you've given are not really equivalent, as you have included a sub-query in the first example. This is a better example:
SELECT vs.*, mw.*
FROM vocab_stats vs, mst_words mw
LEFT JOIN vocab_stats vs ON mw.no = vs.vocab_no
WHERE vs.correct > 0
AND mw.level = 1
AND vs.owner = 1111
Related
Below is a join based on where clause:
SELECT a.* FROM TEST_TABLE1 a,TEST_TABLE2 b,TEST_TABLE3 c
WHERE a.COL11 = b.COL11
AND b.COL12 = c.COL12
AND a.COL3 = c.COL13;
I have been learning SQL from online resources and trying to convert it with joins
Two issues:
The original query is confusing. The outer joins (with the (+) suffix) are made irrelevant by the last where condition. Because of that condition, the query should only return records where there is an actual matching c record. So the original query is the same as if there were no such (+) suffixes.
Your query joins TEST_TABLE3 twice, while the first query only joins it once, and there are two conditions that determine how it is joined there. You should not split those conditions over two separate joins.
BTW, it is surprising that the SQL Fiddle site does not show an error, as it makes no sense to use the same alias twice. See for example how MySQL returns the error with the same query on dbfiddle (on all available versions of MySQL):
Not unique table/alias: 'C'
So to get the same result using the standard join notation, all joins should be inner joins:
SELECT *
FROM TEST_TABLE1 A
INNER JOIN TEST_TABLE2 B
ON A.COL11 = B.COL11
INNER JOIN TEST_TABLE3 C
ON A.COL11 = B.COL11
AND B.COL12 = C.COL12;
#tricot correctly pointed out that it's strange to have 2 aliases with the same name and not getting an error. Also, to answer your question :
In the first query, we are firstly performing cross join between all the 3 tables by specifying all the table names. After that, we are filtering the rows using the condition specified in the WHERE clause on output that we got after performing cross join.
In second query, you need to join test_table3 only once. Since now you have all the required aliases A,B,C as in the first query so you can specify 2 conditions after the last join as below:
SELECT A.* FROM TEST_TABLE1 A
LEFT JOIN TEST_TABLE2 B
ON A.COL11 = B.COL11
left join TEST_TABLE3 C
on B.COL12 =C.COL12 AND A.COL3 = C.COL13;
This question already has answers here:
What is the difference between using a cross join and putting a comma between the two tables?
(8 answers)
Closed last year.
What is the difference?
SELECT a.name, b.name
FROM a, b;
SELECT a.name, b.name
FROM a
CROSS JOIN b;
If there is no difference then why do both exist?
The first with the comma is an old style from the previous century.
The second with the CROSS JOIN is in newer ANSI JOIN syntax.
And those 2 queries will indeed give the same results.
They both link every record of table "a" against every record of table "b".
So if table "a" has 10 rows, and table "b" has 100 rows.
Then the result would be 10 * 100 = 1000 records.
But why does that first outdated style still exists in some DBMS?
Mostly for backward compatibility reasons, so that some older SQL's don't suddenly break.
Most SQL specialists these days would frown upon someone who still uses that outdated old comma syntax. (although it's often forgiven for an intentional cartesian product)
A CROSS JOIN is a cartesian product JOIN that's lacking the ON clause that defines the relationship between the 2 tables.
In the ANSI JOIN syntax there are also the OUTER joins: LEFT JOIN, RIGHT JOIN, FULL JOIN
And the normal JOIN, aka the INNER JOIN.
But those normally require the ON clause, while a CROSS JOIN doesn't.
And example of a query using different JOIN types.
SELECT *
FROM jars
JOIN apples ON apples.jar_id = jars.id
LEFT JOIN peaches ON peaches.jar_id = jars.id
CROSS JOIN bananas AS bnns
RIGHT JOIN crates ON crates.id = jars.crate_id
FULL JOIN nuts ON nuts.jar_id = jars.id
WHERE jars.name = 'FruityMix'
The nice thing about the JOIN syntax is that the link criteria and the search criteria are separated.
While in the old comma style that difference would be harder to notice. Hence it's easier to forget a link criteria.
SELECT *
FROM crates, jars, apples, peaches, bananas, nuts
WHERE apples.jar_id = jars.id
AND jars.name = 'NuttyFruitBomb'
AND peaches.jar_id = jars.id(+)
AND crates.id(+) = jar.crate_id;
Did you notice that the first query has 1 cartesian product join, but the second has 2? That's why the 2nd is rather nutty.
Both expressions perform a Cartesian product of the two given tables. They are hence equivalent.
Please note that from SQL style point of view, using JOIN has been the preferred syntax for a long time now.
Consider below SQL.
SELECT DISTINCT bvc_Order.ID,
bvc_OrderItem.ProductID,
bvc_OrderItem_BundleItem.ProductID
FROM dbo.bvc_OrderItem WITH (nolock)
RIGHT OUTER JOIN dbo.bvc_Order WITH (nolock)
LEFT OUTER JOIN dbo.bvc_User WITH (nolock) ON dbo.bvc_Order.UserID = dbo.bvc_User.ID
LEFT OUTER JOIN dbo.Amazon_Merchants WITH (nolock) ON dbo.bvc_Order.CompanyID = dbo.Amazon_Merchants.ID ON dbo.bvc_OrderItem.OrderID = dbo.bvc_Order.ID
LEFT OUTER JOIN dbo.bvc_OrderItem_BundleItem WITH (nolock) ON dbo.bvc_OrderItem.ID = dbo.bvc_OrderItem_BundleItem.OrderItemID
LEFT OUTER JOIN dbo.bvc_Product WITH (nolock) ON dbo.bvc_OrderItem.ProductID = dbo.bvc_Product.ID
WHERE 1=1
AND (bvc_Order.StatusCode <> 1
AND bvc_Order.StatusCode <> 999)
AND ( bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00'))
AND bvc_Order.OrderSource = 56;
The query when I execute against my database, it returns 85 rows. Well, that is not correct.
If I just remove the part "AND bvc_Order.OrderSource = 56" it returns back 5 rows which is really correct.
Strange.....
Another thing, if I remove the part
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
it will also return the 5 rows as expected even with bvc_Order.OrderSource filter.
I am not sure why it is adding more rows while I am trying to reduce rows by using filters.
the table bvc_OrderItem_BundleItem doesn't contain any rows for the result order ids or OrderItemIDs
[edit]
Thanks guys, I tried to remove the LEFT/RIGHT Join Mix but Query manager doesn't allows only LEFT, it does add at least one RIGHT join. I updated the SQL to remove extra tables and now we have only three. But same result
SELECT DISTINCT dbo.bvc_Order.ID, dbo.bvc_OrderItem.ProductID, dbo.bvc_OrderItem_BundleItem.ProductID AS Expr1
FROM dbo.bvc_OrderItem
LEFT OUTER JOIN dbo.bvc_OrderItem_BundleItem ON dbo.bvc_OrderItem.ID = dbo.bvc_OrderItem_BundleItem.OrderItemId
RIGHT OUTER JOIN dbo.bvc_Order ON dbo.bvc_OrderItem.OrderID = dbo.bvc_Order.ID
WHERE 1=1
AND (bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999)
AND (
bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
)
AND bvc_Order.OrderSource = 56;
[edit]So far, there is no solution for this. I previously pasted a link in my comment with example data outout for both valid/invalid results with queries. here it is again.
http://sameers.me/SQLIssue.xlsx
One thing to remember here is that ALL left join is not possible. Let me explain further
bvc_Order contains main order record
bvc_ORderItem contains Order Items/Products
bvc_ORderItem_BundleItem contains child products of the product which are available in bvC_OrderItem table.
Now NOT Every product has child products, so bvc_OrderItem_BundleItem may not have any record (and in current scenario, there is really no valid row for the orders in bvC_OrderItem_BundleItem).
In short, in current scenario, there is NO matching row available in bvc_OrderItem_BundleItem table. If I remove that join for now, it is all okay, but in real world, I can't remove that BundleItem table join ofcourse.
thank you
When you say
WHERE bvc_Order.OrderSource = 56
that evaluates to false when bvc_Order.OrderSource is NULL. If the LEFT/RIGHT join failed then it will be NULL. This effectively turns the LEFT/RIGHT join into an inner join.
You probably should write the predicate into the ON clause. An alternative approach, which might not deliver the same results, is:
WHERE (bvc_Order.OrderSource IS NULL OR bvc_Order.OrderSource = 56)
The other predicates have the same problem:
Another thing, if I remove the part OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00') it will also return the 5 rows as expected
When the join fails bvc_OrderItem_BundleItem.ProductID is NULL.
I also would recommend writing queries manually. If I understand you right this query comes from a designer. It's structure is quite confusing. I'm pulling up the most important comment:
Mixing left and right outer joins in a query is just confusing. You should start by rewriting the from clause to only use one type (and I strongly recommend left outer join). – Gordon Linoff
When you have eliminated the impossible, whatever remains, however
improbable, must be the truth? S.H.
It is impossible that an extra AND condition appended to a WHERE clause can ever result in extra rows. That would imply a database engine defect, which I hope I can assume is "impossible". (If not, then I guess it's back to square one).
That fact makes it easier to concentrate on possible reasons:
When you comment out
AND bvc_Order.OrderSource = 56;
then you also comment out the semicolon terminator. Is it possible
that there is text following this query that is affecting it? Try
putting a semicolon at the end of the previous line to make sure.
Depending on the tool you are using to run queries, sometimes when a
query fails to execute, the tool mistakenly shows an old result set.
Make sure your query is executing correctly by adding a dummy column
to the SELECT statement to absolutely prove you are seeing live
results. Which tool are you using?
when you use LEFT outer join it will give all the rows from left table (dbo.bvc_OrderItem) once the your and, or conditions satisfies,
the same thing happens with Right outer join too,
Those conditions (Left join, right join ) may not restrict the rows since rows from one table can be all, another table with some rows only.
check with your join condition
Then check you condition :
(bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999)
if any rows satisfying this condition
next check with another condition
[bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')]
Then bvc_Order.OrderSource = 56
compare the result of three queries and check the data in with the conditions and then write your complete query, so that you will understand where the mistake you have done.
Few points to remember
1.And is applied during Virtual join phases
2.Where clause is applied after the final result
3.Left join followed by right join is effectively an inner join in some cases
Lets break your query step by step..
dbo.bvc_OrderItem a1
LEFT OUTER JOIN
dbo.bvc_OrderItem_BundleItem b1
Above output will be a single virtual table (logically) which contains all rows from b1 with matching rows from a1
now below predicates from your and clause will be applied
bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
which effectively eliminates all rows from bvc_OrderItem_BundleItem even if they have matches and gives result some thing like below if bvc_OrderItem_BundleItem.ProductID IN ('28046_00') is true
bvc_OrderItem bvc_OrderItem_BundleItem
28046 28046
null 1
null 2
null 3
if this condition(bvc_OrderItem.ProductID IN ('28046_00')) is true,then you are asking sql to ignore all rows in bvc_OrderItem ,which effectively means the same result set as above
bvc_OrderItem bvc_OrderItem_BundleItem other columns
28046 28046
null 1
null 2
null 3
next you are doing right outer join with dbo.bvc_Order which may qualifies for the join point I mentioned above
Assume ,you got below result set as output which preserves all of bvc_order table(rough output only for understanding due to lack of actual data)
bvc_OrderItem bvc_OrderItem_BundleItem statuscode ordersource
28046 28046 999 56
null 1 1 57
null 2 100 58
null 3 11 59
Next below AND predicates will be applied
status code <>1 and statuscode<> 999
which means ignore rows which match with bvc_order and has status of 1 ,999 even if they found matching rows
Next you are asking bvc_Order.OrderSource = 56; which means I don't care about other rows,preserve matching rows only for 56 and keep the rest as null
Hope this clarifies on what is happening step by step.A more better way can be provide some test data and show the expected output.
you also can control physical order of joins,you can try below to see if this is what you are trying to do..
SELECT DISTINCT dbo.bvc_Order.ID, dbo.bvc_OrderItem.ProductID, dbo.bvc_OrderItem_BundleItem.ProductID AS Expr1
dbo.bvc_OrderItem
LEFT OUTER JOIN
(
dbo.bvc_OrderItem_BundleItem
RIGHT OUTER JOIN
dbo.bvc_Order
ON dbo.bvc_OrderItem.OrderID = dbo.bvc_OrderItem_BundleItem.OrderItemId
)c
on
dbo.bvc_OrderItem.ID = c.bvc_OrderItem_BundleItem.OrderItemId
WHERE 1=1
AND (bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999)
AND (
bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
)
AND bvc_Order.OrderSource = 56;
It looks like you are using the Query Designer. I would avoid using this as this can make your queries extremely confusing. Your queries will be much more concise if you are designing them by hand. If you don't completely understand how inner/outer joins work, a great textbook that I used to teach myself SQL is Murach's SQL Server for Developers.
https://www.murach.com/shop/murach-s-sql-server-2012-for-developers-detail
Now, onto the answer.
I've been thinking about how to resolve your problem, and if you are trying to reduce the result set to 5 rows, why are you using multiple outer joins in the first place? I would consider switching the joins to inner joins instead of outer joins if you are looking for a very specific result set. I can't really provide you with a really comprehensive answer without looking at exactly what results you are trying to achieve, but here's a general idea based on what you've provided to all of us:
SELECT DISTINCT dbo.bvc_Order.ID, dbo.bvc_OrderItem.ProductID, dbo.bvc_OrderItem_BundleItem.ProductID AS 'bvc_OrderItem_BundleItem_ProductID'
FROM dbo.bvc_OrderItem
INNER JOIN dbo.bvc_OrderItem_BundleItem ON dbo.bvc_OrderItem.ID = dbo.bvc_OrderItem_BundleItem.OrderItemId
INNER JOIN dbo.bvc_Order ON dbo.bvc_OrderItem.OrderID = dbo.bvc_Order.ID
Start here and then based upon what you are searching for, add where clauses to filter criteria.
Also, your where clause must be rewritten if you use an inner join instead of an outer join:
WHERE 1=1 --not really sure why this is here. This will always be true. Omit this statement to avoid a bad result set.
AND (bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999) --this is saying, if the StatusCode is not equal to 1 and not equal to 999, don't include it.
--Revised: Look for Status codes with 1 or 999
--bvc_Order.StatusCode = 1 OR bvc_Order.StatusCode = 999
AND (bvc_OrderItem.ProductID IN ('28046_00') --I would eliminate this unless you are looking to see if this exists in Product ID. You could also accomplish this if you are trying to see if this value is in both tables, change this to:
bvc_OrderItem.ProductID = '28046_00' AND bvc_OrderItem_BundleItem.ProductID = '28046_00')
--if you are trying to see if the order source is 56, use this.
AND bvc_Order.OrderSource = 56;
If you are trying to find out rows that are not included in this result set, then I would use OUTER JOIN as necessary (LEFT preferred). Without more information about what you're looking for in your database, that's the best all of us can do.
bLike #usr writ, the reason of this unexpected (for you) result is, you build query with outer joins, and filter rows after join. If you need filter rows of outer joined tables, you should do this before join.
but probably you try build this:
SELECT DISTINCT o.ID, oi.ProductID, bi.ProductID AS Expr1
FROM dbo.bvc_Order as o
LEFT JOIN dbo.bvc_OrderItem as oi on oi.OrderID = o.ID
LEFT JOIN dbo.bvc_OrderItem_BundleItem as bi ON oi.ID = bi.OrderItemId
WHERE 1=1
AND o.OrderSource = 56;
AND o.StatusCode not in (1, 999)
AND '28046_00' in (oi.ProductID, isnull(bi.ProductID,'_') )
Is this query give results what you need?
if not, try change last condition, for example:
and (bi.ProductID = '28046_00' or bi.ProductID is null and oi.ProductID = '28046_00')
you can also put additional condition in to join conditions, for example:
SELECT DISTINCT o.ID, oi.ProductID, bi.ProductID AS Expr1
FROM dbo.bvc_Order as o
LEFT JOIN dbo.bvc_OrderItem as oi on oi.OrderID = o.ID
LEFT JOIN dbo.bvc_OrderItem_BundleItem as bi ON oi.ID = bi.OrderItemId
and bi.ProductID in ('28046_00') --this join BundleItem only if ...
WHERE 1=1
AND o.OrderSource = 56;
AND o.StatusCode not in (1, 999)
AND (oi.ProductID in ('28046_00') or bi.ProductID is not null)
ah, and if you always need join bvc_Order with bvc_OrderItem then use inner join
This question already has answers here:
What is the difference between "INNER JOIN" and "OUTER JOIN"?
(28 answers)
Closed 8 years ago.
I have about 6 months novice experience in SQL, TSQL, SSIS, ETL. As I find myself using JOIN statements more and more in my intern project I have been experimenting with the different JOIN statements. I wanted to confirm my findings. Are the following statements accurate pertaining to the conclusion of JOIN statements in SQL Server?:
1)I did a LEFT OUTER JOIN query and did the same query using JOIN which yielded the same results; are all JOIN statements LEFT OUTER associated in SQL Server?
2)I did a LEFT OUTER JOIN WHERE 2nd table PK (joined to) IS NOT NULL and did the same query using an INNER JOIN which yielded the same results; is it safe to say the the INNER JOIN statement will yield only matched records? and is the same as LEFT OUTER JOIN where joined records IS NOT NULL ?
The reason I'm asking is because I have been only using LEFT OUTER JOINS because that is what I was comfortable with. However, I want to eliminate as much code as possible when writing queries to be more efficient. I just wanted to make sure my observations are correct.
Also, are there any tips that you could provide on easily figuring out which JOIN statement is appropriate for specific queries? For instance, what JOIN would you use if you wanted to yield non-matching records?
Thanks.
A join or inner join (same thing) between table A and table B on, for instance, field1, would narrow in on all rows of table A and B sharing the same field1 value.
A left outer join between A and B, on field1, would show all rows of table A, and only those rows of table B that have a field1 existing in table A.
Where the rows of field1 on table A have a field1 value that doesn't exist in table B, the table B value would show null for field1, but the row of table A would be retained because it is an outer join. These are rows that wouldn't show up in a join which is an implied inner join.
If you get the same results doing a join between table A and table B as you do a left outer join between table A and B, then whatever fields you're joining on have values that exist in both tables. No value for any of the joined fields in A or B exist exclusively in A or B, they all exist in both A and B.
It is also possible you're putting criteria into the where clause that belongs in the on clause of the outer join, which may be causing your confusion. In my example above of tables A and B, where A is being left outer joined with B, you would put any criteria related to table B in the on clause, not the where clause, otherwise you would essentially be turning the outer join into an inner join. For example if you had b.field4 = 12 in the WHERE clause, and table B didn't have a match with A, it would be null and that criteria would fail, and it'd no longer come back even though you used a left outer join. That may be what you are referring to.
JOIN's are mapped to 'INNER JOIN' by default
I'm trying to write a bit of SQL for SQLITE that will take a subset from two tables (TableA and TableB) and then perform a LEFT JOIN.
This is what I've tried, but this produces the wrong result:
Select * from TableA
Left Join TableB using(key)
where TableA.key2 = "xxxx"
AND TableB.key3 = "yyyy"
This ignore cases where key2="xxxx" but key3 != "yyyy".
I want all the rows from TableA that match my criteria whether or not their corresponding value in TableB matches, but only those rows from TableB that match both conditions.
I did manage to solve this by using a VIEW, but I'm sure there must be a better way of doing this. It's just beginning to drive me insane tryng to solve it now.
(Thanks for any help, hope I've explained this well enough).
YOu have made the classic left join mistake. In most databases if you want a condition on the table on the right side of the left join you must put this condition in the join itself and NOT the where clause. IN SQL Server this would turn the left join into an inner join. I've not used SQl lite so I don't know if it does the same but all records must meet the where clause generally.
Select *
from TableA
Left Join TableB on TableA.key = TableB.key
and TableB.key3 = "yyyy"
where TableA.key2 = "xxxx"