AND Statements in JOIN vs WHERE - sql

I am trying to understand why the following happens (example code below).
-- Returns 1,000
SELECT COUNT(*)
FROM TABLE_ONE t1
WHERE t1.FIELD_ONE = 'Hello';
-- Returns 1,000
SELECT COUNT(*)
FROM TABLE_ONE t1
LEFT OUTER JOIN TABLE_TWO t2 ON t2.TABLE_ONE_ID = t1.ID
WHERE t1.FIELD_ONE = 'Hello'
AND t2.FIELD_TWO = 'Goodbye';
-- Returns 83,500
SELECT COUNT(*)
FROM TABLE_ONE t1
LEFT OUTER JOIN TABLE_TWO t2 ON (t2.TABLE_ONE_ID = t1.ID AND t2.FIELD_TWO = 'Goodbye')
WHERE t1.FIELD_ONE = 'Hello';
I understand that left outer joins will always include the left value even if a right value is not found, etc. However, I thought that adding additional conditions to JOIN clauses would restrict what is done in the JOIN.
So for example if I had entry A from TABLE_ONE, then it would look for a TABLE_TWO value that met the 2 conditions. If one wasn't found, it would just be blank. But that entry A would only appear once in the result set. So I am confused why I am getting so many more results then what is actually in the original TABLE_ONE query.
UPDATE
To explain more what I am trying to understand. The resulting query (not using the COUNT) results in something like this.
Table_One_Name Table_Two_Value
+++++++++++++++++++++++++++++++++++
Entry A Goodbye
Entry A
Entry A
Entry A
Entry B Goodbye
Entry B
I don't understand why the JOIN is adding rows when the 2 conditions are not met.

Where clause
filters the records, first case has an additional condition AND t2.FIELD_TWO = 'Goodbye' which filters more records(about 73,500 records)
-- Returns 1,000
SELECT COUNT(*)
FROM TABLE_ONE t1
LEFT OUTER JOIN TABLE_TWO t2 ON t2.TABLE_ONE_ID = t1.ID
WHERE t1.FIELD_ONE = 'Hello'
AND t2.FIELD_TWO = 'Goodbye';
Left outer join brings all the records even it does not have a match
In your second case AND t2.FIELD_TWO = 'Goodbye this is part of the left outer join
so it brings matching and non matching records

The WHERE condition is applied after the JOIN condition, and hence removes all the records containing NULL in t2.FIELD_TWO, thus effectively denying the purpose of the "OUTER" part of the JOIN.

Related

Joins with WHERE - splitting WHERE clauses

I solved the query at this link
Can you return a list of characters and TV shows that are not named "Willow Rosenberg" and not in the show "How I Met Your Mother"?
with the following code:
SELECT ch.name,sh.name
FROM character ch
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE ch.name != "Willow Rosenberg" AND sh.name !="How I Met Your Mother"
;
However, my first try was:
SELECT ch.name,sh.name
FROM character ch
WHERE ch.name != "Willow Rosenberg" /*This here*/
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE sh.name !="How I Met Your Mother"
;
because I thought that in this way only the table character would have been filtered before doing the joins and, therefore, it would have been less computationally heavy.
Does it make any sense?
Is there a way to "split" the WHERE clause when joining multiple tables?
Think of JOINs as a cross-product of two tables, which is filtered using the conditions specified in the ON clause. Your WHERE clause is then applied on the result set, and not on the individual tables participating in the join.
If you want to apply WHERE on only one of the joined tables, you'll have to use a sub-query. The filtered result of that sub-query will then be treated as a normal table and joined with a real table using JOIN again.
If you are doing this for performance, remember though that a join is almost always faster on standard JOINs compared to sub-queries, for properly indexed tables. You'll find that queries using JOIN will be orders of magnitude faster than the ones using sub-queries, except for rare cases.
You can using subqueries
SELECT ch.name,sh.name
FROM (
SELECT ch.name
FROM character ch
WHERE ch.name != "Willow Rosenberg") ch
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE sh.name !="How I Met Your Mother"
but i think it don't have sense. subqueries will make temp table.
First query will be optimized by database server, and likely select only rows from character table that need
JOIN and WHERE clauses are not necessarily executed in the order you write them. In general, the query optimizer will rearrange things to make them as efficient as possible (or at least what it thinks is most efficient), so adding a second WHERE clause wouldn't be any different from adding another AND condition (which is why it's not allowed).
Your idea wasn't bad, but it's just not how databases actually work.
A SELECT can only have 1 WHERE clause.
And it comes after the JOIN's.
But you can have additional WHERE clauses in the sub-queries you join.
And sometimes a criteria that you've added to a WHERE clause can be moved to the ON of a JOIN.
For example the queries below would return the same results
SELECT *
FROM Table1 AS t1
JOIN Table2 AS t2 ON t2.ID = t1.table2ID
WHERE t1.Col1 = 'foo'
AND t2.Col1 = 'bar'
SELECT *
FROM
(
SELECT *
FROM Table1
WHERE Col1 = 'foo'
) AS t1
JOIN Table2 AS t2 ON t2.ID = t1.table2ID
WHERE t2.Col1 = 'bar'
SELECT *
FROM Table1 AS t1
JOIN Table2 AS t2 ON (t2.ID = t1.table2ID AND t2.Col1 = 'bar')
WHERE t1.Col1 = 'foo'

SQL Different between Left join on... and Left Join on..where

I have two sql to join two table together:
select top 100 a.XXX
,a.YYY
,a.ZZZ
,b.GGG
,b.JJJ
from table_01 a
left join table_02 b
on a.XXX = b.GGG
and b.JJJ = "abc"
and a.YYY between '01/08/2009 13:18:00' and '12/08/2009 13:18:00'
select top 100 a.XXX
,a.YYY
,a.ZZZ
,b.GGG
,b.JJJ
from table_01 a
left join table_02 b
on a.XXX = b.GGG
where b.JJJ = "abc"
and a.YYY between '01/08/2009 13:18:00' and '12/08/2009 13:18:00'
The outcome of them is different but I don't understand the reason why.
I would be grateful if I can get some help here.
Whenever you are using LEFT JOIN, all the conditions about the content of the right table should be in the ON clause, otherwise you are effectively converting your LEFT JOIN to an INNER JOIN.
The reason for that is that when a LEFT JOIN is used, all the rows from the left table will be returned. If they are matched by the right table, the values of the matching row(s) will be returned as well, but if they are not matched with any row on the right table, then the right table will return a row of null values.
Since you can't compare anything to NULL (not even another NULL) (Read this answer to find out why), you are basically telling your database to return all rows that are matched in both tables.
However, when the condition is in the ON clause, Your database knows to treat it as a part of the join condition.

Is there a logical difference between putting a condition in the ON clause of an inner join versus the where clause of the main query?

Consider these two similar SQLs
(condition in ON clause)
select t1.field1, t2.field1
from
table1 t1 inner join table2 t2 on t1.id = t2.id and t1.boolfield = 1
(condition in WHERE clause)
select t1.field1, t2.field1
from
table1 t1 inner join table2 t2 on t1.id = t2.id
where t1.boolfield = 1
I have tested this out a bit and I can see the difference between putting a condition in the two different places for an outer join.
But in the case of an inner join can the result sets ever be different?
For INNER JOIN, there is no effective difference, although I think the second option is cleaner.
For LEFT JOIN, there is a huge difference. The ON clause specifies which records will be selected from the tables for comparison and the WHERE clause filters the results.
Example 1: returns all the rows from tbl 1 and matches them up with appropriate rows from tbl2 that have boolfield=1
Select *
From tbl1
LEFT JOIN tbl2 on tbl1.id=tbl2.id and tbl2.boolfield=1
Example 2: will only include rows from tbl1 that have a matching row in tbl2 with boolfield=1. It joins the tables, and then filters out the rows that don't meet the condition.
Select *
From tbl1
LEFT JOIN tbl2 on tbl1.id=tbl2.id
WHERE tbl2.boolfield=1
In your specific case, the t1.boolfield specifies an additional selection condition, not a condition for matching records between the two tables, so the second example is more correct.
If you're speaking about the cases when a condition for matching records is put in the ON clause vs. in the WHERE clause, see this question.
Both versions return the same data.
Although this is true for an inner join, it is not true for outer joins.
Stylistically, there is a third possibility. In addition to your two, there is also:
select t1.field1, t2.field1
from (select t1.*
from table1 t1
where t1.boolfield = 1
) t1 inner join
table2 t2
on t1.id = t2.id
Which is preferable all depends on what you want to highlight, so you (or someone else) can later understand and modify the query. I often prefer the third version, because it emphasizes that the query is only using certain rows from the table -- the boolean condition is very close to where the table is specified.
In the other two cases, if you have a long query, it can be problematic to figure out what "t1" really means. I think this is why some people prefer to put the condition in the ON clause. Others prefer the WHERE clause.

Left Outer join and an additional where clause

I have a join on two tables defined as a left outer join so that all records are returned from the left hand table even if they don't have a record in the right hand table. However I also need to include a where clause on a field from the right-hand table, but.... I still want a row from the left-hand table to be returned for each record in the left-hand table even if the condition in the where clause isn't met. Is there a way of doing this?
Yes, put the condition (called a predicate) in the join conditions
Select [stuff]
From TableA a
Left Join TableB b
On b.Pk = a.Pk
-- [Put your condition here, like this]
And b.Column = somevalue
The reason this works is because the query processor applies conditions in a where clause after all joins are completed, and the final result set has been constructed. So, at that point, a column from the a table on the outer side of a join that has null in a a column you have established a predicate on will be excluded.
Predicates in a join clause are applied before the two result sets are "joined". At this point all the rows on both sides of the join are still there, so the predicate is effective.
You just need to put the predicate into the JOIN condition. Putting it into the WHERE clause would effectively convert your query to an inner join.
For Example:
...
From a
Left Join b on a.id = b.id and b.condition = 'x'
You can use
WHERE (right_table.column=value OR right_table.column IS NULL)
This will return all rows from table 1 and table 2, but only where table 1 does not have a corresponding row in table 2 or the corresponding row in table 2 matches your criteria.
SELECT x.fieldA, y.fieldB
FROM x
LEFT OUTER JOIN (select fieldb, fieldc from Y where condition = some_condition)
ON x.fieldc = y.fieldc
select *
from table1 t1
left outer join table2 t2 on t1.id = t2.id
where t1.some_field = nvl(t2.some_field, t1.some_field)
UPD: errr... no. this way:
select *
from table1 t1
left outer join table2 t2 on t1.id = t2.id
where some_required_value = nvl(t2.some_field, some_required_value)
nvl is an Oracle syntax which replaces first argument with second in case it is null (which is common for outer joins). You can use ifnull or coalesce for other databases.
Thus, you compare t2.some_field with your search criteria if it has met join predicate, but if it has not, then you just return row from table1, because some_required_value compared to itself will always be true (unless it is null, however - null = null yields null, neither true not false.

Outer Join with Where returning Nulls

Hi I have 2 tables. I want to list
all records in table1 which are present in
table2
all records in table2 which are not present in table1 with a where condition
Null rows will be returned by table1 in second condition but I am unable to get the query working correctly. It is only returning null rows
SELECT
A.CLMSRNO,A.CLMPLANO,A.GENCURRCODE,A.CLMNETLOSSAMT,
A.CLMLOSSAMT,A.CLMCLAIMPRCLLOSSSHARE
FROM
PAKRE.CLMCLMENTRY A
RIGHT OUTER JOIN (
SELECT
B.CLMSRNO,B.UWADVICETYPE,B.UWADVICENO,B.UWADVPREMCURRCODE,
B.GENSUBBUSICLASS,B.UWADVICENET,B.UWADVICEKIND,B.UWADVYEAR,
B.UWADVQTR,B.ISMANUAL,B.UWCLMNOREFNO
FROM
PAKRE.UWADVICE B
WHERE
B.ISMANUAL=1
) r
ON a.CLMSRNO=r.CLMSRNO
ORDER BY
A.CLMSRNO DESC;
Which OS are you using ?
Table aliases are case sensistive on some platforms, which is why your join condition ON a.CLMSRNO=r.CLMSRNO fails.
Try with A.CLMSRNO=r.CLMSRNO and see if that works
I'm not understanding your first attempt, but here's basically what you need, I think:
SELECT *
FROM TABLE1
INNER JOIN TABLE2
ON joincondition
UNION ALL
SELECT *
FROM TABLE2
LEFT JOIN TABLE1
ON joincondition
AND TABLE1.wherecondition
WHERE TABLE1.somejoincolumn IS NULL
I think you may want to remove the subquery and put its columns into the main query e.g.
SELECT A.CLMSRNO, A.CLMPLANO, A.GENCURRCODE, A.CLMNETLOSSAMT,
A.CLMLOSSAMT, A.CLMCLAIMPRCLLOSSSHARE,
B.CLMSRNO, B.UWADVICETYPE, B.UWADVICENO, B.UWADVPREMCURRCODE,
B.GENSUBBUSICLASS, B.UWADVICENET, B.UWADVICEKIND, B.UWADVYEAR,
B.UWADVQTR, B.ISMANUAL, B.UWCLMNOREFNO
FROM PAKRE.CLMCLMENTRY A
RIGHT OUTER JOIN PAKRE.UWADVICE B
ON A.CLMSRNO = B.CLMSRNO
WHERE B.ISMANUAL = 1
ORDER
BY A.CLMSRNO DESC;