sql left join explanation - sql

Found this code example online regarding sql left joins and I want to make sure i get it correctly ( since I am no expert )
SELECT table1.column1, table2.column2...
FROM table1
LEFT JOIN table2
ON table1.common_field = table2.common_field AND table1.common_field_2 = table2.common_field_2
WHERE table1.column3 = ... AND table2.common_field IS NULL
My question comes for the AND table2.common_field IS NULL part and how it affects the ON above.
For me it seems that join result will contain only those that they exist on table1, but not on table2 based on the common_field.
Is that correct? Can it be written simpler since the above seems confusing to me.

The first step in any SQL development is to check the data that is actually stored in the tables you intend to use in your query.
How the data is stored will affect the results of the query, particularly when filtering for NULLs or checking for the existence of a row.
Using EXISTS or NOT EXISTS to check for existence/non-existence of one or more rows is very effective, providing the WHERE clause within the EXISTS sub-query doesn't have conflicting logic (e.g. NOT EXISTS and <> are used together), which can be confusing and produce results that are difficult to test.
Does table2.common_field contain any NULLs? If it does, it would be wise to filter on those in a nested query, CTE or view first, then use the results of that in the main query.
If table2.common_field doesn't contain NULLs or has a NOT NULL constraint, then perhaps you are using table2.common_field IS NULL to filter on the results of the LEFT JOIN, where there is no match on the join criteria for table2. If this is the case and you want to stick with using LEFT JOIN, I recommend to nest your query and filter on the NULL in the outer query.
Here's a couple of options:
Option 1: Use LEFT JOIN, filter on NULL in the outer query.
Note the careful use of an alias for table2.common_field which is important.
SELECT
result.*
FROM
(
SELECT table1.column1, table2.column2, table2.common_field as table2_common_field...
FROM table1
LEFT JOIN table2
ON table1.common_field = table2.common_field AND table1.common_field_2 = table2.common_field_2
WHERE table1.column3 = ...
) result
WHERE result.table2_common_field IS NULL;
Option 2 (recommended): Use NOT EXISTS.
SELECT table1.column1, table2.column2...
FROM table1
WHERE NOT EXISTS (
select 1
from table2
where table2.common_field = table1.common_field
AND table2.common_field_2 = table1.common_field_2
)
AND table1.column3 = ...

Related

Joins with WHERE - splitting WHERE clauses

I solved the query at this link
Can you return a list of characters and TV shows that are not named "Willow Rosenberg" and not in the show "How I Met Your Mother"?
with the following code:
SELECT ch.name,sh.name
FROM character ch
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE ch.name != "Willow Rosenberg" AND sh.name !="How I Met Your Mother"
;
However, my first try was:
SELECT ch.name,sh.name
FROM character ch
WHERE ch.name != "Willow Rosenberg" /*This here*/
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE sh.name !="How I Met Your Mother"
;
because I thought that in this way only the table character would have been filtered before doing the joins and, therefore, it would have been less computationally heavy.
Does it make any sense?
Is there a way to "split" the WHERE clause when joining multiple tables?
Think of JOINs as a cross-product of two tables, which is filtered using the conditions specified in the ON clause. Your WHERE clause is then applied on the result set, and not on the individual tables participating in the join.
If you want to apply WHERE on only one of the joined tables, you'll have to use a sub-query. The filtered result of that sub-query will then be treated as a normal table and joined with a real table using JOIN again.
If you are doing this for performance, remember though that a join is almost always faster on standard JOINs compared to sub-queries, for properly indexed tables. You'll find that queries using JOIN will be orders of magnitude faster than the ones using sub-queries, except for rare cases.
You can using subqueries
SELECT ch.name,sh.name
FROM (
SELECT ch.name
FROM character ch
WHERE ch.name != "Willow Rosenberg") ch
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE sh.name !="How I Met Your Mother"
but i think it don't have sense. subqueries will make temp table.
First query will be optimized by database server, and likely select only rows from character table that need
JOIN and WHERE clauses are not necessarily executed in the order you write them. In general, the query optimizer will rearrange things to make them as efficient as possible (or at least what it thinks is most efficient), so adding a second WHERE clause wouldn't be any different from adding another AND condition (which is why it's not allowed).
Your idea wasn't bad, but it's just not how databases actually work.
A SELECT can only have 1 WHERE clause.
And it comes after the JOIN's.
But you can have additional WHERE clauses in the sub-queries you join.
And sometimes a criteria that you've added to a WHERE clause can be moved to the ON of a JOIN.
For example the queries below would return the same results
SELECT *
FROM Table1 AS t1
JOIN Table2 AS t2 ON t2.ID = t1.table2ID
WHERE t1.Col1 = 'foo'
AND t2.Col1 = 'bar'
SELECT *
FROM
(
SELECT *
FROM Table1
WHERE Col1 = 'foo'
) AS t1
JOIN Table2 AS t2 ON t2.ID = t1.table2ID
WHERE t2.Col1 = 'bar'
SELECT *
FROM Table1 AS t1
JOIN Table2 AS t2 ON (t2.ID = t1.table2ID AND t2.Col1 = 'bar')
WHERE t1.Col1 = 'foo'

SQL help, left join with filter

Please have a look at the SQL in the SQLFiddle link below.
http://sqlfiddle.com/#!9/22e094/4
My goal is to get all the records from Table1, and if SecId exists in Table2, join only if the status is 'Y'.
Result should be that it pulls from table 1: ID 1 and 2. And for ID 1, it successfully left joins Table2 and pulls 'Y'
As you can see in the fiddle, I tried 3 different ways but can't seem to get it.
It's got me stumped... Help would be awesome! :)
The left join, with just the join condition in the ON clause, is fine.
But then, you said you want to keep the rows where status is 'Y' only when the secid exists in table2. So this means you also want to keep the rows where the secid from table2 is NULL.
select * won't fly, because you have columns by the same name, secid, in both tables. You must distinguish them (by giving them aliases - or at least one of them; I gave aliases to both of them) if you need to reference them in the where clause, or anywhere else in the query, to break the ambiguity. And, since the aliases can only be given in the SELECT clause, which is evaluated after the WHERE clause, you need to do the left join in a subquery.
select id, secid_a, secid_b, status
from (
select a.id, a.secid as secid_a, b.secid as secid_b, b.status
from table1 a left join table2 b
on a.secid = b.secid
)
where status = 'Y' or secid_b is null;

Is there a logical difference between putting a condition in the ON clause of an inner join versus the where clause of the main query?

Consider these two similar SQLs
(condition in ON clause)
select t1.field1, t2.field1
from
table1 t1 inner join table2 t2 on t1.id = t2.id and t1.boolfield = 1
(condition in WHERE clause)
select t1.field1, t2.field1
from
table1 t1 inner join table2 t2 on t1.id = t2.id
where t1.boolfield = 1
I have tested this out a bit and I can see the difference between putting a condition in the two different places for an outer join.
But in the case of an inner join can the result sets ever be different?
For INNER JOIN, there is no effective difference, although I think the second option is cleaner.
For LEFT JOIN, there is a huge difference. The ON clause specifies which records will be selected from the tables for comparison and the WHERE clause filters the results.
Example 1: returns all the rows from tbl 1 and matches them up with appropriate rows from tbl2 that have boolfield=1
Select *
From tbl1
LEFT JOIN tbl2 on tbl1.id=tbl2.id and tbl2.boolfield=1
Example 2: will only include rows from tbl1 that have a matching row in tbl2 with boolfield=1. It joins the tables, and then filters out the rows that don't meet the condition.
Select *
From tbl1
LEFT JOIN tbl2 on tbl1.id=tbl2.id
WHERE tbl2.boolfield=1
In your specific case, the t1.boolfield specifies an additional selection condition, not a condition for matching records between the two tables, so the second example is more correct.
If you're speaking about the cases when a condition for matching records is put in the ON clause vs. in the WHERE clause, see this question.
Both versions return the same data.
Although this is true for an inner join, it is not true for outer joins.
Stylistically, there is a third possibility. In addition to your two, there is also:
select t1.field1, t2.field1
from (select t1.*
from table1 t1
where t1.boolfield = 1
) t1 inner join
table2 t2
on t1.id = t2.id
Which is preferable all depends on what you want to highlight, so you (or someone else) can later understand and modify the query. I often prefer the third version, because it emphasizes that the query is only using certain rows from the table -- the boolean condition is very close to where the table is specified.
In the other two cases, if you have a long query, it can be problematic to figure out what "t1" really means. I think this is why some people prefer to put the condition in the ON clause. Others prefer the WHERE clause.

Left Outer join and an additional where clause

I have a join on two tables defined as a left outer join so that all records are returned from the left hand table even if they don't have a record in the right hand table. However I also need to include a where clause on a field from the right-hand table, but.... I still want a row from the left-hand table to be returned for each record in the left-hand table even if the condition in the where clause isn't met. Is there a way of doing this?
Yes, put the condition (called a predicate) in the join conditions
Select [stuff]
From TableA a
Left Join TableB b
On b.Pk = a.Pk
-- [Put your condition here, like this]
And b.Column = somevalue
The reason this works is because the query processor applies conditions in a where clause after all joins are completed, and the final result set has been constructed. So, at that point, a column from the a table on the outer side of a join that has null in a a column you have established a predicate on will be excluded.
Predicates in a join clause are applied before the two result sets are "joined". At this point all the rows on both sides of the join are still there, so the predicate is effective.
You just need to put the predicate into the JOIN condition. Putting it into the WHERE clause would effectively convert your query to an inner join.
For Example:
...
From a
Left Join b on a.id = b.id and b.condition = 'x'
You can use
WHERE (right_table.column=value OR right_table.column IS NULL)
This will return all rows from table 1 and table 2, but only where table 1 does not have a corresponding row in table 2 or the corresponding row in table 2 matches your criteria.
SELECT x.fieldA, y.fieldB
FROM x
LEFT OUTER JOIN (select fieldb, fieldc from Y where condition = some_condition)
ON x.fieldc = y.fieldc
select *
from table1 t1
left outer join table2 t2 on t1.id = t2.id
where t1.some_field = nvl(t2.some_field, t1.some_field)
UPD: errr... no. this way:
select *
from table1 t1
left outer join table2 t2 on t1.id = t2.id
where some_required_value = nvl(t2.some_field, some_required_value)
nvl is an Oracle syntax which replaces first argument with second in case it is null (which is common for outer joins). You can use ifnull or coalesce for other databases.
Thus, you compare t2.some_field with your search criteria if it has met join predicate, but if it has not, then you just return row from table1, because some_required_value compared to itself will always be true (unless it is null, however - null = null yields null, neither true not false.

Getting distinct rows from a left outer join

I am building an application which dynamically generates sql to search for rows of a particular Table (this is the main domain class, like an Employee).
There are three tables Table1, Table2 and Table1Table2Map.
Table1 has a many to many relationship with Table2, and is mapped through Table1Table2Map table. But since Table1 is my main table the relationship is virtually like a one to many.
My app generates a sql which basically gives a result set containing rows from all these tables. The select clause and joins dont change whereas the where clause is generated based on user interaction. In any case I dont want duplicate rows of Table1 in my result set as it is the main table for result display. Right now the query that is getting generated is like this:
select distinct Table1.Id as Id, Table1.Name, Table2.Description from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id)
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)
For simplicity I have excluded the where clause. The problem is when there are multiple rows in Table2 for Table1 even though I have said distinct of Table1.Id the result set has duplicate rows of Table1 as it has to select all the matching rows in Table2.
To elaborate more, consider that for a row in Table1 with Id = 1 there are two rows in Table1Table2Map (1, 1) and (1, 2) mapping Table1 to two rows in Table2 with ids 1, 2. The above mentioned query returns duplicate rows for this case. Now I want the query to return Table1 row with Id 1 only once. This is because there is only one row in Table2 that is like an active value for the corresponding entry in Table1 (this information is in Mapping table).
Is there a way I can avoid getting duplicate rows of Table1.
I think there is some basic problem in the way I am trying to solve the problem, but I am not able to find out what it is. Thanks in advance.
Try:
left outer join (select distinct YOUR_COLUMNS_HERE ...) SUBQUERY_ALIAS on ...
In other words, don't join directly against the table, join against a sub-query that limits the rows you join against.
You can use GROUP BY on Table1.Id ,and that will get rid off the extra rows. You wouldn't need to worry about any mechanics on join side.
I came up with this solution in a huge query and it this solution didnt effect the query time much.
NOTE : I'm answering this question 3 years after its been asked but this may help someone i believe.
You can re-write your left joins to be outer applies, so that you can use a top 1 and an order by as follows:
select Table1.Id as Id, Table1.Name, Table2.Description
from Table1
outer apply (
select top 1 *
from Table1Table2Map
where (Table1Table2Map.Table1Id = Table1.Id) and Table1Table2Map.IsActive = 1
order by somethingCol
) t1t2
outer apply (
select top 1 *
from Table2
where (Table2.Id = Table1Table2Map.Table2Id)
) t2;
Note that an outer apply without a "top" or an "order by" is exactly equivalent to a left outer join, it just gives you a little more control. (cross apply is equivalent to an inner join).
You can also do something similar using the row_number() function:
select * from (
select distinct Table1.Id as Id, Table1.Name, Table2.Description,
rowNum = row_number() over ( partition by table1.id order by something )
from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id)
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)
) x
where rowNum = 1;
Most of this doesn't apply if the IsActive flag can narrow down your other tables to one row, but they might come in useful for you.
To elaborate on one point: you said that there is only one "active" row in Table2 per row in Table1. Is that row not marked as active such that you could put it in the where clause? Or is there some magic in the dynamic conditions supplied by the user that determines what's active and what isn't.
If you don't need to select anything from Table2 the solution is relatively simply in that you can use the EXISTS function but since you've put TAble2.Description in the clause I'll assume that's not the case.
Basically what separates the relevant rows in Table2 from the irrelevant ones? Is it an active flag or a dynamic condition? The first row? That's really how you should be removing duplicates.
DISTINCT clauses tend to be overused. That may not be the case here but it sounds like it's possible that you're trying to hack out the results you want with DISTINCT rather than solving the real problem, which is a fairly common problem.
You have to include activity clause into your join (and no need for distinct):
select Table1.Id as Id, Table1.Name, Table2.Description from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id) and Table1Table2Map.IsActive = 1
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)
If you want to display multiple rows from table2 you will have duplicate data from table1 displayed. If you wanted to you could use an aggregate function (IE Max, Min) on table2, this would eliminate the duplicate rows from table1, but would also hide some of the data from table2.
See also my answer on question #70161 for additional explanation