SQL filter LEFT TABLE before left join - sql

I have read a number of posts from SO and I understand the differences between filtering in the where clause and on clause. But most of those examples are filtering on the RIGHT table (when using left join). If I have a query such as below:
select * from tableA A left join tableB B on A.ID = B.ID and A.ID = 20
The return values are not what I expected. I would have thought it first filters the left table and fetches only rows with ID = 20 and then do a left join with tableB.
Of course, this should be technically the same as doing:
select * from tableA A left join table B on A.ID = B.ID where A.ID = 20
But I thought the performance would be better if you could filter the table before doing a join. Can someone enlighten me on how this SQL is processed and help me understand this thoroughly.

A left join follows a simple rule. It keeps all the rows in the first table. The values of columns depend on the on clause. If there is no match, then the corresponding table's columns are NULL -- whether the first or second table.
So, for this query:
select *
from tableA A left join
tableB B
on A.ID = B.ID and A.ID = 20;
All the rows in A are in the result set, regardless of whether or not there is a match. When the id is not 20, then the rows and columns are still taken from A. However, the condition is false so the columns in B are NULL. This is a simple rule. It does not depend on whether the conditions are on the first table or the second table.
For this query:
select *
from tableA A left join
tableB B
on A.ID = B.ID
where A.ID = 20;
The from clause keeps all the rows in A. But then the where clause has its effect. And it filters the rows so on only id 20s are in the result set.
When using a left join:
Filter conditions on the first table go in the where clause.
Filter conditions on subsequent tables go in the on clause.

Where you have from tablea, you could put a subquery like from (select x.* from tablea X where x.value=20) TA
Then refer to TA like you did tablea previously.
Likely the query optimizer would do this for you.
Oracle should have a way to show the query plan. Put "Explain plan" before the sql statement. Look at the plan both ways and see what it does.

In your first SQL statement, A.ID=20 is not being joined to anything technically. Joins are used to connect two separate tables together, with the ON statement joining columns by associating them as keys.
WHERE statements allow the filtering of data by reducing the number of rows returned only where that value can be found under that particular column.

Related

Can the order of Inner Joins Change the results o a query

I have the following scenario on a SQL Server 2008 R2:
The following queries returns :
select * from TableA where ID = '123'; -- 1 rows
select * from TableB where ID = '123'; -- 5 rows
select * from TableC where ID = '123'; -- 0 rows
When joining these tables the following way, it returns 1 row
SELECT A.ID
FROM TableA A
INNER JOIN ( SELECT DISTINCT ID
FROM TableB ) AS D
ON D.ID = A.ID
INNER JOIN TableC C
ON A.ID = C.ID
ORDER BY A.ID
But, when switching the inner joins order it does not returns any row
SELECT A.ID
FROM TableA A
INNER JOIN TableC C
ON A.ID = C.ID
INNER JOIN ( SELECT DISTINCT ID
FROM TableB ) AS D
ON D.ID = A.ID
ORDER BY A.ID
Can this be possible?
Print Screen:
For inner joins, the order of the join operations does not affect the query (it can affect the ordering of the rows and columns, but the same data is returned).
In this case, the result set is a subset of the Cartesian product of all the tables. The ordering doesn't matter.
The order can and does matter for outer joins.
In your case, one of the tables is empty. So, the Cartesian product is empty and the result set is empty. It is that simple.
As Gordon mentioned, for inner joins the order of joins doesn't matter, whereas it does matter when there's at least one outer join involved; however, in your case, none of this is pertinent as you are inner joining 3 tables, one of which will return zero rows - hence all combinations will result in zero rows.
You cannot reproduce the erratic behavior with the queries as they are shown in this question since they will always return zero records. You can try it again on your end to see what you come up with, and if you do find a difference, please share it with us then.
For the future, whenever you have something like this, creating some dummy data either in the form of insert statements or in rextester or the like, you make it that much easier for someone to help you.
Best of luck.

sql left join returns

I am trying to run a left join on 2 tables. I do not have a group by and the only where condition i have is on the second table. But, the returned rows are less than the first table. isn't the left join suppose to bring all the data from the first table?
Here is my SQL:
select *
from tbl_a A left join tbl_b B
ON
A.Cnumber=B.Cnumber
and A.CDNUmber=B.CDNumber
and abs(A.duration - B.Duration)<2
and substr(A.text,1,3)||substr(A.text,5,8)||substr(A.text,9,2)=substr(B.text,1,8)
where B.fixed = 'b580'
There are 140,000 records in table A but the result returned is less than 100,000 records. What is the problem and how can I solve it?
As soon as you put a condition in the WHERE clause that references the right table and doesn't accommodate the NULLs that will be produced when the join is unsuccessful, you've transformed it (effectively) back into an INNER JOIN.
Try:
where B.fixed = 'b580' OR B.fixed IS NULL
Or add this condition to the ON clause for the JOIN.
You should add the where clause to the join:
select *
from tbl_a A left join tbl_b B
ON
A.Cnumber=B.Cnumber
and A.CDNUmber=B.CDNumber
and abs(A.duration - B.Duration)<2
and substr(A.text,1,3)||substr(A.text,5,8)||substr(A.text,9,2)=substr(B.text,1,8)
and B.fixed = 'b580'
If you use where statemen all records where b is not existing will not returned.

Having problems with SQL Joins

Table A
Table B
I tried to use LEFT OUTER JOIN but it seems not working..
I want the query to extract all data from Table A with 0 as average score if there is no data yet for the specified parameter. Meaning, in Figure 3, it should have shown ID 2 with 0 on s. Can anyone help me figure out the solution?
You have the table names switched in the join. To keep all of Table A then it needs to be the table listed on the left side of the left join. Also anything that you want to only affect the output of table B, and not filter the entire results, should be moved to the left join on clause. Should be:
SELECT a.id,
Avg(Isnull(b.score, 0)) AS s
FROM a
LEFT OUTER JOIN b
ON a.id = b.id
AND b.kind = 'X'
GROUP BY a.id

What means "table A left outer join table B ON TRUE"?

I know conditions are used in table joining. But I met a specific situation and the SQL codes writes like "Table A join table B ON TRUE"
What will happen based on the "ON TRUE" condition? Is that just a total cross join without any condition selection?
Actually, the original expression is like:
Table A LEFT outer join table B on TRUE
Let's say A has m rows and B has n rows. Is there any conflict between "left outer join" and "on true"? Because it seems "on true" results a cross join.
From what I guess, the result will be m*n rows. So, it has no need to write "left outer join", just a "join" will give the same output, right?
Yes. That's the same thing as a CROSS JOIN.
In MySQL, we can omit the [optional] CROSS keyword. We can also omit the ON clause.
The condition in the ON clause is evaluated as a boolean, so we could also jave written something like ON 1=1.
UPDATE:
(The question was edited, to add another question about a LEFT [OUTER] JOIN b which is different than the original construct: a JOIN b)
The "LEFT [OUTER] JOIN" is slightly different, in that rows from the table on the left side will be returned even when there are no matching rows found in the table on the right side.
As noted, a CROSS JOIN between tables a (containing m rows) and table b containing n rows, absent any other predicates, will produce a resultset of m x n rows.
The LEFT [OUTER] JOIN will produce a different resultset in the special case where table b contains 0 rows.
CREATE TABLE a (i INT);
CREATE TABLE b (i INT);
INSERT INTO a VALUES (1),(2),(3);
SELECT a.i, b.i FROM a LEFT JOIN b ON TRUE ;
Note that the LEFT JOIN will returns rows from table a (a total of m rows) even when table b contains 0 rows.
A cross join produces a cartesian product between the two tables, returning all possible combinations of all rows. It has no on clause because you're just joining everything to everything.
Cross join does not combine the rows, if you have 100 rows in each table with 1 to 1 match, you get 10.000 results, Innerjoin will only return 100 rows in the same situation.
These 2 examples will return the same result:
Cross join
select * from table1 cross join table2 where table1.id = table2.fk_id
Inner join
select * from table1 join table2 on table1.id = table2.fk_id
Use the last method
The join syntax's general form:
SELECT *
FROM table_a
JOIN table_b ON condition
The condition is used to tell the database how to match rows from table_a to table_b, and would usually look like table_a.some_id = table_b.some_id.
If you just specify true, you will match every row from table_a with every row of table_b, so if table_a contains n rows and table_b contains m rows the result would have m*n rows.
Most(?) modern databases have a cleaner syntax for this, though:
SELECT *
FROM table_a
CROSS JOIN table_b
The difference between the pure cross join and left join (where the condition is forced to be always true, as when using ON TRUE) is that the result set for the left join will also have rows where the left table's rows appear next to a bunch of NULLs where the right table's columns would have been.

SQL Understanding Outer Joins with Criteria and Why the Order of Criteria can alter Results

So I know how some things work in SQL but I don't know why and I'm unable to find a nice layman description of this online. For reference I'm using Oracle 11g and TOAD.
Question 1 - Outer Joins with Criteria
I know that if you place criteria on an outer joined table, you turn the query into an inner join regardless of your Syntax. So this query acts as an inner join:
SELECT a.field1, b.field1
FROM tableA a
LEFT JOIN tableB b on a.key = b.key
WHERE b.field2 = 'someCriteria'
The way to get around this is to include an "OR IS NULL" in the second table's criteria. I know this to be true but I've never been able to wrap my head around why this is. Can someone explain why criteria on an outer table turns an outer join into an inner join?
Question 2 - Adding Criteria to Different Clauses Changes Results
So the above being true, I've been struggling with how the order of my criteria can alter the results of the following two queries. I have two tables - tableA and tableB - and need to do a left join compare of a subset of tableA to a subset of tableB.
SQL1
SELECT DISTINCT a.field1, b.field2
FROM tableA a
LEFT JOIN tableB b
on a.key = b.key
AND (b.field2 = 'somecriteria' or b.field2 IS NULL)
WHERE a.field1 = 'othercriteria'
Results: SQL1 gives me the correct left joined results.
SQL2
SELECT DISTINCT a.field1, b.field2
FROM tableA a
LEFT JOIN tableB b
on a.key = b.key
WHERE a.field1 = 'othercriteria'
AND (b.field2 = 'somecriteria' or b.field2 IS NULL)
Results: SQL2 pulls back only an inner joined result of the two subsets (excluding those rows in tableA where tableB does not have a match).
Understanding The Results
The reason for this has something to do with the order in which the join and the where runs. I could understand this changing performance but I'm not following why it would change the results as the syntax is almost identical. To wrap my head around it, I ran the execution plans for both queries and got the following results (from TOAD):
Execution Plans:
SQL 1:
5) Select Statement (Rows were returned by the Select statement)
4) Sort Unique (The rows from step 3 were sorted to eliminate duplicate rows)
3) Hash Join Right Outer (The result sets from steps 1, 2 were joined (hash)
1) Table Access Full TABLE tableB (Every row in the table tableB is read)
2) Table Access Full TABLE tableA (Every row in the table tableA is read)
SQL 2:
11) Select Statement (Rows were returned by the Select statement)
10) Sort Unique (The rows from step 9 were sorted to eliminate duplicate rows)
9) All distinct rows from steps 4, 8 were returned
4) Filter (For the rows returned by step 3, filter out rows depending on filter criteria)
3) Hash Join Right Outer (The result sets from steps 1, 2 were joined (hash)
1) Table Access Full TABLE tableB (Every row in the table tableB is read
2) Table Access Full TABLE tableA (Every row in the table tableA is read)
8) Filter (For the rows returned by step 7, filter out rows depending on filter criteria)
7) Hash Join Right Outer (The result sets from steps 5, 6 were joined (hash)
5) Table Access Full TABLE tableB (Every row in the table tableB is read)
6) Table Access Full TABLE tableA (Every row in the table tableA is read)
So I have no doubt that the above execution plans explain perfectly why SQL 2 gives me different results than SQL 1 but I'm having a hard time reading these plans. Can someone help me translate these execution plans and explain why SQL2 is treated as an Inner Join because the tableA criteria is listed in the WHERE clause instead of the JOIN?
Thanks in advance!!
For question 1, consider table a containing two rows:
key field1
1 a
2 b
And table b containing 3 rows:
key field1 field2
1 c someCriteria
1 d notSomeCriteria
1 e NULL
your FROM clause (with its JOINs) effectively generates a result set that looks like this:
(a)key (a)field1 (b)key (b)field1 (b)field2
1 a 1 c someCriteria
1 a 1 d notSomeCriteria
1 a 1 e NULL
2 b NULL NULL NULL
By the time the WHERE clause is considered, it no longer really "knows" whether a particular JOIN was successful or not - it doesn't selectively apply criteria based on whether or not the join succeeded. So if you've specified the b.field2 should equal someCriteria, you're saying that it should only return the first row (1,a,1,c,someCriteria).
If you want to make particular assertions about NULLable columns, you do really want the WHERE clause to act this way and force you to explicitly consider NULLs (whether those be generated from a NULL column or by a JOIN failing)
The cure I'd usually adopt is the one shown in HLGEM's answer, rather than adding OR b.field2 IS NULL since you usually want to exclude the (1,a,1,e,NULL) row.
I would write the first as:
SELECT a.field1, b.field1
FROM tableA a
LEFT JOIN tableB b on a.key = b.key
AND b.field2 = 'someCriteria'
This would rerturn all records from table a and the b.field1 would only have data if b.field2 = 'somecriteria'