T-SQL: Additional predicates on JOINs vs. the WHERE clause - sql

Is there any difference between putting additional predicates on a JOIN statement vs. adding them as additional clauses in the WHERE statement?
Example 1: Predicate on the WHERE clause
select emp.*
from Employee emp
left join Order o on emp.Id = o.EmployeeId
where o.Cancelled = 0
Example 2: Predicate on the JOIN statement
select emp.*
from Employee emp
left join Order o on emp.Id = o.EmployeeId and o.Cancelled = 0

With the first statement the outer join is effectively turned into an inner join because of the WHERE condition as it will filter out all rows from the employee table where no order was found (because o.Cancelled will be NULL then)
So the two statements don't do the same thing.

I already got the answers from some of my colleagues, but in case they don't post it here, I'll add an answer myself.
Both of these examples assume that the predicate is comparing a column on the "right" table with a scalar value.
Performance
It seems that if the predicate is on the JOIN, then the "right" table is filtered in advance. If the predicate is part of the WHERE clause, then all results come back and are filtered once at the end before returning the resultset.
Data Returned
if the predicate is part of the WHERE clause, then in the situation where the "right" value is null (i.e. there is no joining row) then the entire row will not be returned in the final resultset, because the predicate will compare the value with null and therefore return false.

Just to address the case that the additional predicate is on a column from the left hand table this can still make a difference as shown below.
WITH T1(N) AS
(
SELECT 1 UNION ALL
SELECT 2
), T2(N) AS
(
SELECT 1 UNION ALL
SELECT 2
)
SELECT T1.N, T2.N, 'ON' AS Clause
FROM T1
LEFT JOIN T2 ON T1.N = T2.N AND T1.N=1
UNION ALL
SELECT T1.N, T2.N, 'WHERE' AS Clause
FROM T1
LEFT JOIN T2 ON T1.N = T2.N
WHERE T1.N=1
Returns
N N Clause
----------- ----------- ------
1 1 ON
2 NULL ON
1 1 WHERE

Here is another example ( four cases )
insert into #tmp(1,"A")
insert into #tmp(2,"B")
select "first Query", a.*,b.* from #tmp a LEFT JOIN #tmp b
on a.id =b.id
and a.id =1
union all
select "second Query", a.*,b.* from #tmp a LEFT JOIN #tmp b
on a.id =b.id
where a.id =1
union all
select "Third Query", a.*,b.* from #tmp a LEFT JOIN #tmp b
on a.id =b.id
and b.id =1
union all
select "Fourth Query", a.*,b.* from #tmp a LEFT JOIN #tmp b
on a.id =b.id
where b.id =1
Results:
first Query 1 A 1 A
first Query 2 B NULL NULL
second Query 1 A 1 A
Third Query 1 A 1 A
Third Query 2 B NULL NULL
Fourth Query 1 A 1 A
Fourth Query 1 A 1 A

Related

Joining two tables where id does not equal

I'm struggling getting this query to produce the results I want.
I have:
table1, columns=empid, alt_id
table2, columns=empid, alt_id
I want to get the empid, and alt_id from table 1 where the alt_id does not match the alt_id in table2. They will both have alt_id numbers I just want to get the ones that do not match.
Any ideas?
SELECT * FROM table1
INNER JOIN table2 ON table2.empid = table1.empid AND table2.alt_id <> table1.alt_id
What does that really mean though? Normally when this is asked, it is of the form "I want all rows from A that have no row matching in B and all in B that have no match in A"
Which looks like this:
SELECT * FROM
A
FULL OUTER JOIN
B
ON
a.id = b.id
You'll see a null for any row data where there isn't a matching row on the other side:
A.id
1
2
B.id
1
3
Result of full outer join:
A.id B.id
1 1
2 null
null 3
You, however have asked for A-B join where the IDs aren't equal, which would be the more useless query of:
SELECT * FROM
A
INNER JOIN
B
ON
a.id != b.id
And it would look like:
A.id B.id
1 3
2 1
2 3
You seem to want not exists:
select t1.*
from table1 t1
where not exists (select 1 from table2 t2 where t2.alt_id = t1.alt_id);
It is unclear whether or not you also want to join on empid, so you might really want:
select t1.*
from table1 t1
where not exists (select 1 from table2 t2 where t2.alt_id = t1.alt_id and t2.empid = t1.empid);
A left join will find all records in Table A that do not match those in Table B. Then use a Where filter to find the Nulls from Table B. That will give you all those in Table A that do not have a matching ID in Table B.
Select A.*
from Table A
Left Join
Table B
on a.altid = b.altid
where b.altid is null;
select *
from [Login] L inner join Employee E
on l.EmployeeID = e.EmployeeID
where l.EmployeeID not in (select EmployeeID from Employee)

SQL select 1 to many within the same row

I have a table with 1 record, which then ties back to a secondary table which can contain either no match, 1 match, or 2 matches.
I need to fetch the corresponding records and display them within the same row which would be easy using left join if I just had 1 or no matches to tie back, however, because I can get 2 matches, it produces 2 records.
Example with 1 match:
Select T1.ID, T1.Person1, T2.Owner
From T1
Left Join T2
ON T1.ID = T2.MatchID
Output
ID Person1 Owner1
----------------------
1 John Frank
Example with 2 match:
Select T1.ID, T1.Person1, T2.Owner
From T1
Left Join T2
ON T1.ID = T2.MatchID
Output
ID Person1 Owner
----------------------
1 John Frank
1 John Peter
Is there a way I can formulate my select so that my output would reflect the following When I have 2 matches:
ID Person1 Owner1 Owner2
-------------------------------
1 John Frank Peter
I explored Oracle Pivots a bit, however couldn't find a way to make this work. Also explored the possibility of using left join on the same table twice using MIN() and MAX() when fetching the matches, however I can only see myself resorting this as a "no other option" scenario.
Any suggestions?
** EDIT **
#ughai - Using CTE does address the issue to some extent, however when attempting to retrieve all of the records, the details derived from this common table isn't showing any records on the LEFT JOIN unless I specify the "MatchID" (CASE_MBR_KEY) value, meaning by removing the "where" clause, my outer joins produce no records, even though the CASE_MBR_KEY values are there in the CTE data.
WITH CTE AS
(
SELECT TEMP.BEAS_KEY,
TEMP.CASE_MBR_KEY,
TEMP.FULLNAME,
TEMP.BIRTHDT,
TEMP.LINE1,
TEMP.LINE2,
TEMP.LINE3,
TEMP.CITY,
TEMP.STATE,
TEMP.POSTCD,
ROW_NUMBER()
OVER(ORDER BY TEMP.BEAS_KEY) R
FROM TMP_BEN_ASSIGNEES TEMP
--WHERE TEMP.CASE_MBR_KEY = 4117398
)
The reason for this is because the ROW_NUMBER value, given the amount of records won't necessarily be 1 or 2, so I attempted the following, but getting ORA-01799: a column may not be outer-joined to a subquery
--// BEN ASSIGNEE 1
LEFT JOIN CTE BASS1
ON BASS1.CASE_MBR_KEY = C.CASE_MBR_KEY
AND BASS1.R IN (SELECT min(R) FROM CTE A WHERE A.CASE_MBR_KEY = C.CASE_MBR_KEY)
--// END BA1
--// BEN ASSIGNEE 2
LEFT JOIN CTE BASS2
ON BASS2.CASE_MBR_KEY = C.CASE_MBR_KEY
AND BASS2.R IN (SELECT MAX(R) FROM CTE B WHERE B.CASE_MBR_KEY = C.CASE_MBR_KEY)
--// END BA2
** EDIT 2 **
Fixed the above issue by moving the Row number clause to the "Where" portion of the query instead of within the JOIN clause. Seems to work now.
You can use CTE with ROW_NUMBER() with 2 LEFT JOIN OR with PIVOT like this.
SQL Fiddle
Query with Multiple Left Joins
WITH CTE as
(
SELECT MatchID,Owner,ROW_NUMBER()OVER(ORDER BY Owner) r FROM t2
)
select T1.ID, T1.Person, t2.Owner as Owner1, t3.Owner as Owner2
FROM T1
LEFT JOIN CTE T2
ON T1.ID = T2.MatchID AND T2.r = 1
LEFT JOIN CTE T3
ON T1.id = T3.MatchID AND T3.r = 2;
Query with PIVOT
WITH CTE as
(
SELECT MatchID,Owner,ROW_NUMBER()OVER(ORDER BY Owner) R FROM t2
)
SELECT ID, Person,O1,O2
FROM T1
LEFT JOIN CTE T2
ON T1.ID = T2.MatchID
PIVOT(MAX(Owner) FOR R IN (1 as O1,2 as O2));
Output
ID PERSON OWNER1 OWNER2
1 John Maxwell Peter
If you know there are at most two matches, you can also use aggregation:
Select T1.ID, T1.Person1,
MIN(T2.Owner) as Owner1,
(CASE WHEN MIN(t2.Owner) <> MAX(t2.Owner) THEN MAX(t2.Owner) END) as Owner2
From T1 Left Join
T2
on T1.ID = T2.MatchID
Group By t1.ID, t1.Person1;

Which performs first WHERE clause or JOIN clause

Which clause performs first in a SELECT statement?
I have a doubt in select query on this basis.
consider the below example
SELECT *
FROM #temp A
INNER JOIN #temp B ON A.id = B.id
INNER JOIN #temp C ON B.id = C.id
WHERE A.Name = 'Acb' AND B.Name = C.Name
Whether, First it checks WHERE clause and then performs INNER JOIN
First JOIN and then checks condition?
If it first performs JOIN and then WHERE condition; how can it perform more where conditions for different JOINs?
The conceptual order of query processing is:
1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY
But this is just a conceptual order. In fact the engine may decide to rearrange clauses. Here is proof. Let's make 2 tables with 1000000 rows each:
CREATE TABLE test1 (id INT IDENTITY(1, 1), name VARCHAR(10))
CREATE TABLE test2 (id INT IDENTITY(1, 1), name VARCHAR(10))
;WITH cte AS(SELECT -1 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) d FROM
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t1(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t2(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t3(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t4(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t5(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t6(n))
INSERT INTO test1(name) SELECT 'a' FROM cte
Now run 2 queries:
SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id AND t2.id = 100
WHERE t1.id > 1
SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id
WHERE t1.id = 1
Notice that the first query will filter most rows out in the join condition, but the second query filters in the where condition. Look at the produced plans:
1 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(100)
2 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(1)
This means that in the first query optimized, the engine decided first to evaluate the join condition to filter out rows. In the second query, it evaluated the where clause first.
Logical order of query processing phases is:
FROM - Including JOINs
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
You can have as many as conditions even on your JOINs or WHERE clauses. Like:
Select * from #temp A
INNER JOIN #temp B ON A.id = B.id AND .... AND ...
INNER JOIN #temp C ON B.id = C.id AND .... AND ...
Where A.Name = 'Acb'
AND B.Name = C.Name
AND ....
you can refer to this join optimization
SELECT * FROM T1 INNER JOIN T2 ON P1(T1,T2)
INNER JOIN T3 ON P2(T2,T3)
WHERE P(T1,T2,T3)
The nested-loop join algorithm would execute this query in the following manner:
FOR each row t1 in T1 {
FOR each row t2 in T2 such that P1(t1,t2) {
FOR each row t3 in T3 such that P2(t2,t3) {
IF P(t1,t2,t3) {
t:=t1||t2||t3; OUTPUT t;
}
}
}
}
You can refer MSDN
The rows selected by a query are filtered first by the FROM clause
join conditions, then the WHERE clause search conditions, and then the
HAVING clause search conditions. Inner joins can be specified in
either the FROM or WHERE clause without affecting the final result.
You can also use the SET SHOWPLAN_ALL ON before executing your query to show the execution plan of your query so that you can measure the performance difference in the two.
If you come to this site for the question about logical query processing, you really need to read this article on ITProToday by Itzik Ben-Gan.
Figure 3: Logical query processing order of query clauses
1 FROM
2 WHERE
3 GROUP BY
4 HAVING
5 SELECT
5.1 SELECT list
5.2 DISTINCT
6 ORDER BY
7 TOP / OFFSET-FETCH

Left outer join on multiple tables

I have the following sql statement:
select
a.desc
,sum(bdd.amount)
from t_main c
left outer join t_direct bds on (bds.repid=c.id)
left outer join tm_defination def a on (a.id =bds.sId)
where c.repId=1000000134
group by a.desc;
When I run it I get the following result:
desc amount
NW 12.00
SW 10
When I try to add another left outer join to get another set of values:
select
a.desc
,sum(bdd.amount)
,sum(i.amt)
from t_main c
left outer join t_direct bds on (bds.repid=c.id)
left outer join tm_defination def a on (a.id =bdd.sId)
left outer join t_ind i on (i.id=c.id)
where c.repId=1000000134
group by a.desc;
It basically doubles the amount field like:
desc amount amt
NW 24.00 234.00
SE 20.00 234.00
While result should be:
desc amount amt
NW 12.00 234.00
SE 10.00 NULL
How do I fix this?
If you really need to receive the data as you mentioned, your can use sub-queries to perform the needed calculations. In this case you code may looks like the following:
select x.[desc], x.amount, y.amt
from
(
select
c.[desc]
, sum (bdd.amount) as amount
, c.id
from t_main c
left outer join t_direct bds on (bds.repid=c.id)
left outer join tm_defination_def bdd on (bdd.id = bds.sId)
where c.repId=1000000134
group by c.id, c.[desc]
) x
left join
(
select t.id, sum (t.amt) as amt
from t_ind t
inner join t_main c
on t.id = c.id
where c.repID = 1000000134
group by t.id
) y
on x.id = y.id
In the first sub-select you will receive the aggregated data for the two first columns: desc and amount, grouped as you need.
The second select will return the needed amt value for each id of the first set.
Left join between those results will gives the needed result. The addition of the t_main table to the second select was done because of performance issues.
Another solution can be the following:
select
c.[desc]
, sum (bdd.amount) as amount
, amt = (select sum (amt) from t_ind where id = c.id)
from #t_main c
left outer join t_direct bds on (bds.repid=c.id)
left outer join tm_defination_def bdd on (bdd.id = bds.sId)
where c.repId = 1000000134
group by c.id, c.[desc]
The result will be the same. Basically, instead of using of nested selects the calculating of the amt sum is performing inline per each row of the result joins. In case of large tables the performance of the second solution will be worse that the first one.
Your new left outer join is forcing some rows to be returned in the result set a few times due to multiple relations most likely. Remove your SUM and just review the returned rows and work out exactly which ones you require (maybe restrict it to on certain type of t_ind record if that is applicable??), then adjust your query accordingly.
Left Outer Join - Driving Table Row Count
A left outer join may return more rows than there are in the driving table if there are multiple matches on the join clause.
Using MS SQL-Server:
DECLARE #t1 TABLE ( id INT )
INSERT INTO #t1 VALUES ( 1 ),( 2 ),( 3 ),( 4 ),( 5 );
DECLARE #t2 TABLE ( id INT )
INSERT INTO #t2 VALUES ( 2 ),( 2 ),( 3 ),( 10 ),( 11 ),( 12 );
SELECT * FROM #t1 t1
LEFT OUTER JOIN #t2 t2 ON t2.id = t1.id
This gives:
1 NULL
2 2
2 2
3 3
4 NULL
5 NULL
There are 5 rows in the driving table (t1), but 6 rows are returned because there are multiple matches for id 2.
So if an aggregate function is used, eg SUM() etc, grouped by the driving table column(s), this will give the wrong results.
To fix this, use derived tables or sub-queries to calculate the aggregate values, as already stated.
Left Outer Join - Multiple Tables
Where there are left outer joins over multiple tables, or any join for that matter, the query generates a series of derived tables in the order of joins.
SELECT * FROM t1
LEFT OUTER JOIN t2 ON t2.col2 = <...>
LEFT OUTER JOIN t3 ON t3.col3 = <...>
This is equivalent to:
SELECT * FROM
(
SELECT * FROM t1
LEFT OUTER JOIN t2 ON t2.col2 = <...>
) dt1
LEFT OUTER JOIN t3 ON t3.col3 = <...>
Here, for both queries, the results of the 1st left outer join are put into a derived table (dt1) which is then left outer joined to the 3rd table (t3).
For left outer joins over multiple tables, the order of the tables in the join clauses is critical.

LEFT JOIN - How to join tables and include extra row even if you have right match

I have two tables
Table A
-------
ID
ProductName
Table B
-------
ID
ProductID
Size
I want to join these two tables
SELECT * FROM
(SELECT * FROM A)
LEFT JOIN
(SELECT * FROM B)
ON A.ID = B.ProductID
This is easy, I will get all rows from A multiplied by rows matched in B, and NULL fields if there is no match.
But here comes the tricky question, how can I get all rows from A with NULL fields for table B, even if there is a match, so I get an extra line with NULL values plus all the matches?
SELECT A.*
, B3.ID
, B3.ProductID
, B3.Size
FROM A
LEFT JOIN
(
SELECT ProductID as MatchID
, ID
, ProductID
, Size
FROM B
UNION ALL
SELECT ID
, null
, null
, null
FROM A A2
) B3
ON A.ID = B3.MatchID
Live example at SQL Fiddle.
Instead of using UNION ALL in a subquery as suggested by others, you could also (and I would) use UNION ALL at the outer level, which keeps the query simpler:
SELECT A.ID, A.ProductName, B.ID, B.Size
FROM A
INNER JOIN B
ON B.ProductID = A.ID
UNION ALL
SELECT A.ID, A.ProductName, NULL, NULL
FROM A
Since every join is going to be successful, we can switch to a full/inner join:
SELECT
*
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size FROM B
UNION ALL
SELECT NULL,ID,NULL FROM A) B
ON
A.ID = B.ProductID
Now would be a very good time to switch to naming columns explicitly, rather than using SELECT *
Or, if, as per #Andomar's comment, you need all of the B columns to be NULL:
SELECT
A.ID,A.ProductName,
B.ID,B.ProductID,B.Size
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size,ProductID as MatchID FROM B
UNION ALL
SELECT NULL,NULL,NULL,ID FROM A) B
ON
A.ID = B.MatchID