Which performs first WHERE clause or JOIN clause - sql

Which clause performs first in a SELECT statement?
I have a doubt in select query on this basis.
consider the below example
SELECT *
FROM #temp A
INNER JOIN #temp B ON A.id = B.id
INNER JOIN #temp C ON B.id = C.id
WHERE A.Name = 'Acb' AND B.Name = C.Name
Whether, First it checks WHERE clause and then performs INNER JOIN
First JOIN and then checks condition?
If it first performs JOIN and then WHERE condition; how can it perform more where conditions for different JOINs?

The conceptual order of query processing is:
1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY
But this is just a conceptual order. In fact the engine may decide to rearrange clauses. Here is proof. Let's make 2 tables with 1000000 rows each:
CREATE TABLE test1 (id INT IDENTITY(1, 1), name VARCHAR(10))
CREATE TABLE test2 (id INT IDENTITY(1, 1), name VARCHAR(10))
;WITH cte AS(SELECT -1 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) d FROM
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t1(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t2(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t3(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t4(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t5(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t6(n))
INSERT INTO test1(name) SELECT 'a' FROM cte
Now run 2 queries:
SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id AND t2.id = 100
WHERE t1.id > 1
SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id
WHERE t1.id = 1
Notice that the first query will filter most rows out in the join condition, but the second query filters in the where condition. Look at the produced plans:
1 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(100)
2 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(1)
This means that in the first query optimized, the engine decided first to evaluate the join condition to filter out rows. In the second query, it evaluated the where clause first.

Logical order of query processing phases is:
FROM - Including JOINs
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
You can have as many as conditions even on your JOINs or WHERE clauses. Like:
Select * from #temp A
INNER JOIN #temp B ON A.id = B.id AND .... AND ...
INNER JOIN #temp C ON B.id = C.id AND .... AND ...
Where A.Name = 'Acb'
AND B.Name = C.Name
AND ....

you can refer to this join optimization
SELECT * FROM T1 INNER JOIN T2 ON P1(T1,T2)
INNER JOIN T3 ON P2(T2,T3)
WHERE P(T1,T2,T3)
The nested-loop join algorithm would execute this query in the following manner:
FOR each row t1 in T1 {
FOR each row t2 in T2 such that P1(t1,t2) {
FOR each row t3 in T3 such that P2(t2,t3) {
IF P(t1,t2,t3) {
t:=t1||t2||t3; OUTPUT t;
}
}
}
}

You can refer MSDN
The rows selected by a query are filtered first by the FROM clause
join conditions, then the WHERE clause search conditions, and then the
HAVING clause search conditions. Inner joins can be specified in
either the FROM or WHERE clause without affecting the final result.
You can also use the SET SHOWPLAN_ALL ON before executing your query to show the execution plan of your query so that you can measure the performance difference in the two.

If you come to this site for the question about logical query processing, you really need to read this article on ITProToday by Itzik Ben-Gan.
Figure 3: Logical query processing order of query clauses
1 FROM
2 WHERE
3 GROUP BY
4 HAVING
5 SELECT
5.1 SELECT list
5.2 DISTINCT
6 ORDER BY
7 TOP / OFFSET-FETCH

Related

Count records only from left side of a LEFT JOIN

I'm building an Access query with a LEFT JOIN that, among other things, counts the number of unique sampleIDs present in the left table of the JOIN, and counts the aggregate number of specimens (bugs) present in the right table of the JOIN, both for a given group of samples (TripID). Here's the pertinent chunk of SQL code:
SELECT DISTINCT t1.TripID, COUNT(t1.SampleID) AS Samples, SUM(t2.C1 + t2.C2)
AS Bugs FROM tbl_Sample AS t1
LEFT JOIN tbl_Bugs AS t2 ON t1.SampleID = t2.SampleID
GROUP BY t1.TripID
The trouble I'm having is that COUNT(t1.SampleID) is not giving me my desired result. My desired result is the number of unique SampleIDs present in t1 for a given TripID (let's say 7). Instead, what I get seems to be the number of rows in t2 for which the SampleID is contained within the given TripID group (let's say 77). How can I change this SQL query to get the desired number (7, not 77)?
just take the aggregate sum first on t2, then join with t2 like this:
SELECT t1.TripID, COUNT(t1.SampleID) AS Samples, SUM(t3.Bugs) as Bugs
FROM tbl_Sample AS t1
LEFT Join (
SELECT t2.SampleID, SUM(t2.C1 + t2.C2) as Bugs
FROM tbl_Bugs as t2
GROUP BY SampleID) AS t3 ON t1.SampleID = t3.SampleID
GROUP BY t1.TripID
This is a tricky query, because you have different hierarchies. Here is one method:
select s.tripid, count(*) as numsamples,
(select sum(b2.c1 + b2.c2)
from bugs b join
tbl_sample s2
on s2.sampleid = b.sampleid
where s2.tripid = s.tripid
) as numbugs
from tbl_sample s
group by s.tripid
You included a DISTINCT with a Group By. This is removing duplicates twice, which is unnecessarily complex. You can get rid of the DISTINCT.
I would have the count separate from what is going on in the group by.
SELECT dT.TripID
,(SELECT COUNT(DISTINCT(SampleID))
FROM Bugs B
WHERE B.TripID = dT.TripID
) AS [Samples]
,dT.Bugs
FROM (
SELECT t1.TripID
,SUM(t2.C1 + t2.C2) AS Bugs
FROM tbl_Sample AS t1
LEFT JOIN tbl_Bugs AS t2 ON t1.SampleID = t2.SampleID
GROUP BY t1.TripID
) AS dT

SQL select 1 to many within the same row

I have a table with 1 record, which then ties back to a secondary table which can contain either no match, 1 match, or 2 matches.
I need to fetch the corresponding records and display them within the same row which would be easy using left join if I just had 1 or no matches to tie back, however, because I can get 2 matches, it produces 2 records.
Example with 1 match:
Select T1.ID, T1.Person1, T2.Owner
From T1
Left Join T2
ON T1.ID = T2.MatchID
Output
ID Person1 Owner1
----------------------
1 John Frank
Example with 2 match:
Select T1.ID, T1.Person1, T2.Owner
From T1
Left Join T2
ON T1.ID = T2.MatchID
Output
ID Person1 Owner
----------------------
1 John Frank
1 John Peter
Is there a way I can formulate my select so that my output would reflect the following When I have 2 matches:
ID Person1 Owner1 Owner2
-------------------------------
1 John Frank Peter
I explored Oracle Pivots a bit, however couldn't find a way to make this work. Also explored the possibility of using left join on the same table twice using MIN() and MAX() when fetching the matches, however I can only see myself resorting this as a "no other option" scenario.
Any suggestions?
** EDIT **
#ughai - Using CTE does address the issue to some extent, however when attempting to retrieve all of the records, the details derived from this common table isn't showing any records on the LEFT JOIN unless I specify the "MatchID" (CASE_MBR_KEY) value, meaning by removing the "where" clause, my outer joins produce no records, even though the CASE_MBR_KEY values are there in the CTE data.
WITH CTE AS
(
SELECT TEMP.BEAS_KEY,
TEMP.CASE_MBR_KEY,
TEMP.FULLNAME,
TEMP.BIRTHDT,
TEMP.LINE1,
TEMP.LINE2,
TEMP.LINE3,
TEMP.CITY,
TEMP.STATE,
TEMP.POSTCD,
ROW_NUMBER()
OVER(ORDER BY TEMP.BEAS_KEY) R
FROM TMP_BEN_ASSIGNEES TEMP
--WHERE TEMP.CASE_MBR_KEY = 4117398
)
The reason for this is because the ROW_NUMBER value, given the amount of records won't necessarily be 1 or 2, so I attempted the following, but getting ORA-01799: a column may not be outer-joined to a subquery
--// BEN ASSIGNEE 1
LEFT JOIN CTE BASS1
ON BASS1.CASE_MBR_KEY = C.CASE_MBR_KEY
AND BASS1.R IN (SELECT min(R) FROM CTE A WHERE A.CASE_MBR_KEY = C.CASE_MBR_KEY)
--// END BA1
--// BEN ASSIGNEE 2
LEFT JOIN CTE BASS2
ON BASS2.CASE_MBR_KEY = C.CASE_MBR_KEY
AND BASS2.R IN (SELECT MAX(R) FROM CTE B WHERE B.CASE_MBR_KEY = C.CASE_MBR_KEY)
--// END BA2
** EDIT 2 **
Fixed the above issue by moving the Row number clause to the "Where" portion of the query instead of within the JOIN clause. Seems to work now.
You can use CTE with ROW_NUMBER() with 2 LEFT JOIN OR with PIVOT like this.
SQL Fiddle
Query with Multiple Left Joins
WITH CTE as
(
SELECT MatchID,Owner,ROW_NUMBER()OVER(ORDER BY Owner) r FROM t2
)
select T1.ID, T1.Person, t2.Owner as Owner1, t3.Owner as Owner2
FROM T1
LEFT JOIN CTE T2
ON T1.ID = T2.MatchID AND T2.r = 1
LEFT JOIN CTE T3
ON T1.id = T3.MatchID AND T3.r = 2;
Query with PIVOT
WITH CTE as
(
SELECT MatchID,Owner,ROW_NUMBER()OVER(ORDER BY Owner) R FROM t2
)
SELECT ID, Person,O1,O2
FROM T1
LEFT JOIN CTE T2
ON T1.ID = T2.MatchID
PIVOT(MAX(Owner) FOR R IN (1 as O1,2 as O2));
Output
ID PERSON OWNER1 OWNER2
1 John Maxwell Peter
If you know there are at most two matches, you can also use aggregation:
Select T1.ID, T1.Person1,
MIN(T2.Owner) as Owner1,
(CASE WHEN MIN(t2.Owner) <> MAX(t2.Owner) THEN MAX(t2.Owner) END) as Owner2
From T1 Left Join
T2
on T1.ID = T2.MatchID
Group By t1.ID, t1.Person1;

SQL alternative to sub-query in SELECT Item list

I have RDBMS table and Queries which are working perfectly. I have offloaded data from RDBMS to HIVE table.To run the existing queries on HIVE, we need first to make them compatible to HIVE.
Let's take below example with sub-query in select item list. It is syntactically valid and working fine on RDBMS system. But It Will not work on HIVE As per HIVE manual , Hive supports subqueries only in the FROM and WHERE clause.
Example 1 :
SELECT t1.one
,(SELECT t2.two
FROM TEST2 t2
WHERE t1.one=t2.two) t21
,(SELECT t3.three
FROM TEST3 t3
WHERE t1.one=t3.three) t31
FROM TEST1 t1 ;
Example 2:
SELECT a.*
, CASE
WHEN EXISTS
(SELECT 1
FROM tblOrder O
INNER JOIN tblProduct P
ON O.Product_id = P.Product_id
WHERE O.customer_id = C.customer_id
AND P.Product_Type IN (2, 5, 6, 9)
)
THEN 1
ELSE 0
END AS My_Custom_Indicator
FROM tblCustomer C
INNER JOIN tblOtherStuff S
ON C.CustomerID = S.CustomerID ;
Example 3 :
Select component_location_id, component_type_code,
( select clv.LOCATION_VALUE
from stg_dev.component_location_values clv
where identifier_code = 'AXLE'
and component_location_id = cl.component_location_id ) as AXLE,
( select clv.LOCATION_VALUE
from stg_dev.component_location_values clv
where identifier_code = 'SIDE'
and component_location_id = cl.component_location_id ) as SIDE
from stg_dev.component_locations cl ;
I want to know the possible alternative of sub-queries in select item list to make it compatible to hive. Apparently I will be able to transform existing queries in HIVE format.
Any help and guidance is highly appreciated !
The query you provided could be transformed to a simple query with LEFT JOINs.
SELECT
t1.one, t2.two AS t21, t3.three AS t31
FROM
TEST1 t1
LEFT JOIN TEST2 t2
ON t1.one = t2.two
LEFT JOIN TEST3 t3
ON t1.one = t3.three
Since there is no limitation in the subqueries, the joins will return the same data. (The subqueries should return only one or no row for each row in TEST1.)
Please note, that your original query could not handle 1..n connections. In most DBMS, subqueries in the SELECT list should return only with a resultset with one columns and one or no row.
Based on HIVE manual:
SELECT t1.one, t2.two, t3.three
FROM TEST1 t1,TEST2 t2, TEST3 t3
WHERE t1.one=t2.two AND t1.one=t3.three;
SELECT t1.one,t2.two,t3.three FROM TEST1 t1 INNER
JOIN TEST2 t2 ON t1.one=t2.two INNER JOIN TEST3 t3
ON t1.one=t3.three WHERE t1.one=t2.two AND t1.one=t3.three;
SELECT t1.one,t2.two as t21,t3.three as t31 FROM TEST1 t1
INNER JOIN TEST2 t2 ON t1.one=t2.two
INNER JOIN TEST3 t3 ON t1.one=t3.three

T-SQL: Additional predicates on JOINs vs. the WHERE clause

Is there any difference between putting additional predicates on a JOIN statement vs. adding them as additional clauses in the WHERE statement?
Example 1: Predicate on the WHERE clause
select emp.*
from Employee emp
left join Order o on emp.Id = o.EmployeeId
where o.Cancelled = 0
Example 2: Predicate on the JOIN statement
select emp.*
from Employee emp
left join Order o on emp.Id = o.EmployeeId and o.Cancelled = 0
With the first statement the outer join is effectively turned into an inner join because of the WHERE condition as it will filter out all rows from the employee table where no order was found (because o.Cancelled will be NULL then)
So the two statements don't do the same thing.
I already got the answers from some of my colleagues, but in case they don't post it here, I'll add an answer myself.
Both of these examples assume that the predicate is comparing a column on the "right" table with a scalar value.
Performance
It seems that if the predicate is on the JOIN, then the "right" table is filtered in advance. If the predicate is part of the WHERE clause, then all results come back and are filtered once at the end before returning the resultset.
Data Returned
if the predicate is part of the WHERE clause, then in the situation where the "right" value is null (i.e. there is no joining row) then the entire row will not be returned in the final resultset, because the predicate will compare the value with null and therefore return false.
Just to address the case that the additional predicate is on a column from the left hand table this can still make a difference as shown below.
WITH T1(N) AS
(
SELECT 1 UNION ALL
SELECT 2
), T2(N) AS
(
SELECT 1 UNION ALL
SELECT 2
)
SELECT T1.N, T2.N, 'ON' AS Clause
FROM T1
LEFT JOIN T2 ON T1.N = T2.N AND T1.N=1
UNION ALL
SELECT T1.N, T2.N, 'WHERE' AS Clause
FROM T1
LEFT JOIN T2 ON T1.N = T2.N
WHERE T1.N=1
Returns
N N Clause
----------- ----------- ------
1 1 ON
2 NULL ON
1 1 WHERE
Here is another example ( four cases )
insert into #tmp(1,"A")
insert into #tmp(2,"B")
select "first Query", a.*,b.* from #tmp a LEFT JOIN #tmp b
on a.id =b.id
and a.id =1
union all
select "second Query", a.*,b.* from #tmp a LEFT JOIN #tmp b
on a.id =b.id
where a.id =1
union all
select "Third Query", a.*,b.* from #tmp a LEFT JOIN #tmp b
on a.id =b.id
and b.id =1
union all
select "Fourth Query", a.*,b.* from #tmp a LEFT JOIN #tmp b
on a.id =b.id
where b.id =1
Results:
first Query 1 A 1 A
first Query 2 B NULL NULL
second Query 1 A 1 A
Third Query 1 A 1 A
Third Query 2 B NULL NULL
Fourth Query 1 A 1 A
Fourth Query 1 A 1 A

How to select records from a Table that has a certain number of rows in a related table in SQL Server?

Not quite sure how to ask this, but I have 2 tables that are related in a 1 to many relationship, I need to select all records in the "1" table that have less than three records in the "many' table.
select b.foreignkey,count(b.foreignkey) as bidcount
from b
where b.foreignkey in (select a.id from a) and bidcount< 3
group by b.foreignkey
this doesn't work at all I know but I am at a loss how to do this.
I need to in the end select all the records from the "a" table based on this criteria. Sorry if that is confusing!
Just using your code, not tested:
SELECT
b.foreignkey,
count(b.foreignkey) as bidcount
FROM
b
WHERE
b.foreignkey IN (SELECT a.id FROM a)
GROUP BY
b.foreignkey
HAVING
count(b.foreignkey) < 3
Try this:
SELECT t1.id,COUNT(t2.parentId)
FROM table1 as t1
INNER JOIN table2 as t2
ON t1.id = t2.parentId
GROUP BY t1.id
HAVING COUNT(t2.parentId) < 3
You didn't mention which version of SQL Server you're using - if you're on SQL Server 2005 or newer, you could use this CTE (Common Table Expression):
;WITH ChildRows AS
(
SELECT A.Id, COUNT(b.Id) AS 'BCount'
FROM
dbo.TableA A
INNER JOIN
dbo.TableB B ON B.TableAId = A.Id
)
SELECT A.*, R.BCount
FROM dbo.TableA A
INNER JOIN ChildRows R ON A.Id = R.Id
The inner SELECT lists the Id columns from TableA and the count of the child rows associated with those (using the INNER JOIN to TableB) - and the outer SELECT just builds on top of that result set and shows all fields from table A (and the count from the B table)
if you want to return all fields of your (1) table in one query, I suggest you consider using CROSS APPLY:
SELECT t1.* FROM table_1 t1
CROSS APPLY (SELECT COUNT(*) cnt FROM Table_Many t2 WHERE t2.fk = t1.pk) a
where a.cnt < 3
in some particular cases, based on your indices and db structure, this query may run 4 times faster than the GROUP BY method
you have posted this question in sql server, I have a answer in oracle database system (don't know whether it will run in sql server as well or not)
this is as follow-
select [desired column list] from
(select b.*, count(*) over (partition by b.foreignkey) c_1
from b
where b.foreignkey in (select a.id from a) )
where c_1 < 3 ;
i hope it should work on sql server as well...
if not please let me update ..