A question suddenly came to my mind while I was tuning one stored procedure. Let me ask it -
I have two tables, table1 and table2. table1 contains huge data and table2 contains less data. Is there performance-wise any difference between these two queries(I am changing order of the tables)?
Query1:
SELECT t1.col1, t2.col2
FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col2
Query2:
SELECT t1.col1, t2.col2
FROM table2 t2
INNER JOIN table1 t1
ON t1.col1=t2.col2
We are using Microsoft SQL server 2005.
Aliases, and the order of the tables in the join (assuming it's INNER JOIN) doesn't affect the final outcome and thus doesn't affect performance since the order is replace (if needed) when the query is executed.
You can read some more basic concepts about relational algebra here:
http://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators
Related
I noticed today that this query
select * from table1 table2 where column_from_table1 = ?;
works. It works the same as (same columns return)
select * from table1 where column_from_table1 = ?;
Shouldn't the former be a syntax error? What is it interpreting table2 as?
Appears it's interpreting it as renaming the table, even though table2 exists it happily allows the rename, this also works:
select * from table1 asdf where asdf.column_from_table1 = ?;
select * from table1 table2 where column_from_table1 = ?;
table2 is working as a table alias for table1. It's not being used as the name of an object in the database at all. The fact that a table named table2 exists is wholly irrelevant to this query. Usually you'd see something like this:
select t.id, t.name from table1 t where t.column_from_table1 = ?;
Some RDBMSs require the as keyword, so you'll also see this:
SELECT t.id, t.name FROM table1 AS t WHERE t.column_from_table1 = ?;
Table aliases are useful for making queries with multiple tables easier to write, especially if they have shared column names which need to be qualified. They're also essential for self-joins where a table is joined to itself.
Example of a join using aliases:
SELECT t1.Id,
t1.Name as t1_Name
t2.Name as t2_Name
FROM table1 t1
JOIN table2 t2
ON t1.id = t2.id
WHERE t1.column_from_table1 = ?;
Or, for a self-join to look for duplicate Name values, for example:
SELECT t1.Name,
t1.Id
t2.Id as Dupe_Id
FROM table1 t1
JOIN table1 t2
ON t1.Name = t2.Name
WHERE t1.Id < t2.Id;
Notice that this query is referring to table1 twice and uses the aliases of t1 and t2 to differentiate which it's referring to.
Note that a comma join, such as FROM table1, table2 WHERE table1.id = table2.id is very old syntax that should be explicitly avoided when writing queries. The older syntax is difficult to read and maintain and doesn't support outer joins except by vender-specific extensions. The newer syntax with the JOIN keyword was introduced in standard SQL in 1992. There's no reason to still be using comma joins.
SELECT .... FROM TABLE1 T1, TABLE2 T2, TABLE3 T3
WHERE T1.NAME = 'ABC' AND T1.ID = T2.COL_ID AND T2.COL1 = T3.COL2
vs
SELECT .... FROM TABLE1 T1
WHERE T1.NAME = 'ABC'
INNER JOIN TABLE2 T2 ON T1.ID = T2.COL_ID
INNER JOIN TABLE3 T3 ON T2.COL1 = T3.COL2
Two questions
In terms of performance, which will perform better and why?
If Option 2 has the better performance, when should be using Option 1? (vice versa question if Option 1 has better performance)
The second query is not correct. It should be:
SELECT .... FROM TABLE1 T1
INNER JOIN TABLE2 T2 ON T1.ID = T2.COL_ID
INNER JOIN TABLE3 T3 ON T2.COL1 = T3.COL2
WHERE T1.NAME = 'ABC'
This is the right way to write your join condition. The 1st one is accepted, but technically creates a cartesian product. All modern database deals perfectly with both 1st and 2nd queries and interprets them the same way, therefore, performance should be the same. But still, you should use the second one because it is more readable and allows you to have only one way to write join weither it is a inner, left or full outer.
The answer is easy: Don't use comma-separated joins (first query). We used these in the 1980s for the lack of something better, but then in 1992 the new syntax (second query) was introduced1, because the old syntax was error-prone (it was easier to forget to apply join criteria) and harder to maintain (was missing join criteria intended or not in a query?) and there was no standard syntax for outer joins.
1 Oracle was a little late though featuring the new syntax. They introduced the new ANSI joins in Oracle 9i in 2001.
In terms of performance: There should be no difference in speed, because DBMS optimizers see that this is essentially the same query.
Your second query is syntactically incorrect by the way. The query's WHERE clause belongs after the complete FROM clause, i.e. after all the joins:
SELECT ....
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.col_id
INNER JOIN table3 t3 ON t2.col1 = t3.col2
WHERE t1.name = 'ABC';
Is there any Performance problem to use operation in miss order?
Like
1. All Inner join first then all where condition later.
select * from
t1
inner join t2 on t1.t2Id = t2.Id
inner join t3 on t1.t3Id = t3.Id
inner join t4 on t2.t4Id = t4.Id
where
t1.Id in (1,2,3,4,5)
and t2.Id in (1,2,3,4,5,6,7)
and t3.Name like '%a'
2. All table with Respective Where and then Inner join
select * from
(select * from t1 where t1.Id in (1,2,3,4,5)) a
inner join (select * from t2 where t2.Id in (1,2,3,4,5,6,7)) a1 on a.t2Id =
a.Id
inner join (select * from t3 where t3.Name like '%a') a2 on a.t3Id = a2.Id
inner join t4 on a1.t4Id = t4.Id
It may effect on query Performance?
Also Order of Where Condition?
Like
select * from t1
inner join t2 on t1.t2Id = t2.Id
where t1.t2Id in (1,2,3,4,5,6)
and t2.t3Id in (1,2,3,4,5)
A SQL query goes through three phases when it is run:
The query is parsed (and the various references are looked up).
The execution plan is created, with an optimization phase based on what the query needs to accomplish.
The query plan is execution.
As a result of the optimization, the way you write the query often has less effect on the performance than you might think. Lots of people have worked very hard on figuring out the best way to optimize queries -- and there are probably lots of things that you are not even aware of (such as different join algorithms, join ordering, pushing down expression evaluations, and so on).
For your examples, the SQL Server optimizer should produce the same execution plans. The engine is smart enough to realize that these are really doing the same thing.
Note: This is not true of all query engines. Some have pretty poor optimizers, and there would be differences in performance.
Sorry for the bad heading.
My question is which of the below queries would be faster?
Query 1
SELECT t1_col1, t1_col2, t2_col2
FROM t1 JOIN t2
ON t1.t1_col1 = t2.t2_col1
Query 2
SELECT t1_col1, t1_col2, t2_col2
FROM
(SELECT t1_col1, t1_col2
FROM t1) t1 JOIN
(SELECT t2_col1, t2_col2
FROM t2) t2
ON t1.t1_col1 = t2.t2_col1
Assume both the tables t1 and t2 have 1 M+ records and more than 15 columns. Also let's just say there are no indexes on any columns.
I go for the approach 2 as it seems less data would be loaded into memory. But doesn't the SQL Server internally manage that?
I am on PDW 2012.
I have two tables say for ex: table1 and table2 as below
Table1(id, desc )
Table2(id, col1, col2.. col10.....)
col1 to col10 in table 2 could be linked with id field in table1.
I write a query which has 10 instances of table1 (each one to link col1 to col10 of table2)
select t2.id, t1_1.desc, t1_2.desc,.....t1_10.desc from table2 t2
left outer join table1 t1_1 on t1_1.id = t2.col1
left outer join table1 t1_2 on t1_2.id = t2.col2
left outer join table1 t1_3 on t1_3.id = t2.col3
.
.
.
left outer join table1 t1_10 on t1_10.id = t2.col10
where t2.id ='111'
This query is inside the Sp and when i try to execute the Sp in SSMS, it works without any problems.
However When my web application runs, the query works for few where clause value and hangs for few.
I have checked the cost of the query, and created one nonclusteredindex with this 10 columns in table2. The cost found to be reduced to 0 on joins. However, I am still seeing the query hangs
The table 1 has 500 rows and table 2 has 700 rows in it.
Can any one help.
First of all, why are you rejoining to the table 10 times rather than one join with 10 predicates?
left outer join table1 t1_1 on t1_1.id = t2.col1
left outer join table1 t1_2 on t1_2.id = t2.col2
left outer join table1 t1_3 on t1_3.id = t2.col3
.
.
.
left outer join table1 t1_10 on t1_10.id = t2.col10
vs.
left outer join table1 t1 on t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
just wanted to bring that up because its very unusual to rejoin to the same table like that 10 times.
As far as your query plan goes, sql server sniffs the first parameter used in the query and caches that query plan for use in future queries. This query plan can be a good plan for certain where clause values and a bad plan for other where clause values which is why sometimes it is performing well and other times it is not. If you have skews in your table columns (some where clause values have a high number of recurring values) then you could consider using OPTION(RECOMPILE) in your query to force it to develop a new execution plan each time it is called. This has pros and cons, see this answer for a discussion OPTION (RECOMPILE) is Always Faster; Why?