Sorry for the bad heading.
My question is which of the below queries would be faster?
Query 1
SELECT t1_col1, t1_col2, t2_col2
FROM t1 JOIN t2
ON t1.t1_col1 = t2.t2_col1
Query 2
SELECT t1_col1, t1_col2, t2_col2
FROM
(SELECT t1_col1, t1_col2
FROM t1) t1 JOIN
(SELECT t2_col1, t2_col2
FROM t2) t2
ON t1.t1_col1 = t2.t2_col1
Assume both the tables t1 and t2 have 1 M+ records and more than 15 columns. Also let's just say there are no indexes on any columns.
I go for the approach 2 as it seems less data would be loaded into memory. But doesn't the SQL Server internally manage that?
I am on PDW 2012.
Related
How can I improve the performance by joining two big tables and sort by 1st able unique index?
I need only 1st table data with sort by. without order query will performance so fast.
Here is the example of queries
select a.* from T1 a, T2 b where a.c1 = b.c2;
select a.* from T1 a, T2 b where a.c1 = b.c2 order by a.id;
just FYI, T1 and T2 have the proper index.
T1 table count is "54483938" T2 table count is "54483820"
I am more interest in T1 data with sort by which T1 Records exist on T2.
I tried using an in operator query, it took me into 300 sec.
You can try the three forms of the query:
join (which you have)
in (which you claim to have run)
exists
The exists version is:
select a.*
from T1 a
where exists (select 1 from T2 b where a.c1 = b.c2)
order by a.id;
For this query, I would recommend indexes on T1(id, c1) and T2(c2).
I have two tables say for ex: table1 and table2 as below
Table1(id, desc )
Table2(id, col1, col2.. col10.....)
col1 to col10 in table 2 could be linked with id field in table1.
I write a query which has 10 instances of table1 (each one to link col1 to col10 of table2)
select t2.id, t1_1.desc, t1_2.desc,.....t1_10.desc from table2 t2
left outer join table1 t1_1 on t1_1.id = t2.col1
left outer join table1 t1_2 on t1_2.id = t2.col2
left outer join table1 t1_3 on t1_3.id = t2.col3
.
.
.
left outer join table1 t1_10 on t1_10.id = t2.col10
where t2.id ='111'
This query is inside the Sp and when i try to execute the Sp in SSMS, it works without any problems.
However When my web application runs, the query works for few where clause value and hangs for few.
I have checked the cost of the query, and created one nonclusteredindex with this 10 columns in table2. The cost found to be reduced to 0 on joins. However, I am still seeing the query hangs
The table 1 has 500 rows and table 2 has 700 rows in it.
Can any one help.
First of all, why are you rejoining to the table 10 times rather than one join with 10 predicates?
left outer join table1 t1_1 on t1_1.id = t2.col1
left outer join table1 t1_2 on t1_2.id = t2.col2
left outer join table1 t1_3 on t1_3.id = t2.col3
.
.
.
left outer join table1 t1_10 on t1_10.id = t2.col10
vs.
left outer join table1 t1 on t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
just wanted to bring that up because its very unusual to rejoin to the same table like that 10 times.
As far as your query plan goes, sql server sniffs the first parameter used in the query and caches that query plan for use in future queries. This query plan can be a good plan for certain where clause values and a bad plan for other where clause values which is why sometimes it is performing well and other times it is not. If you have skews in your table columns (some where clause values have a high number of recurring values) then you could consider using OPTION(RECOMPILE) in your query to force it to develop a new execution plan each time it is called. This has pros and cons, see this answer for a discussion OPTION (RECOMPILE) is Always Faster; Why?
I have searched but have not found a definitive answer. Which of these is better for performance in SQL Server:
SELECT T.*
FROM dbo.Table1 T
LEFT JOIN Table2 T2 ON T.ID = T2.Table1ID
LEFT JOIN Table3 T3 ON T.ID = T3.Table1ID
WHERE T2.Table1ID IS NOT NULL
OR T3.Table1ID IS NOT NULL
or...
SELECT T.*
FROM dbo.Table1 T
JOIN Table2 T2 ON T.ID = T2.Table1ID
UNION
SELECT T.*
FROM dbo.Table1 T
JOIN Table3 T3 ON T.ID = T3.Table1ID
I have tried running both but it's hard to tell for sure. I'd appreciate an explanation of why one is faster than the other, or if it depends on the situation.
Your two queries do not do the same things. In particular, the first will return duplicate rows if values are duplicated in either table.
If you are looking for rows in Table1 that are in either of the other two tables, I would suggest using exists:
select t1.*
from Table1 t1
where exists (select 1 from Table2 t2 where t2.Table1Id = t1.id) or
exists (select 1 from Table3 t3 where t3.Table1Id = t1.id);
And, create indexes on Table1Id in both Table2 and Table3.
Which of your original queries is faster depends a lot on the data. The second has an extra step to remove duplicates (union verses union all). On the other hand, the first might end up creating many duplicate rows.
A question suddenly came to my mind while I was tuning one stored procedure. Let me ask it -
I have two tables, table1 and table2. table1 contains huge data and table2 contains less data. Is there performance-wise any difference between these two queries(I am changing order of the tables)?
Query1:
SELECT t1.col1, t2.col2
FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col2
Query2:
SELECT t1.col1, t2.col2
FROM table2 t2
INNER JOIN table1 t1
ON t1.col1=t2.col2
We are using Microsoft SQL server 2005.
Aliases, and the order of the tables in the join (assuming it's INNER JOIN) doesn't affect the final outcome and thus doesn't affect performance since the order is replace (if needed) when the query is executed.
You can read some more basic concepts about relational algebra here:
http://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators
For example1:
select T1.*, T2.*
from TABLE1 T1, TABLE2 T2
where T1.id = T2.id
and T1.name = 'foo'
and T2.name = 'bar';
That will first join T1 and T2 together by id, then select the records that satisfy the name conditions?
Or select the records that satisfy the name condition in T1 or T2, then join those together?
And, Is there a difference in performance between example1 and example2(DB2)?
example2:
select *
from
(
select * from TABLE1 T1 where T1.name = 'foo'
) A,
(
select * from TABLE2 T2 where T2.name = 'bar'
) B
where A.id = B.id;
How the query will be executed depends on what the query planner does with it. Depending on the available indexes and how much data is in the tables the query plan may look different. The planner tries to do the work in the order that it thinks is most efficient.
If the planner does a good job, the plan for both queries should be the same, otherwise the first query is likely to be faster because the second would create two intermediate results that doesn't have any indexes.
Exemple 1 is more efficient because it has no embedded queries. About how the result set is build, I have no idea - I don't know DB2.