SQL Server Query having multiple left outer joins hangs - sql

I have two tables say for ex: table1 and table2 as below
Table1(id, desc )
Table2(id, col1, col2.. col10.....)
col1 to col10 in table 2 could be linked with id field in table1.
I write a query which has 10 instances of table1 (each one to link col1 to col10 of table2)
select t2.id, t1_1.desc, t1_2.desc,.....t1_10.desc from table2 t2
left outer join table1 t1_1 on t1_1.id = t2.col1
left outer join table1 t1_2 on t1_2.id = t2.col2
left outer join table1 t1_3 on t1_3.id = t2.col3
.
.
.
left outer join table1 t1_10 on t1_10.id = t2.col10
where t2.id ='111'
This query is inside the Sp and when i try to execute the Sp in SSMS, it works without any problems.
However When my web application runs, the query works for few where clause value and hangs for few.
I have checked the cost of the query, and created one nonclusteredindex with this 10 columns in table2. The cost found to be reduced to 0 on joins. However, I am still seeing the query hangs
The table 1 has 500 rows and table 2 has 700 rows in it.
Can any one help.

First of all, why are you rejoining to the table 10 times rather than one join with 10 predicates?
left outer join table1 t1_1 on t1_1.id = t2.col1
left outer join table1 t1_2 on t1_2.id = t2.col2
left outer join table1 t1_3 on t1_3.id = t2.col3
.
.
.
left outer join table1 t1_10 on t1_10.id = t2.col10
vs.
left outer join table1 t1 on t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
just wanted to bring that up because its very unusual to rejoin to the same table like that 10 times.
As far as your query plan goes, sql server sniffs the first parameter used in the query and caches that query plan for use in future queries. This query plan can be a good plan for certain where clause values and a bad plan for other where clause values which is why sometimes it is performing well and other times it is not. If you have skews in your table columns (some where clause values have a high number of recurring values) then you could consider using OPTION(RECOMPILE) in your query to force it to develop a new execution plan each time it is called. This has pros and cons, see this answer for a discussion OPTION (RECOMPILE) is Always Faster; Why?

Related

Query with multiple left join taking so much time to run

In snowflake I have a table Table_A which is getting its data based on various left join conditions from 4 other tables (Table_1, Table_2, Table_3, Table_4). Each source table has around 20 million of rows and it is expected that after running the query at least 10 million of rows will be inserted in Table_A.
I am using the below condition with multiple Left Joins and OR .
Insert into Table_A (x,y,z)
select "column names"
FROM "Table_1" T1
LEFT JOIN "Table_2" T2 ON T1.ID = T2.ID
LEFT JOIN "Table_3" T3 ON T1.ID = T3.ID or T2.ID = T3.ID
LEFT JOIN "Table_4" T4 ON T1.ID = T4.ID or T2.ID = T4.ID or T3.ID = T4.ID
The query with above condition taking so much time. I tried to use limit by 5 and it took 5 mins to insert just 5 rows (with warehouse size Large). I left it run without limit and had to abort after 12 hours because it was still running. Is there any way we can optimize this query/logic condition to decrease its run time? TIA
OR kills the optimization of JOINs. You could use USING to avoid this problem with outer joins. However, that is not necessary (and can be tricky if the join columns do not have the same name).
The chain of joins are LEFT JOINs, so you have all the data in the first table. So, just use the id from that table for all the joins:
Insert into Table_A (x,y,z)
SELECT "column names"
FROM "Table_1" T1 LEFT JOIN
"Table_2" T2
ON T1.ID = T2.ID LEFT JOIN
"Table_3" T3
ON T1.ID = T3.ID LEFT JOIN
"Table_4" T4
ON T1.ID = T4.ID;

SQL join between 2 tables with OR condition

I am just trying to understand the concept behind joining of 2 tables with an OR condition.
My requirement is: I need to join 2 tables Table1 [colA, colB] and Table2 [colX, colY] on columns Table1.colA = Table2.colB but if colA is NULL the condition should be Table1.colB = Table2.colY.
Do I need to do join them separately and then do union? Or is there a way I can do it in one join? Note that I have millions of records in both tables and its a left join and the tables reside in HIVE. I don't have a reproducible example, just trying to understand the concept.
While I'm not familiar with HiveQL, in SQL server this could be accomplished as follows:
SELECT *
FROM table1 t1
JOIN table2 t2
ON COALESCE(t1.cola, t1.colb) = CASE
WHEN t1.cola IS NULL THEN t2.coly
ELSE t2.colx
END
The logic should be fairly readable.
Translate your conditions directly:
SELECT *
FROM table1 t1 JOIN
table2 t2
ON (t1.cola = t2.colb) or
(t1.cola is null and t1.colb = t2.coly)
Usually, or is a performance killer in joins. This wold often be expressed using two separate left joins:
SELECT . . . , COALESCE(t2a.col, t2b.col) as col
FROM table1 t1 LEFT JOIN
table2 t2a
ON (t1.cola = t2.colb) LEFT JOIN
table2 t2b
ON t1.cola is null and t1.colb = t2.coly;

Exactly when to user inner join or an alternative query

SELECT .... FROM TABLE1 T1, TABLE2 T2, TABLE3 T3
WHERE T1.NAME = 'ABC' AND T1.ID = T2.COL_ID AND T2.COL1 = T3.COL2
vs
SELECT .... FROM TABLE1 T1
WHERE T1.NAME = 'ABC'
INNER JOIN TABLE2 T2 ON T1.ID = T2.COL_ID
INNER JOIN TABLE3 T3 ON T2.COL1 = T3.COL2
Two questions
In terms of performance, which will perform better and why?
If Option 2 has the better performance, when should be using Option 1? (vice versa question if Option 1 has better performance)
The second query is not correct. It should be:
SELECT .... FROM TABLE1 T1
INNER JOIN TABLE2 T2 ON T1.ID = T2.COL_ID
INNER JOIN TABLE3 T3 ON T2.COL1 = T3.COL2
WHERE T1.NAME = 'ABC'
This is the right way to write your join condition. The 1st one is accepted, but technically creates a cartesian product. All modern database deals perfectly with both 1st and 2nd queries and interprets them the same way, therefore, performance should be the same. But still, you should use the second one because it is more readable and allows you to have only one way to write join weither it is a inner, left or full outer.
The answer is easy: Don't use comma-separated joins (first query). We used these in the 1980s for the lack of something better, but then in 1992 the new syntax (second query) was introduced1, because the old syntax was error-prone (it was easier to forget to apply join criteria) and harder to maintain (was missing join criteria intended or not in a query?) and there was no standard syntax for outer joins.
1 Oracle was a little late though featuring the new syntax. They introduced the new ANSI joins in Oracle 9i in 2001.
In terms of performance: There should be no difference in speed, because DBMS optimizers see that this is essentially the same query.
Your second query is syntactically incorrect by the way. The query's WHERE clause belongs after the complete FROM clause, i.e. after all the joins:
SELECT ....
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.col_id
INNER JOIN table3 t3 ON t2.col1 = t3.col2
WHERE t1.name = 'ABC';

Does INNER JOIN performance depends on order of tables?

A question suddenly came to my mind while I was tuning one stored procedure. Let me ask it -
I have two tables, table1 and table2. table1 contains huge data and table2 contains less data. Is there performance-wise any difference between these two queries(I am changing order of the tables)?
Query1:
SELECT t1.col1, t2.col2
FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col2
Query2:
SELECT t1.col1, t2.col2
FROM table2 t2
INNER JOIN table1 t1
ON t1.col1=t2.col2
We are using Microsoft SQL server 2005.
Aliases, and the order of the tables in the join (assuming it's INNER JOIN) doesn't affect the final outcome and thus doesn't affect performance since the order is replace (if needed) when the query is executed.
You can read some more basic concepts about relational algebra here:
http://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators

Explanation about nested loop

Nested Loop Join
In this kind of join operation it process each row from outer input and loop through all rows of inner input to search for matching row based on join column.
Nested loops joins perform a search on the inner table for each row of the outer table, typically using an index.
example:
Select T1.Col2
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 36
can you please explain which is outer input and inner input. Here we have two condition that is T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 36 table is first filtered by which condition
I would rather write the query in this way:
SELECT T1.Col2
FROM Table1 T1
INNER JOIN Table2 T2 ON T1.Col1 = T2.Col1
WHERE T1.Col1 BETWEEN 1 AND 36
The second condition is not a join condition, but a where condition (Table2 is not involved in solving that condition).
The optimizer of your database should be able to decide if filtering first Table1 is faster than join Table2 and then filter, I imagine that the later can be true if Table2 is quite small. Also indexes can change the query plan.
Anyway if you want to be sure about how your database is executing your query just check the query plan.
SELECT T1.Col2
FROM Table1 T1
INNER JOIN Table2 T2 ON T1.Col1 = T2.Col1
WHERE T1.Col1 >=1 and T1.Col1<36
you'll find better explaination to join follow the link
http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html