How to improve performance of SQL query with parameters? - sql

I'm using SQL Server 2005.
I have a problems with executing SQL statements like this
DECLARE #Param1 BIT
SET #Param1 = 1
SELECT
t1.Col1,
t1.Col2
FROM
Table1 t1
WHERE
#Param1=0 OR
(t1.Col2 in
(SELECT t2.Col4
FROM
Table2 t2
WHERE
t2.Col1 = t1.Col1 AND
t2.Col2 = 'AAA' AND
t2.t3 <> 0)
)
This query executes very long time.
But if I replace #Param1 with 1, than query execution time is ~2 seconds.
Any information how to resolve the problem would be greatly appreciated.

Well, the explanation seems simple enough. For your current condition, since #Param1=0 is false (you set the parameter to 1 previously), it needs to evaluate your second condition, wich has a subquery and might take a long time. If you change your first filter to #Param1=1, then you are saying that it is true and there is no need to evaluate your second filter, hence making your query faster.

You seem to be confusing the optimiser with your OR statement. If you remove it, you should find it generates two different execution plans for the SELECT statement - one with a filter, and the other without:
DECLARE #Param1 BIT
SET #Param1 = 1
if #Param1=0
begin
SELECT
t1.Col1,
t1.Col2
FROM
Table1 t1
end
else
begin
SELECT
t1.Col1,
t1.Col2
FROM
Table1 t1
WHERE
(t1.Col2 in
(SELECT t2.Col4
FROM
Table2 t2
WHERE
t2.Col1 = t1.Col1 AND
t2.Col2 = 'AAA' AND
t2.t3 <> 0)
)
end

This is commonly referred to as the N+1 problem. You're doing a select in table 1 and for each record you find you will go look for something in table 2.
By setting your #Param1 to an value which will never be found in your select the sql engine will skip the subquery.
To avoid this behavior you could use a JOIN statement to join both tables together and afterwards filter the results with a where statement. The join statement will be a bit slower then a single subquery because you are matching 2 tables to eachother but because you only need to do the join once(vs N times) you'll gain a serious performance boost.
Example code :
DECLARE #Param1 BIT
SET #Param1 = 1
SELECT t1.Col1,t1.Col2
FROM Table1 t1
INNER JOIN Table2 t2 on t1.Col1 = t2.Col1
WHERE #Param1=0
OR t2.Col2 = 'AAA'
AND t2.t3 <> 0

Related

T-SQL query running indefinitely | Not if its batched

I hit queit a strange issue here working with a SQL Server table.
With following query, I'm checking if an entry exists that was created between 2023-02-01T04:10:18 and 2023-02-05T04:55:44 (4 days).
This query runs forever:
SELECT
TOP 1 1
FROM
tablexyz t1 (nolock)
WHERE
t1.col1 = 1
AND t1.col <= '2023-01-31'
AND t1.knowledge_begin_date >= '2023-02-01T04:10:18'
AND t1.knowledge_begin_date <= '2023-02-05T04:10:18'
OPTION(RECOMPILE)
While, if I check for a 2 day period, both the queries execute in under 200 ms:
-- Executes in 200ms
SELECT
TOP 1 1
FROM
tablexyz t1 (nolock)
WHERE
t1.col1 = 1
AND t1.col2 <= '2023-01-31'
AND t1.knowledge_begin_date >= '2023-02-01T04:10:18'
AND t1.knowledge_begin_date <= '2023-02-03T04:10:18'
OPTION(RECOMPILE)
and
-- Executes in 200ms
SELECT
TOP 1 1
FROM
tablexyz t1 (nolock)
WHERE
t1.col1 = 1
AND t1.col2 <= '2023-01-31'
AND t1.knowledge_begin_date >= '2023-02-03T04:10:18'
AND t1.knowledge_begin_date <= '2023-02-05T04:10:18'
OPTION(RECOMPILE)
Any idea what could be the reason here? Note that this view(over 3 tables) has over 3 billion rows.
Indexes on the tables
Non-clustered index col1_col2_IX on (col1, col2)
Non-clustered index kdb_IX on (knowledge_begin_date)
Execution plans:
I'm not able to get the actual execution plan of long running query as it is not completing execution. Is there any way to access this ?
Looking at query plans of batched queries, it is doing an index lookup on kdb_IX for all 3 tables the view is over
It seems reasonable to believe that query optimizer should take of this, but strangely that is not the case.
You can try to combine the 2 indexes in one query using CROSS APPLY, and see if it helps... something like
SELECT
TOP 1 1
FROM
tablexyz t1 (nolock)
CROSS APPLY (
SELECT
TOP 1 1 x
FROM tablexyz t2 (nolock)
WHERE
t1.col1 = 1
AND t1.col <= '2023-01-31'
AND (t1. ID = t2.ID)
) t2
WHERE
t1.knowledge_begin_date >= '2023-02-01T04:10:18'
AND t1.knowledge_begin_date <= '2023-02-05T04:10:18'
OPTION(RECOMPILE)
This is just a strech, depends a lot on the distributions and how will create the execution plan. The idea is that CROSS APPLY should be applied for each row (as a set) and in this case can use the second index.

SQL Query Performance Join with condition

calling all sql experts. I have the following select statement:
SELECT 1
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id
WHERE t1.field = xyz
I'm a little bit worried about the performance here. Is the where clause evaluated before or after the join? If its evaluated after, is there way to first evaluate the where clause?
The whole table could easily contain more than a million entries but after the where clause it may be only 1-10 entries left so in my opinion it really is a big performance difference depending on when the where clause is evaluated.
Thanks in advance.
Dimi
You could rewrite your query like this:
SELECT 1
FROM (SELECT * FROM table1 WHERE field = xyz) t1
JOIN table2 t2 ON t1.id = t2.id
But depending on the database product the optimiser might still decide that the best way to do this is to JOIN table1 to table2 and then apply the constraint.
For this query:
SELECT 1
FROM table1 t1 JOIN
table2 t2
ON t1.id = t2.id
WHERE t1.field = xyz;
The optimal indexes are table1(field, id), table2(id).
How the query is executed depends on the optimizer. It is tasked with choosing the based execution plan, given the table statistics and environment.
Each DBMS has its own query optimizer. So by logic of things in case like yours WHERE will be executed first and then JOINpart of the query
As mentioned in the comments and other answers with performance the answer is always "it depends" depending on your dbms and the indexing of the base tables the query may be fine as is and the optimizer may evaluate the where first. Or the join may be efficient anyway if the indexes cover the join requirements.
Alternatively you can force the behavior you require by reducing the dataset of t1 before you do the join using a nested select as Richard suggested or adding the t1.field = xyz to the join for example
ON t1.field = xyz AND t1.id = t2.id
personally if i needed to reduce the dataset before the join I would use a cte
With T1 AS
(
SELECT * FROM table1
WHERE T1.Field = 'xyz'
)
SELECT 1
FROM T1
JOIN Table2 T2
ON T1.Id = T2.Id

performance comparison on coalesce vs 'is null'

select * from zzz t1
INNER JOIN yyy T2
on (T1.col1 = T2.col1 )
and (T1.col2 = T2.col2 or (T1.col2 is null and T2.col2 is null ) )
VS
select * from zzz t1
INNER JOIN yyy T2
on (T1.col1 = T2.col1 )
and (coalesce(T1.col2, '\0') = coalesce(T2.col2, '\0'))
Or if there's a third, better, way to do this I'd appreciate that too. This is something I find myself constantly doing because half of our databases allow nulls and half of them don't so comparisons suck.
The only thing is I need to avoid things that aren't standard, I use too many different dbs and I'm trying hard to get away from anything that isn't supported by all of them unless the performance gain is utterly magical. (db2, oracle, sqlserver are the main ones I'm using)
If a database has NULL safe comparisons, then the best approach is:
select *
from zzz t1 join
yyy T2
on (T1.col1 = T2.col1 ) and
(T1.col2 is not distinct from T2.col2);
(This is supported by DB2 but not Oracle or SQL Server.)
Otherwise, I think your two versions are going to be equivalent in most databases at least. The use of functions/or limits the ability to use indexes for col2.

How to pass a parameter into a long query in Teradata

I have a somewhat large query (~8 tables being joined, 100 lines of sql or so) that I use frequently in order to join many different data sources together. I would like to be able to pass a parameter into the query so that myself and other members of my team can work off the same code base and just change the time period that we are looking at.
Example code (obviously my actual code is more involved than this, but these are some of the things that I need to do):
SELECT t1.* , x.col1, x.SUM_col3
FROM table1 t1
LEFT JOIN
(
SELECT t2.col1, t3.col2, SUM(t3.col3) as SUM_col3
FROM table2 t2
INNER JOIN table3 t3
ON t2.PI = t3.SI
WHERE t3.col2 NOT LIKE :parameter1
GROUP BY 1,2
QUALIFY ROW_NUMBER() OVER(PARTITION BY t2.col1 ORDER BY t3.col1) = 1
) x
ON t1.col1 = x.col1
WHERE t1.START_DATE >= :parameter2
Solutions I have considered:
Using the '?' prefix in order to enter the parameters at run time. I find this method to be quite inefficient as the entered parameter is retained in the code after running the code.
Using Dynamic SQL to create a procedure which can then be called. I'm struggling to get this method to work as its intended purpose seems to be relatively short queries.
How can I best structure my query and code in order to be able to run something like CALL MY_QUERY(:parameter1,:parameter2) in such a way that it either creates the resulting table in memory (less preferred) or returns a result set that I can then store or use myself (more preferred).
What you are wanting is a Macro in Teradata. A Macro is pretty much just a parameterized view, which is exactly what you are talking about here.
CREATE MACRO myMacro (parameter1 VARCHAR(20), parameter2 DATE)
AS
(
SELECT t1.*
FROM table1 t1
LEFT JOIN
(
SELECT t2.col1, t3.col2, SUM(t3.col3)
FROM table2 t2
INNER JOIN table3 t3
WHERE t3.col2 NOT LIKE :parameter1
GROUP BY 1,2
QUALIFY ROW_NUMBER() OVER(PARTITION BY t2.col1 ORDER BY t3.col1) = 1
) x
ON t1.col1 = x.col1
WHERE t1.START_DATE >= :parameter2;
);
To call it:
Execute myMacro('Harold', DATE '2015-01-01');

Oracle SQL - Query Hanging When Minimizing Select Statement

I have a sql query, which hangs when changing the select statement from * to only one column. Where could it possibly hang? Isn't that supposed to result faster, since I request only 1 column, instead of 50?
select *
from table1 t1, table2 t2
where t1.id1 = t2.id2 and t2.columnX = :x
select t1.column1 from table1 t1, table2 t2 where t1.id1 = t2.id2 and t2.columnX = :x
p.s. the columns have indexes.
Regards
On the surface, it appears there should be no difference between the results. Start by comparing the EXPLAIN PLAN output for each query. If the cost is the same, then there's something else beyond the queries themselves that's at issue here. As #tbone states in the comment, it could be something as simple as caching.