SQL server in operator performance - sql

I have a query that joins some tables and when I use = operator instead of in operator in a where clause like this I get a significant performance improvement.
= operator takes less than a second.
in operator takes about a minute.
where P.GID in ( SELECT GID from [dbo].[fn_SomeFunction] (15268) )
the sub query returns 1 result in most of the cases and just this change will improve most of the cases but will cause errors for some other cases.
any ideas why this behavior?

Try something like that it is not tested and may contain some syntactic errors.
The main idea is to get the desired ids in a temp table variable and use that in a join.
Hope that helps.
DECLARE #gids TABLE(
GID UNIQUEIDENTIFIER NOT NULL
)
INSERT INTO #gids (GID)
SELECT
GID
FROM [dbo].[fn_SomeFunction](15268)
SELECT * FROM SomeTable st INNER JOIN st.GID = #gids.GID

If that truly represents a functions that is what is killing your execution time. You can verify this by looking at the execution plan.
What happens is that in your subquery that function has to be calculated over and over again. If you can remove the function and replace with a query you should notice an improvement.
Additionally, if the functions return type doesn't match the column's type there will need to be an implicit conversion done on each return item. Changing the functions return type(or the columns type) to match would also help performance.
DateTime <> smalldatetime for example and requires an implicit conversion.

Related

Cast to INT Fails

I executed a query very similar to this:
SELECT
CAST(A.ValueCol AS INT) * CAST(B.ValueCol AS INT) MultipliedValue
FROM Table1 A
JOIN Table1 B
ON A.LinkCol = B.LinkCol
WHERE
A.TypeCol = <Value that limits ValueCol to only integers>
AND B.TypeCol = <Value that limits ValueCol to only integers>
In this case, ValueCol is type NVARCHAR and contains both integer and non-integer values. Despite adequate filtering in the WHERE clause, I'm getting CAST errors for values that aren't even scoped (e.g. if WHERE filters all non-integer values, SQL is throwing an error trying to cast 'ABC', which does exist in a table row but should not have been pulled into this query). I verified that only integer values were being pulled by removing the CASTs and selecting the two ValueCols independently.
Is there a precedence/order of operations problem here? Is CAST applied to all rows' ValueCols prior to filtering with WHERE?
I know I can use TRY_CAST, just curious about this behavior in SQL. Thank you!
Sometimes, SQL Server will deem it more efficient to run an operation like CAST() for every row in a page (or index), before applying filters from the WHERE clause.
There is no good way to avoid this.
This is one reason why you should not store meaningful data in columns with an ambiguous type, with anti-patterns such as EAV.

SQL Server subquery behaviour

I have a case where I want to check to see if an integer value is found in a column of a table that is varchar, but is a mix of values that can be integers some are just strings. My first thought was to use a subquery to just select the rows with numeric-esque values. The setup looks like:
CREATE TABLE #tmp (
EmployeeID varchar(50) NOT NULL
)
INSERT INTO #tmp VALUES ('aa1234')
INSERT INTO #tmp VALUES ('1234')
INSERT INTO #tmp VALUES ('5678')
DECLARE #eid int
SET #eid = 5678
SELECT *
FROM (
SELECT EmployeeID
FROM #tmp
WHERE IsNumeric(EmployeeID) = 1) AS UED
WHERE UED.EmployeeID = #eid
DROP TABLE #tmp
However, this fails, with: "Conversion failed when converting the varchar value 'aa1234' to data type int.".
I don't understand why it is still trying to compare #eid to 'aa1234' when I've selected only the rows '1234' and '5678' in the subquery.
(I realize I can just cast #eid to varchar but I'm curious about SQL Server's behaviour in this case)
You can't easily control the order things will happen when SQL Server looks at the query you wrote and then determines the optimal execution plan. It won't always produce a plan that follows the same logic you typed, in the same order.
In this case, in order to find the rows you're looking for, SQL Server has to perform two filters:
identify only the rows that match your variable
identify only the rows that are numeric
It can do this in either order, so this is also valid:
identify only the rows that are numeric
identify only the rows that match your variable
If you look at the properties of this execution plan, you see that the predicate for the match to your variable is listed first (which still doesn't guarantee order of operation), but in any case, due to data type precedence, it has to try to convert the column data to the type of the variable:
Subqueries, CTEs, or writing the query a different way - especially in simple cases like this - are unlikely to change the order SQL Server uses to perform those operations.
You can force evaluation order in most scenarios by using a CASE expression (you also don't need the subquery):
SELECT EmployeeID
FROM #tmp
WHERE EmployeeID = CASE IsNumeric(EmployeeID) WHEN 1 THEN #eid END;
In modern versions of SQL Server (you forgot to tell us which version you use), you can also use TRY_CONVERT() instead:
SELECT EmployeeID
FROM #tmp
WHERE TRY_CONVERT(int, EmployeeID) = #eid;
This is essentially shorthand for the CASE expression, but with the added bonus that it allows you to specify an explicit type, which is one of the downsides of ISNUMERIC(). All ISNUMERIC() tells you is if the value can be converted to any numeric type. The string '1e2' passes the ISNUMERIC() check, because it can be converted to float, but try converting that to an int...
For completeness, the best solution - if there is an index on EmployeeID - is to just use a variable that matches the column data type, as you suggested.
But even better would be to use a data type that prevents junk data like 'aa1234' from getting into the table in the first place.

T-Sql - Select query in another select query takes long time

I have a procedure with arguments but its calling takes a very long time. I decided to check what is wrong with my query and came to the conclusion that the problem is Column In (SELECT [...]).
Both queries return 1500 rows.
First query: time 45 second
Second query: time 0 second
1.
declare #FILTER_OPTION int
declare #ID_DISTRIBUTOR type_int_value
declare #ID_DATA_TYPE type_bigint_value
declare #ID_AGGREGATION_TYPE type_int_value
set #FILTER_OPTION = 8
insert into #ID_DISTRIBUTOR values (19)
insert into #ID_DATA_TYPE values (30025)
insert into #ID_AGGREGATION_TYPE values (10)
SELECT * FROM dbo.[DATA] WHERE
[ID_DISTRIBUTOR] IN (select [VALUE] from #ID_DISTRIBUTOR)
AND [ID_DATA_TYPE] IN (select [VALUE] from #ID_DATA_TYPE)
AND [ID_AGGREGATION_TYPE] IN (select [VALUE] from #ID_AGGREGATION_TYPE)
2.
select * FROM dbo.[DATA] WHERE
[ID_DISTRIBUTOR] IN (19)
AND [ID_DATA_TYPE] IN (30025)
AND [ID_AGGREGATION_TYPE] IN (10)
Why this is happening?
How should I create a stored procedure that takes an array of arguments to use it quickly?
Edit:
Maybe it's a problem with indexes? indexes are created on these three columns.
For such a large performance difference, I would guess that you have one or more indexes. In particular, if you have an index on (ID_DISTRIBUTOR, ID_DATA_TYPE, ID_AGGREGATION_TYPE), then the second query can make use of the index. SQL Server can recognize that the IN is really = and the query is a simple lookup.
In the first case, SQL Server doesn't "know" that the subqueries really have only one row in them. That requires a different set of optimizations. In particular, the above index cannot be used, because the IN generally optimizes differently from =.
As for what to do. First, look at the execution plans so you can see the different between the two versions. Then, test the second version with more than one value in the IN lists.
If you can live with just one value for each comparison, then use = rather than IN.

Optimizing stored procedure with multiple "LIKE"s

I am passing in a comma-delimited list of values that I need to compare to the database
Here is an example of the values I'm passing in:
#orgList = "1123, 223%, 54%"
To use the wildcard I think I have to do LIKE but the query runs a long time and only returns 14 rows (the results are correct, but it's just taking forever, probably because I'm using the join incorrectly)
Can I make it better?
This is what I do now:
declare #tempTable Table (SearchOrg nvarchar(max) )
insert into #tempTable
select * from dbo.udf_split(#orgList) as split
-- this splits the values at the comma and puts them in a temp table
-- then I do a join on the main table and the temp table to do a like on it....
-- but I think it's not right because it's too long.
select something
from maintable gt
join #tempTable tt on gt.org like tt.SearchOrg
where
AYEAR= ISNULL(#year, ayear)
and (AYEAR >= ISNULL(#yearR1, ayear) and ayear <= ISNULL(#yearr2, ayear))
and adate = ISNULL(#Date, adate)
and (adate >= ISNULL(#dateR1, adate) and adate <= ISNULL(#DateR2 , adate))
The final result would be all rows where the maintable.org is 1123, or starts with 223 or starts with 554
The reason for my date craziness is because sometimes the stored procedure only checks for a year, sometimes for a year range, sometimes for a specific date and sometimes for a date range... everything that's not used in passed in as null.
Maybe the problem is there?
Try something like this:
Declare #tempTable Table
(
-- Since the column is a varchar(10), you don't want to use nvarchar here.
SearchOrg varchar(20)
);
INSERT INTO #tempTable
SELECT * FROM dbo.udf_split(#orgList);
SELECT
something
FROM
maintable gt
WHERE
some where statements go here
And
Exists
(
SELECT 1
FROM #tempTable tt
WHERE gt.org Like tt.SearchOrg
)
Such a dynamic query with optional filters and LIKE driven by a table (!) are very hard to optimize because almost nothing is statically known. The optimizer has to create a very general plan.
You can do two things to speed this up by orders of magnitute:
Play with OPTION (RECOMPILE). If the compile times are acceptable this will at least deal with all the optional filters (but not with the LIKE table).
Do code generation and EXEC sp_executesql the code. Build a query with all LIKE clauses inlined into the SQL so that it looks like this: WHERE a LIKE #like0 OR a LIKE #like1 ... (not sure if you need OR or AND). This allows the optimizer to get rid of the join and just execute a normal predicate).
Your query may be difficult to optimize. Part of the question is what is in the where clause. You probably want to filter these first, and then do the join using like. Or, you can try to make the join faster, and then do a full table scan on the results.
SQL Server should optimize a like statement of the form 'abc%' -- that is, where the wildcard is at the end. (See here, for example.) So, you can start with an index on maintable.org. Fortunately, your examples meet this criteria. However, if you have '%abc' -- the wildcard comes first -- then the optimization won't work.
For the index to work best, it might also need to take into account the conditions in the where clause. In other words, adding the index is suggestive, but the rest of the query may preclude the use of the index.
And, let me add, the best solution for these types of searches is to use the full text search capability in SQL Server (see here).

How to structure a query with a large, complex where clause?

I have an SQL query that takes these parameters:
#SearchFor nvarchar(200) = null
,#SearchInLat Decimal(18,15) = null
,#SearchInLng Decimal(18,15) = null
,#SearchActivity int = null
,#SearchOffers bit = null
,#StartRow int
,#EndRow int
The variables #SearchFor, #SearchActivity, #SearchOffers can be either null or not null. #SearchInLat and #SearchInLng must both null, or both have values.
I'm not going to post the whole query as its boring and hard to read, but the WHERE clause is shaped like this:
( -- filter by activity --
(#SearchActivity IS NULL)
OR (#SearchActivity = Activities.ActivityID)
)
AND ( -- filter by Location --
(#SearchInLat is NULL AND #SearchInLng is NULL)
OR ( ... )
)
AND ( -- filter by activity --
#SearchActivity is NULL
OR ( ... )
)
AND ( -- filter by has offers --
#SearchOffers is NULL
OR ( ... )
)
AND (
... -- more stuff
)
I have read that this is a bad way to structure a query - that SqlServer has trouble working out an efficient execution plan with lots of clauses like this, so I'm looking for other ways to do it.
I see two ways of doing this:
Construct the query as a string in my client application, so that the WHERE clause only contains filters for the relevant parameters. The problem with this is it means not accessing the database through stored procedures, as everything else is at the moment.
Change the stored procedure so that it examines which arguments are null, and executes child procedures depending on which arguments it is passed. The problem here is that it would mean repeating myself a lot in the definition of the procs, and thus be harder to maintain.
What should I do? Or should I just keep on as I am currently doing? I have OPTION (RECOMPILE) set for the procedures, but I've heard that this doesn't work right in Server 2005. Also, I plan to add more parameters to this proc, so I want to make sure whatever solution I have is fairly scaleable.
The answer is to use DynamicSQL (be it in the client, or in an SP using sp_executesql), but the reason why is long, so here's a link...
Dynamic Search Conditions in T-SQL
A very short version is that one-size does not fit all. And as the optimiser creates one plan for one query, it's slow. So the solution is to continue using parameterised queries (for execution plan caching), but to have many queries, for the different types of search that can happen.
Perhaps an alternative might be to perform several separate select statements?
e.g.
( -- filter by activity --
if #SearchActivity is not null
insert into tmpTable (<columns>)
select *
from myTable
where (#SearchActivity = Activities.ActivityID)
)
( -- filter by Location --
if #SearchInLat is not null and #SearchInLng is not null
insert into tmpTable (<columns>)
select *
from myTable
where (latCol = #SearchInLat AND lngCol = #SearchInLng)
etc...
then select the temp table to return the final result set.
I'm not sure how this would work with respect to the optimiser and the query plans, but each individual select would be very straightforward and could utilise the indexes that you would have created on each column which should make them very quick.
Depending on your requirements it also may make sense to create a primary key on the temp table to allow you to join to it on each select (to avoid duplicates).
Look at the performance first, like others have said.
If possible, you can use IF clauses to simplify the queries based on what parameters are provided.
You could also use functions or views to encapsulate some of the code if you find you are repeating it often.