In SQL is there any performance between using like vs equals? - sql

I am using T-SQL (SQL Server).
This is the starting code:
DECLARE #code as char(8)
SET #code = '123' -- 123%'
DECLARE #lsUseLikeForCodeWhereCondition char(1)
IF CHARINDEX('%', #code) = 0
SET #lsUseLikeForCodeWhereCondition = 'N'
ELSE
SET #lsUseLikeForCodeWhereCondition = 'Y'
Is there any performance between these two statements:
select * from mytable where idn_orn_i LIKE
CASE WHEN #lsUseLikeForCodeWhereCondition = 'N' THEN
#code
ELSE
#code + '%'
END
vs
IF #lsUseLikeForCodeWhereCondition = 'N'
BEGIN
select * from mytable where idn_orn_i = #code
END
ELSE
BEGIN
select * from mytable where idn_orn_i Like #code + '%'
END
Both appear to return the same results. But where it says mytable. It is actually a join with 10 different tables. So it is not small. Mostly I am wondering if the optimizer would recognize a like WITHOUT a percent sign in the string and do an equals.
If it matters idn_orn_i is char(8).

The two versions are very different and this has little to do with any differences between like and =. In the first, the pattern is an expression-based pattern, which precludes the use of indexes for the query.
In the second, you have two queries and in each, the comparisons are using constants. So, the first version will definitely use an appropriate index. And I think the second will also take advantage of an index too (the compiler should turn the constant expression into a constant so an index can be used). Note that like requires that the pattern not start with a wildcard and probably has some conditions on the collations as well.

Short answer, if you can use the =, use the =.
If the same number of rows is being returned, it should be the same cost.
The actual execution plans should more or less be equal. That being said, if you are saying where value = 'asdf' vs value like 'asdf%' and you have have a lot more rows that will be returned with the like vs the =, then the = would be faster. It would also vary based on the number of rows that need to be scanned/looked up. Which depends on your data distribution statistics or cardinality.
You should look at the execution plans to know for sure.

Exact matches, =, tend to perform faster because the string search can be potentially invalidated from a check with just the first and/or last characters.
A like cannot discount the match from the first character if it is non-matching, for cases when a % is placed before the pattern of interest. It would search the string sequentially, potentially to the end of the string, until the match is found or not at all. Thus, is more time consuming, but useful and necessary for certain tasks.
A like, without any wild cards, would operate in the same manner as =, but the procedure for = is intrinsically more efficient.

Related

Can 2 character length variables cause SQL injection vulnerability?

I am taking a text input from the user, then converting it into 2 character length strings (2-Grams)
For example
RX480 becomes
"rx","x4","48","80"
Now if I directly query server like below can they somehow make SQL injection?
select *
from myTable
where myVariable in ('rx', 'x4', '48', '80')
SQL injection is not a matter of length of anything.
It happens when someone adds code to your existing query. They do this by sending in the malicious extra code as a form submission (or something). When your SQL code executes, it doesn't realize that there are more than one thing to do. It just executes what it's told.
You could start with a simple query like:
select *
from thisTable
where something=$something
So you could end up with a query that looks like:
select *
from thisTable
where something=; DROP TABLE employees;
This is an odd example. But it does more or less show why it's dangerous. The first query will fail, but who cares? The second one will actually work. And if you have a table named "employees", well, you don't anymore.
Two characters in this case are sufficient to make an error in query and possibly reveal some information about it. For example try to use string ')480 and watch how your application will behave.
Although not much of an answer, this really doesn't fit in a comment.
Your code scans a table checking to see if a column value matches any pair of consecutive characters from a user supplied string. Expressed in another way:
declare #SearchString as VarChar(10) = 'Voot';
select Buffer, case
when DataLength( Buffer ) != 2 then 0 -- NB: Len() right trims.
when PatIndex( '%' + Buffer + '%', #SearchString ) != 0 then 1
else 0 end as Match
from ( values
( 'vo' ), ( 'go' ), ( 'n ' ), ( 'po' ), ( 'et' ), ( 'ry' ),
( 'oo' ) ) as Samples( Buffer );
In this case you could simply pass the value of #SearchString as a parameter and avoid the issue of the IN clause.
Alternatively, the character pairs could be passed as a table parameter and used with IN: where Buffer in ( select CharacterPair from #CharacterPairs ).
As far as SQL injection goes, limiting the text to character pairs does preclude adding complete statements. It does, as others have noted, allow for corrupting the query and causing it to fail. That, in my mind, constitutes a problem.
I'm still trying to imagine a use-case for this rather odd pattern matching. It won't match a column value longer (or shorter) than two characters against a search string.
There definitely should be a canonical answer to all these innumerable "if I have [some special kind of data treatment] will be my query still vulnerable?" questions.
First of all you should ask yourself - why you are looking to buy yourself such an indulgence? What is the reason? Why do you want add an exception to your data processing? Why separate your data into the sheep and the goats, telling yourself "this data is "safe", I won't process it properly and that data is unsafe, I'll have to do something?
The only reason why such a question could even appear is your application architecture. Or, rather, lack of architecture. Because only in spaghetti code, where user input is added directly to the query, such a question can be ever occur. Otherwise, your database layer should be able to process any kind of data, being totally ignorant of its nature, origin or alleged "safety".

SQL why use isnull(field, '') <> ''?

I'm reviewing some code I've inherited, and I found a line like this:
And isnull(IH.CustomerItemNumber, '') <> ''
Which one of my predecessors seems to have used in a ton of where clause or join clauses. It looks to me like this is an unnecessary calling of a function, and therefor a performance hog, because NULL will never equal the empty string '', right?
Specifically I took this out of a join clause in a particular query and performance increased dramatically (from 46-49 secs to between 1-3).
Replaced it with AND IH.CustomerItemNumber <> ''
Is my assessment correct here? This is redundant and slow and can be removed? In what situation might this code be beneficial?
EDIT: So, can NULL ever equal the empty string?
This is semantically the same as:
And IH.CustomerItemNumber <> '' And IH.CustomerItemNumber Is Not Null
So it is checking that the column is both not null and not an empty string. Could be important.
UPDATE
In this case, because we're looking for non-equality of a string literal (empty string), you have at least three semantically correct options:
And IH.CustomerItemNumber <> ''
And IH.CustomerItemNumber <> '' And IH.CustomerItemNumber Is Not Null
And isnull(IH.CustomerItemNumber, '') <> ''
The first is going to return the same result set as the other two because <> '' will not match a null, regardless of the ansi_nulls setting.
In a quick test on a dev system, both the first and the second utilized an index seek. The first is very slightly outperforming the second in one of a few very simplified tests.
The third, since it adds a function call, may not utilize indexing like the others, so this is probably the worst choice. That said, in a quick test, isnull was able to use an index scan. Further adding Is Not Null to the third choice actually sped it up and moved it to an index seek. Go figure (GO! GO! Query optimizer!).
As with #Gordon, I would also choose the second option most times since it would better state my intent to other developers (or myself) and would be a better practice to follow if we were checking equality against another column which could be null (think of potential ansi_nulls off).
For completeness' sake:
And nullif(IH.CustomerItemNumber, '') is not null
And case when IH.CustomerItemNumber = '' then null else IH.CustomerItemNumber end is not null
And case IH.CustomerItemNumber when '' then null else IH.CustomerItemNumber end is not null
Are all interpreted exactly the same way (as far as I can tell) in SQL Server and perform the same as the third option above.
The reason that the code is there may be because of the history of the application. Perhaps at some point in time, NULLs were allowed in the field. Then, these were replaced with empty strings.
The reason the code is inefficient is because of the optimization of the join. ISNULL() and its ANSI standard equivalent COALESCE() generally add negligible overhead to the processing of a query. (It does seem that in some versions of SQl Server, COALESCE() evaluates the first argument twice, which is a problem if it is a subquery.)
My guess is that the field has an index on it. SQL Server knows to use the index for the join when the field is used alone. It is not smart enough to use the index when included in a function call. It is the join optimziation that is slowing down the query, not the overhead of the function call.
Personally, I would prefer the form with the explicit NULL check, if the performance is the same:
IH.CustomerItemNumber <> '' and IH.CustomerItemNumber is not null
Being explicit about NULL processing can only help you maintain the code in the future.
You can use for NULL Checking:
And (IH.CustomerItemNumber IS NOT NULL) AND (IH.CustomerItemNumber <> '')
BTW,
ISNULL ( check_expression , replacement_value ) - Replaces NULL with the specified replacement value.
In your case, if the value of IH.CustomerItemNumber is null then it will be replaced by empty value which will then be compared with empty string.
because NULL will never equal the empty string '', right?
NULL is also never not equal to the empty string... it's both at once and neither all at the same time. It communicates a state where you just don't know for sure.

how can I force SQL to only evaluate a join if the value can be converted to an INT?

I've got a query that uses several subqueries. It's about 100 lines, so I'll leave it out. The issue is that I have several rows returned as part of one subquery that need to be joined to an integer value from the main query. Like so:
Select
... columns ...
from
... tables ...
(
select
... column ...
from
... tables ...
INNER JOIN core.Type mt
on m.TypeID = mt.TypeID
where dpt.[DataPointTypeName] = 'TheDataPointType'
and m.TypeID in (100008, 100009, 100738, 100739)
and datediff(d, m.MeasureEntered, GETDATE()) < 365 -- only care about measures from past year
and dp.DataPointValue <> ''
) as subMdp
) as subMeas
on (subMeas.DataPointValue NOT LIKE '%[^0-9]%'
and subMeas.DataPointValue = cast(vcert.IDNumber as varchar(50))) -- THIS LINE
... more tables etc ...
The issue is that if I take out the cast(vcert.IDNumber as varchar(50))) it will attempt to compare a value like 'daffodil' to a number like 3245. Even though the datapoint that contains 'daffodil' is an orphan record that should be filtered out by the INNER JOIN 4 lines above it. It works fine if I try to compare a string to a string but blows up if I try to compare a string to an int -- even though I have a clause in there to only look at things that can be converted to integers: NOT LIKE '%[^0-9]%'. If I specifically filter out the record containing 'daffodil' then it's fine. If I move the NOT LIKE line into the subquery it will still fail. It's like the NOT LIKE is evaluated last no matter what I do.
So the real question is why SQL would be evaluating a JOIN clause before evaluating a WHERE clause contained in a subquery. Also how I can force it to only evaluate the JOIN clause if the value being evaluated is convertible to an INT. Also why it would be evaluating a record that will definitely not be present after an INNER JOIN is applied.
I understand that there's a strong element of query optimizer voodoo going on here. On the other hand I'm telling it to do an INNER JOIN and the optimizer is specifically ignoring it. I'd like to know why.
The problem you are having is discussed in this item of feedback on the connect site.
Whilst logically you might expect the filter to exclude any DataPointValue values that contain any non numeric characters SQL Server appears to be ordering the CAST operation in the execution plan before this filter happens. Hence the error.
Until Denali comes along with its TRY_CONVERT function the way around this is to wrap the usage of the column in a case expression that repeats the same logic as the filter.
So the real question is why SQL would be evaluating a JOIN clause
before evaluating a WHERE clause contained in a subquery.
Because SQL engines are required to behave as if that's what they do. They're required to act like they build a working table from all of the table constructors in the FROM clause; expressions in the WHERE clause are applied to that working table.
Joe Celko wrote about this many times on Usenet. Here's an old version with more details.
First of all,
NOT LIKE '%[^0-9]%'
isn`t work well. Example:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT CASE WHEN #INT LIKE '%[^0-9]%' THEN 1 ELSE 0 END AS Is_Number
Result: 1
But it is not a number!
To check if it is real int value , you should use ISNUMERIC function. Let`s check this:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT ISNUMERIC(#int) Is_Int
Result:0
Result is correct.
So, instead of
NOT LIKE '%[^0-9]%'
try to change this to
ISNUMERIC(subMeas.DataPointValue)=0
UPDATE
How check if value is integer?
First here:
WHERE ISNUMERIC(str) AND str NOT LIKE '%.%' AND str NOT LIKE '%e%' AND str NOT LIKE '%-%'
Second:
CREATE Function dbo.IsInteger(#Value VarChar(18))
Returns Bit
As
Begin
Return IsNull(
(Select Case When CharIndex('.', #Value) > 0
Then Case When Convert(int, ParseName(#Value, 1)) <> 0
Then 0
Else 1
End
Else 1
End
Where IsNumeric(#Value + 'e0') = 1), 0)
End
Filter out the non-numeric records in a subquery or CTE

Regex: does a SQL statement include a WHERE clause?

I need a regex that will determine if a given SQL statement has a WHERE clause. My problem is that the passed SQL statements will most likely be complex, so I can not rely on just the existence of the word WHERE in the statement.
For example this should match
SELECT Contacts.ID
, CASE WHEN (Contacts.Firstname IS NULL) THEN ''
ELSE CAST(Contacts.Firstname AS varchar)
END AS Firstname
, CASE WHEN (Contacts.Lastname IS NULL) THEN ''
ELSE CAST(Contacts.Lastname AS varchar)
END AS Lastname
, CASE WHEN (tbl_ContactExtras.Prequalified=-1 OR
tbl_ContactExtras.Prequalified IS NULL) THEN ''
WHEN tbl_ContactExtras.Prequalified=0 THEN 'No'
WHEN tbl_ContactExtras.Prequalified=1 THEN 'Yes - Other'
WHEN tbl_ContactExtras.Prequalified=2 THEN 'Yes'
ELSE CAST(tbl_ContactExtras.Prequalified AS varchar)
END AS Prequalified
FROM contacts
LEFT JOIN tbl_ContactExtras
ON tbl_ContactExtras.ContactID = Contacts.ID
WHERE (Contacts.Firstname LIKE 'Bob%')
and this should not match:
SELECT Contacts.ID
, CASE WHEN (Contacts.Firstname IS NULL) THEN ''
ELSE CAST(Contacts.Firstname AS varchar)
END AS Firstname
, CASE WHEN (Contacts.Lastname IS NULL) THEN ''
ELSE CAST(Contacts.Lastname AS varchar)
END AS Lastname
, CASE WHEN (tbl_ContactExtras.Prequalified=-1 OR
tbl_ContactExtras.Prequalified IS NULL) THEN ''
WHEN tbl_ContactExtras.Prequalified=0 THEN 'No'
WHEN tbl_ContactExtras.Prequalified=1 THEN 'Yes - Other'
WHEN tbl_ContactExtras.Prequalified=2 THEN 'Yes'
ELSE CAST(tbl_ContactExtras.Prequalified AS varchar)
END AS Prequalified
FROM contacts
LEFT JOIN tbl_ContactExtras
ON tbl_ContactExtras.ContactID = Contacts.ID
Those are examples of some of the simpler statements: a statement could have up to 30 CASE statements in it, or it could have none at all.
I need to programmatically add WHERE parameters, but doing this correctly requires knowing whether a WHERE clause is already present.
Any idea on a regex that would work for this? If not, any other ideas on how to tell the two apart?
Thanks,
This is not possible, since a WHERE clause may be arbitrarily nested inside the FROM clause.
This may not catch all cases but you may find you can catch most of them just by finding the last from and the last where in the statement.
if the where is after the from, then it has a where clause. If the where is before the from (or there is no where at all), then no where clause exists.
Sometimes, it's okay to leave restrictions or limitations in your code, as long as they're properly documented.
For example, I've worked on a project before that parsed SQL and we discovered it didn't handle things like between:
where recdate between '2010-01-01' and '2010-12-31'
Rather than spend a bucket-load of money fixing the problem (and probably introducing bugs on the way), we simply published it as a restriction and told everyone they had to change it to:
where recdate >= '2010-01-01'
and recdate <= '2010-12-31'
Problem solved. While it's good to keep customers happy, you don't have to cater to every whim :-)
Other than that, you need an SQL parser, and SQL is not a pretty language to parse, trust me on that one.
Are all of the joins the same? If so you could find the index of all or part of the FROM statement (perhaps using a regex to be tolerant of slight differences in syntax and whitespace) and then look for the occurrence of the word WHERE after that index.
In general you would be better off using a parser. But if this is just a one off thing and the statements are all fairly similar then the above approach should be okay.
Regex is not designed to do this. Parsing SQL properly requires matching balanced parentheses (and other matching pairs, such as quotes), something regex is not designed to do (and pure regex isn't even equipped to; PCRE can but it's not pretty).
Instead, just write a basic state machine or something to parse it.
What's the problem you're trying to solve? Are you trying to determine if it's safe to add constraints to these existing queries?
For example, if you've got this query
...
where foo = 'bar'
then you know it's safe to add
and bat = 'quux'
but if you don't have a WHERE clause already, then you have to do it as
where bat = 'quux'
Is that the problem you're trying to solve? If so, can you make every SQL query you're working with have a WHERE clause by adding a "WHERE 0=0" to those queries that don't have one? Then you know in your post-process phase that every query already has one.
This is just a guess, of course. Your question sounded like that might be the larger issue.

SQL Server 2008 - Conditional Query

SQL is not one of my strong suits. I have a SQL Server 2008 database. This database has a stored procedure that takes in eight int parameters. For the sake of keeping this question focused, I will use one of these parameters for reference:
#isActive int
Each of these int parameters will be -1, 0, or 1. -1 means "Unknown" or "Don't Care". Basically, I need to query a table such that if the int parameter is NOT -1, I need to consider it in my WHERE clause. Because there are eight int parameters, an IF-ELSE statement does not seem like a good idea. At the same time, I do not know how else to do this?
Is there an elegant way in SQL to add a WHERE conditional if a parameter does NOT equal a value?
Thank you!
best source for dynamic search conditions:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
there are a lot of subtle implications on how you do this as to if an index can be used or not. If you are on the proper release of SQL Server 2008 you can just add OPTION (RECOMPILE) to the query and the local variable's value at run time is used for the optimizations.
Consider this, OPTION (RECOMPILE) will take this code (where no index can be used with this mess of ORs):
WHERE
(#search1 IS NULL or Column1=#Search1)
AND (#search2 IS NULL or Column2=#Search2)
AND (#search3 IS NULL or Column3=#Search3)
and optimize it at run time to be (provided that only #Search2 was passed in with a value):
WHERE
Column2=#Search2
and an index can be used (if you have one defined on Column2)
WHERE coalesce(active,1) = (CASE
WHEN #isActive = -1 THEN coalesce(active,1)
ELSE #isActive
END)
Rather than using -1 to signify that you don't know or don't care, how about just using Null for that? Pretty much what it was made for. Then you could switch to a Bit rather than an Int.
Also, I'm sure TomTom will disagree, but I think using a CASE statement is the way to go for this stuff.
Your mileage may vary, but it seems that the query engine handles it a lot better than wrapping things in IsNull or having multiple OR statements, which can get rather messy as you start adding other conditions.
No matter which way you go, the execution plan is going to suffer a little bit depending on what you're passing in, but it shouldn't be TOO horrible.
The extra benefit of going with CASE statements is that you can add a bit of complexity without much extra code (versus going with a bunch of OR statements). Also, the first condition to match your criteria can prevent the extra evaluations, which isn't always the case when dealing with OR's...
So, for 8 optional parameters with -1 as the value use to ignore the search, what you end up with is something along the lines of:
WHERE
#Search1 = CASE WHEN #Search1 = -1 THEN #Search1 ELSE #Column1 END
AND #Search2 = CASE WHEN #Search2 = -1 THEN #Search1 ELSE #Column2 END
AND #Search3 = CASE WHEN #Search3 = -1 THEN #Search1 ELSE #Column3 END
AND #Search4 = CASE WHEN #Search4 = -1 THEN #Search1 ELSE #Column4 END
AND #Search5 = CASE WHEN #Search5 = -1 THEN #Search1 ELSE #Column5 END
AND #Search6 = CASE WHEN #Search6 = -1 THEN #Search1 ELSE #Column6 END
AND #Search7 = CASE WHEN #Search7 = -1 THEN #Search1 ELSE #Column7 END
AND #Search8 = CASE WHEN #Search8 = -1 THEN #Search1 ELSE #Column8 END
NOTE: As KM pointed out, the NULL method falls short if the columns you're working will can potentially have NULL values, since NULL=NULL won't evaluate properly. So, for fun, I changed my answer back to what the original poster requested, which is to use their own identifier for skipping the search.
The pattern (column = #param OR #param IS NULL) will give you optional parameters. You can use NULLIF to neutralize your -1. Even better would be to allow null parameters, instead of using a magic number.
WHERE
(Customer.IsActive = NULLIF(#isActive, -1) OR NULLIF(#isActive, -1) IS NULL)
.... Where field=case #isActive WHEN -1 THEN field ELSE #isActive END ....
There is NOT an elegant way - all ways suck.
WHERE #isActive == -1 OR isActive = #isActive
is basically the only way - but even then you please make sure that the query plan is reevaluated every time, otherwise most queries will use the wrong query plan.
THis is a classical case where stored procedures are bad. Querying should IMHO not be done using stored procedures at all since modern times - which started about 15 years ago when someone was smart enough to write the first ORM.