Why Does One SQL Query Work and the Other Does Not? - sql

Please disregard the obvious problems with the manipulation of data in the where clause. I know! I'm working on it. While working on it, though, I discovered that this query runs:
SELECT *
FROM PatientDistribution
WHERE InvoiceNumber LIKE'PEX%'
AND ISNUMERIC(CheckNumber) = 1
AND CONVERT(BIGINT,CheckNumber) <> TransactionId
And this one does not:
SELECT *
FROM PatientDistribution
WHERE InvoiceNumber LIKE'PEX%'
AND CONVERT(BIGINT,CheckNumber) <> TransactionId
AND ISNUMERIC(CheckNumber) = 1
The only difference between the two queries is the order of items in the WHERE clause. I was under the impression that the SQL Server query optimizer would take the worry out of me having to worry about that.
The error returned is: Error converting data type varchar to bigint.

You are right, the order of the conditions shouldn't matter.
If AND ISNUMERIC(CheckNumber) = 1 is checked first and non-matching rows thus dismissed, then AND CONVERT(BIGINT,CheckNumber) <> TransactionId will work (for exceptions see scsimon's answer).
If AND CONVERT(BIGINT,CheckNumber) <> TransactionId is processed before AND ISNUMERIC(CheckNumber) = 1 then you may get an error.
That your first query worked and the second not was a matter of luck. It could just as well have been vice versa.
You can force one condition to be executed before the other:
SELECT *
FROM
(
SELECT *
FROM PatientDistribution
WHERE InvoiceNumber LIKE 'PEX%'
AND ISNUMERIC(CheckNumber) = 1
) num_only
WHERE CONVERT(BIGINT,CheckNumber) <> TransactionId;

You just got lucky that the first one worked, since you are correct that the order of what you list in the where clause does not matter. SQL is a declarative language meaning that you are telling the engine what should happen, not how. So your queries weren't executed with the same query plan I would suspect. Granted, you can affect what the optimizer does to a certain extent. You'll also notice this type of issue when using a CTE. For example:
declare #table table(columnName varchar(64))
insert into #table
values
('1')
,('1e4')
;with cte as(
select columnName
from #table
where isnumeric(columnName) = 1)
select
cast(columnName as decimal(32,16))
from cte
The above snippet you would assume that the second statement is ran on the results / subset from the CTE statement. However, you can't ensure this will happen and you could still get a type/conversion error on the second statement.
More importantly, you should know that ISNUMERIC() is largely misused. People often think that if it returns 1 then it could be converted to a decimal or int. But this isn't the case. It just checks that it's a valid numeric type. For example:
select
isnumeric('1e4')
,isnumeric('$')
,isnumeric('1,123,456')
As you can see, these evaluate to true, but would fail the conversion you put in your post.
Side note, your indexes are likely the reason why the first actually didn't error our.

Related

SQL injection mid query

I would like to improve my knowledge about the possible SQL injection attacks that exist. I know that parameterization completely avoids SQL injection risk and should therefore be applied everywhere. However, when someone asks me how it can be exploited, I like to have an answer.
I know how a basic SQL injection attack works. For example a website has a page website.com/users/{id} where id is the primary key of the user. If we trust the input completely and simply pass the id parameter to the query being executed, this can have dire consequences. In the case of website.com/users/1 the query becomes SELECT * FROM [User] WHERE [Id] = 1. However, in the case of website.com/users/1;DROP TABLE User the query becomes SELECT * FROM [User] WHERE [Id] = 1;DROP TABLE User, leading to the nasty result.
But, pretty much all SQL injection attacks I read about count on the WHERE clause being present right before the injection. Almost always, the injection works in some form of ;Injected statement--.
My question is, if it is also possible to perform a SQL injection attack given a query like the one below? Or in a broader sense: does the entire statement have to compile for a SQL injection attack to be possible, or will any error in the statement cause the attack to fail? If the answer is different per DBMS, please specify the DBMS.
In the query below, the injection is supposed to happen in the CHARINDEX('input', [Name]) > 0 where input is copied from a user's input.
SELECT
*
FROM (
SELECT TOP 10
*
FROM
[User]
WHERE
CHARINDEX('input', [Name]) > 0
) AS [User]
LEFT JOIN
[Setting] ON [Setting].[UserId] = [User].[Id]
The furthest I got myself was with the query below, but the error it returns, Missing end comment mark '*/', seems to be completely blocking any attack.
SELECT
*
FROM (
SELECT TOP 10
*
FROM
[User]
WHERE
CHARINDEX('input', '') > 0) AS [User];DROP TABLE [NonExistentTable]/*, [Name]) > 0
) AS [User]
LEFT JOIN
[Setting] ON [Setting].[UserId] = [User].[Id]
The resulting SQL has to be accepted by the particular DBMS for injection to occur, which generally means it needs to be valid SQL, but there are usually ways of crafting the input to make it valid regardless of the SQL in question.
If a line comment isn't enough, an extra statement can be added; if multiple statements aren't allowed, a UNION can be used; and so on.
The exact details vary, but with enough knowledge of the query (e.g. through error details leaking to the user) or lucky guesses, something can usually be crafted that is to the attacker's advantage.
In your example, consider this input, which simply repeats parts of the existing query:
nonsense', [Name]) > 0
)
) AS [User];
Drop Table [User];
SELECT
*
FROM (
SELECT TOP 10
*
FROM
[User]
WHERE
CHARINDEX('nonsense
Which results in the following SQL:
SELECT
*
FROM (
SELECT TOP 10
*
FROM
[User]
WHERE
CHARINDEX('nonsense', [Name]
)
) AS [User];
Drop Table [User];
SELECT
*
FROM (
SELECT TOP 10
*
FROM
[User]
WHERE
CHARINDEX('nonsense', [Name]) > 0
) AS [User]
LEFT JOIN
[Setting] ON [Setting].[UserId] = [User].[Id]
SQL injection normally happens where some kind of string concatenation/insertion operation is involved. It does not have to be the WHERE clause. Also, generally speaking, the attacker is not interested in dropping the tables, he wants information. What if input is replaced by this:
', '') > 0 UNION ALL SELECT TABLE_NAME, COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME = 'password' --
Assuming that the result from select are displayed somehow and error messages are also shown, it'll take a few minutes for the attacker to determine the number and position of , NULL he should add before the query actually returns the name of table and column he wants to probe in the next stage.

SQL NOT IN failed

I am working on a query that will check the temp table if there is a record that do not exist on the main table. My query looks like this
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
WHERE [StartDateTime] NOT IN (SELECT [StartDateTime] FROM [Telemarketing].[dbo].PDCampaignBatch GROUP BY [StartDateTime])
but the problem is it does not display this row
even if that data does not exist in my main table. What seems to be the problem?
NOT IN has strange semantics. If any values in the subquery are NULL, then the query returns no rows at all. For this reason, I strongly recommend using NOT EXISTS instead:
SELECT t.*
FROM [Telemarketing].[dbo].[PDCampaignBatch_temp] t
WHERE NOT EXISTS (SELECT 1
FROM [Telemarketing].[dbo].PDCampaignBatch cb
WHERE t.StartDateTime = cb.StartDateTime
);
If the set is evaluated by the SQL NOT IN condition contains any values that are null, then the outer query here will return an empty set, even if there are many [StartDateTime]s that match [StartDateTime]s in the PDCampaignBatch table.
To avoid such issue,
SELECT *
FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
WHERE [StartDateTime] NOT IN (
SELECT DISTINCT [StartDateTime]
FROM [Telemarketing].[dbo].PDCampaignBatch
WHERE [StartDateTime] IS NOT NULL
);
Let's say PDCampaignBatch_temp and PDCampaignBatch happen to have the same structure (same columns in the same order) and you're tasked with getting the set of all rows in PDCampaignBatch_temp that aren't in PDCampaignBatch. The most effective way to do that is to make use of the EXCEPT operator, which will deal with NULL in the expected way as well:
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
EXCEPT
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch]
In production code that is not a one-off, don't use SELECT *, write out the column names instead.
Most likely your issue is with the datetime. You may be only displaying a certain degree of percision like the year/month/date. The data may be stored as year/month/date/hour/minute/second/milisecond. If so you have to match down the the most granluar measurement of the data. If one field is a date and the other is a date time they also will likely never match up. Thus you always get no responses.

Conversion failed. SELECT * from Person.Address WHERE ISNUMERIC(PostalCode) =1 AND PostalCode<7000

It is Microsoft SQL Server.
In this column PostalCode from the AdventureWorks 2012 Person.Address table, there are numeric and string values.
I want to get table with rows WHERE PostalCode < 7000
This does not work as expected:
USE [AdventureWorks2012]
SELECT *
FROM Person.Address
WHERE ISNUMERIC(PostalCode) = 1
AND PostalCode < 7000
because I get this error:
Conversion failed when converting the nvarchar value 'K4B 1T7' to data type int.
I can make it, by creating temporally table like this:
/* creating of temp table */
USE AdventureWorks2012
SELECT *
INTO temp2
FROM Person.Address
WHERE ISNUMERIC(PostalCode) = 1
/* get data from temp table */
SELECT *
FROM temp2
WHERE PostalCode < 7000
But it is a bad way, cause of low productivity and needless temp-table.
What is the better way to get table with rows WHERE PostalCode < 7000 but data has not only numeric values?
If you're in SQL Server 2012 or newer you should use try_convert instead of isnumeric. Isnumeric has some funny issues that it returns 1 even for strings that can't be converted into a number. So something like this should work:
SELECT *
FROM Person.Address
WHERE try_convert(int, PostalCode) < 7000
If the string can't be converted, try_convert returns null.
MSDN: https://msdn.microsoft.com/en-us/library/hh230993.aspx
The error is being returned because the conditions being evaluated are not short-circuiting - the condition PostalCode<7000 is being evaluated even where the postal code is non-numeric.
Instead, try:
SELECT *
from Person.Address
WHERE CASE WHEN PostalCode NOT LIKE '%[^0-9]%'
THEN CAST(PostalCode AS NUMERIC)
ELSE CAST(NULL AS NUMERIC)
END <7000
(Updated following comments)
The text is from 70-461 Training kit
(Exam 70-461: Querying Microsoft SQL Server 2012):
Recall from Chapter 1 that all expressions that appear in the same
logical query processing phase—for example, the WHERE phase—are
conceptually evaluated at the same point in time. For example,
consider the following filter predicate.
WHERE propertytype = 'INT' AND CAST(propertyval AS INT) > 10
Suppose that the table being queried
holds different property values. The propertytype column represents
the type of the property (an INT, a DATE, and so on), and the
propertyval column holds the value in a character string. When
propertytype is 'INT', the value in propertyval is convertible to INT;
otherwise, not necessarily.
Some assume that unless precedence rules
dictate otherwise, predicates will be evaluated from left to right,
and that short circuiting will take place when possible. In other
words, if the first predicate propertytype = 'INT' evaluates to false,
SQL Server won’t evaluate the second predicate CAST(propertyval AS
INT) > 10 because the result is already known. Based on this
assumption, the expectation is that the query should never fail trying
to convert something that isn’t convertible.
The reality, though, is
different. SQL Server does internally support a short-circuit concept;
however, due to the all-at-once concept in the language, it is not
necessarily going to evaluate the expressions in left-to-right order.
It could decide, based on cost-related reasons, to start with the
second expression, and then if the second expression evaluates to
true, to evaluate the first expression as well. This means that if
there are rows in the table where propertytype is different than
'INT', and in those rows propertyval isn’t convertible to INT, the
query can fail due to a conversion error.
The only safe way of doing this is by getting first the Ids of the fields you are interested in and then join with them in different statements, otherwise the query planner could decide it want to do first the numeric comparison. I started to get a lot of this problems when we upgraded to SQL Server 2008 that didn't happened before.
You can however do a conversion:
USE [AdventureWorks2012]
SELECT *
from Person.Address
WHERE ISNUMERIC(PostalCode) =1 AND CAST(CAST(PostalCode AS INT) AS VARCHAR)<'7000'
I have done the castings to try to avoid any data that could be numeric but with 0 padding on the left that could screw up the ordering.
Beware that the performance of this won't be the best and Indexes on PostalCode aren't going to be used.
Here in Denmark, postal code always have the same length, so I would use this script to avoid strange issues* with isnumeric and conversion issues.
It will check that postalCode has 4 digits and compare the string value.
SELECT *
FROM temp2
WHERE
PostalCode < '7000' and
PostalCode like '[0-9][0-9][0-9][0-9]'
*An example of strange issues with isnumeric
SELECT isnumeric('£1.1')
SELECT isnumeric('-')
Both returns 1
You can use subquery to do that :
select * from (
SELECT * from Person.Address WHERE ISNUMERIC(PostalCode) =1 ) t
where PostalCode<7000

Can SELECT expressions sometimes be evaluated for rows not matching WHERE clause?

I would like to know if it's possible for expressions that are part of the SELECT statement list to be evaluated for rows not matching the WHERE clause?
From the execution order documented here, it seems that the SELECT gets evaluated long after the WHERE, however I ran into a very weird problem with a real-life query similar to the query below.
To put you in context, in the example, the SomeOtherTable has a a_varchar column which always contains numerical values for the code 105, but may contain non-numerical values for other codes.
The query statement works:
SELECT an_id, an_integer FROM SomeTable
UNION ALL
SELECT an_id, CAST(a_varchar AS int)
FROM SomeOtherTable
WHERE code = 105
The following query complains about being unable to cast a_varchar to int:
SELECT 1
FROM (
SELECT an_id, an_integer FROM SomeTable
UNION ALL
SELECT an_id, CAST(a_varchar AS int)
FROM SomeOtherTable
WHERE code = 105
) i
INNER JOIN AnotherOne a
ON a.an_id = i.an_id
And finally, the following query works:
SELECT 1
FROM (
SELECT an_id, an_integer FROM SomeTable
UNION ALL
SELECT
an_id,
CASE code WHEN 105 THEN CAST(a_varchar AS int) ELSE NULL END
FROM SomeOtherTable
WHERE code = 105
) i
INNER JOIN AnotherOne a
ON a.an_id = i.an_id
Therefore, the only explanation I could find was that with the JOIN, the query gets optimized differently in a way that CAST(a_varchar AS int) gets executed even if code <> 105.
The queries are run against SQL SERVER 2008.
Absolutely.
The documentation that you reference has a section called Logical Processing Order of the SELECT statement. This is not the physical processing order. It explains how the query itself is interpreted. For instance, an alias defined in the select clause cannot be references in the where clause, because the where clause is logically processed first.
In fact, SQL Server has the ability to optimize queries by doing various data transformation operations when it reads the data. This is a nice performance benefit, because the data is in memory, locally, and the operations can simply be done in place. However, the following can fail with a run-time error:
select cast(a_varchar as int)
from table t
where a_varchar not like '%[^0-9]%';
The filter is applied after the attempt at conversion, in the real process flow. I happen to consider this a bug; presumably, the folks at Microsoft do not think so, because they have not bothered to fix this.
Two workarounds are available. The first is try_convert(), which does conversions and returns NULL for a failure instead of a run-time error. The second is the case statement:
select (case when a_varchar not like '%[^0-9]%' then cast(a_varchar as int) end)
from table t
where a_varchar not like '%[^0-9]%';

sql server rewrites my query incorrectly?

There is a dirty data in input.
We are trying to cleanup dataset and then make some calculations on cleared data.
declare #t table (str varchar(10))
insert into #t select '12345' union all select 'ABCDE' union all select '111aa'
;with prep as
(
select *, cast(substring(str, 1, 3) as int) as str_int
from #t
where isnumeric(substring(str, 1, 3)) = 1
)
select *
from prep
where 1=1
and case when str_int > 0 then 'Y' else 'N' end = 'Y'
--and str_int > 0
Last 2 lines are doing the same thing. First one works, but if you uncomment second one it will crash with Conversion failed when converting the varchar value 'ABC' to data type int.
Obviously, SQL Server is rewriting query mixing all the conditions together.
My guess it that it considers 'case' as a havy operation and performs it as a last step. That's why workaround with case works.
Is this behavior documented in any way? or is it a bug?
This is a known issue with SQL Server, and Microsoft does not consider it a bug although users do. The difference between the two queries is the execution path. One is doing the conversion before the filtering, the other after.
SQL Server reserves the right to re-order the processing. The documentation does specify the logical processing of clauses as:
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
With (presumably but not explicitly documented here) CTEs being logically processed first. What does logically processed mean? Well, it doesn't mean that run-time errors are caught. It really determines the scope of identifiers during the compile phase.
When SQL Server reads from a data source, it can add new variables in. This is a convenient time to do this, because everything is in memory. However, this might occur before the filtering, which is what is causing the error when it occurs.
The fix to this problem is to use a case statement. So, the following CTE will usually work:
with prep as (
select *, (case when isnumeric(substring(str, 1, 3)) = 1 and str not like '%.%'
then cast(substring(str, 1, 3) as int)
end) as str_int
from #t
where isnumeric(substring(str, 1, 3)) = 1
)
Looks weird. And I think Redmond thinks so too. SQL Server 2012 introduced try_convert() (see here) which returns NULL if the conversion fails.
It would also help if you could instruct SQL Server to materialize CTEs. That would also solve the problem in this case. You can vote on adding such an option to SQL Server here.