There is a dirty data in input.
We are trying to cleanup dataset and then make some calculations on cleared data.
declare #t table (str varchar(10))
insert into #t select '12345' union all select 'ABCDE' union all select '111aa'
;with prep as
(
select *, cast(substring(str, 1, 3) as int) as str_int
from #t
where isnumeric(substring(str, 1, 3)) = 1
)
select *
from prep
where 1=1
and case when str_int > 0 then 'Y' else 'N' end = 'Y'
--and str_int > 0
Last 2 lines are doing the same thing. First one works, but if you uncomment second one it will crash with Conversion failed when converting the varchar value 'ABC' to data type int.
Obviously, SQL Server is rewriting query mixing all the conditions together.
My guess it that it considers 'case' as a havy operation and performs it as a last step. That's why workaround with case works.
Is this behavior documented in any way? or is it a bug?
This is a known issue with SQL Server, and Microsoft does not consider it a bug although users do. The difference between the two queries is the execution path. One is doing the conversion before the filtering, the other after.
SQL Server reserves the right to re-order the processing. The documentation does specify the logical processing of clauses as:
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
With (presumably but not explicitly documented here) CTEs being logically processed first. What does logically processed mean? Well, it doesn't mean that run-time errors are caught. It really determines the scope of identifiers during the compile phase.
When SQL Server reads from a data source, it can add new variables in. This is a convenient time to do this, because everything is in memory. However, this might occur before the filtering, which is what is causing the error when it occurs.
The fix to this problem is to use a case statement. So, the following CTE will usually work:
with prep as (
select *, (case when isnumeric(substring(str, 1, 3)) = 1 and str not like '%.%'
then cast(substring(str, 1, 3) as int)
end) as str_int
from #t
where isnumeric(substring(str, 1, 3)) = 1
)
Looks weird. And I think Redmond thinks so too. SQL Server 2012 introduced try_convert() (see here) which returns NULL if the conversion fails.
It would also help if you could instruct SQL Server to materialize CTEs. That would also solve the problem in this case. You can vote on adding such an option to SQL Server here.
Related
Is there a shorter way to write the following?
-- select empty_result
select t.col
from (select 1 as col) t
where 1 = 0 -- never match
The "original" question follows. This was was modified may times, explicitly in hopes of stopping Y responses as a result of showing a specific use-case; and [rightly] claimed to be a confusing mess.
The use-case is a TSQL query that returns an empty result set in some cases while a 'real' result set in another. In both cases the structure is expected to be the same.
if #foo = 'bar'
-- select real_result
else
-- select empty_result
The question here is then, specifically about creating an empty result set / derived table simply.
One way to do this is as follows. Is there a syntactically 'simpler' method?
-- select empty_result
select t.col
from (select 1 as col) t
where 1 = 0 -- never match
An alternative in this specific MINIMAL IF..ELSE.. CASE, it could be constructed as the following SQL. It is (XY) outside the question scope even though it would function here as the resulting schema is the same in the example above. While it may be a good option elsewhere, it requires a different TSQL flow-control structure. It will also not return the correct result sets if both cases result in a different schema - making it too specific in respect to the title scope.
-- XY alternative specific to MINIMAL CASE shown above
-- select real_result or empty_result with single query
select t.col
from real_data t
where #foo = 'bar'
Yes, there is a 'simpler' / shorter / more terse syntax to returning an empty result set in SQL Server that does not require first creating a derived table.
-- select empty_result
select top 0
1 as col
This is SQL Server specific syntax. There might be other similar forms found in other database implementations.
While not specifically about a shorter empty result set syntax, Ken White provided an approach which avoids duplicating schema if it's identical in both cases. The IF flow-control structure is preserved, as per the original question.
if #foo = 'bar'
-- select real_result
select t.col
from real_table t
else
-- select empty_result
select t.col
from real_table t
where 1 = 0
Please disregard the obvious problems with the manipulation of data in the where clause. I know! I'm working on it. While working on it, though, I discovered that this query runs:
SELECT *
FROM PatientDistribution
WHERE InvoiceNumber LIKE'PEX%'
AND ISNUMERIC(CheckNumber) = 1
AND CONVERT(BIGINT,CheckNumber) <> TransactionId
And this one does not:
SELECT *
FROM PatientDistribution
WHERE InvoiceNumber LIKE'PEX%'
AND CONVERT(BIGINT,CheckNumber) <> TransactionId
AND ISNUMERIC(CheckNumber) = 1
The only difference between the two queries is the order of items in the WHERE clause. I was under the impression that the SQL Server query optimizer would take the worry out of me having to worry about that.
The error returned is: Error converting data type varchar to bigint.
You are right, the order of the conditions shouldn't matter.
If AND ISNUMERIC(CheckNumber) = 1 is checked first and non-matching rows thus dismissed, then AND CONVERT(BIGINT,CheckNumber) <> TransactionId will work (for exceptions see scsimon's answer).
If AND CONVERT(BIGINT,CheckNumber) <> TransactionId is processed before AND ISNUMERIC(CheckNumber) = 1 then you may get an error.
That your first query worked and the second not was a matter of luck. It could just as well have been vice versa.
You can force one condition to be executed before the other:
SELECT *
FROM
(
SELECT *
FROM PatientDistribution
WHERE InvoiceNumber LIKE 'PEX%'
AND ISNUMERIC(CheckNumber) = 1
) num_only
WHERE CONVERT(BIGINT,CheckNumber) <> TransactionId;
You just got lucky that the first one worked, since you are correct that the order of what you list in the where clause does not matter. SQL is a declarative language meaning that you are telling the engine what should happen, not how. So your queries weren't executed with the same query plan I would suspect. Granted, you can affect what the optimizer does to a certain extent. You'll also notice this type of issue when using a CTE. For example:
declare #table table(columnName varchar(64))
insert into #table
values
('1')
,('1e4')
;with cte as(
select columnName
from #table
where isnumeric(columnName) = 1)
select
cast(columnName as decimal(32,16))
from cte
The above snippet you would assume that the second statement is ran on the results / subset from the CTE statement. However, you can't ensure this will happen and you could still get a type/conversion error on the second statement.
More importantly, you should know that ISNUMERIC() is largely misused. People often think that if it returns 1 then it could be converted to a decimal or int. But this isn't the case. It just checks that it's a valid numeric type. For example:
select
isnumeric('1e4')
,isnumeric('$')
,isnumeric('1,123,456')
As you can see, these evaluate to true, but would fail the conversion you put in your post.
Side note, your indexes are likely the reason why the first actually didn't error our.
I have this SQL query that is failing on this nvarchar, even though it seems to me that it is properly guarded for a cast to work. I am trying to get the average of the values for correctly formatted numerical values on a database that I only have read access to.
select TagName, count(TagName) as Freq,
sum(isnumeric(value)) as GoodNums,
avg(case isnumeric(value) when 1 then cast(value as numeric) else 0 end) as "Avg",
Min(Timestamp) as StartTime,
Max(Timestamp) as EndTime
from HH_Data_9 group by TagName
However I never really completely grokked the SQL syntax for combinations of CASE, aggregate functions, and GROUP_BY, so maybe I am just writing it wrong (Tried quite a few things before posting this). Note that the "GoodNums" column works and is giving reasonable answers but when I add the "Avg" column to the query the whole thing errors out of with:
Msg 8114, Level 16, State 5, Line 1
Error converting data type nvarchar to numeric.
This is Microsoft SQL Server 2014 by the way. Any ideas?
Just want to caution that this approach is very dangerous. Consider the following table structure:
CREATE TABLE TestTable
(
FieldType NVARCHAR(50) ,
FieldValue NVARCHAR(50)
)
GO
INSERT INTO dbo.TestTable
VALUES ( 'INT', '5' ),
( 'INT', '15' ),
( 'MONEY', '5.5' )
SELECT *
FROM dbo.TestTable
WHERE FieldType = 'INT'
AND CAST(FieldValue AS INT) > 10
On my machine this query works, but the thing here is that many assumes that this predicate will evaluate from left to right. But this is incorrect. SQL Server engine may decide to evaluate this expression from right to left. And in this case you will get Conversion failed error. It's called ALL-AT-ONCE principle. However in your example this is not applicable, but just wanted to mention.
I would like to know if is it possible to use the clause "with as" with a variable and/or in a block begin/end.
My code is
WITH EDGE_TMP
AS
(select edge.node_beg_id,edge.node_end_id,prg_massif.longueur,prg_massif.lgvideoupartage,prg_massif.lgsanscable from prg_massif
INNER JOIN edge on prg_massif.asset_id=edge.asset_id
where prg_massif.lgvideoupartage LIKE '1' OR prg_massif.lgsanscable LIKE '1')
,
journey (TO_TOWN, STEPS,DISTANCE,WAY)
AS
(SELECT DISTINCT node_beg_id, 0, 0, CAST(&&node_begin AS VARCHAR2(2000))
FROM EDGE_TMP
WHERE node_beg_id = &&node_begin
UNION ALL
SELECT node_end_id, journey.STEPS + 1
, journey.DISTANCE + EDGE_TMP.longueur,
CONCAT(CONCAT(journey.WAY,';'), EDGE_TMP.node_end_id
)
It create a string as output separated by a ; but i need to get it back as variable or table do you know how? I used a concat to retrieve data in a big string. Can i use a table to insert data
,
A need to use the result to proceed more treatment.
Thank you,
mat
No, WITH is a part of an SQL statement only. But if you describe why you need it in pl/sql, we'll can advice you something.
Edit: if you have SQL statement which produces result you need, you can assign it's value to pl/sql variable. There are several methods to do this, simpliest is to use SELECT INTO statement (add INTO variable clause into your select).
You can use WITH clause as a part of SELECT INTO statement (at least in not-too-very-old Oracle versions).
I would like to know if it's possible for expressions that are part of the SELECT statement list to be evaluated for rows not matching the WHERE clause?
From the execution order documented here, it seems that the SELECT gets evaluated long after the WHERE, however I ran into a very weird problem with a real-life query similar to the query below.
To put you in context, in the example, the SomeOtherTable has a a_varchar column which always contains numerical values for the code 105, but may contain non-numerical values for other codes.
The query statement works:
SELECT an_id, an_integer FROM SomeTable
UNION ALL
SELECT an_id, CAST(a_varchar AS int)
FROM SomeOtherTable
WHERE code = 105
The following query complains about being unable to cast a_varchar to int:
SELECT 1
FROM (
SELECT an_id, an_integer FROM SomeTable
UNION ALL
SELECT an_id, CAST(a_varchar AS int)
FROM SomeOtherTable
WHERE code = 105
) i
INNER JOIN AnotherOne a
ON a.an_id = i.an_id
And finally, the following query works:
SELECT 1
FROM (
SELECT an_id, an_integer FROM SomeTable
UNION ALL
SELECT
an_id,
CASE code WHEN 105 THEN CAST(a_varchar AS int) ELSE NULL END
FROM SomeOtherTable
WHERE code = 105
) i
INNER JOIN AnotherOne a
ON a.an_id = i.an_id
Therefore, the only explanation I could find was that with the JOIN, the query gets optimized differently in a way that CAST(a_varchar AS int) gets executed even if code <> 105.
The queries are run against SQL SERVER 2008.
Absolutely.
The documentation that you reference has a section called Logical Processing Order of the SELECT statement. This is not the physical processing order. It explains how the query itself is interpreted. For instance, an alias defined in the select clause cannot be references in the where clause, because the where clause is logically processed first.
In fact, SQL Server has the ability to optimize queries by doing various data transformation operations when it reads the data. This is a nice performance benefit, because the data is in memory, locally, and the operations can simply be done in place. However, the following can fail with a run-time error:
select cast(a_varchar as int)
from table t
where a_varchar not like '%[^0-9]%';
The filter is applied after the attempt at conversion, in the real process flow. I happen to consider this a bug; presumably, the folks at Microsoft do not think so, because they have not bothered to fix this.
Two workarounds are available. The first is try_convert(), which does conversions and returns NULL for a failure instead of a run-time error. The second is the case statement:
select (case when a_varchar not like '%[^0-9]%' then cast(a_varchar as int) end)
from table t
where a_varchar not like '%[^0-9]%';