Take the following table as an instance:
CREATE TABLE TBL_Names(Name VARCHAR(32))
INSERT INTO TBL_Names
VALUES ('Ken'),('1965'),('Karen'),('2541')
sqlfiddle
Executing following query throws an exception:
SELECT [name]
FROM dbo.tblNames AS tn
WHERE [name] IN ( SELECT [name]
FROM dbo.tblNames
WHERE ISNUMERIC([name]) = 1 )
AND [name] = 2541
Msg 245, Level 16, State 1, Line 1 Conversion failed when converting
the varchar value 'Ken' to data type int.
While the following query executes without error:
SELECT [name]
FROM dbo.tblNames AS tn
WHERE ISNUMERIC([name]) = 1
AND [name] = 2541
I know that this is because of SQL Server Query Optimizer's decision. but I am wondering if there is any way to make sql server evaluate clauses in a certain order. this way, in the first query,the first clause filters out those Names that are not numeric so that the second clause will not fail at converting to a number.
Update: As you may noticed, the above query is just an instance to exemplify the problem. I know the risks of that implicit conversion and appreciate those who tried to warn me of that. However my main question is how to change Optimizer's behavior of evaluating clauses in a certain order.
There is no "direct" way of telling the engine to perform operations in order. SQL isn't an imperative language where you have complete control of how to do things, you simply tell what you need and the server decides how to do it itself.
For this particular case, as long as you have [name] = 2541, you are risking a potential conversion failure since you are comparing a VARCHAR column against an INT. Even if you use a subquery/CTE there is still room for the optimizer to evaluate this expression first and try to convert all varchar values to int (thus failing).
You can evade this with workarounds:
Correctly comparing matching data types:
[name] = '2541'
Casting [name] to INT beforehand and only whenever possible and on a different statement, do the comparison.
DECLARE #tblNamesInt TABLE (nameInt INT)
INSERT INTO #tblNamesInt (
nameInt)
SELECT
[nameInt] = CONVERT(INT, [name])
FROM
dbo.tblNames
WHERE
TRY_CAST([name] AS INT) IS NOT NULL -- TRY_CAST better than ISNUMERIC for INT
SELECT
*
FROM
#tblNamesInt AS T
WHERE
T.nameInt = 2351 -- data types match
Even an index hint won't force the optimizer to use an index (that's why it's called a hint), so we have little control on how it gets stuff done.
There are a few mechanics that we know are evaluated in order and we can use to our advantage, such as the HAVING expressions will always be computed after grouping values, and the grouping always after WHERE conditions. So we can "safely" do the following grouping:
DECLARE #Table TABLE (IntsAsVarchar VARCHAR(100))
INSERT INTO #Table (IntsAsVarchar)
VALUES
('1'),
('2'),
('20'),
('25'),
('30'),
('A') -- Not an INT!
SELECT
CASE WHEN T.IntsAsVarchar < 15 THEN 15 ELSE 30 END,
COUNT(*)
FROM
#Table AS T
WHERE
TRY_CAST(T.IntsAsVarchar AS INT) IS NOT NULL -- Will filter out non-INT values first
GROUP BY
CASE WHEN T.IntsAsVarchar < 15 THEN 15 ELSE 30 END
But you should always avoid writing code that implies implicit conversions (like T.IntsAsVarchar < 15).
Try like this
SELECT [name]
FROM #TBL_Names AS tn
WHERE [name] IN ( SELECT [name]
FROM #TBL_Names
WHERE ISNUMERIC([name]) = 1 )
AND [name] = '2541'
2)
AND [name] = convert(varchar,2541 )
Since You are storing name as varchar(32) varchar will accept integer datatype values also called precedence value
What about:
SELECT *
FROM dbo.tblNames AS tn
WHERE [name] = convert(varchar, 2541)
Why do you need ISNUMERIC([name]) = 1) since you only care about the value '2541'?
You can try this
SELECT [name]
FROM dbo.TBL_Names AS tn
WHERE [name] IN ( SELECT [name]
FROM dbo.TBL_Names
WHERE ISNUMERIC([name]) = 1 )
AND [name] = '2541'
You need to just [name] = 2541 to [name] = '2541'. You are missing ' (single quote) with name in where condition.
You can find the live demo Here.
Honestly, I wouldn't apply the implicit cast to your column [name], it'll make the query non-SARGable. Instead, convert the value of your input (or pass it as a string)
SELECT [name]
FROM dbo.TBL_Names tn
WHERE [name] = CONVERT(varchar(32),2541);
If you "must", however, wrap [name] (and suffer performance degradation) then use TRY_CONVERT:
SELECT [name]
FROM dbo.TBL_Names tn
WHERE TRY_CONVERT(int,[name]) = 2541;
Related
I today ran into a really weird problem in SQL Server (both 2008R2 and 2012). I'm trying to build up a string using concatenation in combination with a select statement.
I have found workarounds, but I would really like to understand what's going on here and why it doesn't give me my expected result. Can someone explain it to me?
http://sqlfiddle.com/#!6/7438a/1
On request, also the code here:
-- base table
create table bla (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table without primary key on id column
create table bla2 (
[id] int identity(1,1),
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table with nvarchar(1000) instead of max
create table bla3 (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(1000),
[autofix] bit
)
-- fill the three tables with the same values
insert into bla ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla2 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla3 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
;
declare #a nvarchar(max) = ''
declare #b nvarchar(max) = ''
declare #c nvarchar(max) = ''
declare #d nvarchar(max) = ''
declare #e nvarchar(max) = ''
declare #f nvarchar(max) = ''
-- I expect this to work and generate 'AB', but it doesn't
select #a = #a + [msg]
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: convert nvarchar(4000)
select #b = #b + convert(nvarchar(4000),[msg])
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: without WHERE clause
select #c = #c + [msg]
from bla
--where autofix = 0
order by [priority] asc
-- this DOES work: without the order by
select #d = #d + [msg]
from bla
where autofix = 0
--order by [priority] asc
-- this DOES work: from bla2, so without the primary key on id
select #e = #e + [msg]
from bla2
where autofix = 0
order by [priority] asc
-- this DOES work: from bla3, so with msg nvarchar(1000) instead of nvarchar(max)
select #f = #f + [msg]
from bla3
where autofix = 0
order by [priority] asc
select #a as a, #b as b, #c as c, #d as d, #e as e, #f as f
TLDR; This is not a documented/supported approach for concatenating strings across rows. It sometimes works but also sometimes fails as it depends what execution plan you get.
Instead use one of the following guaranteed approaches
SQL Server 2017+
SELECT #a = STRING_AGG([msg], '') WITHIN GROUP (ORDER BY [priority] ASC)
FROM bla
where autofix = 0
SQL Server 2005+
SELECT #a = (SELECT [msg] + ''
FROM bla
WHERE autofix = 0
ORDER BY [priority] ASC
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')
Background
The KB article already linked by VanDerNorth does include the line
The correct behavior for an aggregate concatenation query is
undefined.
but then goes on to muddy the waters a bit by providing a workaround that does seem to indicate deterministic behavior is possible.
In order to achieve the expected results from an aggregate
concatenation query, apply any Transact-SQL function or expression to
the columns in the SELECT list rather than in the ORDER BY clause.
Your problematic query does not apply any expressions to columns in the ORDER BY clause.
The 2005 article Ordering guarantees in SQL Server... does state
For backwards compatibility reasons, SQL Server provides support for
assignments of type SELECT #p = #p + 1 ... ORDER BY at the top-most
scope.
In the plans where the concatenation works as you expected the compute scalar with the expression [Expr1003] = Scalar Operator([#x]+[Expr1004]) appears above the sort.
In the plan where it fails to work the compute scalar appears below the sort. As explained in this connect item from 2006 when the expression #x = #x + [msg] appears below the sort it is evaluated for each row but all the evaluations end up using the pre assignment value of #x. In another similar Connect Item from 2006 the response from Microsoft spoke of "fixing" the issue.
The Microsoft Response on all the later Connect items on this issue (and there are many) state that this is simply not guaranteed
Example 1
we do not make any guarantees on the correctness of concatenation
queries (like using variable assignments with data retrieval in a
specific order). The query output can change in SQL Server 2008
depending on the plan choice, data in the tables etc. You shouldn't
rely on this working consistently even though the syntax allows you to
write a SELECT statement that mixes ordered rows retrieval with
variable assignment.
Example 2
The behavior you are seeing is by design. Using assignment operations
(concatenation in this example) in queries with ORDER BY clause has
undefined behavior. This can change from release to release or even
within a particular server version due to changes in the query plan.
You cannot rely on this behavior even if there are workarounds. See
the below KB article for more details:
http://support.microsoft.com/kb/287515 The ONLY guaranteed
mechanism are the following:
Use cursor to loop through the rows in specific order and concatenate the values
Use for xml query with ORDER BY to generate the concatenated values
Use CLR aggregate (this will not work with ORDER BY clause)
Example 3
The behavior you are seeing is actually by design. This has to do with
SQL being a set-manipulation language. All expressions in the SELECT
list (and this includes assignments too) are not guaranteed to be
executed exactly once for each output row. In fact, SQL query
optimizer tries hard to execute them as few times as possible. This
will give expected results when you are computing the value of the
variable based on some data in the tables, but when the value that you
are assigning depends on the previous value of the same variable, the
results may be quite unexpected. If the query optimizer moves the
expression to a different place in the query tree, it may get
evaluated less times (or just once, as in one of your examples). This
is why we don't recommend using the "iteration" type assignments to
compute aggregate values. We find that XML-based workarounds ... usually work well for the
customers
Example 4
Even without ORDER BY, we do not guarantee that #var = #var +
will produce the concatenated value for any statement
that affects multiple rows. The right-hand side of the expression can
be evaluated either once or multiple times during query execution and
the behavior as I said is plan dependent.
Example 5
The variable assignment with SELECT statement is a proprietary syntax
(T-SQL only) where the behavior is undefined or plan dependent if
multiple rows are produced. If you need to do the string concatenation
then use a SQLCLR aggregate or FOR XML query based concatenation or
other relational methods.
Seems a bit like this post: VARCHAR(MAX) acting weird when concatenating string
The conclusion there:
This approach to string concatenation does usually work but it isn't guaranteed.
The official line in the KB article for a similar issue is that "The correct behavior for an aggregate concatenation query is undefined."
I today ran into a really weird problem in SQL Server (both 2008R2 and 2012). I'm trying to build up a string using concatenation in combination with a select statement.
I have found workarounds, but I would really like to understand what's going on here and why it doesn't give me my expected result. Can someone explain it to me?
http://sqlfiddle.com/#!6/7438a/1
On request, also the code here:
-- base table
create table bla (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table without primary key on id column
create table bla2 (
[id] int identity(1,1),
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table with nvarchar(1000) instead of max
create table bla3 (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(1000),
[autofix] bit
)
-- fill the three tables with the same values
insert into bla ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla2 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla3 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
;
declare #a nvarchar(max) = ''
declare #b nvarchar(max) = ''
declare #c nvarchar(max) = ''
declare #d nvarchar(max) = ''
declare #e nvarchar(max) = ''
declare #f nvarchar(max) = ''
-- I expect this to work and generate 'AB', but it doesn't
select #a = #a + [msg]
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: convert nvarchar(4000)
select #b = #b + convert(nvarchar(4000),[msg])
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: without WHERE clause
select #c = #c + [msg]
from bla
--where autofix = 0
order by [priority] asc
-- this DOES work: without the order by
select #d = #d + [msg]
from bla
where autofix = 0
--order by [priority] asc
-- this DOES work: from bla2, so without the primary key on id
select #e = #e + [msg]
from bla2
where autofix = 0
order by [priority] asc
-- this DOES work: from bla3, so with msg nvarchar(1000) instead of nvarchar(max)
select #f = #f + [msg]
from bla3
where autofix = 0
order by [priority] asc
select #a as a, #b as b, #c as c, #d as d, #e as e, #f as f
TLDR; This is not a documented/supported approach for concatenating strings across rows. It sometimes works but also sometimes fails as it depends what execution plan you get.
Instead use one of the following guaranteed approaches
SQL Server 2017+
SELECT #a = STRING_AGG([msg], '') WITHIN GROUP (ORDER BY [priority] ASC)
FROM bla
where autofix = 0
SQL Server 2005+
SELECT #a = (SELECT [msg] + ''
FROM bla
WHERE autofix = 0
ORDER BY [priority] ASC
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')
Background
The KB article already linked by VanDerNorth does include the line
The correct behavior for an aggregate concatenation query is
undefined.
but then goes on to muddy the waters a bit by providing a workaround that does seem to indicate deterministic behavior is possible.
In order to achieve the expected results from an aggregate
concatenation query, apply any Transact-SQL function or expression to
the columns in the SELECT list rather than in the ORDER BY clause.
Your problematic query does not apply any expressions to columns in the ORDER BY clause.
The 2005 article Ordering guarantees in SQL Server... does state
For backwards compatibility reasons, SQL Server provides support for
assignments of type SELECT #p = #p + 1 ... ORDER BY at the top-most
scope.
In the plans where the concatenation works as you expected the compute scalar with the expression [Expr1003] = Scalar Operator([#x]+[Expr1004]) appears above the sort.
In the plan where it fails to work the compute scalar appears below the sort. As explained in this connect item from 2006 when the expression #x = #x + [msg] appears below the sort it is evaluated for each row but all the evaluations end up using the pre assignment value of #x. In another similar Connect Item from 2006 the response from Microsoft spoke of "fixing" the issue.
The Microsoft Response on all the later Connect items on this issue (and there are many) state that this is simply not guaranteed
Example 1
we do not make any guarantees on the correctness of concatenation
queries (like using variable assignments with data retrieval in a
specific order). The query output can change in SQL Server 2008
depending on the plan choice, data in the tables etc. You shouldn't
rely on this working consistently even though the syntax allows you to
write a SELECT statement that mixes ordered rows retrieval with
variable assignment.
Example 2
The behavior you are seeing is by design. Using assignment operations
(concatenation in this example) in queries with ORDER BY clause has
undefined behavior. This can change from release to release or even
within a particular server version due to changes in the query plan.
You cannot rely on this behavior even if there are workarounds. See
the below KB article for more details:
http://support.microsoft.com/kb/287515 The ONLY guaranteed
mechanism are the following:
Use cursor to loop through the rows in specific order and concatenate the values
Use for xml query with ORDER BY to generate the concatenated values
Use CLR aggregate (this will not work with ORDER BY clause)
Example 3
The behavior you are seeing is actually by design. This has to do with
SQL being a set-manipulation language. All expressions in the SELECT
list (and this includes assignments too) are not guaranteed to be
executed exactly once for each output row. In fact, SQL query
optimizer tries hard to execute them as few times as possible. This
will give expected results when you are computing the value of the
variable based on some data in the tables, but when the value that you
are assigning depends on the previous value of the same variable, the
results may be quite unexpected. If the query optimizer moves the
expression to a different place in the query tree, it may get
evaluated less times (or just once, as in one of your examples). This
is why we don't recommend using the "iteration" type assignments to
compute aggregate values. We find that XML-based workarounds ... usually work well for the
customers
Example 4
Even without ORDER BY, we do not guarantee that #var = #var +
will produce the concatenated value for any statement
that affects multiple rows. The right-hand side of the expression can
be evaluated either once or multiple times during query execution and
the behavior as I said is plan dependent.
Example 5
The variable assignment with SELECT statement is a proprietary syntax
(T-SQL only) where the behavior is undefined or plan dependent if
multiple rows are produced. If you need to do the string concatenation
then use a SQLCLR aggregate or FOR XML query based concatenation or
other relational methods.
Seems a bit like this post: VARCHAR(MAX) acting weird when concatenating string
The conclusion there:
This approach to string concatenation does usually work but it isn't guaranteed.
The official line in the KB article for a similar issue is that "The correct behavior for an aggregate concatenation query is undefined."
I today ran into a really weird problem in SQL Server (both 2008R2 and 2012). I'm trying to build up a string using concatenation in combination with a select statement.
I have found workarounds, but I would really like to understand what's going on here and why it doesn't give me my expected result. Can someone explain it to me?
http://sqlfiddle.com/#!6/7438a/1
On request, also the code here:
-- base table
create table bla (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table without primary key on id column
create table bla2 (
[id] int identity(1,1),
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table with nvarchar(1000) instead of max
create table bla3 (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(1000),
[autofix] bit
)
-- fill the three tables with the same values
insert into bla ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla2 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla3 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
;
declare #a nvarchar(max) = ''
declare #b nvarchar(max) = ''
declare #c nvarchar(max) = ''
declare #d nvarchar(max) = ''
declare #e nvarchar(max) = ''
declare #f nvarchar(max) = ''
-- I expect this to work and generate 'AB', but it doesn't
select #a = #a + [msg]
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: convert nvarchar(4000)
select #b = #b + convert(nvarchar(4000),[msg])
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: without WHERE clause
select #c = #c + [msg]
from bla
--where autofix = 0
order by [priority] asc
-- this DOES work: without the order by
select #d = #d + [msg]
from bla
where autofix = 0
--order by [priority] asc
-- this DOES work: from bla2, so without the primary key on id
select #e = #e + [msg]
from bla2
where autofix = 0
order by [priority] asc
-- this DOES work: from bla3, so with msg nvarchar(1000) instead of nvarchar(max)
select #f = #f + [msg]
from bla3
where autofix = 0
order by [priority] asc
select #a as a, #b as b, #c as c, #d as d, #e as e, #f as f
TLDR; This is not a documented/supported approach for concatenating strings across rows. It sometimes works but also sometimes fails as it depends what execution plan you get.
Instead use one of the following guaranteed approaches
SQL Server 2017+
SELECT #a = STRING_AGG([msg], '') WITHIN GROUP (ORDER BY [priority] ASC)
FROM bla
where autofix = 0
SQL Server 2005+
SELECT #a = (SELECT [msg] + ''
FROM bla
WHERE autofix = 0
ORDER BY [priority] ASC
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')
Background
The KB article already linked by VanDerNorth does include the line
The correct behavior for an aggregate concatenation query is
undefined.
but then goes on to muddy the waters a bit by providing a workaround that does seem to indicate deterministic behavior is possible.
In order to achieve the expected results from an aggregate
concatenation query, apply any Transact-SQL function or expression to
the columns in the SELECT list rather than in the ORDER BY clause.
Your problematic query does not apply any expressions to columns in the ORDER BY clause.
The 2005 article Ordering guarantees in SQL Server... does state
For backwards compatibility reasons, SQL Server provides support for
assignments of type SELECT #p = #p + 1 ... ORDER BY at the top-most
scope.
In the plans where the concatenation works as you expected the compute scalar with the expression [Expr1003] = Scalar Operator([#x]+[Expr1004]) appears above the sort.
In the plan where it fails to work the compute scalar appears below the sort. As explained in this connect item from 2006 when the expression #x = #x + [msg] appears below the sort it is evaluated for each row but all the evaluations end up using the pre assignment value of #x. In another similar Connect Item from 2006 the response from Microsoft spoke of "fixing" the issue.
The Microsoft Response on all the later Connect items on this issue (and there are many) state that this is simply not guaranteed
Example 1
we do not make any guarantees on the correctness of concatenation
queries (like using variable assignments with data retrieval in a
specific order). The query output can change in SQL Server 2008
depending on the plan choice, data in the tables etc. You shouldn't
rely on this working consistently even though the syntax allows you to
write a SELECT statement that mixes ordered rows retrieval with
variable assignment.
Example 2
The behavior you are seeing is by design. Using assignment operations
(concatenation in this example) in queries with ORDER BY clause has
undefined behavior. This can change from release to release or even
within a particular server version due to changes in the query plan.
You cannot rely on this behavior even if there are workarounds. See
the below KB article for more details:
http://support.microsoft.com/kb/287515 The ONLY guaranteed
mechanism are the following:
Use cursor to loop through the rows in specific order and concatenate the values
Use for xml query with ORDER BY to generate the concatenated values
Use CLR aggregate (this will not work with ORDER BY clause)
Example 3
The behavior you are seeing is actually by design. This has to do with
SQL being a set-manipulation language. All expressions in the SELECT
list (and this includes assignments too) are not guaranteed to be
executed exactly once for each output row. In fact, SQL query
optimizer tries hard to execute them as few times as possible. This
will give expected results when you are computing the value of the
variable based on some data in the tables, but when the value that you
are assigning depends on the previous value of the same variable, the
results may be quite unexpected. If the query optimizer moves the
expression to a different place in the query tree, it may get
evaluated less times (or just once, as in one of your examples). This
is why we don't recommend using the "iteration" type assignments to
compute aggregate values. We find that XML-based workarounds ... usually work well for the
customers
Example 4
Even without ORDER BY, we do not guarantee that #var = #var +
will produce the concatenated value for any statement
that affects multiple rows. The right-hand side of the expression can
be evaluated either once or multiple times during query execution and
the behavior as I said is plan dependent.
Example 5
The variable assignment with SELECT statement is a proprietary syntax
(T-SQL only) where the behavior is undefined or plan dependent if
multiple rows are produced. If you need to do the string concatenation
then use a SQLCLR aggregate or FOR XML query based concatenation or
other relational methods.
Seems a bit like this post: VARCHAR(MAX) acting weird when concatenating string
The conclusion there:
This approach to string concatenation does usually work but it isn't guaranteed.
The official line in the KB article for a similar issue is that "The correct behavior for an aggregate concatenation query is undefined."
I today ran into a really weird problem in SQL Server (both 2008R2 and 2012). I'm trying to build up a string using concatenation in combination with a select statement.
I have found workarounds, but I would really like to understand what's going on here and why it doesn't give me my expected result. Can someone explain it to me?
http://sqlfiddle.com/#!6/7438a/1
On request, also the code here:
-- base table
create table bla (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table without primary key on id column
create table bla2 (
[id] int identity(1,1),
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table with nvarchar(1000) instead of max
create table bla3 (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(1000),
[autofix] bit
)
-- fill the three tables with the same values
insert into bla ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla2 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla3 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
;
declare #a nvarchar(max) = ''
declare #b nvarchar(max) = ''
declare #c nvarchar(max) = ''
declare #d nvarchar(max) = ''
declare #e nvarchar(max) = ''
declare #f nvarchar(max) = ''
-- I expect this to work and generate 'AB', but it doesn't
select #a = #a + [msg]
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: convert nvarchar(4000)
select #b = #b + convert(nvarchar(4000),[msg])
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: without WHERE clause
select #c = #c + [msg]
from bla
--where autofix = 0
order by [priority] asc
-- this DOES work: without the order by
select #d = #d + [msg]
from bla
where autofix = 0
--order by [priority] asc
-- this DOES work: from bla2, so without the primary key on id
select #e = #e + [msg]
from bla2
where autofix = 0
order by [priority] asc
-- this DOES work: from bla3, so with msg nvarchar(1000) instead of nvarchar(max)
select #f = #f + [msg]
from bla3
where autofix = 0
order by [priority] asc
select #a as a, #b as b, #c as c, #d as d, #e as e, #f as f
TLDR; This is not a documented/supported approach for concatenating strings across rows. It sometimes works but also sometimes fails as it depends what execution plan you get.
Instead use one of the following guaranteed approaches
SQL Server 2017+
SELECT #a = STRING_AGG([msg], '') WITHIN GROUP (ORDER BY [priority] ASC)
FROM bla
where autofix = 0
SQL Server 2005+
SELECT #a = (SELECT [msg] + ''
FROM bla
WHERE autofix = 0
ORDER BY [priority] ASC
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')
Background
The KB article already linked by VanDerNorth does include the line
The correct behavior for an aggregate concatenation query is
undefined.
but then goes on to muddy the waters a bit by providing a workaround that does seem to indicate deterministic behavior is possible.
In order to achieve the expected results from an aggregate
concatenation query, apply any Transact-SQL function or expression to
the columns in the SELECT list rather than in the ORDER BY clause.
Your problematic query does not apply any expressions to columns in the ORDER BY clause.
The 2005 article Ordering guarantees in SQL Server... does state
For backwards compatibility reasons, SQL Server provides support for
assignments of type SELECT #p = #p + 1 ... ORDER BY at the top-most
scope.
In the plans where the concatenation works as you expected the compute scalar with the expression [Expr1003] = Scalar Operator([#x]+[Expr1004]) appears above the sort.
In the plan where it fails to work the compute scalar appears below the sort. As explained in this connect item from 2006 when the expression #x = #x + [msg] appears below the sort it is evaluated for each row but all the evaluations end up using the pre assignment value of #x. In another similar Connect Item from 2006 the response from Microsoft spoke of "fixing" the issue.
The Microsoft Response on all the later Connect items on this issue (and there are many) state that this is simply not guaranteed
Example 1
we do not make any guarantees on the correctness of concatenation
queries (like using variable assignments with data retrieval in a
specific order). The query output can change in SQL Server 2008
depending on the plan choice, data in the tables etc. You shouldn't
rely on this working consistently even though the syntax allows you to
write a SELECT statement that mixes ordered rows retrieval with
variable assignment.
Example 2
The behavior you are seeing is by design. Using assignment operations
(concatenation in this example) in queries with ORDER BY clause has
undefined behavior. This can change from release to release or even
within a particular server version due to changes in the query plan.
You cannot rely on this behavior even if there are workarounds. See
the below KB article for more details:
http://support.microsoft.com/kb/287515 The ONLY guaranteed
mechanism are the following:
Use cursor to loop through the rows in specific order and concatenate the values
Use for xml query with ORDER BY to generate the concatenated values
Use CLR aggregate (this will not work with ORDER BY clause)
Example 3
The behavior you are seeing is actually by design. This has to do with
SQL being a set-manipulation language. All expressions in the SELECT
list (and this includes assignments too) are not guaranteed to be
executed exactly once for each output row. In fact, SQL query
optimizer tries hard to execute them as few times as possible. This
will give expected results when you are computing the value of the
variable based on some data in the tables, but when the value that you
are assigning depends on the previous value of the same variable, the
results may be quite unexpected. If the query optimizer moves the
expression to a different place in the query tree, it may get
evaluated less times (or just once, as in one of your examples). This
is why we don't recommend using the "iteration" type assignments to
compute aggregate values. We find that XML-based workarounds ... usually work well for the
customers
Example 4
Even without ORDER BY, we do not guarantee that #var = #var +
will produce the concatenated value for any statement
that affects multiple rows. The right-hand side of the expression can
be evaluated either once or multiple times during query execution and
the behavior as I said is plan dependent.
Example 5
The variable assignment with SELECT statement is a proprietary syntax
(T-SQL only) where the behavior is undefined or plan dependent if
multiple rows are produced. If you need to do the string concatenation
then use a SQLCLR aggregate or FOR XML query based concatenation or
other relational methods.
Seems a bit like this post: VARCHAR(MAX) acting weird when concatenating string
The conclusion there:
This approach to string concatenation does usually work but it isn't guaranteed.
The official line in the KB article for a similar issue is that "The correct behavior for an aggregate concatenation query is undefined."
I have an UDF that selects top 6 objects from a table (with a union - code below) and inserts it into another table. (btw SQL 2005)
So I paste the UDF below and what the code does is:
selects objects for a specific city and add a level to those (from table Europe)
union that selection with a selection from the same table for objects that are from the same country and add a level to those
From the union, selection is made to get top 6 objects, order by level, so the objects from the same city will be first, and if there aren't any available, then objects from the same country will be returned from the selection.
And my problem is, that I want to make a random selection to get random objects from table Europe, but because I insert the result of my selection into a table, I can't use order by newid() or rand() function because they are time-dependent, so I get the following errors:
Invalid use of side-effecting or time-dependent operator in 'newid' within a function.
Invalid use of side-effecting or time-dependent operator in 'rand' within a function.
UDF:
ALTER FUNCTION [dbo].[Objects] (#id uniqueidentifier)
RETURNS #objects TABLE
(
ObjectId uniqueidentifier NOT NULL,
InternalId uniqueidentifier NOT NULL
)
AS
BEGIN
declare #city varchar(50)
declare #country int
select #city = city,
#country = country
from Europe
where internalId = #id
insert #objects
select #id, internalId from
(
select distinct top 6 [level], internalId from
(
select top 6 1 as [level], internalId
from Europe N4
where N4.city = #city
and N4.internalId != #id
union select top 6 2 as [level], internalId
from Europe N5
where N5.countryId = #country
and N5.internalId != #id
) as selection_1
order by [level]
) as selection_2
return
END
If you have fresh ideas, please share them with me.
(Just please, don't suggest to order by newid() or to add a column rand() with seed DateTime (by ms or sthg), because that won't work.)
Perhaps you could take advantage of the guids by adding a position parameter to your inputs and then passing in a randomly generated value and then ordering by Substring(internalID, #Random,1)
I found a good solution myself and I thought it might be handy to share it :)
DECLARE #seed1 int
DECLARE #seed2 int
SET #seed1 = DATEPART(SECOND,GETDATE())
SET #seed2 = DATEPART(MILLISECOND,GETDATE())
SELECT TOP 10 [Column1], [Column2]
FROM [TABLE]
ORDER BY ROW_NUMBER() OVER (ORDER BY [KeyColumn]) * seed2 % seed1
I think it's simple enough and it's quite handy
What version of the database server are you using?
In SQL Server 2005 you can use rand with getdate as seed, and the function becomes indeterministic.
In earlier versions you can't have indeterministic functions, and you would have to use a stored procedure instead.