I today ran into a really weird problem in SQL Server (both 2008R2 and 2012). I'm trying to build up a string using concatenation in combination with a select statement.
I have found workarounds, but I would really like to understand what's going on here and why it doesn't give me my expected result. Can someone explain it to me?
http://sqlfiddle.com/#!6/7438a/1
On request, also the code here:
-- base table
create table bla (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table without primary key on id column
create table bla2 (
[id] int identity(1,1),
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table with nvarchar(1000) instead of max
create table bla3 (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(1000),
[autofix] bit
)
-- fill the three tables with the same values
insert into bla ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla2 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla3 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
;
declare #a nvarchar(max) = ''
declare #b nvarchar(max) = ''
declare #c nvarchar(max) = ''
declare #d nvarchar(max) = ''
declare #e nvarchar(max) = ''
declare #f nvarchar(max) = ''
-- I expect this to work and generate 'AB', but it doesn't
select #a = #a + [msg]
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: convert nvarchar(4000)
select #b = #b + convert(nvarchar(4000),[msg])
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: without WHERE clause
select #c = #c + [msg]
from bla
--where autofix = 0
order by [priority] asc
-- this DOES work: without the order by
select #d = #d + [msg]
from bla
where autofix = 0
--order by [priority] asc
-- this DOES work: from bla2, so without the primary key on id
select #e = #e + [msg]
from bla2
where autofix = 0
order by [priority] asc
-- this DOES work: from bla3, so with msg nvarchar(1000) instead of nvarchar(max)
select #f = #f + [msg]
from bla3
where autofix = 0
order by [priority] asc
select #a as a, #b as b, #c as c, #d as d, #e as e, #f as f
TLDR; This is not a documented/supported approach for concatenating strings across rows. It sometimes works but also sometimes fails as it depends what execution plan you get.
Instead use one of the following guaranteed approaches
SQL Server 2017+
SELECT #a = STRING_AGG([msg], '') WITHIN GROUP (ORDER BY [priority] ASC)
FROM bla
where autofix = 0
SQL Server 2005+
SELECT #a = (SELECT [msg] + ''
FROM bla
WHERE autofix = 0
ORDER BY [priority] ASC
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')
Background
The KB article already linked by VanDerNorth does include the line
The correct behavior for an aggregate concatenation query is
undefined.
but then goes on to muddy the waters a bit by providing a workaround that does seem to indicate deterministic behavior is possible.
In order to achieve the expected results from an aggregate
concatenation query, apply any Transact-SQL function or expression to
the columns in the SELECT list rather than in the ORDER BY clause.
Your problematic query does not apply any expressions to columns in the ORDER BY clause.
The 2005 article Ordering guarantees in SQL Server... does state
For backwards compatibility reasons, SQL Server provides support for
assignments of type SELECT #p = #p + 1 ... ORDER BY at the top-most
scope.
In the plans where the concatenation works as you expected the compute scalar with the expression [Expr1003] = Scalar Operator([#x]+[Expr1004]) appears above the sort.
In the plan where it fails to work the compute scalar appears below the sort. As explained in this connect item from 2006 when the expression #x = #x + [msg] appears below the sort it is evaluated for each row but all the evaluations end up using the pre assignment value of #x. In another similar Connect Item from 2006 the response from Microsoft spoke of "fixing" the issue.
The Microsoft Response on all the later Connect items on this issue (and there are many) state that this is simply not guaranteed
Example 1
we do not make any guarantees on the correctness of concatenation
queries (like using variable assignments with data retrieval in a
specific order). The query output can change in SQL Server 2008
depending on the plan choice, data in the tables etc. You shouldn't
rely on this working consistently even though the syntax allows you to
write a SELECT statement that mixes ordered rows retrieval with
variable assignment.
Example 2
The behavior you are seeing is by design. Using assignment operations
(concatenation in this example) in queries with ORDER BY clause has
undefined behavior. This can change from release to release or even
within a particular server version due to changes in the query plan.
You cannot rely on this behavior even if there are workarounds. See
the below KB article for more details:
http://support.microsoft.com/kb/287515 The ONLY guaranteed
mechanism are the following:
Use cursor to loop through the rows in specific order and concatenate the values
Use for xml query with ORDER BY to generate the concatenated values
Use CLR aggregate (this will not work with ORDER BY clause)
Example 3
The behavior you are seeing is actually by design. This has to do with
SQL being a set-manipulation language. All expressions in the SELECT
list (and this includes assignments too) are not guaranteed to be
executed exactly once for each output row. In fact, SQL query
optimizer tries hard to execute them as few times as possible. This
will give expected results when you are computing the value of the
variable based on some data in the tables, but when the value that you
are assigning depends on the previous value of the same variable, the
results may be quite unexpected. If the query optimizer moves the
expression to a different place in the query tree, it may get
evaluated less times (or just once, as in one of your examples). This
is why we don't recommend using the "iteration" type assignments to
compute aggregate values. We find that XML-based workarounds ... usually work well for the
customers
Example 4
Even without ORDER BY, we do not guarantee that #var = #var +
will produce the concatenated value for any statement
that affects multiple rows. The right-hand side of the expression can
be evaluated either once or multiple times during query execution and
the behavior as I said is plan dependent.
Example 5
The variable assignment with SELECT statement is a proprietary syntax
(T-SQL only) where the behavior is undefined or plan dependent if
multiple rows are produced. If you need to do the string concatenation
then use a SQLCLR aggregate or FOR XML query based concatenation or
other relational methods.
Seems a bit like this post: VARCHAR(MAX) acting weird when concatenating string
The conclusion there:
This approach to string concatenation does usually work but it isn't guaranteed.
The official line in the KB article for a similar issue is that "The correct behavior for an aggregate concatenation query is undefined."
Related
Take the following table as an instance:
CREATE TABLE TBL_Names(Name VARCHAR(32))
INSERT INTO TBL_Names
VALUES ('Ken'),('1965'),('Karen'),('2541')
sqlfiddle
Executing following query throws an exception:
SELECT [name]
FROM dbo.tblNames AS tn
WHERE [name] IN ( SELECT [name]
FROM dbo.tblNames
WHERE ISNUMERIC([name]) = 1 )
AND [name] = 2541
Msg 245, Level 16, State 1, Line 1 Conversion failed when converting
the varchar value 'Ken' to data type int.
While the following query executes without error:
SELECT [name]
FROM dbo.tblNames AS tn
WHERE ISNUMERIC([name]) = 1
AND [name] = 2541
I know that this is because of SQL Server Query Optimizer's decision. but I am wondering if there is any way to make sql server evaluate clauses in a certain order. this way, in the first query,the first clause filters out those Names that are not numeric so that the second clause will not fail at converting to a number.
Update: As you may noticed, the above query is just an instance to exemplify the problem. I know the risks of that implicit conversion and appreciate those who tried to warn me of that. However my main question is how to change Optimizer's behavior of evaluating clauses in a certain order.
There is no "direct" way of telling the engine to perform operations in order. SQL isn't an imperative language where you have complete control of how to do things, you simply tell what you need and the server decides how to do it itself.
For this particular case, as long as you have [name] = 2541, you are risking a potential conversion failure since you are comparing a VARCHAR column against an INT. Even if you use a subquery/CTE there is still room for the optimizer to evaluate this expression first and try to convert all varchar values to int (thus failing).
You can evade this with workarounds:
Correctly comparing matching data types:
[name] = '2541'
Casting [name] to INT beforehand and only whenever possible and on a different statement, do the comparison.
DECLARE #tblNamesInt TABLE (nameInt INT)
INSERT INTO #tblNamesInt (
nameInt)
SELECT
[nameInt] = CONVERT(INT, [name])
FROM
dbo.tblNames
WHERE
TRY_CAST([name] AS INT) IS NOT NULL -- TRY_CAST better than ISNUMERIC for INT
SELECT
*
FROM
#tblNamesInt AS T
WHERE
T.nameInt = 2351 -- data types match
Even an index hint won't force the optimizer to use an index (that's why it's called a hint), so we have little control on how it gets stuff done.
There are a few mechanics that we know are evaluated in order and we can use to our advantage, such as the HAVING expressions will always be computed after grouping values, and the grouping always after WHERE conditions. So we can "safely" do the following grouping:
DECLARE #Table TABLE (IntsAsVarchar VARCHAR(100))
INSERT INTO #Table (IntsAsVarchar)
VALUES
('1'),
('2'),
('20'),
('25'),
('30'),
('A') -- Not an INT!
SELECT
CASE WHEN T.IntsAsVarchar < 15 THEN 15 ELSE 30 END,
COUNT(*)
FROM
#Table AS T
WHERE
TRY_CAST(T.IntsAsVarchar AS INT) IS NOT NULL -- Will filter out non-INT values first
GROUP BY
CASE WHEN T.IntsAsVarchar < 15 THEN 15 ELSE 30 END
But you should always avoid writing code that implies implicit conversions (like T.IntsAsVarchar < 15).
Try like this
SELECT [name]
FROM #TBL_Names AS tn
WHERE [name] IN ( SELECT [name]
FROM #TBL_Names
WHERE ISNUMERIC([name]) = 1 )
AND [name] = '2541'
2)
AND [name] = convert(varchar,2541 )
Since You are storing name as varchar(32) varchar will accept integer datatype values also called precedence value
What about:
SELECT *
FROM dbo.tblNames AS tn
WHERE [name] = convert(varchar, 2541)
Why do you need ISNUMERIC([name]) = 1) since you only care about the value '2541'?
You can try this
SELECT [name]
FROM dbo.TBL_Names AS tn
WHERE [name] IN ( SELECT [name]
FROM dbo.TBL_Names
WHERE ISNUMERIC([name]) = 1 )
AND [name] = '2541'
You need to just [name] = 2541 to [name] = '2541'. You are missing ' (single quote) with name in where condition.
You can find the live demo Here.
Honestly, I wouldn't apply the implicit cast to your column [name], it'll make the query non-SARGable. Instead, convert the value of your input (or pass it as a string)
SELECT [name]
FROM dbo.TBL_Names tn
WHERE [name] = CONVERT(varchar(32),2541);
If you "must", however, wrap [name] (and suffer performance degradation) then use TRY_CONVERT:
SELECT [name]
FROM dbo.TBL_Names tn
WHERE TRY_CONVERT(int,[name]) = 2541;
I today ran into a really weird problem in SQL Server (both 2008R2 and 2012). I'm trying to build up a string using concatenation in combination with a select statement.
I have found workarounds, but I would really like to understand what's going on here and why it doesn't give me my expected result. Can someone explain it to me?
http://sqlfiddle.com/#!6/7438a/1
On request, also the code here:
-- base table
create table bla (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table without primary key on id column
create table bla2 (
[id] int identity(1,1),
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table with nvarchar(1000) instead of max
create table bla3 (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(1000),
[autofix] bit
)
-- fill the three tables with the same values
insert into bla ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla2 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla3 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
;
declare #a nvarchar(max) = ''
declare #b nvarchar(max) = ''
declare #c nvarchar(max) = ''
declare #d nvarchar(max) = ''
declare #e nvarchar(max) = ''
declare #f nvarchar(max) = ''
-- I expect this to work and generate 'AB', but it doesn't
select #a = #a + [msg]
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: convert nvarchar(4000)
select #b = #b + convert(nvarchar(4000),[msg])
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: without WHERE clause
select #c = #c + [msg]
from bla
--where autofix = 0
order by [priority] asc
-- this DOES work: without the order by
select #d = #d + [msg]
from bla
where autofix = 0
--order by [priority] asc
-- this DOES work: from bla2, so without the primary key on id
select #e = #e + [msg]
from bla2
where autofix = 0
order by [priority] asc
-- this DOES work: from bla3, so with msg nvarchar(1000) instead of nvarchar(max)
select #f = #f + [msg]
from bla3
where autofix = 0
order by [priority] asc
select #a as a, #b as b, #c as c, #d as d, #e as e, #f as f
TLDR; This is not a documented/supported approach for concatenating strings across rows. It sometimes works but also sometimes fails as it depends what execution plan you get.
Instead use one of the following guaranteed approaches
SQL Server 2017+
SELECT #a = STRING_AGG([msg], '') WITHIN GROUP (ORDER BY [priority] ASC)
FROM bla
where autofix = 0
SQL Server 2005+
SELECT #a = (SELECT [msg] + ''
FROM bla
WHERE autofix = 0
ORDER BY [priority] ASC
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')
Background
The KB article already linked by VanDerNorth does include the line
The correct behavior for an aggregate concatenation query is
undefined.
but then goes on to muddy the waters a bit by providing a workaround that does seem to indicate deterministic behavior is possible.
In order to achieve the expected results from an aggregate
concatenation query, apply any Transact-SQL function or expression to
the columns in the SELECT list rather than in the ORDER BY clause.
Your problematic query does not apply any expressions to columns in the ORDER BY clause.
The 2005 article Ordering guarantees in SQL Server... does state
For backwards compatibility reasons, SQL Server provides support for
assignments of type SELECT #p = #p + 1 ... ORDER BY at the top-most
scope.
In the plans where the concatenation works as you expected the compute scalar with the expression [Expr1003] = Scalar Operator([#x]+[Expr1004]) appears above the sort.
In the plan where it fails to work the compute scalar appears below the sort. As explained in this connect item from 2006 when the expression #x = #x + [msg] appears below the sort it is evaluated for each row but all the evaluations end up using the pre assignment value of #x. In another similar Connect Item from 2006 the response from Microsoft spoke of "fixing" the issue.
The Microsoft Response on all the later Connect items on this issue (and there are many) state that this is simply not guaranteed
Example 1
we do not make any guarantees on the correctness of concatenation
queries (like using variable assignments with data retrieval in a
specific order). The query output can change in SQL Server 2008
depending on the plan choice, data in the tables etc. You shouldn't
rely on this working consistently even though the syntax allows you to
write a SELECT statement that mixes ordered rows retrieval with
variable assignment.
Example 2
The behavior you are seeing is by design. Using assignment operations
(concatenation in this example) in queries with ORDER BY clause has
undefined behavior. This can change from release to release or even
within a particular server version due to changes in the query plan.
You cannot rely on this behavior even if there are workarounds. See
the below KB article for more details:
http://support.microsoft.com/kb/287515 The ONLY guaranteed
mechanism are the following:
Use cursor to loop through the rows in specific order and concatenate the values
Use for xml query with ORDER BY to generate the concatenated values
Use CLR aggregate (this will not work with ORDER BY clause)
Example 3
The behavior you are seeing is actually by design. This has to do with
SQL being a set-manipulation language. All expressions in the SELECT
list (and this includes assignments too) are not guaranteed to be
executed exactly once for each output row. In fact, SQL query
optimizer tries hard to execute them as few times as possible. This
will give expected results when you are computing the value of the
variable based on some data in the tables, but when the value that you
are assigning depends on the previous value of the same variable, the
results may be quite unexpected. If the query optimizer moves the
expression to a different place in the query tree, it may get
evaluated less times (or just once, as in one of your examples). This
is why we don't recommend using the "iteration" type assignments to
compute aggregate values. We find that XML-based workarounds ... usually work well for the
customers
Example 4
Even without ORDER BY, we do not guarantee that #var = #var +
will produce the concatenated value for any statement
that affects multiple rows. The right-hand side of the expression can
be evaluated either once or multiple times during query execution and
the behavior as I said is plan dependent.
Example 5
The variable assignment with SELECT statement is a proprietary syntax
(T-SQL only) where the behavior is undefined or plan dependent if
multiple rows are produced. If you need to do the string concatenation
then use a SQLCLR aggregate or FOR XML query based concatenation or
other relational methods.
Seems a bit like this post: VARCHAR(MAX) acting weird when concatenating string
The conclusion there:
This approach to string concatenation does usually work but it isn't guaranteed.
The official line in the KB article for a similar issue is that "The correct behavior for an aggregate concatenation query is undefined."
I today ran into a really weird problem in SQL Server (both 2008R2 and 2012). I'm trying to build up a string using concatenation in combination with a select statement.
I have found workarounds, but I would really like to understand what's going on here and why it doesn't give me my expected result. Can someone explain it to me?
http://sqlfiddle.com/#!6/7438a/1
On request, also the code here:
-- base table
create table bla (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table without primary key on id column
create table bla2 (
[id] int identity(1,1),
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table with nvarchar(1000) instead of max
create table bla3 (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(1000),
[autofix] bit
)
-- fill the three tables with the same values
insert into bla ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla2 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla3 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
;
declare #a nvarchar(max) = ''
declare #b nvarchar(max) = ''
declare #c nvarchar(max) = ''
declare #d nvarchar(max) = ''
declare #e nvarchar(max) = ''
declare #f nvarchar(max) = ''
-- I expect this to work and generate 'AB', but it doesn't
select #a = #a + [msg]
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: convert nvarchar(4000)
select #b = #b + convert(nvarchar(4000),[msg])
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: without WHERE clause
select #c = #c + [msg]
from bla
--where autofix = 0
order by [priority] asc
-- this DOES work: without the order by
select #d = #d + [msg]
from bla
where autofix = 0
--order by [priority] asc
-- this DOES work: from bla2, so without the primary key on id
select #e = #e + [msg]
from bla2
where autofix = 0
order by [priority] asc
-- this DOES work: from bla3, so with msg nvarchar(1000) instead of nvarchar(max)
select #f = #f + [msg]
from bla3
where autofix = 0
order by [priority] asc
select #a as a, #b as b, #c as c, #d as d, #e as e, #f as f
TLDR; This is not a documented/supported approach for concatenating strings across rows. It sometimes works but also sometimes fails as it depends what execution plan you get.
Instead use one of the following guaranteed approaches
SQL Server 2017+
SELECT #a = STRING_AGG([msg], '') WITHIN GROUP (ORDER BY [priority] ASC)
FROM bla
where autofix = 0
SQL Server 2005+
SELECT #a = (SELECT [msg] + ''
FROM bla
WHERE autofix = 0
ORDER BY [priority] ASC
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')
Background
The KB article already linked by VanDerNorth does include the line
The correct behavior for an aggregate concatenation query is
undefined.
but then goes on to muddy the waters a bit by providing a workaround that does seem to indicate deterministic behavior is possible.
In order to achieve the expected results from an aggregate
concatenation query, apply any Transact-SQL function or expression to
the columns in the SELECT list rather than in the ORDER BY clause.
Your problematic query does not apply any expressions to columns in the ORDER BY clause.
The 2005 article Ordering guarantees in SQL Server... does state
For backwards compatibility reasons, SQL Server provides support for
assignments of type SELECT #p = #p + 1 ... ORDER BY at the top-most
scope.
In the plans where the concatenation works as you expected the compute scalar with the expression [Expr1003] = Scalar Operator([#x]+[Expr1004]) appears above the sort.
In the plan where it fails to work the compute scalar appears below the sort. As explained in this connect item from 2006 when the expression #x = #x + [msg] appears below the sort it is evaluated for each row but all the evaluations end up using the pre assignment value of #x. In another similar Connect Item from 2006 the response from Microsoft spoke of "fixing" the issue.
The Microsoft Response on all the later Connect items on this issue (and there are many) state that this is simply not guaranteed
Example 1
we do not make any guarantees on the correctness of concatenation
queries (like using variable assignments with data retrieval in a
specific order). The query output can change in SQL Server 2008
depending on the plan choice, data in the tables etc. You shouldn't
rely on this working consistently even though the syntax allows you to
write a SELECT statement that mixes ordered rows retrieval with
variable assignment.
Example 2
The behavior you are seeing is by design. Using assignment operations
(concatenation in this example) in queries with ORDER BY clause has
undefined behavior. This can change from release to release or even
within a particular server version due to changes in the query plan.
You cannot rely on this behavior even if there are workarounds. See
the below KB article for more details:
http://support.microsoft.com/kb/287515 The ONLY guaranteed
mechanism are the following:
Use cursor to loop through the rows in specific order and concatenate the values
Use for xml query with ORDER BY to generate the concatenated values
Use CLR aggregate (this will not work with ORDER BY clause)
Example 3
The behavior you are seeing is actually by design. This has to do with
SQL being a set-manipulation language. All expressions in the SELECT
list (and this includes assignments too) are not guaranteed to be
executed exactly once for each output row. In fact, SQL query
optimizer tries hard to execute them as few times as possible. This
will give expected results when you are computing the value of the
variable based on some data in the tables, but when the value that you
are assigning depends on the previous value of the same variable, the
results may be quite unexpected. If the query optimizer moves the
expression to a different place in the query tree, it may get
evaluated less times (or just once, as in one of your examples). This
is why we don't recommend using the "iteration" type assignments to
compute aggregate values. We find that XML-based workarounds ... usually work well for the
customers
Example 4
Even without ORDER BY, we do not guarantee that #var = #var +
will produce the concatenated value for any statement
that affects multiple rows. The right-hand side of the expression can
be evaluated either once or multiple times during query execution and
the behavior as I said is plan dependent.
Example 5
The variable assignment with SELECT statement is a proprietary syntax
(T-SQL only) where the behavior is undefined or plan dependent if
multiple rows are produced. If you need to do the string concatenation
then use a SQLCLR aggregate or FOR XML query based concatenation or
other relational methods.
Seems a bit like this post: VARCHAR(MAX) acting weird when concatenating string
The conclusion there:
This approach to string concatenation does usually work but it isn't guaranteed.
The official line in the KB article for a similar issue is that "The correct behavior for an aggregate concatenation query is undefined."
I today ran into a really weird problem in SQL Server (both 2008R2 and 2012). I'm trying to build up a string using concatenation in combination with a select statement.
I have found workarounds, but I would really like to understand what's going on here and why it doesn't give me my expected result. Can someone explain it to me?
http://sqlfiddle.com/#!6/7438a/1
On request, also the code here:
-- base table
create table bla (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table without primary key on id column
create table bla2 (
[id] int identity(1,1),
[priority] int,
[msg] nvarchar(max),
[autofix] bit
)
-- table with nvarchar(1000) instead of max
create table bla3 (
[id] int identity(1,1) primary key,
[priority] int,
[msg] nvarchar(1000),
[autofix] bit
)
-- fill the three tables with the same values
insert into bla ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla2 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
insert into bla3 ([priority], [msg], [autofix])
values (1, 'A', 0),
(2, 'B', 0)
;
declare #a nvarchar(max) = ''
declare #b nvarchar(max) = ''
declare #c nvarchar(max) = ''
declare #d nvarchar(max) = ''
declare #e nvarchar(max) = ''
declare #f nvarchar(max) = ''
-- I expect this to work and generate 'AB', but it doesn't
select #a = #a + [msg]
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: convert nvarchar(4000)
select #b = #b + convert(nvarchar(4000),[msg])
from bla
where autofix = 0
order by [priority] asc
-- this DOES work: without WHERE clause
select #c = #c + [msg]
from bla
--where autofix = 0
order by [priority] asc
-- this DOES work: without the order by
select #d = #d + [msg]
from bla
where autofix = 0
--order by [priority] asc
-- this DOES work: from bla2, so without the primary key on id
select #e = #e + [msg]
from bla2
where autofix = 0
order by [priority] asc
-- this DOES work: from bla3, so with msg nvarchar(1000) instead of nvarchar(max)
select #f = #f + [msg]
from bla3
where autofix = 0
order by [priority] asc
select #a as a, #b as b, #c as c, #d as d, #e as e, #f as f
TLDR; This is not a documented/supported approach for concatenating strings across rows. It sometimes works but also sometimes fails as it depends what execution plan you get.
Instead use one of the following guaranteed approaches
SQL Server 2017+
SELECT #a = STRING_AGG([msg], '') WITHIN GROUP (ORDER BY [priority] ASC)
FROM bla
where autofix = 0
SQL Server 2005+
SELECT #a = (SELECT [msg] + ''
FROM bla
WHERE autofix = 0
ORDER BY [priority] ASC
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')
Background
The KB article already linked by VanDerNorth does include the line
The correct behavior for an aggregate concatenation query is
undefined.
but then goes on to muddy the waters a bit by providing a workaround that does seem to indicate deterministic behavior is possible.
In order to achieve the expected results from an aggregate
concatenation query, apply any Transact-SQL function or expression to
the columns in the SELECT list rather than in the ORDER BY clause.
Your problematic query does not apply any expressions to columns in the ORDER BY clause.
The 2005 article Ordering guarantees in SQL Server... does state
For backwards compatibility reasons, SQL Server provides support for
assignments of type SELECT #p = #p + 1 ... ORDER BY at the top-most
scope.
In the plans where the concatenation works as you expected the compute scalar with the expression [Expr1003] = Scalar Operator([#x]+[Expr1004]) appears above the sort.
In the plan where it fails to work the compute scalar appears below the sort. As explained in this connect item from 2006 when the expression #x = #x + [msg] appears below the sort it is evaluated for each row but all the evaluations end up using the pre assignment value of #x. In another similar Connect Item from 2006 the response from Microsoft spoke of "fixing" the issue.
The Microsoft Response on all the later Connect items on this issue (and there are many) state that this is simply not guaranteed
Example 1
we do not make any guarantees on the correctness of concatenation
queries (like using variable assignments with data retrieval in a
specific order). The query output can change in SQL Server 2008
depending on the plan choice, data in the tables etc. You shouldn't
rely on this working consistently even though the syntax allows you to
write a SELECT statement that mixes ordered rows retrieval with
variable assignment.
Example 2
The behavior you are seeing is by design. Using assignment operations
(concatenation in this example) in queries with ORDER BY clause has
undefined behavior. This can change from release to release or even
within a particular server version due to changes in the query plan.
You cannot rely on this behavior even if there are workarounds. See
the below KB article for more details:
http://support.microsoft.com/kb/287515 The ONLY guaranteed
mechanism are the following:
Use cursor to loop through the rows in specific order and concatenate the values
Use for xml query with ORDER BY to generate the concatenated values
Use CLR aggregate (this will not work with ORDER BY clause)
Example 3
The behavior you are seeing is actually by design. This has to do with
SQL being a set-manipulation language. All expressions in the SELECT
list (and this includes assignments too) are not guaranteed to be
executed exactly once for each output row. In fact, SQL query
optimizer tries hard to execute them as few times as possible. This
will give expected results when you are computing the value of the
variable based on some data in the tables, but when the value that you
are assigning depends on the previous value of the same variable, the
results may be quite unexpected. If the query optimizer moves the
expression to a different place in the query tree, it may get
evaluated less times (or just once, as in one of your examples). This
is why we don't recommend using the "iteration" type assignments to
compute aggregate values. We find that XML-based workarounds ... usually work well for the
customers
Example 4
Even without ORDER BY, we do not guarantee that #var = #var +
will produce the concatenated value for any statement
that affects multiple rows. The right-hand side of the expression can
be evaluated either once or multiple times during query execution and
the behavior as I said is plan dependent.
Example 5
The variable assignment with SELECT statement is a proprietary syntax
(T-SQL only) where the behavior is undefined or plan dependent if
multiple rows are produced. If you need to do the string concatenation
then use a SQLCLR aggregate or FOR XML query based concatenation or
other relational methods.
Seems a bit like this post: VARCHAR(MAX) acting weird when concatenating string
The conclusion there:
This approach to string concatenation does usually work but it isn't guaranteed.
The official line in the KB article for a similar issue is that "The correct behavior for an aggregate concatenation query is undefined."
I have a table with this data in SQL Server :
Id
=====
1
12e
5
and I want to order this data like this:
id
====
1
5
12e
My id column is of type nvarchar(50) and I can't convert it to int.
Is this possible that I sort the data in this way?
As a general rule, if you ever find yourself manipulating parts of columns, you're almost certainly doing it wrong.
If your ID is made up of a numeric and alpha component and you need to fiddle with just the numeric bit, make it two columns and save yourself some angst. In that case, you have an integral id_numeric and a varchar id_alpha and your query is simply:
select char(id_numeric) | id_alpha as id
from mytable
order by id_numeric asc
Or, if you really must store that as a single column, create extra columns to hold the individual parts and use those for sorting and selection. But, in order to mitigate the problems in having duplicate data in a row, use triggers to ensure the data remains consistent:
select id
from mytable
order by id_numeric asc
You usually don't want to have to do this splitting on every select since that never scales well. By doing it as an update/insert trigger, you only do the splitting when needed (ie, when the data changes) and this cost is amortised across all the selects. That's a good idea because, in the vast majority of cases, databases are read far more often than they're written.
And it's perfectly normal practice to revert to lesser levels of normalisation for performance reasons, provided that you understand and mitigate the consequences.
I'd actually use something along the lines of this function, though be warned that it's not going to be super-speedy. I've modified that function to return only the numbers:
CREATE FUNCTION dbo.UDF_ParseNumericChars
(
#string VARCHAR(8000)
)
RETURNS VARCHAR(8000)
WITH SCHEMABINDING
AS
BEGIN
DECLARE #IncorrectCharLoc SMALLINT
SET #IncorrectCharLoc = PATINDEX('%[^0-9]%', #string)
WHILE #IncorrectCharLoc > 0
BEGIN
SET #string = STUFF(#string, #IncorrectCharLoc, 1, '')
SET #IncorrectCharLoc = PATINDEX('%[^0-9]%', #string)
END
SET #string = #string
RETURN #string
END
GO
Once you create that function, then you can do your sort like this:
SELECT YourMixedColumn
FROM YourTable
ORDER BY CONVERT(INT, dbo.UDF_ParseNumericChars(YourMixedColumn))
It can be sort with the Len function
create table #temp (id nvarchar(50) null)
select * from #temp order by LEN(id)