Use ampersand in CAST in SQL - sql

The following code snippet on SQL server 2005 fails on the ampersand '&':
select cast('<name>Spolsky & Atwood</name>' as xml)
Does anyone know a workaround?
Longer explanation, I need to update some data in an XML column, and I'm using a search & replace type hack by casting the XML value to a varchar, doing the replace and updating the XML column with this cast.

select cast('<name>Spolsky & Atwood</name>' as xml)
A literal ampersand inside an XML tag is not allowed by the XML standard, and such a document will fail to parse by any XML parser.
An XMLSerializer() will output the ampersand HTML-encoded.
The following code:
using System.Xml.Serialization;
namespace xml
{
public class MyData
{
public string name = "Spolsky & Atwood";
}
class Program
{
static void Main(string[] args)
{
new XmlSerializer(typeof(MyData)).Serialize(System.Console.Out, new MyData());
}
}
}
will output the following:
<?xml version="1.0" encoding="utf-8"?>
<MyData
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<name>Spolsky & Atwood</name>
</MyData>
, with an & instead of &.

It's not valid XML. Use &:
select cast('<name>Spolsky & Atwood</name>' as xml)

You'd need to XML escape the text, too.
So let's backtrack and assume you're building that string as:
SELECT '<name>' + MyColumn + '</name>' FROM MyTable
you'd want to do something more like:
SELECT '<name>' + REPLACE( MyColumn, '&', '&' ) + '</name>' FROM MyTable
Of course, you probable should cater for the other entities thus:
SELECT '<name>' + REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( MyColumn, '&', '&' ), '''', '&apos;' ), '"', '"' ), '<', '<' ), '>', '>' ) + '</name>' FROM MyTable

When working with XML in SQL you're a lot safer using built-in functions instead of converting it manually.
The following code will build a proper SQL XML variable that looks like your desired output based on a raw string:
DECLARE #ExampleString nvarchar(40)
, #ExampleXml xml
SELECT #ExampleString = N'Spolsky & Atwood'
SELECT #ExampleXml =
(
SELECT 'Spolsky & Atwood' AS 'name'
FOR XML PATH (''), TYPE
)
SELECT #ExampleString , #ExampleXml

As John and Quassnoi state, & on it's own is not valid. This is because the ampersand character is the start of a character entity - used to specify characters that cannot be represented literally. There are two forms of entity - one specifies the character by name (e.g., &, or "), and one the specifies the character by it's code (I believe it's the code position within the Unicode character set, but not sure. e.g., " should represent a double quote).
Thus, to include a literal & in a HTML document, you must specify it's entity: &. Other common ones you may encounter are < for <, > for >, and " for ".

Related

Escape XML special characters upon convert

I have working csv splitter for my needs.
You can just grab and run it as is:
declare #t table(data varchar(max))
insert into #t select 'a,b,c,d'
insert into #t select 'e,,,h'
;with cte(xm) as
(
select convert(xml,'<f><e>' + replace(data,',', '</e><e>') + '</e></f>') as xm
from #t
)
select
xm.value('/f[1]/e[1]','varchar(32)'),
xm.value('/f[1]/e[2]','varchar(32)'),
xm.value('/f[1]/e[3]','varchar(32)'),
xm.value('/f[1]/e[4]','varchar(32)')
from cte
Only issue is, that if I introduce an XML sensitive character in the data, like &:
insert into #t select 'i,j,&,k'
It fails with error: character 24, illegal character
One solution is to replace & character to &amp on the fly, like this:
select convert(xml,'<f><e>' + replace(replace(data,'&','&amp'),',', '</e><e>') + '</e></f>') as xm
but there are several dozens of special XML characters which I need to escape upon convert, and I can't really nest dozens replace(replace(replace(... functions in there. That's what i did and it is messy.
How the above code can be modified to escape XML sensitive characters, and produce the same result?
Thanks!
You have got your answer by Martin Smith already. But I think, it is worth to place an answer here for followers. Want to provide some explanantion and furthermor, the rextester-link might not be reachable in future...
If you think of a string in a table like this ...
DECLARE #mockup TABLE(SomeXMLstring VARCHAR(100));
INSERT INTO #mockup VALUES('This is a string with forbidden characters like "<", ">" or "&"');
-- ... you can easily add XML-tags:
SELECT '<root>' + SomeXMLstring + '</root>'
FROM #mockup ;
--The result would look like XML
<root>This is a string with forbidden characters like "<", ">" or "&"</root>
--But it is not! You can test this, the CAST( AS XML) will fail:
SELECT CAST('<root>This is a string with forbidden characters like "<", ">" or "&"</root>' AS XML);
--Sometimes people try to do their own replaces and start to replace <, > and & with the corresponding entities <, > and &. But this will need a lot of replacements in order to be safe.
--But XML is doing all this for us implicitly
SELECT SomeXMLstring
FROM #mockup
FOR XML PATH('')
--This is the result
<SomeXMLstring>This is a string with forbidden characters like "<", ">" or "&"</SomeXMLstring>
--And the funny thing is: We can easily create a nameless element with AS [*]:
SELECT SomeXMLstring AS [*]
FROM #mockup
FOR XML PATH('')
--The result is the same, but without the tags:
This is a string with forbidden characters like "<", ">" or "&"
--Although this is looking like XML in SSMS, this will be implicitly casted to NVARCHAR(MAX) when used as a string.
--You can use this for implicit escaping of a string wherever you feel the need to build a XML with string concatenation:
SELECT CAST('<root>' + (SELECT SomeXMLstring AS [*] FOR XML PATH('')) + '</root>' AS XML)
FROM #mockup ;
To finally answer your question
This line must use the trick:
select convert(xml,'<f><e>' + replace((SELECT data AS [*] FOR XML PATH('')),',', '</e><e>') + '</e></f>') as xm

SQL Server - contains an invalid XML identifier as required by FOR XML;

I'm running this query and getting below mentioned error. Can anyone help?
Column name 'Middle Name' contains an invalid XML identifier as required by
FOR XML; ' '(0x0020) is the first character at fault.
SELECT
Username as [LastName],
'' AS [Middle Name],
'' AS Birthdate,
'' AS [SSN],
0 AS [Wage Amount]
FROM
Employee
FOR XML PATH
You can't have spaces in XML element or attribute names. Use
SELECT Username AS [LastName],
'' AS [MiddleName],
'' AS Birthdate,
'' AS [SSN],
0 AS [WageAmount]
FROM Employee
FOR XML PATH
For the simplest case, Smith's solution works all right.
Since I have constraint to keep the chars, such as space, #, ', /, etc, on my XML, finally I solved this by encoding the identifier using Base64. (Just be careful the length of the name can not depass 128 bit) Then outside where the XML would be read as input data, another small code will translate Base64 easily to original string.
CREATE FUNCTION [dbo].[ufn_string_To_BASE64]
(
#inputString VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
RETURN (
SELECT
CAST(N'' AS XML).value(
'xs:base64Binary(xs:hexBinary(sql:column("bin")))'
, 'VARCHAR(MAX)'
) Base64Encoding
FROM (
SELECT CAST(#inputString AS VARBINARY(MAX)) AS bin
) AS bin_sql_server_temp
)
END
GO
It's important to take VARCHAR, taht will bring us shorter Base64 code.
You could add char(10), char(13) in the identifier as well.
Dynamic SQL could be help to build a temporaire table to stock intermediate data.
In my case, C# decodes the Base64 to string
if (value.StartsWith("_"))
{
var base64Encoded = value.Substring(1).Replace('_','=');
try
{
var data = System.Convert.FromBase64String(base64Encoded);
value = Encoding.GetEncoding(1252).GetString(data);
}
catch (Exception e)
{
log.LogInformation(e.Message);
}
}
Be care of:
XML identifier could not start with numbers, so prefix with _ and remove them in c#.
= is not accepted in XML identifier. That should be replace by something else like _.
The Encoding in the original string when decode to string, using the right Encoding codepage, like 1252 for French char.
In real that would be more complexe than what's talking here.

SQL Server : xml string to rows

I'm trying to convert a string to rows using T-SQL. I've found some people using XML for this but I'm running into troubles.
The original record:
A new line seperated string of data
New In Progress Left Message On Hold Researching Researching (2nd Level) Researching (3rd Level) Resolved Positive False Positive Security Respond
Using the following statement converts this string into XML:
select
cast('<i>'+REPLACE(convert(varchar(max), list_items), CHAR(13) + CHAR(10),'</i><i>')+'</i>' as xml)
from
field
where
column_name = 'state' and table_name = 'sv_inquiry'
XML string:
<i>Unassigned</i><i>Assigned</i><i>Transferred</i><i>Accepted</i><i>Closed</i><i>Reactivated</i>
Now I would like to convert every 'i' node into a separate row. I've constructed the query below, but I can't get it working in the way that it returns all the rows...
select x.i.value('i[1]', 'varchar(30)')
from (
select cast('<i>'+REPLACE(convert(varchar(max), list_items), CHAR(13) + CHAR(10),'</i><i>')+'</i>' as xml)
from field
where column_name='state' and table_name='sv_inquiry'
) x(i)
This will return
Unassigned
To be clear, when i change 'i[1]' into 'i[2]' it will return 'Assigned'. I've tried '.', this will return the whole string in a single record...
How about using the nodes method on an XML datatype.
declare #xml xml
set #xml = '<i>Unassigned</i><i>Assigned</i><i>Transferred</i><i>Accepted</i><i>Closed</i><i>Reactivated</i>'
select
t.c.value('.', 'nvarchar(100)') as [Word]
from
#xml.nodes('/i') as t(c)
You can split a string into rows without XML, see for example the fnSplitString function at SQL Server Central.
Here's an example using the nodes() function of the xml type. I'm using a space as the delimiter because SQL Fiddle doesn't play well with line feeds:
select node_column.value('.', 'varchar(max)')
from (
select cast('<i>' + replace(list_items, ' ', '</i><i>') +
'</i>' as xml) xml_value
from field
) f
cross apply
xml_value.nodes('/i') node_table(node_column);
Live example at SQL Fiddle.

XML Path expression to include Special Characters

I am trying this SQL to get the firstname and lastname from SQL Server 2008 tables using XML Path expression. The data contains special characters. When I try the sql, I get an error the following error:
FOR XML could not serialize the data for node 'LastName' because it contains a character (0x001B) which is not allowed in XML. To retrieve this data using FOR XML, convert it to binary, varbinary or image data type and use the BINARY BASE64 directive
How can I rewrite the SQL to include these characters in the xml ( maybe as CDATA?)
SELECT (
SELECT A1.FirstName
, A1.LastName
FROM dbo.kc_consumer AS A1
FOR XML PATH('Consumer') , TYPE)
AS ConsumerData
FOR XML PATH('Element'), ROOT('Elements')
I tested this with ASCII characters 0-255 and found out that you get this error for characters: 0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008, 0x000B, 0x000C, 0x000E, 0x000F, 0x0010, 0x0011, 0x0012, 0x0013, 0x0014, 0x0015, 0x0016, 0x0017, 0x0018, 0x0019, 0x001A, 0x001B, 0x001C, 0x001D, 0x001E, 0x001F.
One workaround is to remove , TYPE from your XML statement.
Another way is to remove those characters in the select statement:
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE(
< YOUR EXPRESSION TO BE CLEANED >
,char(0x0000),'') ,char(0x0001),'') ,char(0x0002),'') ,char(0x0003),'') ,char(0x0004),'')
,char(0x0005),'') ,char(0x0006),'') ,char(0x0007),'') ,char(0x0008),'') ,char(0x000B),'')
,char(0x000C),'') ,char(0x000E),'') ,char(0x000F),'') ,char(0x0010),'') ,char(0x0011),'')
,char(0x0012),'') ,char(0x0013),'') ,char(0x0014),'') ,char(0x0015),'') ,char(0x0016),'')
,char(0x0017),'') ,char(0x0018),'') ,char(0x0019),'') ,char(0x001A),'') ,char(0x001B),'')
,char(0x001C),'') ,char(0x001D),'') ,char(0x001E),'') ,char(0x001F),'')
You could also create a function with these replace statements.
Pull the TYPE directive into the outer query. Using it bypasses the character escaping that SQL Server does in a normal FOR XML statement, but once your results are escaped (using FOR XML without TYPE), your results can be included in an XML TYPE directive statement. Edit: The original fiddle has died somehow. It's unstable. Instead, here's a block of code that works.
DECLARE #kc_consumer table (FirstName VARCHAR(20), LastName VARCHAR(20))
INSERT INTO #kc_consumer VALUES
('John','Smith' + NCHAR(27))
, ('Jane','123ú♂
2⌂¶2<PZdûá╚' + NCHAR(27))
SELECT
(
SELECT
(SELECT A1.FirstName + '' FOR XML PATH('')) FirstName
, (SELECT A1.LastName + '' FOR XML PATH('')) LastName
FROM #kc_consumer AS A1
FOR XML PATH('Consumer'), TYPE
)
FOR XML PATH('Element'), ROOT('Elements'), TYPE;

Why does SQL display an & as &? [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
help with FOR XML PATH('') escaping “special” characters
I need some assistance, my query is below:
STUFF(
(
SELECT ',' + CountDesc
FROM Count INNER JOIN ProjectCount ON Count.Id = ProjectCount.CountId
WHERE ProjectCount.ProjectId = Project.Id ORDER BY Count.CountDesc
FOR XML PATH('')
), 1, 1, '') as [Country]
What happens is when i run this query and the Count table has an & in one of its fields, it displays the & as &.
Is there anyway to not let this happen?
Thanks in advance.
It is happening because the strings being combined in the XML statement are using XML specific characters. In addition to &, the also affects < and >, and probably other characters.
I usually fix this be doing a replace after the call:
select #str = replace(#str, '&', '&')
And nesting the replaces for additional characters.
Per Section 2.4 of the XML spec, & must be escaped except for in a few special cases (e.g. within a comment or CDATA section). If the & wasn't displayed as &, the XML would be invalid.