XML Path expression to include Special Characters - sql

I am trying this SQL to get the firstname and lastname from SQL Server 2008 tables using XML Path expression. The data contains special characters. When I try the sql, I get an error the following error:
FOR XML could not serialize the data for node 'LastName' because it contains a character (0x001B) which is not allowed in XML. To retrieve this data using FOR XML, convert it to binary, varbinary or image data type and use the BINARY BASE64 directive
How can I rewrite the SQL to include these characters in the xml ( maybe as CDATA?)
SELECT (
SELECT A1.FirstName
, A1.LastName
FROM dbo.kc_consumer AS A1
FOR XML PATH('Consumer') , TYPE)
AS ConsumerData
FOR XML PATH('Element'), ROOT('Elements')

I tested this with ASCII characters 0-255 and found out that you get this error for characters: 0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008, 0x000B, 0x000C, 0x000E, 0x000F, 0x0010, 0x0011, 0x0012, 0x0013, 0x0014, 0x0015, 0x0016, 0x0017, 0x0018, 0x0019, 0x001A, 0x001B, 0x001C, 0x001D, 0x001E, 0x001F.
One workaround is to remove , TYPE from your XML statement.
Another way is to remove those characters in the select statement:
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE(
< YOUR EXPRESSION TO BE CLEANED >
,char(0x0000),'') ,char(0x0001),'') ,char(0x0002),'') ,char(0x0003),'') ,char(0x0004),'')
,char(0x0005),'') ,char(0x0006),'') ,char(0x0007),'') ,char(0x0008),'') ,char(0x000B),'')
,char(0x000C),'') ,char(0x000E),'') ,char(0x000F),'') ,char(0x0010),'') ,char(0x0011),'')
,char(0x0012),'') ,char(0x0013),'') ,char(0x0014),'') ,char(0x0015),'') ,char(0x0016),'')
,char(0x0017),'') ,char(0x0018),'') ,char(0x0019),'') ,char(0x001A),'') ,char(0x001B),'')
,char(0x001C),'') ,char(0x001D),'') ,char(0x001E),'') ,char(0x001F),'')
You could also create a function with these replace statements.

Pull the TYPE directive into the outer query. Using it bypasses the character escaping that SQL Server does in a normal FOR XML statement, but once your results are escaped (using FOR XML without TYPE), your results can be included in an XML TYPE directive statement. Edit: The original fiddle has died somehow. It's unstable. Instead, here's a block of code that works.
DECLARE #kc_consumer table (FirstName VARCHAR(20), LastName VARCHAR(20))
INSERT INTO #kc_consumer VALUES
('John','Smith' + NCHAR(27))
, ('Jane','123ú♂
2⌂¶2<PZdûá╚' + NCHAR(27))
SELECT
(
SELECT
(SELECT A1.FirstName + '' FOR XML PATH('')) FirstName
, (SELECT A1.LastName + '' FOR XML PATH('')) LastName
FROM #kc_consumer AS A1
FOR XML PATH('Consumer'), TYPE
)
FOR XML PATH('Element'), ROOT('Elements'), TYPE;

Related

Function to replace all non alpha-numeric and multiple whitespace characters with a single space

I am trying to write an efficient function to use in a calculated field which has the following characteristics
Replace all non alpha numeric characters with space
Replace multiple white spaces with a space
Trim and lower the results
Example input
A B##%$$C &^%D
Example output
a b c d
A normal regex pattern would match like so
[\W_]+
The following works, however I am not sure if there is a more efficient approach than using 2 loops ( O(n2) complexity at least) with PatIndex and Stuff, charindex and replace
Create Function [dbo].[Clean](#Temp nvarchar(1000))
Returns nvarchar(1000)
AS
Begin
Declare #Pattern as varchar(50) = '%[^a-z0-9 ]%'
While PatIndex(#Pattern, #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex(#Pattern, #Temp), 1, ' ')
while charindex(' ',#Temp ) > 0
set #Temp = replace(#Temp, ' ', ' ')
Return LOWER(TRIM(#Temp))
End
Usage
Select dbo.Clean(' A B##%$$C &^%D ')
Result
a b c d
Is there potentially a single pass approach I can use, or a sneaky method I am not aware of?
I'm not able to test the performance, but the following approach (without loops and based on some string manipulations) is an additional option.
Note, that you'll need at least SQL Server 2017 (for the TRANSLATE() call).
-- Input text and patterns
DECLARE #text varchar(1000) = ' A B##%$$C &^%D'
DECLARE #alphanumericpattern varchar(36) = 'abcdefghijklmnopqrstuvwxyz0123456789'
DECLARE #notalphanumericpattern varchar(1000)
-- Trim and lower the input text
SELECT #text = RTRIM(LTRIM(LOWER(#text)))
-- Get not alpha-numeric characters
SELECT #notalphanumericpattern =
REPLACE(
TRANSLATE(#text, #alphanumericpattern, REPLICATE('a', LEN(#alphanumericpattern))),
'a',
''
)
-- Replace all not alpha-numeric characters with a space
SELECT #text =
REPLACE(
TRANSLATE(#text, #notalphanumericpattern, REPLICATE('$', LEN(#notalphanumericpattern))),
'$',
' '
)
-- Replace multiple spaces with a single space
SELECT #text =
REPLACE(
REPLACE(
REPLACE(
#text,
' ',
'<>'
),
'><',
''
),
'<>',
' '
)
Result:
a b c d

Replace function in SQL Server

I have a string of data
'["Dog",,,1,"Person","2020-03-17",,4,"Todd]'
I am trying to use the replace function to replace double commas with NULL values
Solution
'["Dog",NULL,NULL,1,"Person","2020-03-17",NULL,4,"Todd]'
But I keep ending up with
'"Dog",NULL,,1,"Person","2020-03-17",NULL,4,"Todd'
(The ,,, needs to become ,NULL,NULL, but only becomes ,NULL,,)
Here is my sample code I'm using
REPLACE(FileData, ',,' , ',NULL,')
WHERE FileData LIKE '%,,%'
If you do the same replacement twice, any number of sequential commas will get handled.
REPLACE(REPLACE(FileData, ',,' , ',NULL,'), ',,' , ',NULL,')
The first REPLACE deals with all the odd positions...
',,,,,,,,'` => ',NULL,,NULL,,NULL,,NULL,'
Doing it again will deal with all of the remaining positions.
=> ',NULL,NULL,NULL,NULL,NULL,NULL,NULL,'
Note, by specifically handling a special case of three consecutive commas (as in an other answer here) you won't handle four or five or six, etc. The above solution generalises to Any length of consecutive commas.
To be fully robust, you may also need to consider when there is a missing NULL at the first or last place in the string.
[,ThatOneToMyLeft,and,ThatOneToMyRight,]
A laborious but robust approach could be to replace [, and ,] with [,, and ,,] respectively, then do the double-replacement, then undo the first steps...
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
FileData,
'[,',
'[,,'
),
',]',
',,]'
),
',,',
',NULL,'
),
',,',
',NULL,'
),
',]',
']',
),
'[,',
'['
)
There are ways to make even that less verbose, but I have to run right now :)
You can try the following:
REPLACE(REPLACE(FileData, ',,,' , ',NULL,,'), ',,' , ',NULL,')
Where FileData LIKE '%,,%'
You can create a function for your problem solving that associates to string replacement function.
Check this:
update table1
set column1 = dbo.ReplaceEx(column1, ',', 'NULL')
where column1 like '%,,%'
create function dbo.ReplaceEx(#string varchar(2000), #separator varchar(4), #nullValue varchar(10))
returns varchar(4000)
with execute as caller
as
begin
declare #result varchar(4000);
set #result = '';
select #result = concat_ws(#sep, #result,
case when rtrim(value) = '' then #nullValue
else case when ltrim(rtrim(value)) = '[' then '[' + #nullValue
else case when ltrim(rtrim(value)) = ']' then #nullValue + ']'
else value end end end
)
from string_split(#string, #separator);
return (#result);
end;

Seperate XML node by comma in SQL

I have a Column content like this:
CustomTags
<CustomTagsSerialiser>
<custom-tags>
<tag>Visas and travel</tag>
<tag>Explore Options</tag>
<tag>Consider – feasibility</tag>
</custom-tags>
</CustomTagsSerialiser>
I can query g.[CustomTags].value('(/CustomTagsSerialiser//custom-tags)[1]', 'nvarchar(500)') as Custom_Tag to get result like
Visas and travelExplore OptionsConsider – feasibility
But I want the result to have a tag separated by comma (in the same column), like the following:
Visas and travel,Explore Options,Consider – feasibility
Ideally, I would like this to be implemented by using XML functionality/node
instead of breaking it into + ',' + or coalesce
You may refere How Stuff and 'For Xml Path' work in Sql Server this answer.
try below
SELECT
STUFF((SELECT
',' + CTS.tag.value('(.)[1]', 'nvarchar(500)')
FROM
Temp12345
CROSS APPLY
col1.nodes('/CustomTagsSerialiser/custom-tags/tag') AS CTS(tag)
FOR XML PATH('')
), 1, 1, '')
should be this without using cross apply
STUFF((SELECT ',' + x.t.value('.', 'varchar(50)') FROM
[g].CustomTags.nodes('//tag') x(t) FOR XML PATH('')), 1, 1, '') AS 'Custom Tags'

SQL Server : xml string to rows

I'm trying to convert a string to rows using T-SQL. I've found some people using XML for this but I'm running into troubles.
The original record:
A new line seperated string of data
New In Progress Left Message On Hold Researching Researching (2nd Level) Researching (3rd Level) Resolved Positive False Positive Security Respond
Using the following statement converts this string into XML:
select
cast('<i>'+REPLACE(convert(varchar(max), list_items), CHAR(13) + CHAR(10),'</i><i>')+'</i>' as xml)
from
field
where
column_name = 'state' and table_name = 'sv_inquiry'
XML string:
<i>Unassigned</i><i>Assigned</i><i>Transferred</i><i>Accepted</i><i>Closed</i><i>Reactivated</i>
Now I would like to convert every 'i' node into a separate row. I've constructed the query below, but I can't get it working in the way that it returns all the rows...
select x.i.value('i[1]', 'varchar(30)')
from (
select cast('<i>'+REPLACE(convert(varchar(max), list_items), CHAR(13) + CHAR(10),'</i><i>')+'</i>' as xml)
from field
where column_name='state' and table_name='sv_inquiry'
) x(i)
This will return
Unassigned
To be clear, when i change 'i[1]' into 'i[2]' it will return 'Assigned'. I've tried '.', this will return the whole string in a single record...
How about using the nodes method on an XML datatype.
declare #xml xml
set #xml = '<i>Unassigned</i><i>Assigned</i><i>Transferred</i><i>Accepted</i><i>Closed</i><i>Reactivated</i>'
select
t.c.value('.', 'nvarchar(100)') as [Word]
from
#xml.nodes('/i') as t(c)
You can split a string into rows without XML, see for example the fnSplitString function at SQL Server Central.
Here's an example using the nodes() function of the xml type. I'm using a space as the delimiter because SQL Fiddle doesn't play well with line feeds:
select node_column.value('.', 'varchar(max)')
from (
select cast('<i>' + replace(list_items, ' ', '</i><i>') +
'</i>' as xml) xml_value
from field
) f
cross apply
xml_value.nodes('/i') node_table(node_column);
Live example at SQL Fiddle.

Use ampersand in CAST in SQL

The following code snippet on SQL server 2005 fails on the ampersand '&':
select cast('<name>Spolsky & Atwood</name>' as xml)
Does anyone know a workaround?
Longer explanation, I need to update some data in an XML column, and I'm using a search & replace type hack by casting the XML value to a varchar, doing the replace and updating the XML column with this cast.
select cast('<name>Spolsky & Atwood</name>' as xml)
A literal ampersand inside an XML tag is not allowed by the XML standard, and such a document will fail to parse by any XML parser.
An XMLSerializer() will output the ampersand HTML-encoded.
The following code:
using System.Xml.Serialization;
namespace xml
{
public class MyData
{
public string name = "Spolsky & Atwood";
}
class Program
{
static void Main(string[] args)
{
new XmlSerializer(typeof(MyData)).Serialize(System.Console.Out, new MyData());
}
}
}
will output the following:
<?xml version="1.0" encoding="utf-8"?>
<MyData
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<name>Spolsky & Atwood</name>
</MyData>
, with an & instead of &.
It's not valid XML. Use &:
select cast('<name>Spolsky & Atwood</name>' as xml)
You'd need to XML escape the text, too.
So let's backtrack and assume you're building that string as:
SELECT '<name>' + MyColumn + '</name>' FROM MyTable
you'd want to do something more like:
SELECT '<name>' + REPLACE( MyColumn, '&', '&' ) + '</name>' FROM MyTable
Of course, you probable should cater for the other entities thus:
SELECT '<name>' + REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( MyColumn, '&', '&' ), '''', '&apos;' ), '"', '"' ), '<', '<' ), '>', '>' ) + '</name>' FROM MyTable
When working with XML in SQL you're a lot safer using built-in functions instead of converting it manually.
The following code will build a proper SQL XML variable that looks like your desired output based on a raw string:
DECLARE #ExampleString nvarchar(40)
, #ExampleXml xml
SELECT #ExampleString = N'Spolsky & Atwood'
SELECT #ExampleXml =
(
SELECT 'Spolsky & Atwood' AS 'name'
FOR XML PATH (''), TYPE
)
SELECT #ExampleString , #ExampleXml
As John and Quassnoi state, & on it's own is not valid. This is because the ampersand character is the start of a character entity - used to specify characters that cannot be represented literally. There are two forms of entity - one specifies the character by name (e.g., &, or "), and one the specifies the character by it's code (I believe it's the code position within the Unicode character set, but not sure. e.g., " should represent a double quote).
Thus, to include a literal & in a HTML document, you must specify it's entity: &. Other common ones you may encounter are < for <, > for >, and " for ".