SQL Server - contains an invalid XML identifier as required by FOR XML; - sql

I'm running this query and getting below mentioned error. Can anyone help?
Column name 'Middle Name' contains an invalid XML identifier as required by
FOR XML; ' '(0x0020) is the first character at fault.
SELECT
Username as [LastName],
'' AS [Middle Name],
'' AS Birthdate,
'' AS [SSN],
0 AS [Wage Amount]
FROM
Employee
FOR XML PATH

You can't have spaces in XML element or attribute names. Use
SELECT Username AS [LastName],
'' AS [MiddleName],
'' AS Birthdate,
'' AS [SSN],
0 AS [WageAmount]
FROM Employee
FOR XML PATH

For the simplest case, Smith's solution works all right.
Since I have constraint to keep the chars, such as space, #, ', /, etc, on my XML, finally I solved this by encoding the identifier using Base64. (Just be careful the length of the name can not depass 128 bit) Then outside where the XML would be read as input data, another small code will translate Base64 easily to original string.
CREATE FUNCTION [dbo].[ufn_string_To_BASE64]
(
#inputString VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
RETURN (
SELECT
CAST(N'' AS XML).value(
'xs:base64Binary(xs:hexBinary(sql:column("bin")))'
, 'VARCHAR(MAX)'
) Base64Encoding
FROM (
SELECT CAST(#inputString AS VARBINARY(MAX)) AS bin
) AS bin_sql_server_temp
)
END
GO
It's important to take VARCHAR, taht will bring us shorter Base64 code.
You could add char(10), char(13) in the identifier as well.
Dynamic SQL could be help to build a temporaire table to stock intermediate data.
In my case, C# decodes the Base64 to string
if (value.StartsWith("_"))
{
var base64Encoded = value.Substring(1).Replace('_','=');
try
{
var data = System.Convert.FromBase64String(base64Encoded);
value = Encoding.GetEncoding(1252).GetString(data);
}
catch (Exception e)
{
log.LogInformation(e.Message);
}
}
Be care of:
XML identifier could not start with numbers, so prefix with _ and remove them in c#.
= is not accepted in XML identifier. That should be replace by something else like _.
The Encoding in the original string when decode to string, using the right Encoding codepage, like 1252 for French char.
In real that would be more complexe than what's talking here.

Related

String or binary data would be truncated, but everything is varchar(max)?

I have a query to pivot some tabular data into a JSON-esque string array using a query similar to this:
SELECT cr.TableId AS Id,
cast(quotename(
stuff(
(SELECT cast(',' AS VARCHAR(max)) + quotename(cast(li.ImageUrl AS VARCHAR(max)), '"')
FROM [DSP].[CreativeLibraryImage] cli WITH (nolock)
INNER JOIN [DSP].[LibraryImage] li WITH (nolock)
ON li.LibraryImageId = cli.[CreativeLibraryImageId]
WHERE cli.TableId = cr.TableId
FOR xml path('')
), 1, 1, ''
) --stuff
) --quotename
AS VARCHAR(max)
) --cast
AS ImagePaths
FROM MyTable cr WITH (nolock);
I am getting the "String or binary data would be truncated" error halfway through my query, but I think I'm casting all the varchar data to varchar(max) data, so I shouldn't have any string length issues. Clearly there's problem shoehorning this tabular data into a wide column, but I can't see where it could be truncating.
Can someone advise me on what conditions could cause this so that I can compensate accordingly?
Directly from BOL for quotename:
'character_string' Is a string of Unicode character data.
character_string is sysname and is limited to 128 characters. Inputs
greater than 128 characters return NULL.

Using Upper to Capitalize the first letter of City name

I am doing some data clean-up and need to Capitalize the first letter of City names. How do I capitalize the second word in a City Like Terra Bella.
SELECT UPPER(LEFT([MAIL CITY],1))+
LOWER(SUBSTRING([MAIL CITY],2,LEN([MAILCITY])))
FROM masterfeelisting
My results is this 'Terra bella' and I need 'Terra Bella'. Thanks in advance.
Ok, I know I answered this before, but it bugged me that we couldn't write something efficient to handle an unknown amount of 'text segments'.
So re-thinking it and researching, I discovered a way to change the [MAILCITY] field into XML nodes where each 'text segment' is assigned it's own Node within the xml field. Then those xml fields can be processed node by node, concatenated together, and then changed back to a SQL varchar. It's convoluted, but it works. :)
Here's the code:
CREATE TABLE
#masterfeelisting (
[MAILCITY] varchar(max) not null
);
INSERT INTO #masterfeelisting VALUES
('terra bellA')
,(' terrA novA ')
,('chicagO ')
,('bostoN')
,('porT dE sanTo')
,(' porT dE sanTo pallo ');
SELECT
RTRIM
(
(SELECT
UPPER([xmlField].[xmlNode].value('.', 'char(1)')) +
LOWER(STUFF([xmlField].[xmlNode].value('.', 'varchar(max)'), 1, 1, '')) + ' '
FROM [xmlNodeRecordSet].[nodeField].nodes('/N') as [xmlField]([xmlNode]) FOR
xml path(''), type
).value('.', 'varchar(max)')
) as [MAILCITY]
FROM
(SELECT
CAST('<N>' + REPLACE([MAILCITY],' ','</N><N>')+'</N>' as xml) as [nodeField]
FROM #masterfeelisting
) as [xmlNodeRecordSet];
Drop table #masterfeelisting;
First I create a table and fill it with dummy values.
Now here is the beauty of the code:
For each record in #masterfeelisting, we are going to create an xml field with a node for each 'text segment'.
ie. '<N></N><N>terrA</N><N>novA</N><N></N>'
(This is built from the varchar ' terrA novA ')
1) The way this is done is by using the REPLACE function.
The string starts with a '<N>' to designate the beginning of the node. Then:
REPLACE([MAILCITY],' ','</N><N>')
This effectively goes through the whole [MAILCITY] string and replaces each
' ' with '</N><N>'
and then the string ends with a '</N>'. Where '</N>' designates the end of each node.
So now we have a beautiful XML string with a couple of empty nodes and the 'text segments' nicely nestled in their own node. All the 'spaces' have been removed.
2) Then we have to CAST the string into xml. And we will name that field [nodeField]. Now we can use xml functions on our newly created record set. (Conveniently named [xmlNodeRecordSet].)
3) Now we can read the [xmlNodeRecordSet] into the main sub-Select by stating:
FROM [xmlNodeRecordSet].[nodeField].nodes('/N')
This tells us we are reading the [nodeField] as nodes with a '/N' delimiter.
This table of node fields is then parsed by stating:
as [xmlField]([xmlNode]) FOR xml path(''), type
This means each [xmlField] will be parsed for each [xmlNode] in the xml string.
4) So in the main sub-select:
Each blank node '<N></N>' is discarded. (Or not processed.)
Each node with a 'text segment' in it will be parsed. ie <N>terrA</N>
UPPER([xmlField].[xmlNode].value('.', 'char(1)')) +
This code will grab each node out of the field and take its contents '.' and only grab the first character 'char(1)'. Then it will Upper case that character. (the plus sign at the end means it will concatenate this letter with the next bit of code:
LOWER(STUFF([xmlField].[xmlNode].value('.', 'varchar(max)'), 1, 1, ''))
Now here is the beauty... STUFF is a function that will take a string, from a position, for a length, and substitute another string.
STUFF(string, start position, length, replacement string)
So our string is:
[xmlField].[xmlNode].value('.', 'varchar(max)')
Which grabs the whole string inside the current node since it is 'varchar(max)'.
The start position is 1. The length is 1. And the replacement string is ''. This effectively strips off the first character by replacing it with nothing. So the remaining string is all the other characters that we want to have lower case. So that's what we do... we use LOWER to make them all lower case. And this result is concatenated to our first letter that we already upper cased.
But wait... we are not done yet... we still have to append a + ' '. Which adds a blank space after our nicely capitalized 'text segment'. Just in case there is another 'text segment' after this node is done.
This main sub-Select will now parse each node in our [xmlField] and concatenate them all nicely together.
5) But now that we have one big happy concatenation, we still have to change it back from an xml field to a SQL varchar field. So after the main sub-select we need:
.value('.', 'varchar(max)')
This changes our [MAILCITY] back to a SQL varchar.
6) But hold on... we still are not done. Remember we put an extra space at the end of each 'text segment'??? Well the last 'text segment still has that extra space after it. So we need to Right Trim that space off by using RTRIM.
7) And dont forget to rename the final field back to as [MAILCITY]
8) And that's it. This code will take an unknown amount of 'text segments' and format each one of them. All using the fun of XML and it's node parsers.
Hope that helps :)
Here's one way to handle this using APPLY. Note that this solution supports up to 3 substrings (e.g. "Phoenix", "New York", "New York City") but can easily be updated to handle more.
DECLARE #string varchar(100) = 'nEW yoRk ciTY';
WITH DELIMCOUNT(String, DC) AS
(
SELECT #string, LEN(RTRIM(LTRIM(#string)))-LEN(REPLACE(RTRIM(LTRIM(#string)),' ',''))
),
CIPOS AS
(
SELECT *
FROM DELIMCOUNT
CROSS APPLY (SELECT CHARINDEX(char(32), string, 1)) CI1(CI1)
CROSS APPLY (SELECT CHARINDEX(char(32), string, CI1.CI1+1)) CI2(CI2)
)
SELECT
OldString = #string,
NewString =
CASE DC
WHEN 0 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,8000))
WHEN 1 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,CI1-1)) +
UPPER(SUBSTRING(string,CI1+1,1))+LOWER(SUBSTRING(string,CI1+2,100))
WHEN 2 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,CI1-1)) +
UPPER(SUBSTRING(string,CI1+1,1))+LOWER(SUBSTRING(string,CI1+2,CI2-(CI1+1))) +
UPPER(SUBSTRING(string,CI2+1,1))+LOWER(SUBSTRING(string,CI2+2,100))
END
FROM CIPOS;
Results:
OldString NewString
--------------- --------------
nEW yoRk ciTY New York City
This will only capitalize the first letter of the second word. A shorter but less flexible approach. Replace #str with [Mail City].
DECLARE #str AS VARCHAR(50) = 'Los angelas'
SELECT STUFF(#str, CHARINDEX(' ', #str) + 1, 1, UPPER(SUBSTRING(#str, CHARINDEX(' ', #str) + 1, 1)));
This is a way to use imbedded Selects for three City name parts.
It uses CHARINDEX to find the location of your separator character. (ie a space)
I put an 'if' structure around the Select to test if you have any records with more than 3 parts to the city name. If you ever get the warning message, you could add another sub-Select to handle another city part.
Although... just to be clear... SQL is not the best language to do complicated formatting. It was written as a data retrieval engine with the idea that another program will take that data and massage it into a friendlier look and feel. It may be easier to handle the formatting in another program. But if you insist on using SQL and you need to account for city names with 5 or more parts... you may want to consider using Cursors so you can loop through the variable possibilities. (But Cursors are not a good habit to get into. So don't do that unless you've exhausted all other options.)
Anyway, the following code creates and populates a table so you can test the code and see how it works. Enjoy!
CREATE TABLE
#masterfeelisting (
[MAILCITY] varchar(30) not null
);
Insert into #masterfeelisting select 'terra bella';
Insert into #masterfeelisting select ' terrA novA ';
Insert into #masterfeelisting select 'chicagO ';
Insert into #masterfeelisting select 'bostoN';
Insert into #masterfeelisting select 'porT dE sanTo';
--Insert into #masterfeelisting select ' porT dE sanTo pallo ';
Declare #intSpaceCount as integer;
SELECT #intSpaceCount = max (len(RTRIM(LTRIM([MAILCITY]))) - len(replace([MAILCITY],' ',''))) FROM #masterfeelisting;
if #intSpaceCount > 2
SELECT 'You need to account for more than 3 city name parts ' as Warning, #intSpaceCount as SpacesFound;
else
SELECT
cThird.[MAILCITY1] + cThird.[MAILCITY2] + cThird.[MAILCITY3] as [MAILCITY]
FROM
(SELECT
bSecond.[MAILCITY1] as [MAILCITY1]
,SUBSTRING(bSecond.[MAILCITY2],1,bSecond.[intCol2]) as [MAILCITY2]
,UPPER(SUBSTRING(bSecond.[MAILCITY2],bSecond.[intCol2] + 1, 1)) +
SUBSTRING(bSecond.[MAILCITY2],bSecond.[intCol2] + 2,LEN(bSecond.[MAILCITY2]) - bSecond.[intCol2]) as [MAILCITY3]
FROM
(SELECT
SUBSTRING(aFirst.[MAILCITY],1,aFirst.[intCol1]) as [MAILCITY1]
,UPPER(SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 1, 1)) +
SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 2,LEN(aFirst.[MAILCITY]) - aFirst.[intCol1]) as [MAILCITY2]
,CHARINDEX ( ' ', SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 1, LEN(aFirst.[MAILCITY]) - aFirst.[intCol1]) ) as intCol2
FROM
(SELECT
UPPER (LEFT(RTRIM(LTRIM(mstr.[MAILCITY])),1)) +
LOWER(SUBSTRING(RTRIM(LTRIM(mstr.[MAILCITY])),2,LEN(RTRIM(LTRIM(mstr.[MAILCITY])))-1)) as [MAILCITY]
,CHARINDEX ( ' ', RTRIM(LTRIM(mstr.[MAILCITY]))) as intCol1
FROM
#masterfeelisting as mstr -- Initial Master Table
) as aFirst -- First Select Shell
) as bSecond -- Second Select Shell
) as cThird; -- Third Select Shell
Drop table #masterfeelisting;

SQL XML parsing split the string on letters 'TH'

I want to split a string into multiple values based on a special symbol. e.g., Here is the string
JdwnrhþTHIMPHUþOTHþþ10/1991þ02/02/2011þBHUTAN
I want it to be:
Jdwnrh THIMPHU OTH 10/1991 02/02/2011 BHUTAN
I am using the following SQL:
DECLARE #delimiter VARCHAR(50)
SET #delimiter='þ'
;WITH CTE AS
(
SELECT
CAST('<M>' + REPLACE(REPLACE(CAST(DATA as nvarchar(MAX)), #delimiter , '</M><M>'), '&', '&') + '</M>' AS XML)
AS BDWCREGPREVADDR_XML
FROM [JACS_RAVEN_DATA_OLD].dbo.BDWCREGPREVADDR
)
SELECT
BDWCREGPREVADDR_XML.value('/M[1]', 'varchar(50)') As streetNo,
BDWCREGPREVADDR_XML.value('/M[2]', 'varchar(50)') As suburb,
BDWCREGPREVADDR_XML.value('/M[3]', 'varchar(3)') As stateCode,
BDWCREGPREVADDR_XML.value('/M[4]', 'varchar(10)') As postalCode,
BDWCREGPREVADDR_XML.value('/M[7]', 'varchar(50)') As country,
BDWCREGPREVADDR_XML.value('/M[5]', 'varchar(50)') As dateFrom,
BDWCREGPREVADDR_XML.value('/M[6]', 'varchar(50)') As dateTo
FROM CTE
GO
The query works well on all the strings other than the one provided as an example. For above the string, the query returns the following:
'Jdwnrh' ' ' 'IMPHU' 'O' ' ' '10/1991' '02/02/2011' 'BHUTAN'
It seems the code takes letters 'TH' as a new attribute and split the string on it. Does anyone know how to resolve this issue?
This seems to be related to your collation. In Latin1_General_CS_AS, the þ character is considered equivalent to th (because it's an Old English letter that sounds like "th" when pronounced).
print replace('thornþ' collate Latin1_General_CS_AS,'þ','1')
' output: 1orn1
This is not the case for all collations; for example, in Latin1_General_BIN they are separate:
print replace('thornþ' collate Latin1_General_BIN,'þ','1')
' output: thorn1
So perhaps you could look at changing the collation of the column which contains the þ characters.
the key(þ) is wrong ,if you change another word ,it's ok. when use key(þ) and key(z), there are two results:
enter image description here
enter image description here
I think maybe the key(þ) has some special meaning. hope to help you

Extract e-mail address from string

I have table called Entities with the column CustomData.
I need to extract the email address from each row.
Also if value is null I need to to show as null.
Sample rows from CustomData:
Id CustomData Name
273 [{"Name":"Customer","Value":"test customer"},{"Name":"Address","Value":null},{"Name":"Email","Value":null},{"Name":"Company Name","Value":null},{"Name":"Other Phone","Value":null}] 2323123213
274 [{"Name":"Customer","Value":"Cash Sale"},{"Name":"Address","Value":null},{"Name":"Email","Value":"test#outlook.com"},{"Name":"Company Name","Value":null},{"Name":"Other Phone","Value":null}] 2222222222
This is the string i will be using to update my system.
I have previously achieved selecting the phone number form this same data but it was a fixed length. I can't seem to pull the e-mail address.
I will post a couple of the different methods I have tried so far once im back at my PC
Well, in such a denormalized data your only option is to parse it and try to get email. Most elegant way - is to use json parser, but it is not awailable in current versions of sql server, so you have to parse it manually.
Assuming each record for email starts with {"Name":"Email","Value":, you can do it in a few steps:
Find position of {"Name":"Email","Value": in your string.
Find first occurence of } in the right remainder of the string.
Get substring in between.
Check if it is string equals to 'null' - then return null, otherwise return string itself.
So it can be done like in this snippet:
declare #data nvarchar(max), #pattern nvarchar(max)
select #data = '[{"Name":"Customer","Value":"test customer"},
{"Name":"Address","Value":null},
{"Name":"Email","Value":null},
{"Name":"Company Name","Value":null},
{"Name":"Other Phone","Value":null}]'
select #pattern = '{"Name":"Email","Value":'
select nullif(substring(#data,
charindex(#pattern, #data, 0) + len(#pattern),
charindex('}', #data, charindex(#pattern, #data, 0))
- charindex(#pattern, #data, 0) - len(#pattern)
), 'null')

Use ampersand in CAST in SQL

The following code snippet on SQL server 2005 fails on the ampersand '&':
select cast('<name>Spolsky & Atwood</name>' as xml)
Does anyone know a workaround?
Longer explanation, I need to update some data in an XML column, and I'm using a search & replace type hack by casting the XML value to a varchar, doing the replace and updating the XML column with this cast.
select cast('<name>Spolsky & Atwood</name>' as xml)
A literal ampersand inside an XML tag is not allowed by the XML standard, and such a document will fail to parse by any XML parser.
An XMLSerializer() will output the ampersand HTML-encoded.
The following code:
using System.Xml.Serialization;
namespace xml
{
public class MyData
{
public string name = "Spolsky & Atwood";
}
class Program
{
static void Main(string[] args)
{
new XmlSerializer(typeof(MyData)).Serialize(System.Console.Out, new MyData());
}
}
}
will output the following:
<?xml version="1.0" encoding="utf-8"?>
<MyData
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<name>Spolsky & Atwood</name>
</MyData>
, with an & instead of &.
It's not valid XML. Use &:
select cast('<name>Spolsky & Atwood</name>' as xml)
You'd need to XML escape the text, too.
So let's backtrack and assume you're building that string as:
SELECT '<name>' + MyColumn + '</name>' FROM MyTable
you'd want to do something more like:
SELECT '<name>' + REPLACE( MyColumn, '&', '&' ) + '</name>' FROM MyTable
Of course, you probable should cater for the other entities thus:
SELECT '<name>' + REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( MyColumn, '&', '&' ), '''', '&apos;' ), '"', '"' ), '<', '<' ), '>', '>' ) + '</name>' FROM MyTable
When working with XML in SQL you're a lot safer using built-in functions instead of converting it manually.
The following code will build a proper SQL XML variable that looks like your desired output based on a raw string:
DECLARE #ExampleString nvarchar(40)
, #ExampleXml xml
SELECT #ExampleString = N'Spolsky & Atwood'
SELECT #ExampleXml =
(
SELECT 'Spolsky & Atwood' AS 'name'
FOR XML PATH (''), TYPE
)
SELECT #ExampleString , #ExampleXml
As John and Quassnoi state, & on it's own is not valid. This is because the ampersand character is the start of a character entity - used to specify characters that cannot be represented literally. There are two forms of entity - one specifies the character by name (e.g., &, or "), and one the specifies the character by it's code (I believe it's the code position within the Unicode character set, but not sure. e.g., " should represent a double quote).
Thus, to include a literal & in a HTML document, you must specify it's entity: &. Other common ones you may encounter are < for <, > for >, and " for ".