XML parse for special characters containing in Element - xml-validation

I want to read the below XML using XMLREADER.
<?xml version="1.0" encoding="ISO-8859-1" ?>
<Inforamtion>
<Name;Property>Name contain
</Name;Property>
<123>89</123>
<question?>
</question?>
</Inforamtion>
But it throws me error for special character containing in element name.
and element name's first char can't be a number.
I can have any such xml in bulk to process and correct it.
Please guide me how to process such XML or correct it or read it?
Thank You

Your xml is not valid.
This document :
What are the rules for a valid XML element name? will help you correct this XML.
A summary :
A Name is a token beginning with a letter or one of a few punctuation
characters ( [_] and [:]) , and continuing with letters, digits, hyphens, underscores,
colons, or full stops, together known as name characters.

Related

WIX How to include the equal signs and ampersand in the string table to avoid LGHT0104 error

I have a launch condition error string in String_en-US.wxl:
<WixLocalization Culture="en-us" Codepage="1252" xmlns="http://schemas.microsoft.com/wix/2006/localization">
<String Id="ERR_REQUIRED_APP_ABSENT">This product requires XXX to be on the system. Please download it from "https://knowledge.xxx.com/knowledge/llisapi.dll?func=ll&objId=59284919&objAction=browse&sort=name&viewType=1", install it and try again.</String>
</WixLocalization>
It seems having the ampersand signs (&) and the equal signs (=) cause the light error:
Strings_en-US.wxl(0,0): error LGHT0104: Not a valid localization file; detail: '=' is an unexpected token. The expected token is ';'. Line 36, position 172.
I even tried to escape them using = which is equivalent to the equal sign but it complaint about the ampersand. "How can I avoid the error?
CDATA: A CDATA section is "...a section of element content that is marked for the parser to interpret as only character data, not markup."
In this case, something like this:
<String Id="TEST1"><![CDATA[https://www.hi.com/one&two&three&v=1]]></String>
XML Escape Characters: XML escape characters are normally used for encoding special characters in XML documents. The escape character for & is & & (more) - CDATA is an alternative approach.
Links:
What characters do I need to escape in XML documents?
https://en.wikipedia.org/wiki/CDATA

Why does Replace '&' with '&' not work for XML data?

I need to download a XML file and its data is retrieved from stored procedure.
My problem is if the data contains any '&' symbol, in XML file it is showing as
'&'
I have used REPLACE function in my Procedure as shown below but...
SELECT #V_NAME = REPLACE(#V_NAME, ' & ', ' & ');
UPDATE #TMP_RS_XML
SET OBJECT_ID=#V_ID,
FNAME=#V_FILE,
DOCUMENT=(SELECT #V_NAME as 'Description',
...
Now, the output is:
&amp;
This is not the way this is supposed to work...
XML is not just some text with fancy extras but with very strict rules. As any text-based container you will need either magic words or special characters to tell the consumer what is the content and what is the markup.
The most important markup characters in XML are < and > - of course. If you want these characters to be part of your content, you'll have to replace them. That is done with xml entities.
Within the content, any XML entity will start with an ampersand (< comes out as <), therefore the ampersand is the third most important special character. If you want an ampersand within the content you must use an entitiy (&) as a code for in this place we want an ampersand.
You must distinguish between the text you see, when you look at the XML and the actual content taken out of the XML.
Try this:
DECLARE #SomeStringWithSpecialCharacters NVARCHAR(200)=N'This & that -> let''s see, why how some foreign characters behave: அரிச். And what about a line break?' + CHAR(13) + CHAR(10) + 'Here is the second line. And an unprintable?' + CHAR(2);
--Here we use FOR XML, all the escaping is done implicitly
SELECT #SomeStringWithSpecialCharacters AS TestIt FOR XML PATH('test');
The result
<test>
<TestIt>This & that -> let's see, why how some foreign characters behave: அரிச். And what about a line break?
Here is the second line. And an unprintable?</TestIt>
</test>
Now I take the XML as it came out of the first part and place it into a XML-typed variable.
Attention: I had to remove the  entity, check it out...
DECLARE #SomeXML XML=
N'<test>
<TestIt>This & that -> let''s see, why how some foreign characters behave: அரிச். And what about a line break?
Here is the second line. And an unprintable?</TestIt>
</test>';
--Now we do the magic using .value() against a native XML:
SELECT #SomeXML.value('(/test/TestIt/text())[1]','nvarchar(max)');
The result comes out with all entities re-espaced:
This & -> let's see, why how some foreign characters behave: அரிச். And what about a line break?
Here is the second line. And an unprintable?
The general hint is: Never do the replacements yourself. Pushing content into the XML will need escaping and reading content out of XML will need the opposite. All this is done for you implicitly, when you use the proper tools.
'&' is a special character that is being rendered out of ' &amp ; '
The best practice here would be to decode the XML, adding a reference below:
https://learn.microsoft.com/en-us/dotnet/api/system.web.httputility.htmldecode?redirectedfrom=MSDN&view=netframework-4.8#overloads

selecting alpha numeric node in XQuery

i have this XQuery
declare #XML xml
set #XML =
'
<root>
<row1>
<value>1</value>
</row1>
<1row2>
<value>2</value>
</1row2>
</root>
'
select #XML.query('/root/1row2')
i keep on getting an error white trying to select 1row2.
this error
XQuery [query()]: Syntax error near '1', expected a step expression.
is seems that i just keep getting this error when xml node start with a number is there a way to select the said node?
From XML Naming Rules, XML elements must follow these naming rules:
Element names are case-sensitive
Element names must start with a letter or underscore
Element names cannot start with the letters xml (or XML, or Xml, etc)
Element names can contain letters, digits, hyphens, underscores, and
periods
Element names cannot contain spaces
Any name can be used, no words are reserved (except xml).
So, the elements names must start with a letter or underscore. On SQL Server 2016 SP1 your XML is event not a valid and cannot be executed:
You need to either repair your string to be a valid XML or to query the data using some other technique (for example, SQL CLR function to implement regex expression support or splitting the nodes).

Trimming spaces out of a string based on a pattern in SQL Server

I have a varchar(max) field with XML data in it. I need to clean it by removing the spaces between the tags. For eg:
</tns:time_changed> <tns:changed_properties>
should be cleaned as
</tns:time_changed><tns:changed_properties>
I need to do this in a single query and I cannot use replace all white spaces as there are other relevant spaces in the content.
Try like this:
UPDATE table
SET xmlColumnName = REPLACE ( xmlColumnName , '> <' , '><' );
Converted to XML type and it automatically took care of the spaces.
Replaced the
<?xml version="1.0"?>
from the field with a blank and it got rid of the error that I was getting "text/xmldecl not at the beginning of input".

Insert special character XML to SQL

I'm trying to update a column of type XML.
Text to be inserted in the XML fields: "& Decision ↨‼ Agreement"
Text converted to XML: <?xml version="1.0" encoding="utf-16"?><Informations xmlns="http://monschema"><Text lGic="fdf475bc-9fed-4f61-b321-f81949cb51ca" id="71e231e6-ecbd-4848-ba6f-004bdddefb79">& Décision   Accord</Text></Informations>
Error: Msg 9420, Level 16, State 1, Line 7
XML parsing: line 1, character 263 character non-compliant XML
I do not understand why the character with ascii code "&#x12" has a problem.
If I replace &#x12 by &#x20, it works !
Can you help me?
Thank you in advance
The character references  and  denote control characters that are disallowed in XML 1.0. The real problem here is that they do not denote the characters you have in the text. The characters “↨‼” are U+21A8 UP DOWN ARROW WITH BASE and U+203C DOUBLE EXCLAMATION MARK, so they should be written as ↨‼.
The reason why get the odd character references is probably that in the CP437 encoding, “↨‼” are placed in code positions 12 and 13 (hex.). So this is an encoding confusion, and some conversion has applied a wrong conversion. In XML, the numbers in character references always mean Unicode code numbers.
These control characters are not supported in XML version 1.0 documents.
You should be able to change your version to 1.1 in the version attribute of the document, in which case the document should validate.
I solved my problem.
This character is from a SQL obtenues view on ORACLE database.
The character -> on ORACLE Is interpreted by ↨ on SQL SERVER.
I'll do a replace in my view