selecting alpha numeric node in XQuery - sql

i have this XQuery
declare #XML xml
set #XML =
'
<root>
<row1>
<value>1</value>
</row1>
<1row2>
<value>2</value>
</1row2>
</root>
'
select #XML.query('/root/1row2')
i keep on getting an error white trying to select 1row2.
this error
XQuery [query()]: Syntax error near '1', expected a step expression.
is seems that i just keep getting this error when xml node start with a number is there a way to select the said node?

From XML Naming Rules, XML elements must follow these naming rules:
Element names are case-sensitive
Element names must start with a letter or underscore
Element names cannot start with the letters xml (or XML, or Xml, etc)
Element names can contain letters, digits, hyphens, underscores, and
periods
Element names cannot contain spaces
Any name can be used, no words are reserved (except xml).
So, the elements names must start with a letter or underscore. On SQL Server 2016 SP1 your XML is event not a valid and cannot be executed:
You need to either repair your string to be a valid XML or to query the data using some other technique (for example, SQL CLR function to implement regex expression support or splitting the nodes).

Related

How to efficiently replace special characters in an XML in Oracle SQL?

I'm parsing an xml in oracle sql.
XMLType(replace(column1,'&','<![CDATA[&]]>')) //column1 is a column name that has xml data
While parsing, I'm temporarily wrapping '&' in CDATA to prevent any xml exception. After getting rid of the exception caused by '&', I'm getting "invalid character 32 (' ') found in a Name or Nmtoken". This is because of '<' character.
E.g: <child> 40 < 50 </child> // This causes the above exception.
So I tried the below and it works.
XMLType(replace(replace(column1,'&','<![CDATA[&]]>'),'< ','<![CDATA[< ]]>'))
In the above, I'm wrapping '< '(less than symbol followed by space) in CDATA. But the above is a bit time consuming. So I'm trying to use regex to reduce the time taken.
Does anyone know how to implement the above action using regex in Oracle sql??
Input : <child> 40 & < 50 </child>
Expected Output : <child> 40 <![CDATA[&]]> <![CDATA[< ]]> 50 </child>
Note: Replacing '& ' with ampersand semicolon sometimes is leading to 'entity reference not well formed' exception. Hence I have opted to wrap in CDATA.
You can do that with a regexp like this:
select regexp_replace(sr.column1,'(&|< )','<![CDATA[\1]]>') from dual;
However, regexp_replace (and all the regexp_* functions) are often slower than using plain replace, because they do more complicated logic. So I'm not sure if it'll be faster or not.
You might already be aware, but your underlying problem here is that you're starting out with invalid XML that you're trying to fix, which is a hard problem! The ideal solution is to not have invalid XML in the first place - if possible, you should escape special characters when originally generating your XML. There are built-in functions which can do that quickly, like DBMS_XMLGEN.CONVERT or HTF.ESCAPE_SC.

Why does Replace '&' with '&' not work for XML data?

I need to download a XML file and its data is retrieved from stored procedure.
My problem is if the data contains any '&' symbol, in XML file it is showing as
'&'
I have used REPLACE function in my Procedure as shown below but...
SELECT #V_NAME = REPLACE(#V_NAME, ' & ', ' & ');
UPDATE #TMP_RS_XML
SET OBJECT_ID=#V_ID,
FNAME=#V_FILE,
DOCUMENT=(SELECT #V_NAME as 'Description',
...
Now, the output is:
&amp;
This is not the way this is supposed to work...
XML is not just some text with fancy extras but with very strict rules. As any text-based container you will need either magic words or special characters to tell the consumer what is the content and what is the markup.
The most important markup characters in XML are < and > - of course. If you want these characters to be part of your content, you'll have to replace them. That is done with xml entities.
Within the content, any XML entity will start with an ampersand (< comes out as <), therefore the ampersand is the third most important special character. If you want an ampersand within the content you must use an entitiy (&) as a code for in this place we want an ampersand.
You must distinguish between the text you see, when you look at the XML and the actual content taken out of the XML.
Try this:
DECLARE #SomeStringWithSpecialCharacters NVARCHAR(200)=N'This & that -> let''s see, why how some foreign characters behave: அரிச். And what about a line break?' + CHAR(13) + CHAR(10) + 'Here is the second line. And an unprintable?' + CHAR(2);
--Here we use FOR XML, all the escaping is done implicitly
SELECT #SomeStringWithSpecialCharacters AS TestIt FOR XML PATH('test');
The result
<test>
<TestIt>This & that -> let's see, why how some foreign characters behave: அரிச். And what about a line break?
Here is the second line. And an unprintable?</TestIt>
</test>
Now I take the XML as it came out of the first part and place it into a XML-typed variable.
Attention: I had to remove the  entity, check it out...
DECLARE #SomeXML XML=
N'<test>
<TestIt>This & that -> let''s see, why how some foreign characters behave: அரிச். And what about a line break?
Here is the second line. And an unprintable?</TestIt>
</test>';
--Now we do the magic using .value() against a native XML:
SELECT #SomeXML.value('(/test/TestIt/text())[1]','nvarchar(max)');
The result comes out with all entities re-espaced:
This & -> let's see, why how some foreign characters behave: அரிச். And what about a line break?
Here is the second line. And an unprintable?
The general hint is: Never do the replacements yourself. Pushing content into the XML will need escaping and reading content out of XML will need the opposite. All this is done for you implicitly, when you use the proper tools.
'&' is a special character that is being rendered out of ' &amp ; '
The best practice here would be to decode the XML, adding a reference below:
https://learn.microsoft.com/en-us/dotnet/api/system.web.httputility.htmldecode?redirectedfrom=MSDN&view=netframework-4.8#overloads

Getting Unescaped JSON from SQL

I've created a stored procedure to pull data as a JSON object from my SQL Server database. All my data is relational and I'm trying to get it out as a JSON string.
Currently, I am able to get out a JSON string from SQL Server just fine, however this object ALWAYS includes escape characters (e.g. "{\"field\":\"value\"}). I'd like to pull the same JSON but without escaped characters. To test this I'm using some simple queries and getting them into .NET with a SqlDataAdapter using my stored procedure.
The thing that puzzles me is that when I run my query within SSMS, I never see any escape characters, but as soon as it's pulled a .NET application, the escape characters appear. I'd like to prevent this from happening and have my applications get only the unescaped JSON string.
I've tried several suggestions I've found during my research but nothing has produced my desired results. The changes I've seen (documented in MSDN and in other SO posts) have dealt with getting unescaped results, but only within SSMS and not within other applications.
What I've tried:
Simple Json query set to param and then using JSON_QUERY to select the param:
DECLARE #JSON varchar(max)
SET #JSON = (SELECT '{"Field":"Value"}' AS myJson FOR JSON PATH)
SELECT JSON_QUERY(#JSON) AS 'JsonResponse' FOR JSON PATH
This produces the following in a .NET application:
"[{\"JsonResponse\":{\"Field\":\"Value\"}}]"
This produces the following in SSMS:
[{"JsonResponse":[{"myJson":"{\"Field\":\"Value\"}"}]}]
Simple Json query without param using JSON_QUERY:
SELECT JSON_QUERY('{"Field":"Value"}') AS 'JsonResponse' FOR JSON PATH
This produces the following in a .NET application
"[{\"JsonResponse\":{\"Field\":\"Value\"}}]"
This produces the following in SSMS
[{"JsonResponse":{"Field":"Value"}}]
Simple Json query with temp tables using JSON_QUERY:
CREATE TABLE #temp(
jsoncol varchar(255)
)
INSERT INTO #temp VALUES ('{"Field":"Value"}')
SELECT JSON_QUERY(jsoncol) AS 'JsonResponse' FROM #temp FOR JSON PATH
DROP TABLE #temp
This produces the following in a .NET application:
"[{\"JsonResponse\":{\"Field\":\"Value\"}}]"
This produces the following in SSMS:
[{"JsonResponse":{"Field":"Value"}}]
I'm lead to believe that there is no way to get out a JSON string from SQL Server without having the escaped characters. In case the examples above weren't enough, I've included my stored procedure here. Hopefully someone can point me in the right direction.
This depends where you look at the string...
In SSMS a string is marked with single quotes. The double quote can exist within a string without problems:
DECLARE #SomeString = 'This can include "double quotes" but you have to double ''single quote''';
In a C# application the double quote is the string marker. So the above example would look like this:
string SomeString = "This must escape \"double quotes\" but you can use 'single quote' without problems";
Within your IDE (is it VS?) you can look at the string as is or as you'd need to be used in code. Your example shows " at the beginning and at the end of your string. That is a clear hint, that this is the option as in code. You could use this string and place it into your code. The real string, which is used and processed will not contain escape characters.
Hint: Escape characters are only needed in human-readable formats, where there are characters with special meaning (a ; in a CSV, a < in HTML and so on)...
UPDATE Some more explanation
Escape characters are needed to place a string within a string. Somehow you have to mark the beginning and the end of the string, but there is nothing else you can use then some magic characters.
In order to use these characters within the embedded string you have to go one the following ways:
escaping (e.g. XML will replace & with & and JSON will replace a " with \" as JSON uses the " to mark its labels) or
Magic borders (e.g. a CDATA-section in XML, which allows to place unescaped characters as is: <![CDATA[forbidden characters &<> allowed here]]>)
Whatever you do, you must distinguish between the visible string in an editor or in a text-based container like XML or JSON and the value the application will pick out of this.
An example:
<root><a>this & that</a></root>
visible string: "this & that"
real value: "this & that"

Xpath: whitespace encoding

I need to create an XPath query to select a JCR node whose name contains a whitespace character.
For instance: /jcr:root/foo bar/
But that results in an invalid query.
How should whitespaces be encoded in an XPath query?
Try using something like this XPath query:
/jcr:root/foo_x0020_bar/
The JSR-170 (JCR 1.0) specification defines how XPath can be used to query a JCR repository, and even though JSR-283 (or JCR 2.0) deprecated XPath as a query language, many of the implementations still support XPath along with the other query languages (including the more powerful JCR-SQL2).
Now, regarding the rules for escaping characters in XPath, JSR-170 states the following in Section 6.6.4.9:
The names of elements and attributes (corresponding to nodes and properties, respectively) within an XPath statement must correspond to the form in which they (notionally) appear in the document view. This means that spaces (and any other non-XML characters) within names must be encoded according to the rules described in 6.4.3 Escaping of Names.
Section 6.4.3 defines how such characters are escaped in names:
The escape character is the underscore (“_”). Any invalid character is escaped as _xHHHH_, where HHHH is the four-digit hexadecimal UTF-16 code for the character. When producing escape sequences the implementation should use lowercase letters for the hex digits a-f. When unescaping, however, both upper and lowercase alphabetic hexadecimal characters must be recognized.
Although you didn't ask about it, you can easily do the same query in JCR-SQL2:
SELECT * FROM [nt:base] WHERE ISSAMENODE('/foo_x0020_bar')

XML parse for special characters containing in Element

I want to read the below XML using XMLREADER.
<?xml version="1.0" encoding="ISO-8859-1" ?>
<Inforamtion>
<Name;Property>Name contain
</Name;Property>
<123>89</123>
<question?>
</question?>
</Inforamtion>
But it throws me error for special character containing in element name.
and element name's first char can't be a number.
I can have any such xml in bulk to process and correct it.
Please guide me how to process such XML or correct it or read it?
Thank You
Your xml is not valid.
This document :
What are the rules for a valid XML element name? will help you correct this XML.
A summary :
A Name is a token beginning with a letter or one of a few punctuation
characters ( [_] and [:]) , and continuing with letters, digits, hyphens, underscores,
colons, or full stops, together known as name characters.