How to not escape special chars when updating XML in oracle SQL - sql

I have a problem trying to update xmlType values in oracle.
I need to modify the xml looking similar to the following:
<a>
<b>Something to change here</b>
<c>Here is some narrative containing weirdly escaped <tags>\</tags> </c>
</a>
What I want to achieve is to modify <b/> without modifying <c/>
Unfortunately following modifyXml:
select
updatexml(XML_TO_MODIFY, '/a/b/text()', 'NewValue')
from dual;
returns this:
<a>
<b>NewValue</b>
<c>Here is some narrative containing weirdly escaped <tags></tags> </c>
</a>
as you can see, the '>' had been escaped.
Same happens for xmlQuery (the new non-deprecated version of updateXml):
select /*+ no_xml_query_rewrite */
xmlquery(
'copy $d := .
modify (
for $i in $d/a
return replace value of node $i/b with ''nana''
)
return $d'
passing t.xml_data
returning content
) as updated_doc
from (select xmlType('<a>
<b>Something to change here</b>
<c>Here is some narrative containing weirdly escaped \<tags>\</tags> </c>
</a>') as xml_data from dual) t
;
Also when using xmlTransform I will get the same result.
I tried to use the
disable-output-escaping="yes"
But it did the opposite - it unescaped the < :
select XMLTransform(
xmlType('<a>
<b>Something to change here</b>
<c>Here is some narrative containing weirdly escaped \<tags>\</tags> </c>
</a>'),
XMLType(
'<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/a/b">
<b>
<xsl:value-of select="text()"/>
</b>
</xsl:template>
<xsl:template match="/a/c">
<c>
<xsl:value-of select="text()" disable-output-escaping="yes"/>
</c>
</xsl:template>
</xsl:stylesheet>'))
from dual;
returned:
<a>
<b>NewValue</b>
<c>Here is some narrative containing weirdly escaped <tags></tags> </c>
</a>
Any suggestions?
Two things you need to know:
I cannot modify the initial format - it comes to me in this way and
I need to preserve it.
The original message is so big, that changing
the message to string and back (to use regexps as workaround) will
not do the trick.

The root of your issue seems to be that your original XML value for node C is not valid XML if it contains the > within the value instead of >, and not inside a CDATA section (also What does <![CDATA[]]> in XML mean?).
The string value of:
Here is some narrative containing weirdly escaped <tags>\</tags>
in XML format should really be
<c>Here is some narrative containing weirdly escaped &lt;tags>\&lt;/tags></c>
OR
<c><![CDATA[Here is some narrative containing weirdly escaped <tags>\</tags>]]></c>
I would either request that the XML be corrected at the source, or implement some method to sanitize the inputs yourself, such as wrapping the <c> node values in <![CDATA[]]>. If you need to save the exact original value, and the messages are large, then the best I can think of is the store duplicate copies, with the original value as string, and store the "sanitized" value as XML data type.

In the end we managed to do this with the help of java.
By:
reading the xml as a clob
modifying it in java
storing it back in the database using java.sql.Connection (for some reason, if we used
JdbcTemplate, it complained about casting to Long, which was
indication that string was over 4000 bytes (talking about clean
errors, all hail Oracle) and using CLOB Type didn't really
help. I guess it's a different story though)
When storing the data, oracle does not perform any magic, only updates tend to modify escape characters.
Possibly not an answer for everyone, but a nice workaround if you stumble upon same problem as we did.

Related

How to efficiently replace special characters in an XML in Oracle SQL?

I'm parsing an xml in oracle sql.
XMLType(replace(column1,'&','<![CDATA[&]]>')) //column1 is a column name that has xml data
While parsing, I'm temporarily wrapping '&' in CDATA to prevent any xml exception. After getting rid of the exception caused by '&', I'm getting "invalid character 32 (' ') found in a Name or Nmtoken". This is because of '<' character.
E.g: <child> 40 < 50 </child> // This causes the above exception.
So I tried the below and it works.
XMLType(replace(replace(column1,'&','<![CDATA[&]]>'),'< ','<![CDATA[< ]]>'))
In the above, I'm wrapping '< '(less than symbol followed by space) in CDATA. But the above is a bit time consuming. So I'm trying to use regex to reduce the time taken.
Does anyone know how to implement the above action using regex in Oracle sql??
Input : <child> 40 & < 50 </child>
Expected Output : <child> 40 <![CDATA[&]]> <![CDATA[< ]]> 50 </child>
Note: Replacing '& ' with ampersand semicolon sometimes is leading to 'entity reference not well formed' exception. Hence I have opted to wrap in CDATA.
You can do that with a regexp like this:
select regexp_replace(sr.column1,'(&|< )','<![CDATA[\1]]>') from dual;
However, regexp_replace (and all the regexp_* functions) are often slower than using plain replace, because they do more complicated logic. So I'm not sure if it'll be faster or not.
You might already be aware, but your underlying problem here is that you're starting out with invalid XML that you're trying to fix, which is a hard problem! The ideal solution is to not have invalid XML in the first place - if possible, you should escape special characters when originally generating your XML. There are built-in functions which can do that quickly, like DBMS_XMLGEN.CONVERT or HTF.ESCAPE_SC.

How to prevent XML reformatting in SQL

Updated an xml file in order to remove an unnecessary field using
deletexml(xmltype(xxx)).getClobVal()
but the XML returns as one long string instead of a properly formatted XML file with indents and spaces. Any idea what I'm doing wrong here? Thanks
getClobVal, getStringVal are deprecated since oracle 11.2 .instead of these function you have to use xmlserialize.
Example:
select xmlserialize(document xmltype('<a><b><c>xxx</c></b></a>') indent size=2) from dual;
And you will end with clob object containing pretty-print xml.
"properly formatted XML file with indents and spaces"
That might surprise you, but that is properly formatted (well-formed) XML. The XML standard says nothing about whitespace between structural elements, except that it is allowed. It's called "insignificant white-space" for a reason.
If you want to format your XML for human-readability, you must do that yourself. But XML isn't for humans, it's for machines, so there is no reason to have your SQL do such formatting. Use any tool you like that auto-formats XML for human readability if you want to inspect the XML as human.
I use this procedure to make a pretty XML:
PROCEDURE MakePrettyXml(xmlString IN OUT NOCOPY CLOB) IS
xmlDocFragment DBMS_XMLDOM.DOMDOCUMENTFRAGMENT;
xslProc DBMS_XSLPROCESSOR.PROCESSOR;
xsl DBMS_XSLPROCESSOR.STYLESHEET;
xmlStringOut CLOB;
BEGIN
DBMS_LOB.CREATETEMPORARY(xmlStringOut, TRUE);
xslProc := DBMS_XSLPROCESSOR.NEWPROCESSOR;
xsl := DBMS_XSLPROCESSOR.NEWSTYLESHEET(
'<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">'||
'<xsl:output method="xml" indent="yes"/>'||
'<xsl:template match="#*|node( )">'||
'<xsl:copy>'||
'<xsl:apply-templates select="#*|node( )"/>'||
'</xsl:copy>'||
'</xsl:template>'||
'</xsl:stylesheet>', NULL);
xmlDocFragment := DBMS_XSLPROCESSOR.PROCESSXSL(p => xslProc, ss => xsl, cl => xmlString);
DBMS_XMLDOM.WRITETOCLOB(DBMS_XMLDOM.MAKENODE(xmlDocFragment), xmlStringOut);
DBMS_XSLPROCESSOR.FREESTYLESHEET(xsl);
DBMS_XSLPROCESSOR.FREEPROCESSOR(xslProc);
xmlString := xmlStringOut;
DBMS_LOB.FREETEMPORARY(xmlStringOut);
END MakePrettyXml;
But note the output is a CLOB rather than a XMLTYPE, you may need some additional conversions.

selecting alpha numeric node in XQuery

i have this XQuery
declare #XML xml
set #XML =
'
<root>
<row1>
<value>1</value>
</row1>
<1row2>
<value>2</value>
</1row2>
</root>
'
select #XML.query('/root/1row2')
i keep on getting an error white trying to select 1row2.
this error
XQuery [query()]: Syntax error near '1', expected a step expression.
is seems that i just keep getting this error when xml node start with a number is there a way to select the said node?
From XML Naming Rules, XML elements must follow these naming rules:
Element names are case-sensitive
Element names must start with a letter or underscore
Element names cannot start with the letters xml (or XML, or Xml, etc)
Element names can contain letters, digits, hyphens, underscores, and
periods
Element names cannot contain spaces
Any name can be used, no words are reserved (except xml).
So, the elements names must start with a letter or underscore. On SQL Server 2016 SP1 your XML is event not a valid and cannot be executed:
You need to either repair your string to be a valid XML or to query the data using some other technique (for example, SQL CLR function to implement regex expression support or splitting the nodes).

XSLT: variables and "empty" labels

I have an XML datafile containing among other things a string of arbitrarily many comma separated values. I want those values to be displayed in a web browser as a list with one value per line. So I wrote an XSLT template that takes this string, displays the first value followed by a linebreak tag (<br/>), properly name-spaced, and resources with the remainder of the string. In effect, the commas are being replaced by HTML <br/> tags.
Now, when I store the result of calling that template in a xsl:variable, and display that through xsl:value-of, then the HTML tags disappear: what is shown is the string minus the commas.
When I display the result directly by having the xsl:call-template in place of the xsl:value-of, all is fine, and the values appear in a list.
So, what's going on?
Is this behavior an implementation artifact, or is it standard XSLT?
Use xsl:copy-of instead of xsl:value-of if you want to output nodes (like your br elements), xsl:value-of creates a simple text node with the string value(s) selected.
Here is an example that shows the difference between xsl:value-of and xsl:copy-of, you will note that it is not the use of the variable with newly created br elements that makes the difference, it is simply the use of xsl:value-of that creates a text() node with the string conversion of the selection:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="html" indent="yes" version="5" doctype-system="about:legacy-doctype"/>
<xsl:variable name="var">Phrase 1.<br/>Phrase 2.<br/>Phrase 3.</xsl:variable>
<xsl:template match="/">
<html>
<head>
<title>.NET XSLT Fiddle Example</title>
</head>
<body>
<section>
<h1>Example 1: value-of</h1>
<xsl:value-of select="$var"/>
</section>
<section>
<h1>Example 2: copy-of</h1>
<xsl:copy-of select="$var"/>
</section>
<xsl:apply-templates select="//p"/>
<xsl:apply-templates select="//p" mode="copy-of"/>
</body>
</html>
</xsl:template>
<xsl:template match="p">
<section>
<h1>Example 1: value-of</h1>
<xsl:value-of select="."/>
</section>
</xsl:template>
<xsl:template match="p" mode="copy-of">
<section>
<h1>Example 1: copy-of</h1>
<xsl:copy-of select="."/>
</section>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/gWmuiJy/1
Output is
Example 1: value-of
Phrase 1.Phrase 2.Phrase 3.
Example 2: copy-of
Phrase 1.
Phrase 2.
Phrase 3.
Example 1: value-of
Line 1.Line 2.Line 3.
Example 1: copy-of
Line 1.
Line 2.
Line 3.
It seems that you hit the boundaries of the RTF ("Result tree fragment"):
When you use an XML fragment to initialize a variable or a parameter, then the variable or parameter is of the
"result tree fragment" datatype. This is an XSLT 1.0 specific datatype [just like node-set, but slightly different].
A result tree fragment is equivalent to a node-set that contains just the root node.
You cannot apply operators like "/", "//" or predicate on a result tree fragments. They are only applicable for node-set datatypes.
[...]
a) In XSLT 1.0
The resolution of this is to convert the result tree fragment into a node-set. I am not aware of any oracle specific xpath extension functions that can do this trick for you.
You could use EXSLT to achieve this.
b) Use XSLT 2.0
You can code your transformations in XSLT 2.0. XSLT 2.0 deprecates ResultTreeFragments i.e. if you are modeling an XSLT 2.0 transformation, and you create a variable or a parameter that holds a tree fragment, it is implicitly a node sequence.
So without using an XSLT version greater than 1, you're out of luck. So better use XSLT-2.0 or 3.0 to solve this problem.
Is this behavior an implementation artifact, or is it standard XSLT?
It is standard for XSLT-1.0, but not for XSLT-2.0+.

In XSLT how to open document, with %20 in its name itself?

I have to process one Xml which is having %20 in its name from Xslt.
Filename: Sample%20Documnet.XML
In Xslt i have written like this,
<xsl:variable name="readDoc" select="document('Sample%20Documnet.XML')"/>
But i am getting this error: Could not find file 'C:\xslt\Sample Documnet.XML'.
I think %20 is getting converted to space internally, which i don't want. Is there any way to stop this behavior.