How to get SQL query to not escape HTML data returned in query - sql

I have the following SQL query....
select AanID as '#name', '<![CDATA[' + Answer + ']]>' from AuditAnswers for XML PATH('str'), ROOT('root')
which works wonderfully but the column 'Answer' can sometimes have HTML markup in it. The query automatically escapes this HTML from the 'Answer' column in the generated XML. I don't want that. I will be wrapping this resulting column in CDATA so the escaping is not necessary.
I want the result to be this...
<str name="2"><![CDATA[<DIV><DIV Style="width:55%;float:left;">Indsfgsdfg]]></str>
instead of this...
<str name="2"><![CDATA[<DIV><DIV Style="width:55%;float:left;">In</str>
Is there a function or other mechanism to do this?

Selecting anything "FOR XML" escapes any pre-existing XML so that it will not break the consistency of the XmlDocument. The first example line you gave is considered to be improperly formed XML, and will not be able to be loaded by an XmlDocument object, as well as most parsers. I would consider restructuring what you're trying to do so that you can have a more efficient solution.

You can use for xml explicit and the cdata directive:
select
1 as tag,
null as parent,
AanID as [str!1!name],
Answer as [str!1!!cdata]
from AuditAnswers
for xml explicit

You can specify that the output be treated as CDATA when using EXPLICIT mode XML queries. See:
Using EXPLICIT Mode
and
Example: Specifying the CDATA Directive

What would be the benefit of having <[CDATA[ <div></div> ]]> over having <div></div> in your database output? To me, it looks like you would have a properly escaped HTML fragment in your XML output in both cases, and reading it back with a decent XML parser should give you the unescaped original version in both cases.

Related

SQL UPDATE statement with large HTML data

I have some automated workflow, which includes updating a column via SQL with HTML tags in it.
The basic SQL statement goes like this:
UPDATE content SET bodytext = '<div class="one two three">Here comes a whole lot of HTML with all special chars and double quotes " and single quotes ' and empty lines and all possible kind of stuff...</div>' WHERE pid = 10;
Is there a way to make MariaDB or MySQL to escape things automatically in SQL (without PHP)?
I'd suggest to use prepared statements. This way you separate the statement from it's parameters and don't need to care about additional escaping necessary in plain SQL.
Using functionality provided in PHP's MySQLi driver would simplify the process:
https://www.w3schools.com/php/php_mysql_prepared_statements.asp
Prepared statements are also possible in plain SQL, but I'm not sure if doing it manually would be worth the hassle
https://dev.mysql.com/doc/refman/8.0/en/sql-prepared-statements.html
Thank you for your input, but I think, I found a solution which works for me. It seems that you actually can tell the SQL server to accept a raw string by this kind of syntax:
SELECT q'[The 'end' of the day]'
(Source: https://www.databasestar.com/sql-escape-single-quote/)
So I did the following:
SELECT #html := '[<div class="one two three">Here comes a whole lot of HTML with all special chars and double quotes " and single quotes '' and empty lines and all possible kind of stuff...</div>]';
UPDATE content SET bodytext = #html WHERE pid = 10;
And it works that way without any escaping problems.

Dynamic, Nested Replace

I'm using SQL Server 2008 and need to strip out quite a bit of data within a string. Because of the nature and variability of the string, I think I'm needing to use multiple, nested REPLACE commands. The problem is each REPLACE needs to build on the previous one. Here is a sample of what I'm looking at:
<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Essentially, I need it to return just the text outside of the <> brackets so for this example it would be:
Treatment by test.
Also, I wanted to mention that the strings inside the <> brackets can vary quite a bit for each row both by content and length, but it isn't relevant for what I'm needing other than making it more complex for replacing.
Here is what I've tried:
REPLACE(note,substring(note,patindex('<%>',note),CHARINDEX('>',note) - CHARINDEX('<',note) + 1),'')
And it returns:
<Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Somehow I need to keep going with replacing each of the <> brackets but don't know how to proceed. Any help or guidance would be greatly appreciated!!!
Depending on how you have that string holding the HTML fragment available you could try to use something like:
SELECT convert(xml, '<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>').value('/', 'varchar(255)') as stripped
You convert it to XML and then use the built in xml parser function "value".

TSQL - remove illegal character from malformed XML string

Inspired by convert-string-to-xml-illegal-characters
I wonder if there is way in pure T-SQL to convert malformed XML string to well-formed version.
I have NVARCHAR like:
DECLARE #string NVARCHAR(MAX) =
N'<root>
<stuff attrib="Ooop,bad character<">
<test>Here I get &, and "<" or ">>>>" </test>
<test2>And even more <<<>><><<<><> </test2>
</stuff>
</root>';
SELECT CONVERT(XML, #string);
Of course this will fail because & should be replaced by &, this is easy.
But how to replace < and > when they are in element text or attribute without knowing structure in advance?
There is not a magic method for changing a string into valid XML. You have to be sure that you build your XML string in a way that ensures that it is syntactically correct. Even your simple method of replacing all & with & does not work in all cases. Consider this XML string:
<root>
<stuff>
<test>Here I get &</test>
</stuff>
</root>';
The simple replacement will result in:
<root>
<stuff>
<test>Here I get &amp;</test>
</stuff>
</root>';
Unless you want to write a lot of code to parse strings into XML, you should either:
Use the XML methods to build your XML
Use other standard methods such as the FOR XML clause in select
statements.
Ensure that as you build the string you ensure that any variable part (tags, attributes, or data) conform to the XML standards in conformance to what that variable part represents. For example: wrapping data variables in <![CDATA[ ]]> or replacing invalid characters in variable tags and attributes.

CDATA removed in target xml

I'm trying to Copy an XML as it is using XSLT. My input XML contains CDATA section. The Output Strips the CDATA and keeps its contents. I want to just make an exact copy of input XML(Including CDATA tags). kindly help.
As long as the CDATA doesn't contain any characters that have a special meaning, stripping the CDATA while keeping the content intact does not change anything as far as XML semantics is concerned. So from the view of an XML processor, you are creating an exact copy. If you want to keep the bytes intact, don't use an XML parser.
You could try using cdata-section-elements attribute on <xsl:output> element. That attribute takes a whitespace separated list of element names (QNames) whose text node children should be output using CDATA sections.
For more info see http://www.w3.org/TR/xslt#output

Does NSXMLNode's nodesForXPath:error: guarantee a specific ordering?

I extract nodes from an XML document by calling -nodesForXPath:error:. Now i wonder if it guarantees, that the nodes are returned in the same order as they appear from top to bottom in the document (it's crucial in my case).
My XML looks something like this and i retrieve the b tags with the XPath query:
<a>
<b>
...
</b>
<b>
...
</b>
<a>
Unfortunately the b tags do not have an explicit counter.
While the documentation for NSXMLNode doesn't state explicitly if order is preserved, I believe it will be because XML documents are inherently ordered. Also, a method that does not have a deterministic result set will usually have that fact stated; something that hasn't been done for NSXMLNode.
With that said, the only way to find out for sure is to run some tests on your data.