OPENXML, Convert Base64 to Binary - sql

For an import/export process, we put binary data into XML as Base64 encoded strings. The issue came up when getting the values back out...
We're using OPENXML because performance on 2005/2008 is horrid using nodes() - it doesn't scale well at all. They fixed the performance issue in SQL Server 2012, but for the sake of legacy support (2005+) it's not a realistic option, and MS doesn't appear to want to backport things (assuming even possible).
Here's some relevant info on the subject.
Ideally, I'm looking for a single SQL statement using OPENXML to shred an XML document that contains binary data encoded to Base64, and provide a result set there that data is correctly rendered as binary data. I have one solution that doesn't use nodes, hoping someone has something better.

You can specify your column as XML and use .value to get the data as varbinary.
Something like this.
declare #XML xml
set #XML =
'<root>
<item>
<ID>1</ID>
<Col>Um93MQ==</Col>
</item>
<item>
<ID>2</ID>
<Col>Um93Mg==</Col>
</item>
</root>'
declare #idoc int
exec sp_xml_preparedocument #idoc out, #xml
select T.ID,
T.Col.value('.', 'varbinary(max)')
from openxml(#idoc, '/root/item', 2)
with (ID int,
Col xml) as T
exec sp_xml_removedocument #idoc
Or you could make use of CLR (if that is an option for you) something like this:
using System;
using System.Text;
public class CLRTest
{
public static byte[] ConvertBase64ToBinary(string str)
{
if (str == null)
{
return null;
}
return Convert.FromBase64String(str);
}
}

Related

SQL SERVER xml with CDATA

I have a table in my database with a column containing xml. The column type is nvarchar(max). The xml is formed in this way
<root>
<child>....</child>
.
.
<special>
<event><![CDATA[text->text]]></event>
<event><![CDATA[text->text]]></event>
...
</special>
</root>
I have not created the db, I cannot change the way information is stored in it but I can retrieve it with a select. For the extraction I use
select cast(replace(xml,'utf-8','utf-16')as xml)
from table
It works well except for cdata, whose content in the query output is: text -> text
Is there a way to retrieve also the CDATA tags?
Well, this is - as far as I know - not possible on normal ways...
The CDATA section has one sole reason: include invalid characters within XML for lazy people...
CDATA is not seen as needed at all and therefore is not really supported by normal XML methods. Or in other words: It is supported in the way, that the content is properly escaped. There is no difference between correctly escaped content and not-escaped content within CDATA actually! (Okay, there are some minor differences like including ]]> within a CDATA-section and some more tiny specialties...)
The big question is: Why?
What are you trying to do with this afterwards?
Try this. the included text is given as is:
DECLARE #xml XML =
'<root>
<special>
<event><![CDATA[text->text]]></event>
<event><![CDATA[text->text]]></event>
</special>
</root>'
SELECT t.c.query('text()')
FROM #xml.nodes('/root/special/event') t(c);
So: Please explain some more details: What do you really want?
If your really need nothing more than the wrapping CDATA you might use this:
SELECT '<![CDATA[' + t.c.value('.','varchar(max)') + ']]>'
FROM #xml.nodes('/root/special/event') t(c);
Update: Same with outdated FROM OPENXML
I just tried how the outdated approach with FROM OPENXML handles this and found, that there is absolutely no indication in the resultset, that the given text was within a CDATA section originally. The "Some value here" is exactly returned in the same way as the text within CDATA:
DECLARE #doc XML =
'<root>
<child>Some value here </child>
<special>
<event><![CDATA[text->text]]></event>
<event><![CDATA[text->text]]></event>
</special>
</root>';
DECLARE #hnd INT;
EXEC sp_xml_preparedocument #hnd OUTPUT, #doc;
SELECT * FROM OPENXML (#hnd, '/root',0);
EXEC sp_xml_removedocument #hnd;
This is how to include cdata on child nodes in XML, using pure SQL. But; it's not ideal.
SELECT 1 AS tag,
null AS parent,
'10001' AS 'Customer!1!Customer_ID!Element',
'AirBallon Captain' AS 'Customer!1!Title!cdata',
'Customer!1' = (
SELECT
2 AS tag,
NULL AS parent,
'Wrapped in cdata, using explicit' AS 'Location!2!Title!cdata'
FOR XML EXPLICIT)
FOR XML EXPLICIT, ROOT('Customers')
CDATA is included, but Child element is encoded using
>
instead of >
Which is so weird from a sensable point of view. I'm sure there are technical explanations, but they are stupid, because there is no difference in the FOR XML specification.
You could include the option type on the inner child node and then loose cdata too..
BUT WHY OH WHY?!?!?!?! would you (Microsoft) remove cdata, when I just added it?
<Customers>
<Customer>
<Customer_ID>10001</Customer_ID>
<Title><![CDATA[AirBallon Captain]]></Title>
<Location>
<Title><![CDATA[wrapped in cdata, using explicit]]></Title>
</Location>
</Customer>
</Customers>

SQL Server Script - select xml child nodes of root as (n)varchar

I need to create a SQL Server script and a part of the script is selecting the names of the immediate child nodes of the root node and convert it to a (n)varchar. I don't need the attributes or content of the node.
This is an example of the xml:
declare #XML xml
set #XML =
'
<config>
<module1 />
<module2 />
</config>
'
I want the result like this:
module1
module2
Note that the xml is not hardcoded and can have many different child nodes.
I've already taken a look at this (msdn)link but at first sight it doesn't seem possible with those XML methods.
Many thanks,
Kjell
If you want the XML of the child nodes you mentioned you can use the Query method, for example;
select
cast(#XML.query('//GuiConfiguration/Activities') as nvarchar(max)),
cast(#XML.query('//GuiConfiguration/Reservations') as nvarchar(max))
EDIT: Answer to refined question
To get the names of the immediate child nodes of the root you can use this;
select
cast(t.c.query('local-name(.)') as nvarchar(max))
from
#xml.nodes('//*[1]/child::node()') as t(c)

Configuring namespace for sp_xml_preparedocument

I have an RSS xml with this format:
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<title></title>
<link></link>
<description></description>
<language></language>
<lastBuildDate></lastBuildDate>
<generator></generator>
<docs></docs>
<managingEditor></managingEditor>
<webMaster></webMaster>
<ttl></ttl>
<item>
<title></title>
<link></link>
<description></description>
<guid isPermaLink="false"></guid>
<pubDate></pubDate>
<author></author>
<dc:date></dc:date>
<dc:publisher></dc:publisher>
<dc:language></dc:language>
</item>
<item>
<title></title>
<link></link>
<description></description>
<guid isPermaLink="false"></guid>
<pubDate></pubDate>
<author></author>
<dc:date></dc:date>
<dc:publisher></dc:publisher>
<dc:language></dc:language>
</item>
</channel>
</rss>
And I want to parse it with sp_xml_preparedocument in SQLServer.
My problem is the "namespce" field. There are three tags in each item which has namespace, and I don't know how to specify them.
I have tried this:
EXEC sp_xml_preparedocument #hDoc OUTPUT, #xmlContent,'<item xmlns:dc="http://purl.org/dc/elements/1.1/"/>'
but it just parse the first item and forget the rest!
Any idea?
The fact that you are only getting one row has nothing to do with the namespace. You have some error in your openxml query against #hDoc.
There might be reasons for you to still use openxml but until you show the query that is not working for you I will suggest you use the XML data type instead.
with xmlnamespaces('http://purl.org/dc/elements/1.1/' as dc)
select C.N.value('(title/text())[1]', 'nvarchar(100)') as channel_title,
I.N.value('(title/text())[1]', 'nvarchar(100)') as item_title,
I.N.value('(dc:publisher/text())[1]', 'nvarchar(100)') as publisher
from #XML.nodes('/rss/channel') as C(N)
cross apply C.N.nodes('item') as I(N);
SQL Fiddle
The namespace needs to be defined as a character type:
EXEC sp_xml_preparedocument #hDoc OUTPUT, #xmlContent,'<item xmlns:dc="http://purl.org/dc/elements/1.1/"/>'
[ xpath_namespaces ]
Specifies the namespace declarations that are used in row and column XPath expressions in OPENXML. xpath_namespaces is a text parameter: char, nchar, varchar, nvarchar, text, ntext or xml.
The default value is . xpath_namespaces provides the namespace URIs for the prefixes used in the XPath expressions in OPENXML by means of a well-formed XML document. xpath_namespaces declares the prefix that must be used to refer to the namespace urn:schemas-microsoft-com:xml-metaprop; this provides metadata about the parsed XML elements. Although you can redefine the namespace prefix for the metaproperty namespace by using this technique, this namespace is not lost. The prefix mp is still valid for urn:schemas-microsoft-com:xml-metaprop even if xpath_namespaces contains no such declaration.
http://msdn.microsoft.com/en-us/library/ms187367.aspx

Validating empty xml by xsd in SQL Server

I have following simple xsd schema:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="MyUrl" type="xsd:anyURI"/>
</xsd:schema>'
I have SQL Server 2008 R2 and I want to validate my variable against this schema. It works, but variable x got validated even if it's empty or whitespace, however empty xml isn't valid against this schema.
Why I got these results?
TSQL code:
CREATE XML SCHEMA COLLECTION dbo.xsdTest AS
N'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="MyUrl" type="xsd:anyURI"/>
</xsd:schema>'
GO
DECLARE #x XML(dbo.xsdTest)
SET #x = ' ' --no error
By default the XML data type accepts XML fragments as valid XML.
The XML data can contain multiple zero or more elements at the top
level.
You can specify that the XML must be a valid XML document like this.
declare #x xml(document dbo.xsdTest)
set #x = '' -- error here

SQL and escaped XML data

I have a table with a mix of escaped and non-escaped XML. Of course, the data I need is escaped. For example, I have:
<Root>
<InternalData>
<Node>
<ArrayOfComment>
<Comment&gt
<SequenceNo>1</SequenceNo>
<IsDeleted>false</IsDeleted>
<TakenByCode>397</TakenByCode>
</Comment&gt
</ArrayOfComment>
</Node>
</InternalData>
</Root>
As you can see, the data in the Node tag is all escaped. I can use a query to obtain the Node data, but how can I convert it to XML in SQL so that it can be parsed and broken up? I'm pretty new to using XML in SQL, and I can't seem to find any examples of this.
Thanks
You have not given enough information about your end goal, but this will get you very close. FYI - You had two missing ; both after comment&gt
declare #xml xml
set #xml = '
<Root>
<InternalData>
<Node>
<ArrayOfComment>
<Comment>
<SequenceNo>1</SequenceNo>
<IsDeleted>false</IsDeleted>
<TakenByCode>397</TakenByCode>
</Comment>
</ArrayOfComment>
</Node>
</InternalData>
</Root>
'
select convert(xml, n.c.value('.', 'varchar(max)'))
from #xml.nodes('Root/InternalData/Node/text()') n(c)
Output
<ArrayOfComment>
<Comment>
<SequenceNo>1</SequenceNo>
<IsDeleted>false</IsDeleted>
<TakenByCode>397</TakenByCode>
</Comment>
</ArrayOfComment>
The result is an XML column that you can put into a variable or cross-apply into directly to get data from the XML fragment.
Your best bet might be to look into a HTML Decoding UDF. I did a quick search and found this one:
http://www.andreabertolotto.net/Articles/HTMLDecodeUDF.aspx
You may want to modify it so it only decodes > and <. The one above seems to go above and beyond your needs.
UPDATE
#Cyberkiwi's solution seems to be a bit cleaner. I will leave this up in case the version of SQL Server you are running doesn't support his solution.