Extract XML nodes using OPENXML - sql

sample_xml I tried to extract XML data using OPENXML in SQL, but the XML file contain prefixes such as: "pidx:CustomerID>01234", see sample_xml.
If I exclude the prefix "pidx:" it can't read data, if I include, error out:
Msg 6603, Level 16, State 2, Line 15 XML parsing error: Reference to
undeclared namespace prefix: 'pidx'.
How do I do it?

Besides the fact, that you should never post your code / XML as a picture, there are some general hints:
Your prefixes are binding an element to a namespace.
This namespace must be declared!
If your xml (as posted in the picture) is complete, it is invalid! If it is just a portion of a bigger XML you'll find the namespace's declaration somewhere above, in most cases in the root element
OPENXML is outdated and should not be used any more. Better use the XML's native methods like .value or .nodes().
update
As requested in a comment some explanation about the need to declare a namespace. This XML is not valid:
<abc:test>1</abc:test>
Try this:
DECLARE #xml XML=N'<abc:test>1</abc:test>';
You'll get this
Msg 9459, Level 16, State 1, Line 1 XML parsing: line 1, character 10,
undeclared prefix
Now declare the namespace and it works
DECLARE #xml XML=N'<abc:test xmlns:abc="blah">1</abc:test>';
Such a namespace is valid for the declaring node and all elements hierarchically below (=within).
In most cases namespaces are declared within a root node
Try this
DECLARE #xml XML=
N'
<root xmlns:abc="blah">
<abc:test>1</abc:test>
</root>
';
SELECT #xml

Related

XML parsing with namespace SQL Server

We are cleaning up data in our database and a column has XML details inside of it which we want to be able to convert into plain text.
Below is the sample XML in the table column.
<FlowDocument PagePadding="5,5,5,5" Name="RTDocument" AllowDrop="True" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation">
<Paragraph>FILE DESTROYED - MAY 21st, 2008</Paragraph>
<Paragraph>todo</Paragraph>
</FlowDocument>
I am using this query, but it is not rendering the desired output due to the presence of Namespace (if I remove the namespace from the XML, I am able to render the output successfully).
SELECT
CAST(CAST(Comments AS XML).query('data(/FlowDocument/Paragraph)') AS VARCHAR(7000)) AS activity
FROM
dbo.Activities
WHERE
ActivityID = 1
Kindly help in this matter.
Thanks
You can also declare your namespace like this:
;WITH xmlnamespaces(DEFAULT 'http://schemas.microsoft.com/winfx/2006/xaml/presentation')
SELECT
CAST(CAST(Comments AS XML).query('data(/FlowDocument/Paragraph)') AS VARCHAR(7000)) AS activity
FROM [dbo].Activities where ActivityID=1
Other options are given here: parsing xml using sql server
You need to use namespace declaration in your Query as per: https://msdn.microsoft.com/en-us/library/ms191474.aspx
so your query portion would look something like:
query('
declare namespace NS="http://schemas.microsoft.com/winfx/2006/xaml/presentation";
data(/NS:FlowDocument/NS:Paragraph)
')

SQL Server XQuery error: A node or set of nodes is required for number()

I have the following query in SQL Server 2012:
DECLARE #xml XML
SET #xml =
'<Root>
<Value>false</Value>
</Root>'
SELECT
node.value('concat(substring("T", 1, number((./Value/text())[1] = "true")), substring("F", 1, number(not((./Value/text())[1] = "true"))))', 'NVARCHAR(MAX)') AS [ValueTF]
FROM #xml.nodes('/Root') AS input(node)
This is using the method of deriving a ternary-style operation as described here: How to create an if-then-else expression (aka ternary operator) in an XPath 1.0 expression?
I would expect this query to return F for ValueTF, but instead it gives the following error message:
Msg 2374, Level 16, State 1, Line 10
XQuery [value()]: A node or set of nodes is required for number()
Even the simplified XPath number((./Value/text())[1] = "true") returns the same error. A Google search for "A node or set of nodes is required for number()" returns no results. The query successfully executes and returns F as expected when executed elsewhere, such as in this online XPath tester.
The following query does return false for Value as expected, so I know that at least that part of the query is working correctly:
DECLARE #xml XML
SET #xml =
'<Root>
<Value>false</Value>
</Root>'
SELECT
node.value('(./Value/text())[1]', 'NVARCHAR(MAX)') AS [Value]
FROM #xml.nodes('/Root') AS input(node)
The W3 spec for the number function seems to indicate that xs:anyAtomicType can be passed as an argument to number, which includes xs:boolean. So, is there an error in my code or is this a difference in SQL Server's XQuery implementation?
Like you, I don't know why SQL Server's implementation of number() won't take a boolean argument... nor even a string, apparently! That's pretty strange.
Nevertheless, if they support XQuery, they should by definition support XPath 2.0, which means that you don't need to use this ugly concat() workaround to simulate the ternary conditional operator: XPath 2.0 has the real thing!
So instead of
'concat(substring("T", 1, number((./Value/text())[1] = "true")),
substring("F", 1, number(not((./Value/text())[1] = "true"))))'
you should be able to say
'if ((./Value/text())[1] = "true") then "T" else "F"'
I haven't tested that in SQL Server 2012, but it is part of XQuery, and it's documented here, so it's worth a try.
After a bit more research I came across the following page describing the implementation details of number() in SQL Server 2012: number Function (XQuery). From that page under the Implementation Limitations section:
The number() function only accepts nodes. It does not accept atomic values.
So, it seems that the issue is that the SQL Server implementation of number() will not take a boolean argument, though it's not clear to me why it was limited in that way.

Parsing XML with namespaces in SQL Server

I am having a hard time trying to parse an XML that has some namespaces defined:
<TravelItineraryReadRS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="2.2.0">
<TravelItinerary xmlns="http://webservices.sabre.com/sabreXML/2011/10">
<CustomerInfo>
<PersonName WithInfant="false" NameNumber="01.01" RPH="1">
<GivenName>JEFF S</GivenName>"
The XMl is stored in an XML type column named response, and I want to get the GivenName value, for which I use the below query:
;WITH XMLNAMESPACES (DEFAULT 'http://webservices.sabre.com/sabreXML/2011/10')
select
response.value('(/TravelItineraryReadRS/TravelItinerary/CustomerInfo/PersonName[1]/GivenName)[1]', 'nvarchar(50)') AS Name
from dbo.RezMonitorXMLdataTest where locator = 'GUBXRV'
but instead of getting JEFF S as a result I get NULL. I think this might be related to the namespaces used. Does anyone know how could I get the GivenName value?
Thanks in advance,
Guzmán
Since your top-level node <TravelItineraryReadRS> is not part of that XML namespace, you cannot use the DEFAULT qualifier. Instead, you have to define a XML namespace prefix, and include that in your XQuery:
;WITH XMLNAMESPACES ('http://webservices.sabre.com/sabreXML/2011/10' AS ns)
SELECT
XmLContent.value('(/TravelItineraryReadRS/ns:TravelItinerary/ns:CustomerInfo/ns:PersonName[1]/ns:GivenName)[1]', 'nvarchar(50)') AS Name
FROM
dbo.RezMonitorXMLdataTest
WHERE
locator = 'GUBXRV'

Parsing XML Element value entirely in SQL form arbitrary string

We have log/audits we have compiled over some time that we would like to run some brief reports on.
One of the columns in the logs is JSON, but contains XML. We want to be able to parse out the value of a certain XML tag for each of the rows. So given an arbitrary string such as the following:
{ "XmlData" :"<tag1><tag2><TagToParse>234</TagToParse></tag2><tag1>".....}
I would like to run a sql query that return 234 when I give it the tag name TagToParse
What is the easiest way to do this ENTIRELY in SQL?
Give your container will always be tag1, then something like this should do it:
DECLARE #MyXML XML
SET #MyXML = '<tag1><tag2><TagToParse>234</TagToParse></tag2></tag1>'
SELECT
a.b.value('(/tag1//TagToParse/node())[1]', 'nvarchar(max)') AS Tag
FROM #MyXML.nodes('tag1') a(b)
Good luck.

How to preserve an ampersand (&) while using FOR XML PATH on SQL 2005

Are there any tricks for preventing SQL Server from entitizing chars like &, <, and >? I'm trying to output a URL in my XML file but SQL wants to replace any '&' with '&'
Take the following query:
SELECT 'http://foosite.com/' + RTRIM(li.imageStore)
+ '/ImageStore.dll?id=' + RTRIM(li.imageID)
+ '&raw=1&rev=' + RTRIM(li.imageVersion) AS imageUrl
FROM ListingImages li
FOR XML PATH ('image'), ROOT ('images'), TYPE
The output I get is like this (&s are entitized):
<images>
<image>
<imageUrl>http://foosite.com/pics4/ImageStore.dll?id=7E92BA08829F6847&raw=1&rev=0</imageUrl>
</image>
</images>
What I'd like is this (&s are not entitized):
<images>
<image>
<imageUrl>http://foosite.com/pics4/ImageStore.dll?id=7E92BA08829F6847&raw=1&rev=0</imageUrl>
</image>
</images>
How does one prevent SQL server from entitizing the '&'s into '&'?
There are situations where a person may not want well formed XML - the one I (and perhaps the original poster) encountered was using the For XML Path technique to return a single field list of 'child' items via a recursive query. More information on this technique is here (specifically in the 'The blackbox XML methods' section):
Concatenating Row Values in Transact-SQL
For my situation, seeing 'H&E' (a pathology stain) transformed into 'well formed XML' was a real disappointment. Fortunately, I found a solution... the following page helped me solve this issue relatively easily and without having re-architect my recursive query or add additional parsing at the presentation level (for this as well for as other/future situations where my child-rows data fields contain reserved XML characters): Handling Special Characters with FOR XML PATH
EDIT: code below from the referenced blog post.
select
stuff(
(select ', <' + name + '>'
from sys.databases
where database_id > 4
order by name
for xml path(''), root('MyString'), type
).value('/MyString[1]','varchar(max)')
, 1, 2, '') as namelist;
What SQL Server generates is correct. What you expect to see is not well-formed XML. The reason is that & character signifies the start of an entity reference, such as &. See the XML specification for more information.
When your XML parser parses this string out of XML, it will understand the & entity references and return the text back in the form you want. So the internal format in the XML file should not cause a problem to you unless you're using a buggy XML parser, or trying to parse it manually (in which case your current parser code is effectively buggy at the moment with respect to the XML specification).
Try this....
select
stuff(
(select ', <' + name + '>'
from sys.databases
where database_id > 4
order by name
for xml path(''), root('MyString'), type
).value('/MyString[1]','varchar(max)')
, 1, 2, '') as namelist;