SQL Parse an xml string - sql

I have a bunch of XMLs that I need to parse with SQL.
The XML can take on multiple forms:
<Grandparent>
<parent>
<child1>something</child1>
<child2>something</child2>
</parent>
</Grandparent>
or
<Grandparent>
<child1>something</child1>
<child2>something</child2>
</Grandparent>
Additionally, the number of "child" nodes is variable and there is no way of knowing before hand how many children there are.
What i have done so far is:
#xml.nodes('/Grandparent')
which returns either the <parent> node and children or simply the child nodes depending on the format of the xml.
The version of SQL and the fact that i'm writing it as an SQL function seems to mean that trying to get valueas shown in this anwser does not work.
Therefore, I decided to parse the string. Essentially, I look for < and take the substring from there until > for the node name. Then I take anything between > and </ for the value. I do so in a while loop until the xml string is finished. It works perfectly unless the xml has that parent node.
I don't know how to determine whether that parent node is there and how to ignore it if it is. This is where I am stuck.
What I want to get in either case is:
Node | Value
child1 | something
child2 | something
etc for as many child nodes that there is.

You can use descendant axis // to get child nodes at any level depth within a parent node.
Another useful xpath syntax for this task is local-name() which return current context node/attribute's name without namespace :
select c.value('local-name(.)', 'varchar(max)') as 'node'
, c.value('.', 'varchar(max)') as 'value'
from #xml.nodes('/Grandparent//*[not(*)]') as T(c)
This xpath bit //*[not(*)] means select descendant nodes that doesn't have child node, in other words select the inner most descendant.
SQL Fiddle

Going out on a limb here with two assumptions; your question isn't clear about the following:
I'm assuming that your child nodes have the same name (e.g., child, not child1 and child2), and
You want a SQL statement that returns 1 child per row.
If either of those assumptions is incorrect, this answer won't help :)
DECLARE #xml XML = '<Grandparent>
<parent>
<child>something</child>
<child>something</child>
</parent>
</Grandparent>'
SELECT x.value('.[1]', 'varchar(100)')
FROM #xml.nodes('/Grandparent//child') t(x)
SET #xml= '<Grandparent>
<child>something</child>
<child>something</child>
</Grandparent>'
SELECT x.value('.[1]', 'varchar(100)')
FROM #xml.nodes('/Grandparent//child') t(x)

Try:
DECLARE #xml xml = N'
<Grandparent>
<parent>
<child1>something</child1>
<child2>something</child2>
</parent>
</Grandparent>';
SELECT
child.value('fn:local-name(.)', 'varchar(100)') AS Node
,child.value('.', 'varchar(100)') AS value
FROM #xml.nodes('//*[self::child1 or self::child2]') AS ansestor(child);
SET #xml = N'
<Grandparent>
<child1>something</child1>
<child2>something</child2>
</Grandparent>';
SELECT
child.value('fn:local-name(.)', 'varchar(100)') AS Node
,child.value('.', 'varchar(100)') AS value
FROM #xml.nodes('//*[self::child1 or self::child2]') AS ansestor(child);

Related

Count non-empty nodes in XML in SQL Server

I need to count all b nodes which are not empty (so result should be 2).
<a>
<b>1</b>
<b/>
<b>g</b>
</a>
I'm using code below but this returns count off all nodes (empty included).
select top 1 rc.XmlContent.value('count(//a/b)', 'int') from Table rc
If you use //a/b/text() rather than just //a/b, then you get a count of 2
DECLARE #x XML= '<a><b>1</b><b/><b>g</b></a>';
SELECT #x.value('count(//a/b/text())', 'int');
Sorry, this is not the answer! I misread this completely and thought your are looking for the empty nodes. There is an appropriate answer given by GarethD already (same idea, just the other way round).
I don't delete it, because it might help others...
The empty element <b/> (same as <b></b>) is existing but has no text().
DECLARE #xml XML =
N'<a>
<b>1</b>
<b/>
<b></b>
<b>g</b>
</a>';
select #xml.value('count(/a/b[empty(text())])', 'int')
This returns 2, because there is <b/> and <b></b>.
Just for completeness, you might negate the predicate, which is your needed result actually:
select #xml.value('count(/a/b[not(empty(text()))])', 'int')
Use this XPath expression
count(/a/b[normalize-space(text())=''])
Incorporated in your code it would look like this:
select top 1 rc.XmlContent.value('count(/a/b[normalize-space(text())=""])', 'int') from Table rc

Extract Value from XML having same tag name in SQL Server

I have XML variable defined below and its value.
I want to fetch the text defined between tag <TextNodeChild> in single query.
Kindly help.
Declare #XMLVariable =
'<?xml version="1.0"?>
<root>
<TextNodeParent>
<TextNodeChild>12345</TextNodeChild>
<TextNodeChild>67890</TextNodeChild>
<TextNodeChild>12389</TextNodeChild>
</TextNodeParent>
</root>'
I need output like this:
12345
67890
12389
You could use the XQuery (i.e. XML query) .nodes() method
SELECT
TextNodeParent = n.value('.[1]', 'NVARCHAR(max)')
FROM
#XMLVariable.nodes('root/TextNodeParent/*') as p(n)
EDIT : If you want to just the select the TextNodeChild node data then little change in xml path as follow
#XMLVariable.nodes('root/TextNodeParent/TextNodeChild') as p(n)
Result
TextNodeParent
12345
67890
12389
#YogeshSharma's solution works - here - because you have nothing but <TextNodeChild> elements under your <TextNodeParent> node.
However, if you had various node, and you wanted to extract only the <TextNodeChild> ones and get their values (and ignore all others), you'd have to use something like this instead:
SELECT
TextNodeParent = XC.value('.', 'INT')
FROM
#XMLVariable.nodes('root/TextNodeParent/TextNodeChild') as XT(XC)

Selecting columns as XML with namespace

I need to select some columns from a table as XML with namespaces included in them along with other columns as is. For example, I have a following table layout:
ID C1 X1C1 X1C2 X2C3
1 A 1 2 3
What the query should return is:
ID C1 XmlData
1 A <xmldata1>
2 A <xmldata2>
Where <xmldata1> would be:
<Root xmlns:xsd="w3.org/2001/XMLSchema" xmlns:xsi="w3.org/2001/XMLSchema-instance" xmlns:mst="microsoft.com/wsdl/types/">
<Child attrib="C1">
<ChildValue xsi:type="xsd:integer">1</ChildNode>
</Child>
<Child attrib="C2">
<ChildNode xsi:type="xsd:integer">2</ChildNode>
</Child>
</Root>
and <xmldata2> would be:
<Rootxmlns:xsd="w3.org/2001/XMLSchema" xmlns:xsi="w3.org/2001/XMLSchema-instance" xmlns:mst="microsoft.com/wsdl/types/">
<Child attrib="C3">
<ChildNode xsi:type="xsd:integer">3</ChildNode>
</Child>
</Root>
I have a good reference how to build the xml from this SO question but I'm not able to put in the namespaces. If this is possible how to do it?
Edit:
I've used following query attempting to get the required result:
select 1 ID, 'A' C1, 1 X1C1, 2 X1C2, 3 X2C3
into #t
;with xmlnamespaces('w3.org/2001/XMLSchema' as xsd, 'w3.org/2001/XMLSchema-instance' as xsi, 'microsoft.com/wsdl/types/' as mst)
select ID, C1, (select (SELECT 'C1' "#attrib", 'xsd:integer' "ChildValue/#xsi:type",t.X1C1 as 'ChildValue' FOR XML PATH('Child'), type),(SELECT 'C2' "#name", 'xsd:integer' "ChildValue/#xsi:type", t.X1C2 as 'ChildValue' FOR XML PATH('Child'), type) FOR XML PATH('Root'), type) as property_data
FROM #t t
drop table #t
Here is the output of its xml part:
<Root xmlns:mst="microsoft.com/wsdl/types/" xmlns:xsi="w3.org/2001/XMLSchema-instance" xmlns:xsd="w3.org/2001/XMLSchema">
<Child xmlns:mst="microsoft.com/wsdl/types/" xmlns:xsi="w3.org/2001/XMLSchema-instance" xmlns:xsd="w3.org/2001/XMLSchema" attrib="C1">
<ChildValue xsi:type="xsd:integer">1</ChildValue>
</Child>
<Child xmlns:mst="microsoft.com/wsdl/types/" xmlns:xsi="w3.org/2001/XMLSchema-instance" xmlns:xsd="w3.org/2001/XMLSchema" name="C2">
<ChildValue xsi:type="xsd:integer">2</ChildValue>
</Child>
</Root>
I can't get rid of the namespaces in the Child node.
I used this solution: TSQL for xml add schema attribute to root node
Basically, I did not put the namespace in the beginning but after generating the required xml structure I casted the xml to nvarchar(max) and replaced the root node with the desired namespace.
I also needed to use namespace prefix in the attribute. For that I used a pseudo attribute name which I replaced with a proper xml namespace prefix.
Both operations were done using tsql REPLACE function. Hacky but couldn't find other proper ways to do it.
you need to include WITH xmlnamespaces , example:
;with xmlnamespaces('w3.org/2001/XMLSchema' as xsd, 'w3.org/2001/XMLSchema-instance' as xsi, 'microsoft.com/wsdl/types/' as mst)
select ID, C1,
(select
(SELECT 'C1' "#name",t.C1 as 'value'FOR XML PATH('Property'), type),
(SELECT 'C2' "#name",t.C2 as 'value'FOR XML PATH('property'), type)
FOR XML PATH('data'), type) as property_data
FROM TableName t
Have you tried like?
select XML_COL_NAME.value('(/rootNode//childNode/node())[1]', 'nvarchar(64)') from tableName

Does nodes() or openxml returns rows in same order as it finds in xml?

I have an xml which i need to parse using openxml or nodes(). The xml contains few child tags that repeat with different values, as below.
<root>
<value>10</value>
<value>12</value>
<value>11</value>
<value>1</value>
<value>15</value>
<root>
For my code it is very important that i get all these rows returned in same order as in xml. I googled and gogled but nothing tells me if the #mp:id is always returned in same order as in xml. Or if nodes() return values in same order as it encounters them.
All I want to know if I can trust any of those two methods and be happy with proper order of rows.
P.S. excuse any errors or mistakes in above text, I dont enjoy typing codes in an android window either.
You can use row_number on the shredded XML like this.
declare #XML xml=
'<root>
<value>10</value>
<value>12</value>
<value>11</value>
<value>1</value>
<value>15</value>
</root>'
select value
from
(
select T.N.value('.', 'int') as value,
row_number() over(order by T.N) as rn
from #xml.nodes('/root/value') as T(N)
) as T
order by T.rn
Uniquely Identifying XML Nodes with DENSE_RANK
Update:
You can also use a numbers table like this;
declare #XML xml=
'<root>
<value>10</value>
<value>12</value>
<value>11</value>
<value>1</value>
<value>15</value>
</root>';
with N(Number) as
(
select Number
from master..spt_values
where type = 'P'
)
select #XML.value('(/root/value[sql:column("N.Number")])[1]', 'int')
from N
where N.Number between 1 and #XML.value('count(/root/value)', 'int')
order by N.Number
XPath allows you to select nodes explicitly by ordinal: '/root[1]/value[1]' is the first element, '/root[1]/value[2]' is the second etc. Also could use '(/root/value)[1]' and '(/root/value[2])'. This way you can select exactly the element you want, and selecting element 1 then element 2 then element 3 etc will give you controlled order. Slow, but controlled.
Updated P.S. Wouldn't this be nice to be true?
declare #x xml = '<root>
<value>10</value>
<value>12</value>
<value>11</value>
<value>1</value>
<value>15</value>
<root>';
select x.value(N'position()', N'int') as position,
x.value(N'.', 'int') as value
from #x.nodes(N'//root/value') t(x)
Unfortunately, is not...
Msg 2371, Level 16, State 1, Line 9
XQuery [value()]: 'position()' can only be used within a predicate or XPath selector
And the existence of this error makes me worry that order may be broken sometimes...

How do I select a top-level attribute of an XML column in SQL Server?

I have an XML column in SQL Server that is the equivalent of:
<Test foo="bar">
<Otherstuff baz="belch" />
</Test>
I want to get the value of the foo attribute of Test (the root element) as a varchar. My goal would be something along the lines of:
SELECT CAST('<Test foo="bar"><Otherstuff baz="belch" /></Test>' AS xml).value('#foo', 'varchar(20)') AS Foo
But when I run the above query, I get the following error:
Msg 2390, Level 16, State 1, Line 1
XQuery [value()]: Top-level attribute
nodes are not supported
John Saunders has it almost right :-)
declare #Data XML
set #Data = '<Test foo="bar"><Otherstuff baz="belch" /></Test>'
select #Data.value('(/Test/#foo)[1]','varchar(20)') as Foo
This works for me (SQL Server 2005 and 2008)
Marc
If you dont know the root element:
select #Data.value('(/*/#foo)[1]','varchar(20)') as Foo
Why does .value('#foo', 'varchar(20)') generate the error “Top-level attribute nodes are not supported”?
When you query the xml data type, the context is the document node, which is an implicit node that contains the root element(s) of your XML document. The document node has no name and no attributes.
How can I get the value of an attribute on the root element?
In your XQuery expression, include the path to the first root element:
DECLARE #Data xml = '<Customer ID="123"><Order ID="ABC" /></Customer>'
SELECT #Data.value('Customer[1]/#ID', 'varchar(20)')
-- Result: 123
If you don’t know (or don’t want to specify) the name of the root element, then just use * to match any element:
SELECT #Data.value('*[1]/#ID', 'varchar(20)')
-- Result: 123
Because the query context is the document node, you don’t need to prefix the XQuery expression with a forward slash (as the other answers unnecessarily do).
Why do I have to include [1]?
The XQuery expression you pass to value() must be guaranteed to return a singleton. The expression Customer/#ID doesn’t satisfy this requirement because it matches both ID="123" and ID="456" in the following example:
DECLARE #Data xml = '<Customer ID="123" /><Customer ID="456" />'
Remember that the xml data type represents an XML document fragment, not an XML document, so it can contain multiple root elements.
What’s the difference between Customer[1]/#ID and (Customer/#ID)[1]?
The expression Customer[1]/#ID retrieves the ID attribute of the first <Customer> element.
The expression (Customer/#ID)[1] retrieves the ID attribute of all <Customer> elements, and from this list of attributes, picks the first.
The following example demonstrates the difference:
DECLARE #Data xml = '<Customer /><Customer ID="123" /><Customer ID="456" />'
SELECT #Data.value('Customer[1]/#ID', 'varchar(20)')
-- Result: NULL (because the first Customer element doesn't have an ID attribute)
SELECT #Data.value('(Customer/#ID)[1]', 'varchar(20)')
-- Result: 123