Build New XML From Stored XML Value - sql

We store rather large XML blobs (in an column of XML type) and I'm pursuing a skunkworks project to try to build up a subset of the XML on the fly when needed.
Let's say I have this XML blob stored in our database table in a given column:
<root>
<header>
<id>1</id>
<name id="foo">Name</name>
</header>
<body>
<items>
<addItem>
<val>1</val>
</addItem>
<observeItem>
<val>2</val>
</observeItem>
</items>
</body>
</root>
What I want to get out is this is to basically recreate the above document structure but only include one of the items children, so for example:
<root>
<header>
<id>1</id>
<name id="foo">Name</name>
</header>
<body>
<items>
<observeItem>
<val>2</val>
</observeItem>
</items>
</body>
</root>
If I were interested in just the observeItem record (the items element can have any number of children, but I'll only ever be interested in a single one of them).
I know I can do something like SELECT #XML.query('//items/child::*[2]') to get just a given child item, but how would I build up the full original document in a query with just one of those children?

I've come up with a solution, but I'm not entirely pleased with it:
DECLARE #XML XML = '
<root>
<header>
<id>1</id>
<name id="foo">Name</name>
</header>
<body>
<items>
<addItem>
<val>1</val>
</addItem>
<observeItem>
<val>2</val>
</observeItem>
</items>
</body>
</root>'
DECLARE #NthChild INT = 2
SELECT
#XML.query('//header'),
#XML.query('//items/child::*[sql:variable("#NthChild")]') AS 'items'
FOR XML PATH('root')
I don't like having to specify the root explicitly nor the items, but I think this approach could get me by.

Related

Retrieving All instances of an 3rd level XML field from an XML column

I have an XML data field in one of my tables that essentially looks like this:
<App xmlns='http://Namespace1'>
<Package xmlns='http://Namespace2'>
<Item>
<ItemDetails xmlns='http://Namespace3'>
<ItemName>ItemNameValue</ItemName>
</ItemDetails>
other_item_stuff
</Item>
<Item>
<ItemDetails>
<ItemName>ItemNameValue</ItemName>
</ItemDetails>
</Item>
...
</Package>
</App>
I need to get all of the ItemNameValues from the XML.
I have tried to adapt many examples found on the web to my purpose, but have failed miserably. The best I seem to be able to do is get one ItemName per Package.
I think that CROSS APPLY is where I need to go, but the syntax to retrieve all the itemdetail.itemname eludes me.
This is my latest failure (returns nothing):
WITH XMLNAMESPACES(
'http://Namespace1' AS xsd,
'http://www.w3.org/2001/XMLSchema-instance' AS xsi,
'http://Namespace2' AS ns1,
'http://Namespace3' AS ns2)
Items.d.value('(ns2:ItemDetails/ItemName/text())[1]','varchar(200)') as
ItemName
FROM MyTable
CROSS APPLY XMLDataColumn.nodes('/xsd:App/ns1:Package/ns1:Item') Items(d)
I hope to get several records from each XML field, but can only ever get the first element.
The biggest problem in this issue is the XML itself:
<App xmlns="http://Namespace1">
<Package xmlns="http://Namespace2">
<Item>
<ItemDetails xmlns="http://Namespace3">
<ItemName>ItemNameValue</ItemName>
</ItemDetails>
other_item_stuff
</Item>
<Item>
<ItemDetails>
<ItemName>ItemNameValue</ItemName>
</ItemDetails>
</Item>
...
</Package>
</App>
Two major Problems:
The namespaces are all declared as default namespaces (they do not include a prefix). All nodes within a node share the same default namespace, if there is nothing else stated explicitly.
The first <ItemDetails> is living within namespace http://Namespace3, while the second <ItemDetails> is living within namespace http://Namespace2 (inherited from <Package>)
That means: If you can - by any chance - change the construction of the XML, try to do this first.
If you have to deal with this, you can try this clean, but clumsy approach.
WITH XMLNAMESPACES(
'http://Namespace1' AS ns1,
'http://www.w3.org/2001/XMLSchema-instance' AS xsi,
'http://Namespace2' AS ns2,
'http://Namespace3' AS ns3)
SELECT COALESCE(Items.d.value('(ns2:ItemDetails/ns2:ItemName/text())[1]','varchar(200)')
,Items.d.value('(ns3:ItemDetails/ns3:ItemName/text())[1]','varchar(200)')) AS ItemName
FROM #xml.nodes('/ns1:App/ns2:Package/ns2:Item') Items(d);
Another approach is to use a namespace wildcard, but be aware of ambigous names...
SELECT Items.d.value('(*:ItemDetails/*:ItemName/text())[1]','varchar(200)') AS ItemName
FROM #xml.nodes('/*:App/*:Package/*:Item') Items(d)

U-SQL with XmlExtractor - elements inside elements

In U-SQL I am trying to get a list of elements inside elements, using the XmlExtractor. But I cannot get the nested collection.
It is a list of items, which has locations. With the XmlExtractor I can get a collection of elements, but I don't see how I can get a collection that contains a collection. An XML sample is shown below.
Any ideas?
<root>
<Item>
<Header>
<id>111</id>
</Header>
<Body>
<Locations>
<Location>
<Station>k4</Station>
<Timestamp>2017-08-30T02:04:18.2506945+02:00</Timestamp>
</Location>
<Location>
<Station>k5</Station>
<Timestamp>2017-08-30T02:04:18.2506945+02:00</Timestamp>
</Location>
</Locations>
</Body>
</Item>
<Item>
<Header>
<id>222</id>
</Header>
<Body>
<Locations>
<Location>
<Station>k4</Station>
<Timestamp>2017-08-30T02:12:36.1218601+02:00</Timestamp>
</Location>
<Location>
<Station>k5</Station>
<Timestamp>2017-08-30T02:12:36.1218601+02:00</Timestamp>
</Location>
</Locations>
</Body>
</Item>
</root>
Solved by making an extractor that takes the XML in one string, and then calls a method using xpath, returning an SQL.Array, where the string has comma separated values of of the result. The result looks like this:
111;k4,2017-08-30T02:04:18.2506945+02:00
111;k5,2017-08-30T02:04:18.2506945+02:00
222;k4,2017-08-30T02:12:36.1218601+02:00
222;k5,2017-08-30T02:12:36.1218601+02:00
The standard XmlExtractor cannot do this, and I also decided that it is better to postpone the parsing of the xml to after it has been extracted, because there can be multiple steps on the same xml.
Azure SQL Database has powerful abilities to shred XML. Maybe if this is already in your estate/architecture it might make a simple alternative to custom code? A simple example:
DECLARE #xml XML = '<root>
<Item>
<Header>
<id>111</id>
</Header>
<Body>
<Locations>
<Location>
<Station>k4</Station>
<Timestamp>2017-08-30T02:04:18.2506945+02:00</Timestamp>
</Location>
<Location>
<Station>k5</Station>
<Timestamp>2017-08-30T02:04:18.2506945+02:00</Timestamp>
</Location>
</Locations>
</Body>
</Item>
<Item>
<Header>
<id>222</id>
</Header>
<Body>
<Locations>
<Location>
<Station>k4</Station>
<Timestamp>2017-08-30T02:12:36.1218601+02:00</Timestamp>
</Location>
<Location>
<Station>k5</Station>
<Timestamp>2017-08-30T02:12:36.1218601+02:00</Timestamp>
</Location>
</Locations>
</Body>
</Item>
</root>'
/*
111;k4,2017-08-30T02:04:18.2506945+02:00
111;k5,2017-08-30T02:04:18.2506945+02:00
222;k4,2017-08-30T02:12:36.1218601+02:00
222;k5,2017-08-30T02:12:36.1218601+02:00
*/
SELECT
r.c.value('(Header/id/text())[1]', 'int' ) id,
b.c.value('(Station/text())[1]', 'varchar(10)' ) station,
b.c.value('(Timestamp/text())[1]', 'varchar(40)' ) [timestamp],
b.c.value('(Timestamp/text())[1]', 'datetimeoffset' ) [timestamp2]
FROM #xml.nodes('root/Item') r(c)
CROSS APPLY r.c.nodes('Body/Locations/Location') b(c)
You can do something similar if the XML is stored in a table also.
My results:
Here is a script that achieves the desired results using the extractors provided.
USE master;
REFERENCE SYSTEM ASSEMBLY [System.Xml]
REFERENCE ASSEMBLY master.[Microsoft.Analytics.Samples.Formats.Xml]
#e = EXTRACT a string, b string
FROM "CollectTest.xml"
USING new Microsoft.Analytics.Samples.Formats.Xml.XmlDomExtractor(rowPath:"Item",
columnPaths:new SQL.MAP<string, string> { {"Header", "a"}, {"Body", "b"} });
#f = SELECT #e.a, t.c, t.d
FROM #e
CROSS APPLY new Microsoft.Analytics.Samples.Formats.Xml.XmlApplier("b","Location", new SQL.MAP<string,string> { {"Station", "c"}, {"Timestamp", "d"} }) AS t(c string, d string);
OUTPUT #f TO "foo.txt" USING Outputters.Tsv(outputHeader:true);
OUTPUT #e TO "foo2.txt" USING Outputters.Tsv(outputHeader:true);
The first rowset #e uses the XmlDomExtractor to create a row set containing "ID" in col a and the child XML code in col b.
The second rowset #f then uses XmlApplier to extract the values from the nested xml code and cross apply it to the correct rows. The sample xml was copied from the post above and saved in the USQLDataRoot folder as "CollectTest.xml."
Note: Got lazy and the output for Header contains some unwanted node syntax but adding an intermediate xpath or XmlApplier step between #e and #f should solve this.

reading xml file with namespace

we need to read a xml file in sql server but we are having problems because the xml have a namespace, I have tried several solutions but I can't resolve the problem.
the xml file looks like this
<?xml version="1.0" encoding="UTF-8"?>
<Status:orders xmlns:Status="http://www.test.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.test.com Status.xsd">
<order>
<Header>
<Name>500039</Name>
<Letter>A</Letter>
</Header>
</order>
</Status:orders>
can you help how to retrieve the values for the Name and letter tags
thanks in advance.
Your friend is called WITH XMLNAMESPACES...
Try it like this
DECLARE #xml XML=
'<?xml version="1.0" encoding="UTF-8"?>
<Status:orders xmlns:Status="http://www.test.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.test.com Status.xsd">
<order>
<Header>
<Name>500039</Name>
<Letter>A</Letter>
</Header>
</order>
</Status:orders>';
WITH XMLNAMESPACES('http://www.test.com' AS Status)
SELECT #xml.value('(/Status:orders/order/Header/Name)[1]','int')
,#xml.value('(/Status:orders/order/Header/Letter)[1]','varchar(max)');
An alternative was, to use the asterisk.
SELECT #xml.value('(/*:orders/order/Header/Name)[1]','int')
,#xml.value('(/*:orders/order/Header/Letter)[1]','varchar(max)');
Another alternative was this:
SELECT #xml.value('(//Name)[1]','int')
,#xml.value('(//Letter)[1]','varchar(max)');
But in general it is good advice, to be as specific as possible...

sql + xquery to delete multiple parent nodes

In the below xml i have three 5 /Item elements, 4 of which have a Blob child element. I want to delete the elements that have a child Blob element but only where Item/#Name has the text "Blob" in it.
<Items>
<Item Name="Blob123">
<Blob/>
</Item>
<Item Name="Blob124">
<Blob/>
</Item>
<Item Name="Blob125">
<Blob/>
</Item>
<Item Name="Blob126">
</Item>
<Item Name="Xyz126">
<Blob/>
</Item>
</Items>
This query returns the 3 /Item elements named 'Blob%' and with a child /Blob element just fine.
select xmlVal.query('(/Items/Item[contains(#Name, "Blob")]/Blob/..)')
However when i attempt to delete those element using this xquery:
select xmlVal.modify('delete (/Items/Item[contains(#Name, "Blob")]/Blob/..)')
I get: Incorrect use of the XML data type method 'modify'. A non-mutator method is expected in this context.
What am i doing wrong.
In case it helps others, to fix this i need to use update/set and also needed to change the xpath slightly
update table1
set xmlVal.modify('delete (/Items/Item[contains(#Name, "Blob")][Blob])')

Not able to get desired output while generating xml files from SQL query from SQL Server

I am executing this query
select category "ROOT/category",
question "Category/question",
option1 "Category/option1"
from testDB2 for XML PATH ('ROOT') , ELEMENTS
Presently the database has three entries and the xml file i get is this
<ROOT>
<ROOT>
<category>maths</category>
</ROOT>
<Category>
<question>2+2?</question>
<option1>1</option1>
</Category>
</ROOT>
<ROOT>
<ROOT>
<category>maths</category>
</ROOT>
<Category>
<question>100*0</question>
<option1>0</option1>
</Category>
</ROOT>
<ROOT>
<ROOT>
<category>chemistry</category>
</ROOT>
<Category>
<question>H2O?</question>
<option1>water </option1>
</Category>
</ROOT>
I do not want this, i want a file with just one main Parent node and rest of them as its child and each child can be parent for other child nodes, but there should be just one single main Parent node, in this case each row is a separate parent and there is no main or single parent
I hope I am able to tell my question properly. Thanks
try something like this:
select category,
question,
option1
from testdb2
for xml raw('Category'), elements, root('Categories')
for xml raw: this will make a node for each row in your table, with every column an attribute for that node
for xml raw('user'): this is the same as xml raw, but you specify the name of the nodes
for xml raw('user') elements: you swith from a attribute view to a node view. every column will be a node in your row node
root('Users'): you can use this to name your parent root
hope this helps
I think you want to use the FOR XML AUTO mode to shape your output.
Not 100% sure what it is you really want, but how about this:
SELECT
category '#Name',
question "Category/question",
option1 "Category/option1"
FROM
dbo.testDB2
FOR XML PATH('Category'), ROOT('ROOT')
Does that get closer to what you want? If I'm not mistaken (can't test right now), this should give you something like:
<ROOT>
<Category Name="maths">
<question>100*0</question>
<option1>0</option1>
</Category>
<Category Name="chemistry">
<question>H2O?</question>
<option1>water </option1>
</Category>
</ROOT>
If not - could you post a few sample rows of data, and what you expect to get from your SELECT in the end??
Marc