Select from list of xml elements with SQL Server - sql

Let's say I have this xml -
<book>
<author>Gambardella, Matthew</author>
<title>XML Developers Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book>
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<book>
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
I need a T-SQL query to select one book at a time from this list, because I need to do some kind of processing with that before I insert it into a table.
Let's say I want to select the second element in this list and want to display it as a table.
How can I do that?

Personally, I would shred the entire XML document into a table first, and if i REALLY couldn't do set based operations to them, iterate over the table using something like a row number and a loop, or a cursor.
If you just explicitly want to access the nth member of an xml document, you can do so when you write an xpath expression. How you'd know how many elements you need to iterate over though, I'm not sure. Here's an example of grabbing the 2nd book node:
--Assumes you've assigned the XML document you provided to an XML variable called #xml
select t.c.value('author[1]', 'varchar(50)')
from #xml.nodes('/book[2]') as t(c)

Related

SQL SERVER xml with CDATA

I have a table in my database with a column containing xml. The column type is nvarchar(max). The xml is formed in this way
<root>
<child>....</child>
.
.
<special>
<event><![CDATA[text->text]]></event>
<event><![CDATA[text->text]]></event>
...
</special>
</root>
I have not created the db, I cannot change the way information is stored in it but I can retrieve it with a select. For the extraction I use
select cast(replace(xml,'utf-8','utf-16')as xml)
from table
It works well except for cdata, whose content in the query output is: text -> text
Is there a way to retrieve also the CDATA tags?
Well, this is - as far as I know - not possible on normal ways...
The CDATA section has one sole reason: include invalid characters within XML for lazy people...
CDATA is not seen as needed at all and therefore is not really supported by normal XML methods. Or in other words: It is supported in the way, that the content is properly escaped. There is no difference between correctly escaped content and not-escaped content within CDATA actually! (Okay, there are some minor differences like including ]]> within a CDATA-section and some more tiny specialties...)
The big question is: Why?
What are you trying to do with this afterwards?
Try this. the included text is given as is:
DECLARE #xml XML =
'<root>
<special>
<event><![CDATA[text->text]]></event>
<event><![CDATA[text->text]]></event>
</special>
</root>'
SELECT t.c.query('text()')
FROM #xml.nodes('/root/special/event') t(c);
So: Please explain some more details: What do you really want?
If your really need nothing more than the wrapping CDATA you might use this:
SELECT '<![CDATA[' + t.c.value('.','varchar(max)') + ']]>'
FROM #xml.nodes('/root/special/event') t(c);
Update: Same with outdated FROM OPENXML
I just tried how the outdated approach with FROM OPENXML handles this and found, that there is absolutely no indication in the resultset, that the given text was within a CDATA section originally. The "Some value here" is exactly returned in the same way as the text within CDATA:
DECLARE #doc XML =
'<root>
<child>Some value here </child>
<special>
<event><![CDATA[text->text]]></event>
<event><![CDATA[text->text]]></event>
</special>
</root>';
DECLARE #hnd INT;
EXEC sp_xml_preparedocument #hnd OUTPUT, #doc;
SELECT * FROM OPENXML (#hnd, '/root',0);
EXEC sp_xml_removedocument #hnd;
This is how to include cdata on child nodes in XML, using pure SQL. But; it's not ideal.
SELECT 1 AS tag,
null AS parent,
'10001' AS 'Customer!1!Customer_ID!Element',
'AirBallon Captain' AS 'Customer!1!Title!cdata',
'Customer!1' = (
SELECT
2 AS tag,
NULL AS parent,
'Wrapped in cdata, using explicit' AS 'Location!2!Title!cdata'
FOR XML EXPLICIT)
FOR XML EXPLICIT, ROOT('Customers')
CDATA is included, but Child element is encoded using
>
instead of >
Which is so weird from a sensable point of view. I'm sure there are technical explanations, but they are stupid, because there is no difference in the FOR XML specification.
You could include the option type on the inner child node and then loose cdata too..
BUT WHY OH WHY?!?!?!?! would you (Microsoft) remove cdata, when I just added it?
<Customers>
<Customer>
<Customer_ID>10001</Customer_ID>
<Title><![CDATA[AirBallon Captain]]></Title>
<Location>
<Title><![CDATA[wrapped in cdata, using explicit]]></Title>
</Location>
</Customer>
</Customers>

Storing XML into Postgres

I have an XML document that needs to get stored in an SQL db (Postgres).
I've already seen how that's done, but I have a question: do I just create a single table with a xml field and place the whole document there? This is a document about movies and so (movies, actors...) that has information to be later retrieved.
I've never worked with XML in databases, so I'm a little confused.
Here's an example of my XML:
<?xml version="1.0" encoding="UTF-8"?>
<cinema xmlns="movies"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="movies file:/C:/Users/Fabio/git/LAPD/movies.xsd">
<persons>
<person id="P1">
<name>Samuel L. Jackson</name>
<birth>1948-12-21</birth>
</person>
<person id="P2">
<name>Leonardo Di Caprio</name>
<birth>1974-11-11</birth>
</person>
<person id="P3">
<name>Quentin Tarantino</name>
<birth>1963-03-27</birth>
</person>
</persons>
<movies>
<movie id="M1">
<title>Pulp Fiction</title>
<length>154</length>
<year>1994</year>
<description>The lives of two mob hit men,
a boxer, a gangster's wife, and a pair
of diner bandits intertwine in four tales of violence and redemption</description>
<crew>
<director ref="P3"/>
<writer ref="P3"/>
</crew>
<cast>
<actor ref="P1"/>
</cast>
<rate>
<imdb>8.9</imdb>
<rottentomatoes>9</rottentomatoes>
<moviedb>7.8</moviedb>
<average>8.57</average>
</rate>
<numOscars>1</numOscars>
</movie>
<movie id="M2">
<title>Django Unchained</title>
<length>165</length>
<year>2012</year>
<description>With the help of a German bounty hunter,
a freed slave sets out to rescue his wife
from a brutal Mississippi plantation owner.</description>
<crew>
<director ref="P3"/>
<writer ref="P3"/>
</crew>
<cast>
<actor ref="P1"/>
<actor ref="P2"/>
</cast>
<rate>
<imdb>8.5</imdb>
<rottentomatoes>8</rottentomatoes>
<moviedb>7.4</moviedb>
<average>7.97</average>
</rate>
<numOscars>2</numOscars>
</movie>
</movies>
You can store a whole XML document as value in a single xml column or you can extract data and store it in a more or less normalized form.
Which is better, depends on all the details of your application that are unknown to us.
Here is a related answer discussing pros and cons of storing document types vs. db normalization:
Does JSONB make PostgreSQL arrays useless?
Save XML as a text column of DB so that you can also apply equality operator easily. You may find some error on insertion for " or ' so try to replace them with other characters like ~ or `, both.

How to Extract Text Node Value From Payload Using XPath in MEL

I want to extract some text values from an message's XML payload so that I can use them in a jdbc query.
Given the test XML file below I want to obtain the string value of first book's author text node.
Something like:
INSERT INTO books VALUES (#[xpath('/catalog/book[0]/author/text()')])
To test the expression I am just using a logger but can't seem to get it to extract correctly.
<logger message="#[xpath('/catalog/book[0]/author/text()')]" level="DEBUG" doc:name="Logger"/>
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
</book>
</catalog>
Here is the correct MEL expression:
#[xpath('/catalog/book[1]/author/text()').text]
Note in XPath, the first node is 1 not 0.
David's answer worked fine except for one thing. For me, .text didn't work. I had to use .wholeText. This is probably because I'm using a xerces implementation somewhere in my work project.

Import Xml nodes as Xml column with SSIS

I'm trying to use the Xml Source to shred an XML source file however I do not want the entire document shredded into tables. Rather I want to import the xml Nodes into rows of Xml.
a simplified example would be to import the document below into a table called "people" with a column called "person" of type "xml". When looking at the XmlSource --- it seem that it suited to shredding the source xml, into multiple records --- not quite what I'm looking for.
Any suggestions?
<people>
<person>
<name>
<first>Fred</first>
<last>Flintstone</last>
</name>
<address>
<line1>123 Bedrock Way</line>
<city>Drumheller</city>
</address>
</person>
<person>
<!-- more of the same -->
</person>
</people>
I didn't think that SSIS 2005 supported the XML datatype at all. I suppose it "supports" it as DT_NTEXT.
In any case, you can't use the XML Source for this purpose. You would have to write your own. That's not actually as hard as it sounds. Base it on the examples in Books Online. The processing would consist of moving to the first child node, then calling XmlReader.ReadSubTree to return a new XmlReader over just the next <person/> element. Then use your favorite XML API to read the entire <person/>, convert the resulting XML to a string, and pass it along down the pipeline. Repeat for all <person/> nodes.
Could you perhaps change your xml output so that the content of person is seen as a string? Use escape chars for the <>.
You could use a script task to parse it as well, I'd imagine.

xml parsing in iPhone and getting other tags with same names

Let me try to explain as clear as possible what I mean exactly with this question.
The xml looks instead like this
<Books>
<Book id="1">
<title>Circumference</title>
<author>Nicholas Nicastro</author>
<summary>Eratosthenes and the Ancient Quest to Measure the Globe.</summary>
</Book>
<Book id="2">
<title>Copernicus Secret</title>
<author>Jack Repcheck</author>
<summary>How the scientific revolution began</summary>
</Book>
</Books>
It will look like this
<Books>
<Book id="1">
<title>Circumference</title>
<author>Nicholas Nicastro</author>
<summary id ='1'>Eratosthenes and the Ancient Quest to Measure the Globe.</summary>
<summary id ='2'>Eratosthenes more info in another tag.</summary>
<summary id ='3'>Eratosthenes and again another tag.</summary>
<summary id ='4'>Eratosthenes and the final tag another one here</summary>
</Book>
<Book id="2">
<title>Copernicus Secret</title>
<author>Jack Repcheck</author>
<summary id ='1'>How the scientific revolution began</summary>
<summary id ='2'>Eratosthenes more info in another tag.</summary>
<summary id ='3'>Eratosthenes and again another tag.</summary>
<summary id ='4'>Eratosthenes and the final tag another one here</summary>
</Book>
</Books>
Now if I follow the instruction on the site listed above , it doesn't explain how to handle summary 2,3,4( the xml i need to parse looks like that) and how I can show their output. All I will get is the last line. Does anyone have an idea about how I can get the other ones as well( meaning 2,3 in this case it seems to show only the last one since that's probably the last in the currentElementValue ).
I'm a bit confused would I have to address the attribute here as well or should I create a new search tag in my parser?
I think this is what you need to be looking at, you could grab the value of the id field from the attributes and using that value assign it to a variable which you can then use.
So I might have something this in my didStartElement (where attributes is a variable declared in the header):
if([elementName isEqualToString:#"Summary"]){
attributes = attributeDict;
}
Then something like this in my foundCharacters:
if([[attributes valueForKey:#"id"] intValue] == 1){
doSomething
}else if([[attributes valueForKey:#"id"] intValue] == 2){
doSomethingElse
}...
and so on until you've got all your data out.
N.B. This is 100% untested code but I'm confident it might work...
You will need to keep track of the summary elements you have parsed so far by keeping all the values in some container like an array: NSMutableArray. Thus, instead of an NSString to keep the summary, you'd have an NSMutableArray to hold the list of summaries you have parsed.
Whenever you encounter a summary in your parser, you don't set the NSString summary to the string you just read (which replaces the old value and explains why you only get the last summary tag). Instead, you store it as a new string and add that string to your NSMutableArray.
The problem with the design in the blog post you linked to is that it uses keyValueCoding to set the properties in the Book object and that doesn't facilitate adding items to an array item very well. Hence, you will need to include some special handling for the summary element in the parser and add methods to the Book class that allow you to add items to the summary array. See this post on how to also do that with KVC.