Extracting XML (as xml type) using xpath query in SQL - sql

I'm trying to extract a chunk of XML (ie the whole xml of a node, not just content) using an xpath query in SQL. I can extract a single content field but am not sure how to do the above.
Say the xml is as follows
<head>
<instance>
<tag1>
<tag2>data</tag2>
<tag3>data</tag3>
</tag1>
</instance>
</head>
I would like to extract all of the xml inside tag1, and was hoping something like this would work in a SQL query:
Table.value('(/head/instance/tag1)[1]', 'varchar(max)') as "col"
Any help would be great.
Thanks

this should work:
Select Cast(Table.xmlcolumnname.query('/head/instance/tag1') as varchar(max)) 'col';
(its not checked! may contain typo..)

Related

SQL get value from XML in tag, by tag value

I have the following XML:
<Main>
<ResultOutput>
<Name>TEST1</Name>
<Value>D028</Value>
</ResultOutput>
<ResultOutput>
<Name>TEST2</Name>
<Value>Accept</Value>
</ResultOutput>
<ResultOutput>
<Name>TEST3</Name>
<Value />
</ResultOutput>
</Main>
What I want is to get the value of the <value> tag in SQL.
Basically want to say get <value> where <Name> has the value of TEST1, as an example
This is what I have at the moment, but this depends on the position of the XML tag:
XMLResponse.value(Main/ResultOutput/Value)[5]', nvarchar(max)')
The best way to do this is not to put extra where .value clauses, but to do it directly in XQuery.
Use [nodename] to filter by a child node, you can even nest such predicates. text() gets you the inner text of the node:
XMLResponse.value('(/Main/ResultOutput[Name[text()="TEST1"]]/Value/text())[1]', 'nvarchar(max)')
Below is an example using the sample XML in your question. You'll need to extend this to add namespace declarations and the proper xpath expressions that may be present in your actual XML as your query attempt suggests.
SELECT ResultOutput.value('Value[1]', 'nvarchar(100)')
FROM #xml.nodes('Main/ResultOutput') AS Main(ResultOutput)
WHERE ResultOutput.value('Name[1]', 'nvarchar(100)') = N'TEST1';

SQL SERVER xml with CDATA

I have a table in my database with a column containing xml. The column type is nvarchar(max). The xml is formed in this way
<root>
<child>....</child>
.
.
<special>
<event><![CDATA[text->text]]></event>
<event><![CDATA[text->text]]></event>
...
</special>
</root>
I have not created the db, I cannot change the way information is stored in it but I can retrieve it with a select. For the extraction I use
select cast(replace(xml,'utf-8','utf-16')as xml)
from table
It works well except for cdata, whose content in the query output is: text -> text
Is there a way to retrieve also the CDATA tags?
Well, this is - as far as I know - not possible on normal ways...
The CDATA section has one sole reason: include invalid characters within XML for lazy people...
CDATA is not seen as needed at all and therefore is not really supported by normal XML methods. Or in other words: It is supported in the way, that the content is properly escaped. There is no difference between correctly escaped content and not-escaped content within CDATA actually! (Okay, there are some minor differences like including ]]> within a CDATA-section and some more tiny specialties...)
The big question is: Why?
What are you trying to do with this afterwards?
Try this. the included text is given as is:
DECLARE #xml XML =
'<root>
<special>
<event><![CDATA[text->text]]></event>
<event><![CDATA[text->text]]></event>
</special>
</root>'
SELECT t.c.query('text()')
FROM #xml.nodes('/root/special/event') t(c);
So: Please explain some more details: What do you really want?
If your really need nothing more than the wrapping CDATA you might use this:
SELECT '<![CDATA[' + t.c.value('.','varchar(max)') + ']]>'
FROM #xml.nodes('/root/special/event') t(c);
Update: Same with outdated FROM OPENXML
I just tried how the outdated approach with FROM OPENXML handles this and found, that there is absolutely no indication in the resultset, that the given text was within a CDATA section originally. The "Some value here" is exactly returned in the same way as the text within CDATA:
DECLARE #doc XML =
'<root>
<child>Some value here </child>
<special>
<event><![CDATA[text->text]]></event>
<event><![CDATA[text->text]]></event>
</special>
</root>';
DECLARE #hnd INT;
EXEC sp_xml_preparedocument #hnd OUTPUT, #doc;
SELECT * FROM OPENXML (#hnd, '/root',0);
EXEC sp_xml_removedocument #hnd;
This is how to include cdata on child nodes in XML, using pure SQL. But; it's not ideal.
SELECT 1 AS tag,
null AS parent,
'10001' AS 'Customer!1!Customer_ID!Element',
'AirBallon Captain' AS 'Customer!1!Title!cdata',
'Customer!1' = (
SELECT
2 AS tag,
NULL AS parent,
'Wrapped in cdata, using explicit' AS 'Location!2!Title!cdata'
FOR XML EXPLICIT)
FOR XML EXPLICIT, ROOT('Customers')
CDATA is included, but Child element is encoded using
>
instead of >
Which is so weird from a sensable point of view. I'm sure there are technical explanations, but they are stupid, because there is no difference in the FOR XML specification.
You could include the option type on the inner child node and then loose cdata too..
BUT WHY OH WHY?!?!?!?! would you (Microsoft) remove cdata, when I just added it?
<Customers>
<Customer>
<Customer_ID>10001</Customer_ID>
<Title><![CDATA[AirBallon Captain]]></Title>
<Location>
<Title><![CDATA[wrapped in cdata, using explicit]]></Title>
</Location>
</Customer>
</Customers>

FOR XML PATH in SQL server and [text()]

I found some article on internet
this url
Then I code query like that and I get same result
But when I change AS [text()] to [name]
the result contain XML tag
like this
So My question is What is [text()] in this code
Thank you.
The other current answers don't explain much about where this is coming from, or just offer links to poorly formatted sites and don't really answer the question.
In many answers around the web for grouping strings there are the copy paste answers without a lot of explanation of what's going on. I wanted to better answer this question because I was wondering the same thing, and also give insight into what is actually happening overall.
tldr;
In short, this is syntax to help transform the XML output when using FOR XML PATH which uses column names (or aliases) to structure the output. If you name your column text() the data will be represented as text within the root tag.
<row>
My record's data
<row>
In the examples you see online for how to group strings and concat with , it may not be obvious (except for the fact that your query has that little for xml part) that you are actually building an XML file with a specific structure (or rather, lack of structure) by using FOR XML PATH (''). The ('') is removing the root xml tags, and just spitting out the data.
The deal with AS [text()]
As usual, AS is acting to name or rename the column alias. In this example, you are aliasing this column as [text()]. The []s are simply SQL Server's standard column delimiters, often unneeded, except today since our column name has ()s. That leaves us with text() for our column name.
Controlling the XML Structure with Column Names
When you are using FOR XML PATH you are outputting an XML file and can control the structure with your column names. A detailed list of options can be found here: https://msdn.microsoft.com/en-us/library/ms189885.aspx
An example includes starting your column name with an # sign, such as:
SELECT color as '#color', name
FROM #favorite_colors
FOR XML PATH
This would move this column's data to an attribute of the current xml row, as opposed to an item within it. You end up with
<row color="red">
<name>tim</name>
</row>
<row color="blue">
<name>that guy</name>
</row>
So then, back to [text()]. This is actually specifying an XPath Node Test. In the context of MS Sql Server, you can learn about this designation here. Basically it helps determine the type of element we are adding this data to, such as a normal node (default), an xml comment, or in this example, some text within the tag.
An example using a few moves to structure the output
SELECT
color as [#color]
,'Some info about ' + name AS [text()]
,name + ' likes ' + color AS [comment()]
,name
,name + ' has some ' + color + ' things' AS [info/text()]
FROM #favorite_colors
FOR XML PATH
Notice we are using a few designations in our column names:
#color: a tag attribute
text(): some text for this root tag
comment(): an xml comment
info/text(): some text in a specific xml tag, <info>
The output looks like this:
<row color="red">
Some info about tim
<!--tim likes red-->
<name>tim</name>
<info>tim has some red things</info>
</row>
<row color="blue">
Some info about that guy
<!--that guy likes blue-->
<name>that guy</name>
<info>that guy has some blue things</info>
</row>
Wrapping it up, how can these tools group and concat strings?
So, with the solutions we see for grouping strings together using FOR XML PATH, there are two key components.
AS [text()]: Writes the data as text, instead of wrapping it in a tag
FOR XML PATH (''): Renames the root tag to '', or rather, removes it entirely
This gives us "XML" (air quotes) output that is essentially just a string.
SELECT name + ', ' AS [text()] -- no 'name' tags
FROM #favorite_colors
FOR XML PATH ('') -- no root tag
returns
tim, that guy,
From there, it's just a matter of joining that data back to the larger dataset from which it came.
While righting the sql query remove alias name then you got the text.
select name+',' aa from employee for xml path('')
then the result comes in xml with aa tag.
select (select name+',' from employee for xml path('')) aa

SQL and escaped XML data

I have a table with a mix of escaped and non-escaped XML. Of course, the data I need is escaped. For example, I have:
<Root>
<InternalData>
<Node>
<ArrayOfComment>
<Comment&gt
<SequenceNo>1</SequenceNo>
<IsDeleted>false</IsDeleted>
<TakenByCode>397</TakenByCode>
</Comment&gt
</ArrayOfComment>
</Node>
</InternalData>
</Root>
As you can see, the data in the Node tag is all escaped. I can use a query to obtain the Node data, but how can I convert it to XML in SQL so that it can be parsed and broken up? I'm pretty new to using XML in SQL, and I can't seem to find any examples of this.
Thanks
You have not given enough information about your end goal, but this will get you very close. FYI - You had two missing ; both after comment&gt
declare #xml xml
set #xml = '
<Root>
<InternalData>
<Node>
<ArrayOfComment>
<Comment>
<SequenceNo>1</SequenceNo>
<IsDeleted>false</IsDeleted>
<TakenByCode>397</TakenByCode>
</Comment>
</ArrayOfComment>
</Node>
</InternalData>
</Root>
'
select convert(xml, n.c.value('.', 'varchar(max)'))
from #xml.nodes('Root/InternalData/Node/text()') n(c)
Output
<ArrayOfComment>
<Comment>
<SequenceNo>1</SequenceNo>
<IsDeleted>false</IsDeleted>
<TakenByCode>397</TakenByCode>
</Comment>
</ArrayOfComment>
The result is an XML column that you can put into a variable or cross-apply into directly to get data from the XML fragment.
Your best bet might be to look into a HTML Decoding UDF. I did a quick search and found this one:
http://www.andreabertolotto.net/Articles/HTMLDecodeUDF.aspx
You may want to modify it so it only decodes > and <. The one above seems to go above and beyond your needs.
UPDATE
#Cyberkiwi's solution seems to be a bit cleaner. I will leave this up in case the version of SQL Server you are running doesn't support his solution.

How to get required XML element from not well formed XML data in SQL server

In my SQL 2008 database table, I have one column name AUTHOR that contains XML data. The XML is not well formed and has data like below
<Author>
<ID>172-32-1176</ID>
<LastName>White</LastName>
<FirstName>Johnson</FirstName>
<Address>
<Street>10932 Bigge Rd.</Street>
<City>Menlo Park</City>
<State>CA</State>
</Address>
</Author>
Some XML have all of above data and some have just one tag.
<ID>172-32-1176</ID>
I want to write query that returns me a column as identiry.
I tried using AUTHOR.query('data(/Author/ID)') as identity but it fails when XML does not have Author node.
Thanks,
Vijay
Have you tried something like /Author/ID|/ID ? i.e. try for the first scenario, and with no match, the second ?
(note that the | operator is a set union operator, as described here)
In case that nothing "certain" can be maintained about the XML, except that a unique ID element contains the required identity value, then the following XPath expression selects the ID element:
//ID