Extracting Text Values from XML in SQL - sql

I'm working with SQL data hosted by a 3rd party, and am trying to pull some specific information for reporting. However, some of what I need to parse out is in XML format, and I'm getting stuck. I'm looking for the syntax to pull the text= values only from the XML code.
I believe the code should something like this, but most of the examples I can find online involve simpler XML hierarchy's than what I'm dealing with.
<[columnnameXML].value('(/RelatedValueListBO/Items/RelatedValueListBOItem/text())[1]','varchar(max)')>
Fails to pull any results. I've tried declaring the spacenames as well, but again...I only ever end up with NULL values pulled.
Example XML I'm dealing with:
<RelatedValueListBO xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://tempuri.org/RelatedValueListBOSchema.xsd">
<Items>
<RelatedValueListBOItem groupKey="Response1" text="Response1" selected="true" />
<RelatedValueListBOItem groupKey="Response2" text="Response2" selected="true" />
<RelatedValueListBOItem groupKey="Response3" text="Response3" selected="true" />
</Items>
</RelatedValueListBO>
Ideally I'd like to pull response1; response2; response3 into a single column. Allowing for the fact that multiple responses may exist. I believe I'm getting stuck with the basic code I've been trying due to the namespaces associated to RelatedValueListBO and the fact that what I want is grouped in groupKey, text, and selected, instead of the value I want just being under the Items node.

You have the namespaces defined in your XML, so you need to define them in the XQuery too.
Fast and dirty method is to replace all namespaces with a "*":
SELECT #x.value('(/*:RelatedValueListBO/*:Items/*:RelatedValueListBOItem/#text)[1]','varchar(max)')
To get all responses in a single column you can use:
SELECT
Item.Col.value('./#text','varchar(max)') X
FROM #x.nodes('/*:RelatedValueListBO/*:Items/*:RelatedValueListBOItem') AS Item(Col)
If you need a better performance, you may need to define namespaces properly.

You can use something like this to extract the value of "text" in the first node of RelatedValueListBOItem
SELECT extractvalue(value(rs), '//RelatedValueListBOItem[1]/#text')
FROM TABLE (xmlsequence(extract(sys.xmltype('<RelatedValueListBO>
<Items>
<RelatedValueListBOItem groupKey="Response1" text="Response1"
selected="true" />
<RelatedValueListBOItem groupKey="Response2" text="Response2"
selected="true" />
<RelatedValueListBOItem groupKey="Response3" text="Response3"
selected="true" />
</Items>
</RelatedValueListBO>'),'/RelatedValueListBO/Items'))) rs;

Related

Informix XML CLOB extract returning NULL when specifying any #attribute

Informix IDS 12.25 is returning NULL whenever an #attribute is specified. In the image below we have the same document being queried by two statements. The difference between the statements is that one of them specifies an #attribute. While the other doesn't. And, as is possible to see in the image, the attribute indeed exists, because it's returned by one of the columns.
I've been searching a lot, seeing documentations and documentations, all places are saying that the syntax is correct. I don't know what to do anymore. Really thanks.
[Edit]
Here goes a sample of the xml File I'm working with:
<Frame>
<Shape sizeX="5400" sizeY="4400" distance="1800">
<ShapePoint>
<Point direction="0" radius="266" />
<Point direction="144" radius="280" />
<Point direction="243" radius="289" />
<Point direction="279" radius="291" />
</ShapePoint>
</Shape>
</Frame>
Alternative approaches for this problem, if mainly using the database engine, also would be extremely welcomed.
It's definitely a valid Xpath, except the first one selects a node, and the one that isn't working selects a string, which makes me think extractclob() is having a problem with this type of result.
Here's my test in Python to demonstrate this is the correct xpath for the given xml.
In [16]: tree.xpath('/Frame/Shape/ShapePoint/Point[1]')
Out[16]: [<Element Point at 0x102d68bc0>]
In [17]: tree.xpath('/Frame/Shape/ShapePoint/Point[1]/#radius')
Out[17]: ['266']
What happens if you use extractvalueclob() instead?
https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.xml.doc/ids_xpextractvalue.htm

Retrieving All instances of an 3rd level XML field from an XML column

I have an XML data field in one of my tables that essentially looks like this:
<App xmlns='http://Namespace1'>
<Package xmlns='http://Namespace2'>
<Item>
<ItemDetails xmlns='http://Namespace3'>
<ItemName>ItemNameValue</ItemName>
</ItemDetails>
other_item_stuff
</Item>
<Item>
<ItemDetails>
<ItemName>ItemNameValue</ItemName>
</ItemDetails>
</Item>
...
</Package>
</App>
I need to get all of the ItemNameValues from the XML.
I have tried to adapt many examples found on the web to my purpose, but have failed miserably. The best I seem to be able to do is get one ItemName per Package.
I think that CROSS APPLY is where I need to go, but the syntax to retrieve all the itemdetail.itemname eludes me.
This is my latest failure (returns nothing):
WITH XMLNAMESPACES(
'http://Namespace1' AS xsd,
'http://www.w3.org/2001/XMLSchema-instance' AS xsi,
'http://Namespace2' AS ns1,
'http://Namespace3' AS ns2)
Items.d.value('(ns2:ItemDetails/ItemName/text())[1]','varchar(200)') as
ItemName
FROM MyTable
CROSS APPLY XMLDataColumn.nodes('/xsd:App/ns1:Package/ns1:Item') Items(d)
I hope to get several records from each XML field, but can only ever get the first element.
The biggest problem in this issue is the XML itself:
<App xmlns="http://Namespace1">
<Package xmlns="http://Namespace2">
<Item>
<ItemDetails xmlns="http://Namespace3">
<ItemName>ItemNameValue</ItemName>
</ItemDetails>
other_item_stuff
</Item>
<Item>
<ItemDetails>
<ItemName>ItemNameValue</ItemName>
</ItemDetails>
</Item>
...
</Package>
</App>
Two major Problems:
The namespaces are all declared as default namespaces (they do not include a prefix). All nodes within a node share the same default namespace, if there is nothing else stated explicitly.
The first <ItemDetails> is living within namespace http://Namespace3, while the second <ItemDetails> is living within namespace http://Namespace2 (inherited from <Package>)
That means: If you can - by any chance - change the construction of the XML, try to do this first.
If you have to deal with this, you can try this clean, but clumsy approach.
WITH XMLNAMESPACES(
'http://Namespace1' AS ns1,
'http://www.w3.org/2001/XMLSchema-instance' AS xsi,
'http://Namespace2' AS ns2,
'http://Namespace3' AS ns3)
SELECT COALESCE(Items.d.value('(ns2:ItemDetails/ns2:ItemName/text())[1]','varchar(200)')
,Items.d.value('(ns3:ItemDetails/ns3:ItemName/text())[1]','varchar(200)')) AS ItemName
FROM #xml.nodes('/ns1:App/ns2:Package/ns2:Item') Items(d);
Another approach is to use a namespace wildcard, but be aware of ambigous names...
SELECT Items.d.value('(*:ItemDetails/*:ItemName/text())[1]','varchar(200)') AS ItemName
FROM #xml.nodes('/*:App/*:Package/*:Item') Items(d)

Is there any existing service in HotDocs tools to receive data from an external source to prepare a document?

HotDocs is a tool to generate documents and basically it carries 2 things. First is temple and second is answer file. Template carries variables and data to those variables are pushed through answer file.
Generally answer file is page where is it asks for data and further it generates a document.
Now our requirement is - instead of passing variable's values through answer file, I need to send through a API built using PHP which provides data in JSON format.
IS there any exiting service in HotDocs to serve this kind requests?. I can change the data from JSON to XML if required.
At the moment there is no off the shelf converter from JSON to HotDocs Answer XML however, at HotDocs we do this all the time. If you produce either JSON or XML from your application the data will need to be transformed into the HotDocs answer XML format - e.g.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AnswerSet title="Demo Answers" version="1.1">
<Answer name="Employee Name">
<TextValue>Graham Penman</TextValue>
</Answer>
<Answer name="Job Duty">
<RptValue>
<TextValue>make tea</TextValue>
<TextValue>make coffee</TextValue>
<TextValue>make some cake</TextValue>
</RptValue>
</Answer>
<Answer name="Annual Salary">
<NumValue>12.0000000</NumValue>
</Answer>
<Answer name="Contract Date">
<DateValue>10/10/2016</DateValue>
</Answer>
<Answer name="Paid Seminar Days">
<TFValue>false</TFValue>
</Answer>
</AnswerSet>
There are three key things you need to know to create the answer XML: The data type of your data, the data type in HotDocs and whether the data you are passing is a list or single item.
So to build the answer XML is relatively easy.
The answer XML is essentially key value pairs being contained between the opening and closing tags:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AnswerSet title="Demo Answers" version="1.1">
...Answers go here
</AnswerSet>
We then add answers in by adding the following and specifying the variable in the template the answer corresponds to, the actual value (from your data) you want to set the answer to and also the type of data it is in the template - in the example below it is text however, the type in HotDocs are: TextValue (string), NumValue (decimal), TFValue (boolean), DateValue (DateTime) and MCValue (see later on in this answer).
<Answer name="[Variable name in template]">
<TextValue>[Value from your data]</TextValue>
</Answer>
For multiple choices specifically you can select one or more answers so the answer XML format is slightly different:
<Answer name="[Variable name in template]">
<MCValue>
<SelValue>[First selected value]</SelValue>
<SelValue>[Second selected value]</SelValue>
</MCValue>
</Answer>
If you have repeated data you want to put into the document you can use the list repeat format:
<Answer name="[Variable name in template]">
<RptValue>
<[Variable Type]>[First value]</[Variable Type]>
<[Variable Type]>[Second value]</[Variable Type]>
</RptValue>
</Answer>
Once you build this XML structure you can pass this into the assemble document method on the REST services as a string with the template to assemble the corresponding documents.

performing calculations with values in xml CLOB which has identical tags using sql

I've got a table (event_archive) and one of the columns(event_xml) has CLOB data in xml format as below. Is there a way to use SQL to summate the values of the "xx" tag? Please help as i'm completely baffled. Even simply extracting the values is a problem as there are 2 "xx" tags within the same root. Thanks in advance.
<?xml version="1.0" encoding="UTF-8"?>
<event type="CALCULATION">
<source_id>INTERNAL</source_id>
<source_participant/>
<source_role/>
<source_start_pos>1</source_start_pos>
<destination_participant/>
<destination_role/>
<event_id>123456</event_id>
<payload>
<cash_point reference="abc12345">
<adv_start>20120907</adv_start>
<adv_end>20120909</adv_end>
<conf>1234</conf>
<profile>3</profile>
<group>A</group>
<patterns>
<pattern id="00112">
<xx>143554.1</xx>
<yyy>96281.6</yyy>
<adv>875</adv>
</pattern>
<pattern id="00120">
<xx>227606.1</xx>
<yyy>97539.8</yyy>
<adv>18181</adv>
</pattern>
</patterns>
</cash_point>
</payload>
</event>
Different databases handle XML differently. There's no standard way of dealing with XML payloads via raw, standard SQL. So you'll need to look at your actual DB implementation to find out what support they have.

Quickest method for matching nested XML data against database table structure

I have an application which creates datarequests which can be quite complex. These need to be stored in the database as tables. An outline of a datarequest (as XML) would be...
<datarequest>
<datatask view="vw_ContractData" db="reporting" index="1">
<datefilter modifier="w0">
<filter index="1" datatype="d" column="Contract Date" param1="2009-10-19 12:00:00" param2="2012-09-27 12:00:00" daterange="" operation="Between" />
</datefilter>
<filters>
<alternation index="1">
<filter index="1" datatype="t" column="Department" param1="Stock" param2="" operation="Equals" />
</alternation>
<alternation index="2">
<filter index="1" datatype="t" column="Department" param1="HR" param2="" operation="Equals" />
</alternation>
</filters>
<series column="Turnaround" aggregate="avg" split="0" splitfield="" index="1">
<filters />
</series>
<series column="Requested 3" aggregate="avg" split="0" splitfield="" index="2">
<filters>
<alternation index="1">
<filter index="1" datatype="t" column="Worker" param1="Malcom" param2="" operation="Equals" />
</alternation>
</filters>
</series>
<series column="Requested 2" aggregate="avg" split="0" splitfield="" index="3">
<filters />
</series>
<series column="Reqested" aggregate="avg" split="0" splitfield="" index="4">
<filters />
</series>
</datatask>
</datarequest>
This encodes a datarequest comprising a daterange, main filters, series and series filters. Basically any element which has the index attribute can occur multiple times within its parent element - the exception to this being the filter within datefilter.
But the structure of this is kind of academic, the problem is more fundamental:
When a request comes through, XML like this is sent to SQLServer as a parameter to a stored proc. This XML is shredded into a de-normalised table and then written iteratively to normalised tables such as tblDataRequest (DataRequestID PK), tblDataTask, tblFilter, tblSeries. This is fine.
The problem occurs when I want to match a given XML defintion with one already held in the DB. I currently do this by...
Shredding the XML into a de-normalised table
Using a CTE to pull all the existing data in the database into that same de-normalised form
Matching using a huge WHERE condition (34 lines long)
..This will return me any DataRequestID which exactly matches the XML given. I fear that this method will end up being painfully slow - partly because I don't believe the CTE will do any clever filtering, it will pull all the data every single time before applying the huge WHERE.
I have thought there must be better solutions to this eg
When storing a datarequest, also store a hash of the datarequest somehow and simply match on that. In the case of collision, use the current method. I wanted however to do this using set-logic. And also, I'm concerned about irrelevant small differences in the XML changing the hash - spurious spaces etc.
Somehow perform the matching iteratively from the bottom up. Eg produce a list of filters which match on the lowest level. Use this as part of an IN to match Series. Use this as part of an IN to match DataTasks etc etc. The trouble is, I start to black-out when I think about this for too long.
Basically - Has anyone ever encountered this kind of problem before (they must have). And what would be the recommended route for tackling it? example (pseudo)code would be great :)
To get rid of the possibility of minor variances, I'd run the request through an XML transform (XSLT).
Alternatively, since you've already got the code to parse this out into a denormalized staging table that's fine too. I would then simply using FOR XML to create a new XML doc.
Your goal here is to create a standardized XML document that respects ordering where appropriate and removes inconsistencies where it is not.
Once that is done, store this in a new table. Now you can run a direct comparison of the "standardized" request XML against existing data.
To do the actual comparison, you can use a hash, store the XML as a string and do a direct string comparison, or do a full XML comparison like this: http://beyondrelational.com/modules/2/blogs/28/posts/10317/xquery-lab-36-writing-a-tsql-function-to-compare-two-xml-values-part-2.aspx
My preference, as long as the XML is never over 8000bytes, would be to create a unique string (either VARCHAR(8000) or NVARCHAR(4000) if you have special character support) and create a unique index on the column.