Check if XML nodes are empty in SQL - sql

Hi I am new to XML manipulation, my question would be if there is a possibility of detecting if the XML node is an empty node like this: <gen:nodeName />
I am able to manipulate single nodes however I would be interested if there is an approach like a loop or recursive function that could save some time doing manual labor looking trough every single node. I have no idea how to approach this problem though.
Thanks for help.

You did not specify the dialect of SQL ([sql] is not enough, please specify always the RDBMS incl. version).
This is for SQL-Server, but the semantics should be the same.
DECLARE #xml XML=
N'<root>
<SelfClosing />
<NoContent></NoContent>
<BlankContent> </BlankContent>
<HasContent>blah</HasContent>
<HasContent>other</HasContent>
</root>';
SELECT #xml.query(N'/root/*') AS AnyBelowRoor --All elements
,#xml.query(N'/root/*[text()]') AS AnyWithTextNode --blah and other
,#xml.query(N'/root/*[not(text())]') AS NoText --no text
,#xml.query(N'/root/*[text()="blah"]') AS AnyWithTextNode--blah only
The <SelfClosing /> is semantically the same as the <NoContent><NoContent>. There is no difference.
It might be a surprise, but a blank as content is taken as empty too.
So the check for empty or not empty is the check for the existance of a text() node. one can negate this with not() to find all without a text().
Interesting: The result for NoText comes back as this (SQL-Server)
<SelfClosing />
<NoContent />
<BlankContent />
The three elements are implicitly returned in the shortest format.

Related

Informix XML CLOB extract returning NULL when specifying any #attribute

Informix IDS 12.25 is returning NULL whenever an #attribute is specified. In the image below we have the same document being queried by two statements. The difference between the statements is that one of them specifies an #attribute. While the other doesn't. And, as is possible to see in the image, the attribute indeed exists, because it's returned by one of the columns.
I've been searching a lot, seeing documentations and documentations, all places are saying that the syntax is correct. I don't know what to do anymore. Really thanks.
[Edit]
Here goes a sample of the xml File I'm working with:
<Frame>
<Shape sizeX="5400" sizeY="4400" distance="1800">
<ShapePoint>
<Point direction="0" radius="266" />
<Point direction="144" radius="280" />
<Point direction="243" radius="289" />
<Point direction="279" radius="291" />
</ShapePoint>
</Shape>
</Frame>
Alternative approaches for this problem, if mainly using the database engine, also would be extremely welcomed.
It's definitely a valid Xpath, except the first one selects a node, and the one that isn't working selects a string, which makes me think extractclob() is having a problem with this type of result.
Here's my test in Python to demonstrate this is the correct xpath for the given xml.
In [16]: tree.xpath('/Frame/Shape/ShapePoint/Point[1]')
Out[16]: [<Element Point at 0x102d68bc0>]
In [17]: tree.xpath('/Frame/Shape/ShapePoint/Point[1]/#radius')
Out[17]: ['266']
What happens if you use extractvalueclob() instead?
https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.xml.doc/ids_xpextractvalue.htm

Export SQL XML field to grid [duplicate]

I have something like the following XML in a column of a table:
<?xml version="1.0" encoding="utf-8"?>
<container>
<param name="paramA" value="valueA" />
<param name="paramB" value="valueB" />
...
</container>
I am trying to get the valueB part out of the XML via TSQL
So far I am getting the right node, but now I can not figure out how to get the attribute.
select xmlCol.query('/container/param[#name="paramB"]') from LogTable
I figure I could just add /#value to the end, but then SQL tells me attributes have to be part of a node. I can find a lot of examples for selecting the child nodes attributes, but nothing on the sibling atributes (if that is the right term).
Any help would be appreciated.
Try using the .value function instead of .query:
SELECT
xmlCol.value('(/container/param[#name="paramB"]/#value)[1]', 'varchar(50)')
FROM
LogTable
The XPath expression could potentially return a list of nodes, therefore you need to add a [1] to that potential list to tell SQL Server to use the first of those entries (and yes - that list is 1-based - not 0-based). As second parameter, you need to specify what type the value should be converted to - just guessing here.
Marc
Depending on the the actual structure of your xml, it may be useful to put a view over it to make it easier to consume using 'regular' sql eg
CREATE VIEW vwLogTable
AS
SELECT
c.p.value('#name', 'varchar(10)') name,
c.p.value('#value', 'varchar(10)') value
FROM
LogTable
CROSS APPLY x.nodes('/container/param') c(p)
GO
-- now you can get all values for paramB as...
SELECT value FROM vwLogTable WHERE name = 'paramB'

Field value query with special character and unfiltered search returning unexpected results?

Field value query is giving unexpected results when any special character(#,=,#,$,%,^,*) is passed.
please find the 4 sample docs I have inserted in to ML.
<root>
<journalTitle>Dinesh</journalTitle>
<sourceType>JA</sourceType>
<title>title1</title>
<volume>volume0</volume>
</root>
<root>
<journalTitle>Gayari</journalTitle>
<sourceType>JA</sourceType>
<title>title1</title>
<volume>volume0</volume>
</root>
<root>
<journalTitle>Dixit</journalTitle>
<sourceType>JA</sourceType>
<title>title1</title>
<volume>volume0</volume>
</root>
<root>
<journalTitle>Singla</journalTitle>
<sourceType>JA</sourceType>
<title>title1</title>
<volume>volume0</volume>
</root>
CTS Query :
cts:search(
fn:doc(),
cts:field-value-query("Sample","######*()", ("unwildcarded")),
"unfiltered"
)
On running this query I am getting all the documents.
As per my understanding, it should return an empty sequence.
please find below the field I have created.
Field (in XML format) :
<field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://marklogic.com/xdmp/database">
<field-name>Sample</field-name>
<field-path>
<path>/root/journalTitle</path>
<weight>1.0</weight>
</field-path>
<word-lexicons/>
<included-elements/>
<excluded-elements/>
<tokenizer-overrides/>
</field>
Index setting:
If I will add any alphabet(s) in the search string it will give me the correct results.
Like:
##$%F
=====S
df===$d
Please help me to resolve this issue?
Try passing "exact" as an option to cts:field-value-query:
cts:search(
fn:doc(),
cts:field-value-query("Sample","######*()", ("exact")),
"unfiltered"
)
MarkLogic has an index for exact values to help in cases like this. Note it's only on when you have both case sensitive and diacritic sensitive indexes enabled (which you do). I know this works for cts:element-value-query so I expect it will for cts:field-value-query as well.
Use instead the 'exact' option in the field-value-query.
This requires the fast diacritic- and case-sensitive options, but you already have those enabled.
You can also try xdmp:plan before and after using 'exact' to see the effect on the query plan.
In the 'tokenizer overrides' option for your field, add these special character(#,=,#,$,%,^,*) as words (select 'word').
These special characters are not considered for matching by default. You need to override the default tokenizer to include them as words.
May I know what output are you expecting on passing this cts:element-word-query(xs:QName("journalTitle"),"=====S") for the above given for xmls.
Changing the one character searches to true in database config, resolves the issue in element-word-query.

Quickest method for matching nested XML data against database table structure

I have an application which creates datarequests which can be quite complex. These need to be stored in the database as tables. An outline of a datarequest (as XML) would be...
<datarequest>
<datatask view="vw_ContractData" db="reporting" index="1">
<datefilter modifier="w0">
<filter index="1" datatype="d" column="Contract Date" param1="2009-10-19 12:00:00" param2="2012-09-27 12:00:00" daterange="" operation="Between" />
</datefilter>
<filters>
<alternation index="1">
<filter index="1" datatype="t" column="Department" param1="Stock" param2="" operation="Equals" />
</alternation>
<alternation index="2">
<filter index="1" datatype="t" column="Department" param1="HR" param2="" operation="Equals" />
</alternation>
</filters>
<series column="Turnaround" aggregate="avg" split="0" splitfield="" index="1">
<filters />
</series>
<series column="Requested 3" aggregate="avg" split="0" splitfield="" index="2">
<filters>
<alternation index="1">
<filter index="1" datatype="t" column="Worker" param1="Malcom" param2="" operation="Equals" />
</alternation>
</filters>
</series>
<series column="Requested 2" aggregate="avg" split="0" splitfield="" index="3">
<filters />
</series>
<series column="Reqested" aggregate="avg" split="0" splitfield="" index="4">
<filters />
</series>
</datatask>
</datarequest>
This encodes a datarequest comprising a daterange, main filters, series and series filters. Basically any element which has the index attribute can occur multiple times within its parent element - the exception to this being the filter within datefilter.
But the structure of this is kind of academic, the problem is more fundamental:
When a request comes through, XML like this is sent to SQLServer as a parameter to a stored proc. This XML is shredded into a de-normalised table and then written iteratively to normalised tables such as tblDataRequest (DataRequestID PK), tblDataTask, tblFilter, tblSeries. This is fine.
The problem occurs when I want to match a given XML defintion with one already held in the DB. I currently do this by...
Shredding the XML into a de-normalised table
Using a CTE to pull all the existing data in the database into that same de-normalised form
Matching using a huge WHERE condition (34 lines long)
..This will return me any DataRequestID which exactly matches the XML given. I fear that this method will end up being painfully slow - partly because I don't believe the CTE will do any clever filtering, it will pull all the data every single time before applying the huge WHERE.
I have thought there must be better solutions to this eg
When storing a datarequest, also store a hash of the datarequest somehow and simply match on that. In the case of collision, use the current method. I wanted however to do this using set-logic. And also, I'm concerned about irrelevant small differences in the XML changing the hash - spurious spaces etc.
Somehow perform the matching iteratively from the bottom up. Eg produce a list of filters which match on the lowest level. Use this as part of an IN to match Series. Use this as part of an IN to match DataTasks etc etc. The trouble is, I start to black-out when I think about this for too long.
Basically - Has anyone ever encountered this kind of problem before (they must have). And what would be the recommended route for tackling it? example (pseudo)code would be great :)
To get rid of the possibility of minor variances, I'd run the request through an XML transform (XSLT).
Alternatively, since you've already got the code to parse this out into a denormalized staging table that's fine too. I would then simply using FOR XML to create a new XML doc.
Your goal here is to create a standardized XML document that respects ordering where appropriate and removes inconsistencies where it is not.
Once that is done, store this in a new table. Now you can run a direct comparison of the "standardized" request XML against existing data.
To do the actual comparison, you can use a hash, store the XML as a string and do a direct string comparison, or do a full XML comparison like this: http://beyondrelational.com/modules/2/blogs/28/posts/10317/xquery-lab-36-writing-a-tsql-function-to-compare-two-xml-values-part-2.aspx
My preference, as long as the XML is never over 8000bytes, would be to create a unique string (either VARCHAR(8000) or NVARCHAR(4000) if you have special character support) and create a unique index on the column.

SQL and escaped XML data

I have a table with a mix of escaped and non-escaped XML. Of course, the data I need is escaped. For example, I have:
<Root>
<InternalData>
<Node>
<ArrayOfComment>
<Comment&gt
<SequenceNo>1</SequenceNo>
<IsDeleted>false</IsDeleted>
<TakenByCode>397</TakenByCode>
</Comment&gt
</ArrayOfComment>
</Node>
</InternalData>
</Root>
As you can see, the data in the Node tag is all escaped. I can use a query to obtain the Node data, but how can I convert it to XML in SQL so that it can be parsed and broken up? I'm pretty new to using XML in SQL, and I can't seem to find any examples of this.
Thanks
You have not given enough information about your end goal, but this will get you very close. FYI - You had two missing ; both after comment&gt
declare #xml xml
set #xml = '
<Root>
<InternalData>
<Node>
<ArrayOfComment>
<Comment>
<SequenceNo>1</SequenceNo>
<IsDeleted>false</IsDeleted>
<TakenByCode>397</TakenByCode>
</Comment>
</ArrayOfComment>
</Node>
</InternalData>
</Root>
'
select convert(xml, n.c.value('.', 'varchar(max)'))
from #xml.nodes('Root/InternalData/Node/text()') n(c)
Output
<ArrayOfComment>
<Comment>
<SequenceNo>1</SequenceNo>
<IsDeleted>false</IsDeleted>
<TakenByCode>397</TakenByCode>
</Comment>
</ArrayOfComment>
The result is an XML column that you can put into a variable or cross-apply into directly to get data from the XML fragment.
Your best bet might be to look into a HTML Decoding UDF. I did a quick search and found this one:
http://www.andreabertolotto.net/Articles/HTMLDecodeUDF.aspx
You may want to modify it so it only decodes > and <. The one above seems to go above and beyond your needs.
UPDATE
#Cyberkiwi's solution seems to be a bit cleaner. I will leave this up in case the version of SQL Server you are running doesn't support his solution.