Parsing XML Element value entirely in SQL form arbitrary string - sql

We have log/audits we have compiled over some time that we would like to run some brief reports on.
One of the columns in the logs is JSON, but contains XML. We want to be able to parse out the value of a certain XML tag for each of the rows. So given an arbitrary string such as the following:
{ "XmlData" :"<tag1><tag2><TagToParse>234</TagToParse></tag2><tag1>".....}
I would like to run a sql query that return 234 when I give it the tag name TagToParse
What is the easiest way to do this ENTIRELY in SQL?

Give your container will always be tag1, then something like this should do it:
DECLARE #MyXML XML
SET #MyXML = '<tag1><tag2><TagToParse>234</TagToParse></tag2></tag1>'
SELECT
a.b.value('(/tag1//TagToParse/node())[1]', 'nvarchar(max)') AS Tag
FROM #MyXML.nodes('tag1') a(b)
Good luck.

Related

Xquery transformation on text data in response

I want to get a xquery transformation drafted for below data.. here as part of service response fault, I am getting below payload and need to capture the data in ErrorCode element.
challenge that I am having here is, this is not part of a XML structure, its coming as part of CDATA tags.
Could you please suggest how I can get this value in a variable same.
Say I am getting this structure as part of $Fault and now need to assign ErrorCode in a new variable $FaultCode
$FaultCode = $Fault/con:details/con1:ErrorResponseDetail/con1:detail/ ********
I am not sure how I can capture this further detail element
<con:fault xmlns:con="http://www.bea.com/wli/sb/context">
<con:errorCode>382502</con:errorCode>
<con:reason>eceived an error response</con:reason>
<con:details>
<con1:ErrorResponseDetail xmlns:con1="http://www.bea.com/wli/sb/stages/transform/config">
<con1:detail>
<![CDATA[<Error xmlns="http://servic.abcd.net/V1">
<ErrorCode>DATA_AVAILABILITY</ErrorCode>
<ErrorDescription>{"description":"No Cdata for )"}</ErrorDescription>
</Error>]]></con1:detail>
<con1:http-response-code>404</con1:http-response-code>
</con1:ErrorResponseDetail>
</con:details>
<con:location>
<con:node>TestPPNode</con:node>
<con:pipeline>TestPPNode_request</con:pipeline>
<con:stage>Test Stage</con:stage>
<con:path>request-pipeline</con:path>
</con:location>
</con:fault>
Assuming XQuery 3.1 with the parse-xml function (https://www.w3.org/TR/xpath-functions/#func-parse-xml) you can use
declare namespace con="http://www.bea.com/wli/sb/context";
declare namespace con1="http://www.bea.com/wli/sb/stages/transform/config";
declare namespace V1 = "http://servic.abcd.net/V1";
/con:fault/con:details/con1:ErrorResponseDetail/con1:detail!parse-xml(.)/V1:Error/V1:ErrorCode/data()
to get the string value DATA_AVAILABILITY, see https://xqueryfiddle.liberty-development.net/6qM2e27 for demo.
With XQuery 1 there is not XML parsing which is needed to solve this properly but you can of course try to use string functions to extract the data e.g.
/con:fault/con:details/con1:ErrorResponseDetail/con1:detail/substring-before(substring-after(., '<ErrorCode>'), '</ErrorCode>')

T-SQL XML node value

I'm trying to extract the values from the following xml document
<response>
<entry>
<title>the tales</title>
<subject-area code="1" abbrev="XX1">Test1</subject-area>
<subject-area code="2" abbrev="XX2">Test2</subject-area>
</entry>
</response>
but I'm having problem getting the subject-area text values i.e. "Test1"
I'm using the below T-SQL to extract the rest of the values, I'm using a cross appy on the node as I required this to loop to get all values so can't use [1] etc to extract it that way as I'm not sure how many subject area there will be.
Any ideas
SELECT
,a.APIXMLResponse.value('(response[1]/entry[1]/title[1])','VARCHAR(250)') AS Title
,sa.value('(./#code)','varchar(10)') AS SubjectAreaCode
,sa.value('(./#abbrev)','varchar(10)') AS SubjectAreaAbbrev
FROM [dbo].[APIXML] a
CROSS APPLY APIXMLResponse.nodes('response/entry/subject-area') AS SubjectArea(sa)
Although there is a working solution in a comment already, I'd like to point out some things:
Using just '.' as path can lead to very annoying effects, if there are nested elements.
Looking for performance it is recommended to use text()[1] to read the needed value at its actual place (Here are some details with examples).
As the internal values are NVARCHAR(x) it is slightly faster to use NVARCHAR as target type (if you don't have a reason to do otherwise...
That's my query:
SELECT
a.APIXMLResponse.value('(response/entry/title)[1]','NVARCHAR(250)') AS Title
,sa.value('#code','nvarchar(10)') AS SubjectAreaCode
,sa.value('#abbrev','nvarchar(10)') AS SubjectAreaAbbrev
,sa.value('text()[1]','nvarchar(10)') AS SubjectAreaContent
FROM #mockup a
CROSS APPLY APIXMLResponse.nodes('response/entry/subject-area') AS SubjectArea(sa)

SQL Server: Find records where XML is missing tag

I have table named: XMLIndex that contains a column named: XMLRec that holds the structure of an XML file and values.
Some of these records are missing a tag named: <ISO></ISO>
My question is: what type of query do I need to run in order to find all the records in the table XMLIndex, that are missing the <ISO> tag?
This is an example XMLRecord XML that contains the ISO tag:
<XMLRecord>
<pn>0042761</pn>
<SRI>4.40</SRI>
<igm>/images/images/0042761.gif</img>
<ISO>ZW</ISO>
<ListPrice>$5.50</ListPrice>
</XMLRecord>
and one with multiple ISOs (look at the tag small difference):
<XMLRecord>
<pn>0042762</pn>
<SRI>4.40</SRI>
<igm>/images/images/0042762.gif</img>
<ISOs>ZW+NZ+AU+BR</ISOs>
<ListPrice>$5.50</ListPrice>
</XMLRecord>
One record missing the ISO tag is one that the XML structure would not contain such tag.
Any examples are much appreciated.
Thank you.
You can use the XQuery exist method.
Check anywhere in the xml document:
select *
from XMLIndex
where XMLRec.exist('//ISO') = 0
Check a specific location:
where XMLRec.exist('/XMLRecord/ISO') = 0

XML parsing with namespace SQL Server

We are cleaning up data in our database and a column has XML details inside of it which we want to be able to convert into plain text.
Below is the sample XML in the table column.
<FlowDocument PagePadding="5,5,5,5" Name="RTDocument" AllowDrop="True" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation">
<Paragraph>FILE DESTROYED - MAY 21st, 2008</Paragraph>
<Paragraph>todo</Paragraph>
</FlowDocument>
I am using this query, but it is not rendering the desired output due to the presence of Namespace (if I remove the namespace from the XML, I am able to render the output successfully).
SELECT
CAST(CAST(Comments AS XML).query('data(/FlowDocument/Paragraph)') AS VARCHAR(7000)) AS activity
FROM
dbo.Activities
WHERE
ActivityID = 1
Kindly help in this matter.
Thanks
You can also declare your namespace like this:
;WITH xmlnamespaces(DEFAULT 'http://schemas.microsoft.com/winfx/2006/xaml/presentation')
SELECT
CAST(CAST(Comments AS XML).query('data(/FlowDocument/Paragraph)') AS VARCHAR(7000)) AS activity
FROM [dbo].Activities where ActivityID=1
Other options are given here: parsing xml using sql server
You need to use namespace declaration in your Query as per: https://msdn.microsoft.com/en-us/library/ms191474.aspx
so your query portion would look something like:
query('
declare namespace NS="http://schemas.microsoft.com/winfx/2006/xaml/presentation";
data(/NS:FlowDocument/NS:Paragraph)
')

How to make LIKE in SQL look for specific string instead of just a wildcard

My SQL Query:
SELECT
[content_id] AS [LinkID]
, dbo.usp_ClearHTMLTags(CONVERT(nvarchar(600), CAST([content_html] AS XML).query('root/Physicians/name'))) AS [Physician Name]
FROM
[DB].[dbo].[table1]
WHERE
[id] = '188'
AND
(content LIKE '%Urology%')
AND
(contentS = 'A')
ORDER BY
--[content_title]
dbo.usp_ClearHTMLTags(CONVERT(nvarchar(600), CAST([content_html] AS XML).query('root/Physicians/name')))
The issue I am having is, if the content is Neurology or Urology it appears in the result.
Is there any way to make it so that if it's Urology, it will only give Urology result and if it's Neurology, it will only give Neurology result.
It can be Urology, Neurology, Internal Medicine, etc. etc... So the two above used are what is causing the issue.
The content is a ntext column with XML tag inside, for example:
<root><Location><location>Office</location>
<office>Office</office>
<Address><image><img src="Rd.jpg?n=7513" /></image>
<Address1>1 Road</Address1>
<Address2></Address2>
<City>Qns</City>
<State>NY</State>
<zip>14404</zip>
<phone>324-324-2342</phone>
<fax></fax>
<general></general>
<from_north></from_north>
<from_south></from_south>
<from_west></from_west>
<from_east></from_east>
<from_connecticut></from_connecticut>
<public_trans></public_trans>
</Address>
</Location>
</root>
With the update this content column has the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<Physicians>
<name>Doctor #1</name>
<picture>
<img src="phys_lab coat_gradation2.jpg?n=7529" />
</picture>
<gender>M</gender>
<langF1>
English
</langF1>
<specialty>
<a title="Neurology" href="neu.aspx">Neurology</a>
</specialty>
</Physicians>
</root>
If I search for Lab the result appears because there is the text lab in the column.
This is what I would do if you're not into making a CLR proc to use Regexes (SQL Server doesn't have regex capabilities natively)
SELECT
[...]
WHERE
(content LIKE #strService OR
content LIKE '%[^a-z]' + #strService + '[^a-z]%' OR
content LIKE #strService + '[^a-z]%' OR
content LIKE '%[^a-z]' + #strService)
This way you check to see if content is equal to #strService OR if the word exists somewhere within content with non-letters around it OR if it's at the very beginning or very end of content with a non-letter either following or preceding respectively.
[^...] means "a character that is none of these". If there are other characters you don't want to accept before or after the search query, put them in every 4 of the square brackets (after the ^!). For instance [^a-zA-Z_].
As I see it, your options are to either:
Create a function that processes a string and finds a whole match inside it
Create a CLR extension that allows you to call .NET code and leverage the REGEX capabilities of .NET
Aaron's suggestion is a good one IF you can know up front all the terms that could be used for searching. The problem I could see is if someone searches for a specific word combination.
Databases are notoriously bad at semantics (i.e. they don't understand the concept of neurology or urology - everything is just a string of characters).
The best solution would be to create a table which defines the terms (two columns, PK and the name of the term).
The query is then a join:
join table1.term_id = terms.term_id and terms.term = 'Urology'
That way, you can avoid the LIKE and search for specific results.
If you can't do this, then SQL is probably the wrong tool. Use LIKE to get a set of results which match and then, in an imperative programming language, clean those results from unwanted ones.
Judging from your content, can you not leverage the fact that there are quotes in the string you're searching for?
SELECT
[...]
WHERE
(content LIKE '%""Urology""%')