T-SQL XML node value - sql

I'm trying to extract the values from the following xml document
<response>
<entry>
<title>the tales</title>
<subject-area code="1" abbrev="XX1">Test1</subject-area>
<subject-area code="2" abbrev="XX2">Test2</subject-area>
</entry>
</response>
but I'm having problem getting the subject-area text values i.e. "Test1"
I'm using the below T-SQL to extract the rest of the values, I'm using a cross appy on the node as I required this to loop to get all values so can't use [1] etc to extract it that way as I'm not sure how many subject area there will be.
Any ideas
SELECT
,a.APIXMLResponse.value('(response[1]/entry[1]/title[1])','VARCHAR(250)') AS Title
,sa.value('(./#code)','varchar(10)') AS SubjectAreaCode
,sa.value('(./#abbrev)','varchar(10)') AS SubjectAreaAbbrev
FROM [dbo].[APIXML] a
CROSS APPLY APIXMLResponse.nodes('response/entry/subject-area') AS SubjectArea(sa)

Although there is a working solution in a comment already, I'd like to point out some things:
Using just '.' as path can lead to very annoying effects, if there are nested elements.
Looking for performance it is recommended to use text()[1] to read the needed value at its actual place (Here are some details with examples).
As the internal values are NVARCHAR(x) it is slightly faster to use NVARCHAR as target type (if you don't have a reason to do otherwise...
That's my query:
SELECT
a.APIXMLResponse.value('(response/entry/title)[1]','NVARCHAR(250)') AS Title
,sa.value('#code','nvarchar(10)') AS SubjectAreaCode
,sa.value('#abbrev','nvarchar(10)') AS SubjectAreaAbbrev
,sa.value('text()[1]','nvarchar(10)') AS SubjectAreaContent
FROM #mockup a
CROSS APPLY APIXMLResponse.nodes('response/entry/subject-area') AS SubjectArea(sa)

Related

Find XML tag which is present several times

I am working with an Oracle database 19c.
I have a table with the blob field "MSG_BODY". This field contains XML's like that:
<Body xmlns = "http://www.finnova.ch/ZV/EHF/021">
<Auftrag>
<Auftragsinformation>
<Auftragsidentifikation>
<AUF_LNR>987987987987</AUF_LNR>
<APPL_ID>9999</APPL_ID>
</Auftragsidentifikation>
<Auftragsreferenz>
<EXT_REF>TEST-2020082109574181</EXT_REF>
<EXT_AUF_REF>BA18081508D86B28</EXT_AUF_REF>
<KD_LNR_ERF>901</KD_LNR_ERF>
</Auftragsreferenz>
</Auftragsinformation>
<Zahlungsliste>
<Zahlung>
<Identifikation>
<ZV_ZLG_SYS_LNR>987987987987</ZV_ZLG_SYS_LNR>
<ZV_ZLG_LNR>1</ZV_ZLG_LNR>
</Identifikation>
<Referenz>
<EXT_REF>ABCD654654654</EXT_REF>
<EXT_REF_AUF>XX-XXX 230/99999/1</EXT_REF_AUF>
<EXT_REF_AUF_IB>BA9999988888</EXT_REF_AUF_IB>
<ZLG_INSTR_ID>BA999988886666</ZLG_INSTR_ID>
<MeldungsRef>
<MSG_TX_ID>123123123123</MSG_TX_ID>
<CS_ZLG_TRACK_ID>d8047b9f-a8c7-4d74-b5c7-470510240b60</CS_ZLG_TRACK_ID>
<CS_SWIFTGPI_SVC_ID>001</CS_SWIFTGPI_SVC_ID>
</MeldungsRef>
<MeldungsRef>
<MSG_TX_ID_DECK>xxxxxxxxxx</MSG_TX_ID_DECK>
</MeldungsRef>
</Referenz>
<Mitteilung>
<MIT_BEGxxx</MIT_BEG>
<MIT_BEG_XML>
<Ustrd>xxx</Ustrd>
</MIT_BEG_XML>
<PURP_CD>SALA</PURP_CD>
</Mitteilung>
</Zahlung>
</Zahlungsliste>
</Auftrag>
The tag "Zahlung" can exist multiple times and that's OK, but into the the tag "Zahlung" is the
tag "MeldungsRef". This tag should exist zero or one time for every tag "Zahlung". That's a fault shown in the XML above. I now need a query to select all rows in the table, which contains an XML, where the tag "MeldungsRef" is multiple times there. How can I do that?
Thanks for helping me!
Regards,
mablaser
You're looking for a second appearance of the MeldungsRef node within a Zahlung node, so you can look directly for that. This query shows you the first and second instances of the node, using xmlquery() and specifying the appearance to find with [1] or [2]:
select id,
xmlquery(
'declare default element namespace "http://www.finnova.ch/ZV/EHF/021"; (: :)
/Body/Auftrag/Zahlungsliste/Zahlung/Referenz/MeldungsRef[1]'
passing xmltype(msg_body)
returning content
) as first,
xmlquery(
'declare default element namespace "http://www.finnova.ch/ZV/EHF/021"; (: :)
/Body/Auftrag/Zahlungsliste/Zahlung/Referenz/MeldungsRef[2]'
passing xmltype(msg_body)
returning content
) as second
from your_table;
You could look for the second being not-null, but it's easier to use the same XPath with xmlexists() to test whether a second child node exists:
select id
from your_table
where xmlexists(
'declare default element namespace "http://www.finnova.ch/ZV/EHF/021"; (: :)
/Body/Auftrag/Zahlungsliste/Zahlung/Referenz/MeldungsRef[2]'
passing xmltype(msg_body)
);
db<>fiddle with one good (single node) and one bad (multiple node) row.
i receive the following error: ORA-32512: type 'xquery external variable'
As your base column is a BLOB you need to tell it which character set it's it, e.g.:
passing xmltype(msg_body, nls_charset_id('UTF8'))
db<>fiddle.

Select values from XML with multiple namespaces

I need to read a value of an attribute from an XML column. The data is an XML with multiple namespaces declared:
<sd:objectData xmlns:sd="http://sd-uri">
<sd:object sourceKey="FC5A0A51-7FB6-4C64-A13E-D4B00649E80E">
<do:properties xmlns:do="http://do-uri">
<do:property name="DECISION">
<do:propertyValues clearExistingValues="true">
<do:propertyValue action="add" valueInteger="1000142" tag="Approve" />
</do:propertyValues>
</do:property>
</do:properties>
</sd:object>
</sd:objectData>
I want to read the value of valueInteger, namely in this example 1000142. I tried with WITH XMLNAMESPACES() but I am not able to get it together to define both aliases.
Does this work for you?
DECLARE #XML xml = '
<sd:objectData xmlns:sd="http://sd-uri">
<sd:object sourceKey="FC5A0A51-7FB6-4C64-A13E-D4B00649E80E">
<do:properties xmlns:do="http://do-uri">
<do:property name="DECISION">
<do:propertyValues clearExistingValues="true">
<do:propertyValue action="add" valueInteger="1000142" tag="Approve" />
</do:propertyValues>
</do:property>
</do:properties>
</sd:object>
</sd:objectData>';
WITH XMLNAMESPACES ('http://sd-uri' AS sd,
'http://do-uri' AS do)
SELECT #XML.value('(/sd:objectData/sd:object/do:properties/do:property/do:propertyValues/do:propertyValue/#valueInteger)[1]','int') AS valueInteger;
In addition to Larnu's answer (which is the best and correct answer) just some alternative shortcuts, if you just want to get one value:
This query fetches the needed value in four different approaches
SELECT #XML.value(N'(//*/#valueInteger)[1]',N'int') AS Super_easy_with_double_wildcard
,#XML.value(N'(//*:propertyValue/#valueInteger)[1]',N'int') AS Easy_with_namespace_wildcard
,#XML.value(N'declare namespace do="http://do-uri";
(//do:propertyValue/#valueInteger)[1]',N'int') AS with_local_declaration
,#XML.value(N'declare namespace do="http://do-uri";
declare namespace sd="http://sd-uri";
(/sd:objectData/sd:object/do:properties/do:property/do:propertyValues/do:propertyValue/#valueInteger)[1]',N'int') AS with_full_local_declaration;
The general advise is: Be as specific as possible to avoid hassels. If you do no bother and you just need a readable, quick catch, you can take one of the alternatives.
UPDATE Add a predicate
With a predicate you can place a filter:
SELECT #XML.value(N'(//*:property[#name="DECISION"]//*:propertyValue/#valueInteger)[1]',N'int') AS Example_with_predicate

How to make LIKE in SQL look for specific string instead of just a wildcard

My SQL Query:
SELECT
[content_id] AS [LinkID]
, dbo.usp_ClearHTMLTags(CONVERT(nvarchar(600), CAST([content_html] AS XML).query('root/Physicians/name'))) AS [Physician Name]
FROM
[DB].[dbo].[table1]
WHERE
[id] = '188'
AND
(content LIKE '%Urology%')
AND
(contentS = 'A')
ORDER BY
--[content_title]
dbo.usp_ClearHTMLTags(CONVERT(nvarchar(600), CAST([content_html] AS XML).query('root/Physicians/name')))
The issue I am having is, if the content is Neurology or Urology it appears in the result.
Is there any way to make it so that if it's Urology, it will only give Urology result and if it's Neurology, it will only give Neurology result.
It can be Urology, Neurology, Internal Medicine, etc. etc... So the two above used are what is causing the issue.
The content is a ntext column with XML tag inside, for example:
<root><Location><location>Office</location>
<office>Office</office>
<Address><image><img src="Rd.jpg?n=7513" /></image>
<Address1>1 Road</Address1>
<Address2></Address2>
<City>Qns</City>
<State>NY</State>
<zip>14404</zip>
<phone>324-324-2342</phone>
<fax></fax>
<general></general>
<from_north></from_north>
<from_south></from_south>
<from_west></from_west>
<from_east></from_east>
<from_connecticut></from_connecticut>
<public_trans></public_trans>
</Address>
</Location>
</root>
With the update this content column has the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<Physicians>
<name>Doctor #1</name>
<picture>
<img src="phys_lab coat_gradation2.jpg?n=7529" />
</picture>
<gender>M</gender>
<langF1>
English
</langF1>
<specialty>
<a title="Neurology" href="neu.aspx">Neurology</a>
</specialty>
</Physicians>
</root>
If I search for Lab the result appears because there is the text lab in the column.
This is what I would do if you're not into making a CLR proc to use Regexes (SQL Server doesn't have regex capabilities natively)
SELECT
[...]
WHERE
(content LIKE #strService OR
content LIKE '%[^a-z]' + #strService + '[^a-z]%' OR
content LIKE #strService + '[^a-z]%' OR
content LIKE '%[^a-z]' + #strService)
This way you check to see if content is equal to #strService OR if the word exists somewhere within content with non-letters around it OR if it's at the very beginning or very end of content with a non-letter either following or preceding respectively.
[^...] means "a character that is none of these". If there are other characters you don't want to accept before or after the search query, put them in every 4 of the square brackets (after the ^!). For instance [^a-zA-Z_].
As I see it, your options are to either:
Create a function that processes a string and finds a whole match inside it
Create a CLR extension that allows you to call .NET code and leverage the REGEX capabilities of .NET
Aaron's suggestion is a good one IF you can know up front all the terms that could be used for searching. The problem I could see is if someone searches for a specific word combination.
Databases are notoriously bad at semantics (i.e. they don't understand the concept of neurology or urology - everything is just a string of characters).
The best solution would be to create a table which defines the terms (two columns, PK and the name of the term).
The query is then a join:
join table1.term_id = terms.term_id and terms.term = 'Urology'
That way, you can avoid the LIKE and search for specific results.
If you can't do this, then SQL is probably the wrong tool. Use LIKE to get a set of results which match and then, in an imperative programming language, clean those results from unwanted ones.
Judging from your content, can you not leverage the fact that there are quotes in the string you're searching for?
SELECT
[...]
WHERE
(content LIKE '%""Urology""%')

SQL Server : remove duplicated text within a string

I have a SQL Server 2008 table with a column containing lengthy HTML text. Near the top there is a link provided for an associated MP3 file which is unique to each record. The links are are all formatted as follows:
<div class="MediaSaveAs">Download Audio </div>
Unfortunately many records contain two or three sequential and identical instances of this link where there should be only one. Is there a relatively simple script I can run to find and eliminate the redundant links?
I'm not entirely sure - because your explanation wasn't very clear - but this appears to do what you want, although whether or not you consider this to be a "simple script", I don't know.
declare #Link nvarchar(200) = N'<div class="MediaSaveAs">Download Audio </div>'
declare #BadData nvarchar(max) = N'cbjahcgfhjasgfzhjaucv' + replicate(#Link, 3) + N'cabhjcsghagj',
#StartPattern nvarchar(34) = N'<div class="MediaSaveAs"><a href="',
#EndPattern nvarchar(27) = N'">Download Audio </a></div>'
select #BadData
select replace (
#BadData,
substring(#BadData, charindex(#StartPattern, #BadData), len(#BadData)-charindex(reverse(#EndPattern), reverse(#BadData))-charindex(#StartPattern, #BadData) + 2),
substring(#BadData, charindex(#StartPattern, #BadData), charindex(#EndPattern, #BadData) + len(#EndPattern) - charindex(#StartPattern, #BadData))
)
Personally I would not like to have to maintain this code; I would far rather use a script in another language that can actually parse HTML. You said this is "just a repeated text issue", but that doesn't mean it's an easy problem and especially not in a language like TSQL that has such limited support for string operations.
For future reference, please put all relevant information into the question - you can edit it if you need to - instead of leaving them in the comments where they are difficult to read and may be overlooked. And please post sample data and results instead of describing things in words.
First we need to identify the file names, which we can do with PATINDEX:
select
substring(html, PATINDEX('%filename%.mp3%', html), PATINDEX('%.mp3%', html)-PATINDEX('%filename%.mp3%', html)+4)
from files
And then secondly identify and the duplicates, check it out:
delete
from files
where id not in (
select max(id)
from files
group by substring(html, PATINDEX('%filename%.mp3%', html), PATINDEX('%.mp3%', html)-PATINDEX('%filename%.mp3%', html)+4)
)
http://www.sqlfiddle.com/#!3/887a3/5

Parsing XML Element value entirely in SQL form arbitrary string

We have log/audits we have compiled over some time that we would like to run some brief reports on.
One of the columns in the logs is JSON, but contains XML. We want to be able to parse out the value of a certain XML tag for each of the rows. So given an arbitrary string such as the following:
{ "XmlData" :"<tag1><tag2><TagToParse>234</TagToParse></tag2><tag1>".....}
I would like to run a sql query that return 234 when I give it the tag name TagToParse
What is the easiest way to do this ENTIRELY in SQL?
Give your container will always be tag1, then something like this should do it:
DECLARE #MyXML XML
SET #MyXML = '<tag1><tag2><TagToParse>234</TagToParse></tag2></tag1>'
SELECT
a.b.value('(/tag1//TagToParse/node())[1]', 'nvarchar(max)') AS Tag
FROM #MyXML.nodes('tag1') a(b)
Good luck.