XML - Extract after a String and before a certain character - sql

I've the following XML Code:
_RCFM*=.ยท<form><text id="NomeTransporteSAP" label="JOB: *" mandatory="true" multiline="true" readonly="false" visible="true">AA123EDC/NB: Cheque holding v05 TESTE PT 223427</text>
I'm trying to create a statement that allows me to get the ID: AA123EDC
For that I'm using:
SUBSTRING(col1, LEN(SUBSTRING(col1, 0, LEN(col1) - CHARINDEX ('DSI Request Number', col1))) + 1,
LEN(col1) - LEN(SUBSTRING(col1, 0, LEN(col1) - CHARINDEX ('DSI Request Number', col1)))
- LEN(SUBSTRING(col1, CHARINDEX ('</text><text id=', col1), LEN(col1))))
But it gives me the wrong string...
Anybody can give me a help?
Thanks!

Your first line is quite unclear (but you've tagged it with tsql...). It seems that you want to read a value form within an XML. Furthermore this value is not atomic, so you have to parse it out.
If my assumptions are correct, you should try it this way:
DECLARE #YourXML XML=
N'<form>
<text id="NomeTransporteSAP" label="JOB: *" mandatory="true" multiline="true" readonly="false" visible="true">AA123EDC/NB: Cheque holding v05 TESTE PT 223427</text>
</form>';
WITH ReadFromXML AS
(
SELECT #YourXML.value(N'(/form/text/text())[1]',N'nvarchar(max)') AS TheValue --AA123EDC/NB: Cheque holding v05 TESTE PT 223427
)
SELECT LEFT(TheValue,CHARINDEX('/',TheValue)-1)
FROM ReadFromXML;
This will use a CTE to retrieve the inner text in a derived table and cut away everything starting with the / using LEFT.
The CTE-approach is not necessary, but is much better to read.
If your XML is living within a table you can use the same approach, but in this case I'd use CROSS APPLY instead of the CTE.

Related

Extracting specific value from large string SQL

I've used a combination of CHARINDEX and SUBSTRING but can't get it working.
I get passed a variable in SQL that contains a lot of text but has an email in it. I need to extract the email value.
I have to use SQL 2008.
I'm trying to extract the value between "EmailAddress":" and ",
An example string is here:
{ "Type":test,
"Admin":test,
"User":{
"UserID":"16959191",
"FirstName":"Test",
"Surname":"Testa",
"EmailAddress":"Test.Test#test.com",
"Address":"Test"
}
}
Assuming you can't upgrade to 2016 or higher, you can use a combination of substring and charindex.
I've used a common table expression to make it less cumbersome, but you don't have to.
DECLARE #json varchar(4000) = '{ "Type":test,
"Admin":test,
"User":{
"UserID":"16959191",
"FirstName":"Test",
"Surname":"Testa",
"EmailAddress":"Test.Test#test.com",
"Address":"Test"
}
}';
WITH CTE AS
(
SELECT #Json as Json,
CHARINDEX('"EmailAddress":', #json) + LEN('"EmailAddress":') As StartIndex
)
SELECT SUBSTRING(Json, StartIndex, CHARINDEX(',', json, StartIndex) - StartIndex)
FROM CTE
Result: "Test.Test#test.com"
The first hint is: Move to v2016 if possible to use JSON support natively. v2008 is absolutely outdated...
The second hint is: Any string action (and all my approaches below will need some string actions too), will suffer from forbidden characters, unexpected blanks or any other surprise you might find within your data.
Try it like this:
First I create a mockup scenario to simulate your issue
DECLARE #tbl TABLE(ID INT IDENTITY,YourJson NVARCHAR(MAX));
INSERT INTO #tbl VALUES
(N'{ "Type":"test1",
"Admin":"test1",
"User":{
"UserID":"16959191",
"FirstName":"Test1",
"Surname":"Test1a",
"EmailAddress":"Test1.Test1#test.com",
"Address":"Test1"
}
}')
,(N'{ "Type":"test2",
"Admin":"test2",
"User":{
"UserID":"16959191",
"FirstName":"Test2",
"Surname":"Test2a",
"EmailAddress":"Test2.Test2#test.com",
"Address":"Test2"
}
}');
--Starting with v2016 there is JSON support
SELECT JSON_VALUE(t.YourJson, '$.User.EmailAddress')
FROM #tbl t
--String-methods
--use CHARINDEX AND SUBSTRING
DECLARE #FirstBorder NVARCHAR(MAX)='"EMailAddress":';
DECLARE #SecondBorder NVARCHAR(MAX)='",';
SELECT t.*
,A.Pos1
,B.Pos2
,SUBSTRING(t.YourJson,A.Pos1,B.Pos2 - A.Pos1) AS ExtractedEMail
FROM #tbl t
OUTER APPLY(SELECT CHARINDEX(#FirstBorder,t.YourJson)+LEN(#FirstBorder)) A(Pos1)
OUTER APPLY(SELECT CHARINDEX(#SecondBorder,t.YourJson,A.Pos1)) B(Pos2);
--use a XML trick
SELECT CAST('<x>' + REPLACE(REPLACE((SELECT t.YourJson AS [*] FOR XML PATH('')),'"EmailAddress":','<mailAddress value='),',',' />') + '</x>' AS XML)
.value('(/x/mailAddress/#value)[1]','nvarchar(max)')
FROM #tbl t
Some explanations:
JSON-support will parse the value directly from a JSON path.
For CHARINDEX AND SUBSTRING I use APPLY. The advantage is, that you can use the computed positions like a variable. No need to repeat the CHARINDEX statements over and over.
The XML approach will transform your JSON to a rather strange and ugly XML. The only sensefull element is <mailAddress> with an attribute value. We can use the native XML method .value() to retrieve the value you are asking for:
An intermediate XML looks like this:
<x>{ "Type":"test1" />
"Admin":"test1" />
"User":{
"UserID":"16959191" />
"FirstName":"Test1" />
"Surname":"Test1a" />
<mailAddress value="Test1.Test1#test.com" />
"Address":"Test1"
}
}</x>

Extract path from source(src) attribute in an HTML image tag(img)

I'm trying to extract image paths from a database field that contains HTML. The data in the database looks like this:
<p><strong><strong>DISCUSSION POINT</strong>: Nearly how many years did it take Sir Francis Drake to complete the first circumnavigation of the globe in 1580? <br /><p><img id="lk45459gjh4" src="../mediaForExam/dlfkeiut8484034djjd222.png" alt="dlfkeiut8484034djjd222.png" width="697" height="352" /></p>
From this string, I only need this part:
/mediaForExam/dlfkeiut8484034djjd222.png
I tried this query:
SELECT RIGHT(questionText, (LEN(questionText)-PATINDEX ( '%SRC="%' , questionText )-5)) AS MediaPath FROM exams.history
but it's returning a string like this:
./mediaForExam/dlfkeiut8484034djjd222.png"
alt="dlfkeiut8484034djjd222.png" width="697" height="352" />
Is there a way to only return the first slash, folder name, then file name( ie: /folderName/fileName.ext)
Thanks!
I would do :
select substring(hh.questionText, 1, charindex('"', hh.questionText) - 1) as MediaPath
from exams.history h cross apply
( values (stuff(questionText, 1, charindex('src=', questionText)+6, ''))
) hh (questionText);

Oracle 10.2.0.4.0 query on partial xpath

I need to change the below query to be able to query any kind of tender item.
/Basket/CardTenderItem/Description
/Basket/CashTenderItem/Description
So
/Basket/WildcardTenderItem/Description
I have looked at various examples on but cannot them to bring back any results when running (happily admit to user error if can get working!)
SELECT
RETURN_ID
,SALE_ID,
,extractValue(xmltype(RETURNxml),'/Basket/CashTenderItem/NetValue')
,extractValue(xmltype(RETURNxml),'/Basket/CashTenderItem/Description')
FROM SPR361
WHERE return_id = '9999.0303|20170327224954|2063'
If you only want to match anything the ends with TenderItem, but doesn't have anything after that, you could be specific with substring checks:
SELECT
RETURN_ID
,SALE_ID
,extractValue(xmltype(RETURNxml),
'/Basket/*[substring(name(), string-length(name()) - 9) = "TenderItem"]/NetValue')
,extractValue(xmltype(RETURNxml),
'/Basket/*[substring(name(), string-length(name()) - 9) = "TenderItem"]/Description')
FROM SPR361
WHERE return_id = '9999.0303|20170327224954|2063'
If you never have any nodes with anything after that fixed string then #Shnugo's contains approach is easier, and in Oracle would be very similar:
...
,extractValue(xmltype(RETURNxml),
'/Basket/*[contains(name(), "TenderItem")]/NetValue')
,extractValue(xmltype(RETURNxml),
'/Basket/*[contains(name(), "TenderItem")]/Description')
I'm not sure there's any real difference between name() and local-name() here.
If a basket can have multiple child nodes (card and cash, or more than one of each) you could also switch to XMLTable syntax:
SELECT
s.RETURN_ID
,s.SALE_ID
,x.netvalue
,x.description
FROM SPR361 s
CROSS JOIN XMLTable(
'/Basket/*[contains(name(), "TenderItem")]'
PASSING XMLType(s.RETURNxml)
COLUMNS netvalue NUMBER PATH './NetValue'
, description VARCHAR(80) PATh './Description'
) x
WHERE s.return_id = '9999.0303|20170327224954|2063'
And it's overkill here maybe, but for more complicated tests you can use other XPath syntax, like:
CROSS JOIN XMLTable(
'for $i in /Basket/*
where contains($i/name(), "TenderItem") return $i'
PASSING XMLType(s.RETURNxml)
...
This is SQL-Server syntax and I cannot test, if this works with Oracle too, but I think it will. You can use XQuery function contains():
DECLARE #xml XML=
N'<root>
<abcTenderItem>test1</abcTenderItem>
<SomeOther>should not show up</SomeOther>
<xyzTenderItem>test2</xyzTenderItem>
</root>';
SELECT #xml.query(N'/root/*[contains(local-name(),"TenderItem")]')
only the elements with "TenderItem" in their names show up:
<abcTenderItem>test1</abcTenderItem>
<xyzTenderItem>test2</xyzTenderItem>

Reading dynamic XML nodes in SQL Server

I have the following XML structure:
set #MailXML =
'<MailingCompany>
<Mailman>
<Name>Jamie</Name>
<Age> 24 </Age>
<Letter>
<DestinationAddress> 440 Mountain View Parade </DestinationAddress>
<DestinationCountry> USA </DestinationCountry>
<OriginCountry> Australia </OriginCountry>
<OriginAddress> 120 St Kilda Road </OriginAddress>
</Letter>
</Mailman>
</MailingCompany>'
My SQL currently looks like this:
-- Mail Insertion
INSERT INTO mailDB.dbo.Mailman
SELECT
m.value('Name[1]','varchar(50)') as Name,
m.value('Age[1]','varchar(50)') as Age
FROM
#MailXML.nodes('/MailingCompany/Mailman') as A(m)
SET #MailPersonFK = SCOPE_IDENTITY();
-- Letter Insertion
INSERT INTO mailDB.dbo.Letter
SELECT
l.value('DestinationAddress[1]', 'varchar(50)') as DestinationAddress,
l.value('DestinationCountry[1]', 'varchar(50)') as DestinationCountry,
l.value('OriginCountry[1]', 'varchar(50)') as OriginCountry,
l.value('OriginAddress[1]', 'varchar(50)') as OriginAddress
#MailPersonFK as MailID
FROM
#MailXML.nodes('MailingCompany/Mailman/Letter') as B(l)
I am trying to extract the Mailman and Letter data into their own respective tables. I have got that working however my issue is that the MailCompany node is dynamic. Sometimes it may be MailVehicle, for example, and I still need
to read the corresponding Mailman and Letter node data and insert them into their own respective tables.
So both
FROM #MailXML.nodes('/MailingCompany/Mailman') as A(t)
and
FROM #MailXML.nodes('MailingCompany/Mailman/Letter') as B(l)
Will need to be changed to allow MailingCompany to be dynamic.
I have tried to extract the parent node and concatenate it into a string to put into the .nodes function like the following:
set #DynXML = '/' + #parentNodeVar + '/Mailman'
FROM #MailXML.nodes(#DynXML) as A(t)
However I get the following error:
The argument 1 of the XML data type method "nodes" must be a string literal.
How can I overcome this dynamic XML issue?
Thank you very much in advance
Look at this reduced example:
DECLARE #xml1 XML=
N'<MailingCompany>
<Mailman>
<Name>Jamie</Name>
<Letter>
<DestinationAddress> 440 Mountain View Parade </DestinationAddress>
</Letter>
</Mailman>
</MailingCompany>';
DECLARE #xml2 XML=
N'<OtherName>
<Mailman>
<Name>Jodie</Name>
<Letter>
<DestinationAddress> This is the other address </DestinationAddress>
</Letter>
</Mailman>
</OtherName>';
SELECT #xml1.value(N'(*/Mailman/Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml1.value(N'(*/Mailman/Letter/DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
SELECT #xml2.value(N'(*/Mailman/Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml2.value(N'(*/Mailman/Letter/DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
You can replace a node's name with *.
Another trick is the deep search with // (same result as before):
SELECT #xml1.value(N'(//Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml1.value(N'(//DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
SELECT #xml2.value(N'(//Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml2.value(N'(//DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
The general rule: Be as specific as possible.

Combine 2 XML Output Statements into one XML file

I've been working on outputting from SQL to XML and have finally come up with 2 queries that work fabulously for what I need with one problem. I did these independently as I am a novice with XML and now I need to combine them into one file for output. What I have is pretty complicated using dynamic SQL so I'm just going to post the dynamic part and exclude all of the variable and un-pivoting stuff I'm doing at the beginning. If I could just get result set 1 to be at the top and result set 2 just below it would suffice. I would like a wrapper tag (not sure if that's the term) as the very first and very last lines in this file. I'm not sure if anyone can tell me what I'm doing wrong just by looking at the dynamic code but I thought it would be worth a try. As I said, both of these work great but produce 2 separate files. Thanks in advance!
EXEC ('
SELECT
(SELECT GETDATE() AS STARTDATE,
GETDATE() AS ENDDATE
FOR XML PATH(''RECDATE''),ELEMENTS, TYPE),
(SELECT COUNT(*) AS SURVEY_COUNT
FROM PM_TEMP.dbo.tmpKansasCancerCenterExtract FOR XML PATH(''''), ELEMENTS, TYPE),
(SELECT ''PEPM'' as SERVICE,
DataCol as VARNAME,
Question as QUESTION_TEXT,
AnswerValue as ANSWER_TEXT
from PM_TEMP.dbo.tmpKansasCancerCenterCodeSheetExtract FOR XML PATH(''QUESTION''), ROOT(''QUESTION_MAP''), ELEMENTS, TYPE)
FOR XML PATH(''HEADER'')
SELECT PATID,CLIENTID,SERVICE,PATVISITDT,DATE,
(
SELECT VarName,Value
FROM PM_TEMP.dbo.tmpKansasCancerCenterExtract_AnalysisPivot P
INNER JOIN PM_TEMP.dbo.tmpKansasCancerCenterExtract E ON E.PatVisitID = P.PatVisitID
FOR XML PATH(''Response''), TYPE, ROOT(''Analysis'')
),
(
SELECT VarName,Value
FROM PM_TEMP.dbo.tmpKansasCancerCenterExtract_DemoPivot P
INNER JOIN PM_TEMP.dbo.tmpKansasCancerCenterExtract E ON E.PatVisitID = P.PatVisitID
FOR XML PATH(''Response''), TYPE, ROOT(''Demographics'')
)
FROM PM_TEMP.dbo.tmpKansasCancerCenterExtract
FOR XML PATH(''PatientLevelData''),TYPE, ROOT(''PatientLevelData'')
')