Extract values from xmlType - sql

I know this has been asked before, but I'm not able to monkey-see-monkey-do with my formatting
WITH TEST_XML_EXTRACT AS
(
SELECT XMLTYPE (
'<tns:Envelope xmlns:tns="http://schemas.xmlsoap.org/soap/envelope/">
<testLeaf> ValueIWant </testLeaf>
</tns:Envelope>') testField
FROM dual
)
SELECT EXTRACTVALUE(testField,'/testLeaf'), -- doesn't work
EXTRACTVALUE(testField,'/tns'), -- doesn't work
EXTRACTVALUE(testField,'/Envelope'), -- doesn't work
EXTRACTVALUE(testField,'/BIPIBITY') -- doesn't work
FROM TEST_XML_EXTRACT;
It just returns blank.
And I can't find any exactly similar examples from the Oracle docs.
Any thoughts?

There's only testLeaf node as a decently defined one in the XPath expression. So , only that could be extracted by using extractValue() function in such a way below :
with test_xml_extract( testField ) as
(
select
XMLType(
'<tns:Envelope xmlns:tns="http://schemas.xmlsoap.org/soap/envelope/">
<testLeaf> ValueIWant </testLeaf>
</tns:Envelope>'
)
from dual
)
select extractValue(value(t), 'testLeaf') as testLeaf,
extractValue(value(t), 'tns') as tns,
extractValue(value(t), 'Envelope') as Envelope,
extractValue(value(t), 'BIPIBITY') as BIPIBITY
from test_xml_extract t,
table(XMLSequence(t.testField.extract('//testLeaf'))) t;
TESTLEAF TNS ENVELOPE BIPIBITY
---------- ------- ---------- ----------
ValueIWant
Demo

You might do better passing the namespace to the extract process
WITH TEST_XML_EXTRACT AS
( SELECT XMLTYPE (
'<tns:Envelope xmlns:tns="http://schemas.xmlsoap.org/soap/envelope/">
<testLeaf> ValueIWant </testLeaf>
</tns:Envelope>') testField
FROM dual)
select t.testField.extract('/tns:Envelope/testLeaf',
'xmlns:tns="http://schemas.xmlsoap.org/soap/envelope/"').getstringval() val,
EXTRACTVALUE(t.testField,'/tns:Envelope/testLeaf', 'xmlns:tns="http://schemas.xmlsoap.org/soap/envelope/"') extval,
EXTRACTVALUE(t.testField,'/*/testLeaf', 'xmlns:tns="http://schemas.xmlsoap.org/soap/envelope/"') extval_wc
from TEST_XML_EXTRACT t;

Related

XMLQuery to get data

I have the following SQL :
with q1 (Tdata) as (
SELECT XMLtype(transportdata, nls_charset_id('AL32UTF8'))
from bph_owner.paymentinterchange pint
where --PINT.INCOMING = 'T' and
PINT.TRANSPORTTIME >= to_date('2022-08-10', 'yyyy-mm-dd')
and pint.fileformat = 'pain.001.001.03'
)
--select XMLQuery('//*:GrpHdr/*:InitgPty/Nm/text()'
select tdata, XMLQuery('//*:GrpHdr/*:CtrlSum/text()'
passing Tdata
returning content).getstringval()
,
XMLQuery('//*:GrpHdr/*:MsgId/text()'
passing Tdata
returning content).getstringval()
from q1;
This works but for the InitgPty/Nm/ it doesn't - anybody know how I can extract this information?
Please be gentle as I don't work with XML much.
Thanks
Based on sample data from a previous question and other relating to this XML schema, it appears you just need to wildcard the namespace for Nm node, as well as its parents:
'//*:GrpHdr/*:InitgPty/*:Nm/text()'
so the query could become:
with q1 (Tdata) as (
SELECT XMLtype(transportdata, nls_charset_id('AL32UTF8'))
from paymentinterchange pint
where --PINT.INCOMING = 'T' and
PINT.TRANSPORTTIME >= to_date('2022-08-10', 'yyyy-mm-dd')
and pint.fileformat = 'pain.001.001.03'
)
select
XMLQuery('//*:GrpHdr/*:InitgPty/*:Nm/text()'
passing Tdata
returning content).getstringval() as nm
,
XMLQuery('//*:GrpHdr/*:CtrlSum/text()'
passing Tdata
returning content).getstringval() as ctrlsum
,
XMLQuery('//*:GrpHdr/*:MsgId/text()'
passing Tdata
returning content).getstringval() as msgig
from q1;
But if you're pulling multiple values out of the same document then it's simpler to use XMLTable rather than multiple XMLQuery calls:
select x.nm, x.ctrlsum, x.msgid
from paymentinterchange pint
cross apply XMLTable(
'//*:GrpHdr'
passing XMLtype(pint.transportdata, nls_charset_id('AL32UTF8'))
columns nm varchar2(50) path '*:InitgPty/*:Nm',
ctrlsum number path '*:CtrlSum',
msgid varchar2(50) path '*:MsgId'
) x
where pint.transporttime >= date '2022-08-10'
and pint.fileformat = 'pain.001.001.03';
db<>fiddle with data adapted from a previous question.
Before Oracle 12c you can use cross join instead of cross apply. I don't think it would make a difference in later versions either, that just seems to be my default now...
select x.nm, x.ctrlsum, x.msgid
from paymentinterchange pint
cross join XMLTable(
'//*:GrpHdr'
passing XMLtype(pint.transportdata, nls_charset_id('AL32UTF8'))
columns nm varchar2(50) path '//*:InitgPty/*:Nm',
ctrlsum number path '//*:CtrlSum',
msgid varchar2(50) path '//*:MsgId'
) x
where pint.transporttime >= date '2022-08-12'
and pint.fileformat = 'pain.001.001.03';
db<>fiddle showing it in 11.2.0.2, where it seems to need the // at the start of all the paths, including in the columns clause - those aren't needed in later versions, but here it returns nulls without them.

Oracle Parse NCLOB data to Output or New Table

I have an Oracle 11.2.0.4.0 table named LOOKUPTABLE with 3 fields
LOOKUPTABLEID NUMBER(12)
LOOKUPTABLENM NVARCHAR2(255)
LOOKUPTABLECONTENT NCLOB
The data in the NCLOB field is highly validated on insert so I'm certain the data always is a comma separated string with a CRLF on the end so reads exactly like a simple CSV file. Example ([CRLF] is representation of an actual CRLF, not text)
WITH lookuptable AS (
SELECT
1 AS "LOOKUPTABLEID",
'CODES.TBL' AS "LOOKUPTABLENM",
TO_NCLOB('851,ALL HOURS WORKED GLASS,G,0,,,,,,'||chr(10)||chr(13)||
'935,ALL OT AND HW HRS,G,0,,,,,,'||chr(10)||chr(13)||
'934,ALL PAID TIME,G,0,,,,,,'||chr(10)||chr(13)) AS "LOOKUPTABLECONTENT"
FROM dual
)
SELECT lookuptablecontent FROM lookuptable WHERE lookuptablenm='CODES.TBL';
"851,ALL HOURS WORKED GLASS,G,0,,,,,,[CRLF]935,ALL OT AND HW HRS,G,0,,,,,,[CRLF]934,ALL PAID TIME,G,0,,,,,,[CRLF]"
I essentially want to have a query that can output 1 row for each line in the CLOB. I'm using an application that will read this SQL and write it to a text file for me but it cannot handle CLOB data types and I don't have the option to write directly to file from SQL itself. I have to have a query that can produce this result and allow my app to write the file. I do have the ability to create/write my own tables so a procedure that would read the CLOB into a new table and then I would select from that table in my application would be acceptable if that's better, its just over my head right now. Desired output below, thanks in advance for any help :)
1. 851,ALL HOURS WORKED GLASS,G,0,,,,,,
2. 935,ALL OT AND HW HRS,G,0,,,,,,
3. 934,ALL PAID TIME,G,0,,,,,,
This is a specific case of a general question "how to split a string", and I link this question a lot for more details on that. In this case, instead of a comma, the delimiter that you want to split on is CRLF, or chr(10)||chr(13).
Here's a simple solution with regexp_substr. It's not the fastest solution, but it works fine in simple scenarios. If you need better performance, see the version in the link above with a recursive CTE and no regexp.
WITH lookuptable AS (
SELECT
1 AS LOOKUPTABLEID,
'CODES.TBL' AS LOOKUPTABLENM,
TO_NCLOB('851,ALL HOURS WORKED GLASS,G,0,,,,,,'||chr(10)||chr(13)||
'935,ALL OT AND HW HRS,G,0,,,,,,'||chr(10)||chr(13)||
'934,ALL PAID TIME,G,0,,,,,,'||chr(10)||chr(13)) AS LOOKUPTABLECONTENT
FROM dual
)
SELECT lookuptableid as id, to_char(regexp_substr(lookuptablecontent,'[^('||chr(13)||chr(10)||')]+', 1, level))
FROM lookuptable
WHERE lookuptablenm='CODES.TBL'
connect by level <= regexp_count(lookuptablecontent, '[^('||chr(13)||chr(10)||')]+')
and PRIOR lookuptableid = lookuptableid and PRIOR SYS_GUID() is not null -- needed if more than 1 source row
order by lookuptableid, level
;
Output:
id r
1 851,ALL HOURS WORKED GLASS,G,0,,,,,,
1 935,ALL OT AND HW HRS,G,0,,,,,,
1 934,ALL PAID TIME,G,0,,,,,,
My example data and format using the recursive CTE without regexp from link provided by #kfinity
WITH lookuptable (lookuptableid, lookuptablenm, lookuptablecontent) AS (
SELECT
1,
'CODES.TBL',
TO_NCLOB('ID,NAME,TYPE,ISMONEYSW,EARNTYPE,EARNCODE,RATESW,NEGATIVESW,OVERRIDEID,DAILYSW'||chr(13)||chr(10)||
'851,ALL HOURS WORKED GLASS,G,0,,,,,,'||chr(13)||chr(10)||
'935,ALL OT AND HW HRS,G,0,,,,,,'||chr(13)||chr(10)||
'934,ALL PAID TIME,G,0,,,,,,'
)
FROM dual
), CTE (lookuptableid, lookuptablenm, lookuptablecontent, startposition, endposition) AS (
SELECT
lookuptableid,
lookuptablenm,
lookuptablecontent,
1,
INSTR(lookuptablecontent, chr(13)||chr(10))
FROM lookuptable
WHERE lookuptablenm = 'CODES.TBL'
UNION ALL
SELECT
lookuptableid,
lookuptablenm,
lookuptablecontent,
endposition + 1,
INSTR(lookuptablecontent, chr(13)||chr(10), endposition+1)
FROM CTE
WHERE endposition > 0
)
SELECT
lookuptableid,
lookuptablenm,
SUBSTR(lookuptablecontent, startposition, DECODE(endposition, 0, LENGTH(lookuptablecontent) + 1, endposition) - startposition) AS lookuptablecontent
FROM CTE
ORDER BY lookuptableid, startposition;

Regex extract in BigQuery issue

I'm trying to simplify a column in BigQuery by using BigQuery extract on it but I am having a bit of an issue.
Here are two examples of the data I'm extracting from:
dc_pre=CLXk_aigyOMCFQb2dwod4dYCZw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=OVERDRFT;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.bank.co.za/onlineContent/ga_bridge.html
dc_pre=COztt4-tyOMCFcji7Qod440PCw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=DDA13;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.support.co.za/onlineContent/ga_bridge.html
I want to extract the portion between ;u1= and ;u2
Running the following legacy SQL Query
SELECT
Date(Event_Time),
Activity_ID,
REGEXP_EXTRACT(Other_Data, r'(?<=u1=)(.*\n?)(?=;u2)')
FROM
[sprt-data-transfer:dtftv2_sprt.p_activity_166401]
WHERE
Activity_ID in ('8179851')
AND Site_ID_DCM NOT IN ('2134603','2136502','2539719','2136304','2134604','2134602','2136701','2378406')
AND Event_Time BETWEEN 1563746400000000 AND 1563832799000000
I get the error...
Failed to parse regular expression "(?<=u1=)(.*\n?)(?=;u2)": invalid
perl operator: (?<
And this is where my talent runs out, is the error being caused because I'm using legacy SQL? Or is an unsupported format for REGEX?
Just tried this, and it worked, but with "Standart SQL" enabled.
select
other_data,
regexp_extract(other_data, ';u1=(.+?);u2') as some_part
from
unnest([
'dc_pre=CLXk_aigyOMCFQb2dwod4dYCZw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=OVERDRFT;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.bank.co.za/onlineContent/ga_bridge.html',
'dc_pre=COztt4-tyOMCFcji7Qod440PCw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=DDA13;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.support.co.za/onlineContent/ga_bridge.html'
]) as other_data
Not using regex but it still works...
with test as (
select 1 as id, 'dc_pre=CLXk_aigyOMCFQb2dwod4dYCZw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=OVERDRFT;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.bank.co.za/onlineContent/ga_bridge.html' as my_str UNION ALL
select 2 as id, 'dc_pre=COztt4-tyOMCFcji7Qod440PCw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=DDA13;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.support.co.za/onlineContent/ga_bridge.html'
),
temp as (
select
id,
split(my_str,';') as items
from test
),
flattened as (
select
id,
split(i,'=')[SAFE_OFFSET(0)] as left_side,
split(i,'=')[SAFE_OFFSET(1)] as right_side
from temp
left join unnest(items) i
)
select * from flattened
where left_side = 'u1'

How to extract this XML

I want to extract a xml file (XML_DATA):
The XML:
-<XP6>
+<INFO_1>
+<INFO_2>
+<INFO_3>
-<Prdct>
-<Prdct_row>
.....
<LILBFLO>Samsung,corp. </LILBFLO> <--value
I tried this, but it's not working:
EXTRACTVALUE(XML_DATA,'/Prdct/Prdct_row/LILBFLO/text()')
How to use extractvalue correctly?
Assuming the +/- symbols indicate collapsed nodes and your XML actually looks something like the sample in this CTE, you just need to include the root node in the path:
with your_table (xml_data) as (
select xmltype('<XP6>
<INFO_1/>
<INFO_2/>
<INFO_3/>
<Prdct>
<Prdct_row>
<LILBFLO>Samsung,corp. </LILBFLO>
</Prdct_row>
</Prdct>
</XP6>') from dual
)
select EXTRACTVALUE(XML_DATA,'/XP6/Prdct/Prdct_row/LILBFLO/text()')
from your_table;
EXTRACTVALUE(XML_DATA,'/XP6/PRDCT/PRDCT_ROW/LILBFLO/TEXT()')
------------------------------------------------------------
Samsung,corp.
But the extractvalue() function is deprecated, so you should use an XMLQuery instead:
select XMLQuery('/XP6/Prdct/Prdct_row/LILBFLO/text()' passing XML_DATA returning content)
from your_table;
XMLQUERY('/XP6/PRDCT/PRDCT_ROW/LILBFLO/TEXT()'PASSINGXML_DATARETURNINGCONTENT)
--------------------------------------------------------------------------------
Samsung,corp.

Find Substring - SQL

I need to find a substring that is in a text field that is actually partially xml. I tried converting it to xml and then use the .value method but to no avail.
The element(substring) I am looking for is a method name that looks like this:
AssemblyQualifiedName="IPMGlobal.CRM2011.IPM.CustomWorkflowActivities.ProcessChildRecords,
where the method at the end "ProcessChildRecords" could be another name such as "SendEmail". I know I can use the "CustomWorkflowActivities." and the , (comma) to find the substring (method name) but not sure how to accomplish it. In addition, there may be more that one instance listed of the **"CustomWorkflowActvities.<method>"**
Some Clarifications:
Below is my original query. It returns that first occurrence in each row but no additional. For example I might have in the string '...IPM.CustomWorkflowActivities.ProcessChildRecords...' and
'...IPM.CustomWorkflowActivities.GetworkflowContext...'
The current query only returns Approve Time Process,
ipm_mytimesheetbatch,
ProcessChildRecords
SELECT WF.name WFName,
(
SELECT TOP 1 Name
FROM entity E
WHERE WF.primaryentity = E.ObjectTypeCode
) Entity,
Convert(xml, xaml) Xaml,
SUBSTRING(xaml, Charindex('CustomWorkflowActivities.', xaml) + Len('CustomWorkflowActivities.'), Charindex(', IPMGlobal.CRM2011.IPM.CustomWorkflowActivities, Version=1.0.0.0', xaml) - Charindex('CustomWorkflowActivities.', xaml) - Len('CustomWorkflowActivities.'))
FROM FilteredWorkflow WF
WHERE 1 = 1
AND xaml LIKE '%customworkflowactivities%'
AND statecodename = 'Activated'
AND typename = 'Definition'
ORDER BY NAME
If you are using Oracle you could use REGEXP function:
WITH cte(t) as (
SELECT 'AssemblyQualifiedName="IPMGlobal.CRM2011.IPM.CustomWorkflowActivities.ProcessChildRecords,' FROM dual
)
SELECT t,
regexp_replace(t, '.*CustomWorkflowActivities.(.+)\,.*', '\1') AS r
FROM cte;
DBFiddle Demo
SQL Server:
WITH cte(t) as (
SELECT 'AssemblyQualifiedName="IPMGlobal.CRM2011.IPM.CustomWorkflowActivities.ProcessChildRecords,asfdsa'
)
SELECT t,SUBSTRING(t, s, CHARINDEX(',', t, s)-s)
FROM (SELECT t, PATINDEX( '%CustomWorkflowActivities.%', t) + LEN('CustomWorkflowActivities.') AS s
FROM cte
) sub;
DBFiddle Demo 2