Extracting multiple values from BLOB as XML - sql

I have an XML like this in a BLOB column:
<?xml version="1.0" encoding="UTF-8"?>
<document xmlns="urn:xyzns">
<history>
<Event>
<year>1983</year>
<Country>
<Location>Lisbon</Location>
<Type>Political</Type>
</Country>
</Event>
<Event>
<Country>
<Location>USA</Location>
<Type>Entertainment</Type>
<year>2016</year>
</Country>
</Event>
</history>
</document>
As you can see the year can be either in the event level or at country level. There can be multiple events and multiple countries per event. This whole XML is stored in a BLOB column in Oracle. I need to extract the value of the year or better check if the year is 2000 and if so return the primary key of the row.
I used EXISTSNODE to check if the year tag is present.
select pk from table where XMLType(blobdata, nls_charset_id('UTF8')).EXISTSNODE('/Document/history/Event[*]/year',
'xmlns="urn:xyzns"') = 1 and EXTRACTVALUE(XMLTYPE(blobdata, nls_charset_id('UTF8')), '/Document/history/Event[*]/year/text()',
'xmlns="urn:xyzns"') = '2000';
However this fails and the extractvalue query returns multiple nodes, so I changed the parameter to '/Document/history/Event[1]/year/text()' to check and it works. However this wouldnt be enough as it only checks the first event tag.
I looked at other questions here and one of the options was to use XMLTABLE since extractvalue is deprecated. I am having trouble understanding the parameters given inside the XMLTABLE. Could someone explain how to use XMLTABLE in this scenario? I should point out that the original datatype is BLOB and not CLOB. Thank you.

Use XMLTABLE to get values for both locations and then use COALESCE to show whichever is not NULL:
SELECT COALESCE( year, country_year ) AS year,
location,
type
FROM table_name t
CROSS APPLY XMLTABLE(
XMLNAMESPACES( DEFAULT 'urn:xyzns' ),
'/document/history/Event'
PASSING XMLTYPE(t.blobdata, nls_charset_id('UTF8'))
COLUMNS
year NUMBER(4,0) PATH './year',
country_year NUMBER(4,0) PATH './Country/year',
location VARCHAR2(200) PATH './Country/Location',
type VARCHAR2(200) PATH './Country/Type'
) x
Which, for the sample data:
CREATE TABLE table_name ( blobdata BLOB );
INSERT INTO table_name
VALUES (
UTL_RAW.CAST_TO_RAW(
'<?xml version="1.0" encoding="UTF-8"?>
<document xmlns="urn:xyzns">
<history>
<Event>
<year>1983</year>
<Country>
<Location>Lisbon</Location>
<Type>Political</Type>
</Country>
</Event>
<Event>
<Country>
<Location>USA</Location>
<Type>Entertainment</Type>
<year>2016</year>
</Country>
</Event>
</history>
</document>'
)
);
Outputs:
YEAR | LOCATION | TYPE
---: | :------- | :------------
1983 | Lisbon | Political
2016 | USA | Entertainment
db<>fiddle here

Related

XML to SQL table (epcis:EPCISDocument)

Could you help me here, please? I need to do a query with an XML file, however, it has something different because it has the epcis: at the beginning of the XML document.
So, if I try to do the query with epcis: and dts: the result is:
Msg 2229, Level 16, State 1, Line 38.
XQuery [nodes()]: The name "epcis" does not denote a namespace.
And if I try to do the query without epcis: and dts:, the result is in blank.
BEGIN
DECLARE #archivo XML
SET #archivo = (
'<?xml version="1.0" encoding="utf-8"?>
<epcis:EPCISDocument xmlns:dts="urn:dts:extension:xsd" schemaVersion="1.2" creationDate="2021-06-30T07:29:32.6511940Z"
xmlns:epcis="urn:epcglobal:epcis:xsd:1">
<EPCISBody>
<EventList>
<ObjectEvent>
<eventTime>2021-06-30T07:29:32</eventTime>
<eventTimeZoneOffset>+02:00</eventTimeZoneOffset>
<action>OBSERVE</action>
<bizStep>code</bizStep>
<dts:epcItemList>
<item>
<epc>123456789</epc>
<code>123456789zz</code>
</item>
<item>
<epc>9687654321</epc>
<code>9687654321zz</code>
</item>
<item>
<epc>147258369</epc>
<code>147258369zz</code>
</item>
</dts:epcItemList>
</ObjectEvent>
</EventList>
</EPCISBody>
</epcis:EPCISDocument>'
)
SELECT
patch.r.value('(epc)[1]', 'varchar(100)') as [epc],
patch.r.value('(code)[1]', 'varchar(100)') as [code]
FROM
#archivo.nodes('EPCISDocument/EPCISBody/EventList/ObjectEvent/epcItemList/item') AS patch(r)
END
GO
I need to export the information as a SQL table per item, like this:
| epc | code |
| -------- | ------------ |
| 123456789 | 123456789zz |
| 9687654321 | 9687654321zz |
enter image description here
enter image description here
Thank you so much
Juan
You need to declare your namespaces. The namespace aliases present in the XML are not relevant, you can use any alias you like (that's the bit after AS in the declaration).
Also, text() is more performant when used in .value
WITH XMLNAMESPACES (
'urn:epcglobal:epcis:xsd:1' AS epcis,
'urn:dts:extension:xsd' AS dts
)
SELECT
patch.r.value('(epc/text())[1]', 'varchar(100)') as [epc],
patch.r.value('(code/text())[1]', 'varchar(100)') as [code]
FROM #archivo.nodes('epcis:EPCISDocument/EPCISBody/EventList/ObjectEvent/dts:epcItemList/item') as patch(r);
SQL Fiddle

Delete Empty tag from xmltype oracle

i want try to delete the empty tag from xmltype. I Have generate the below xml from oracle type. In the collection few elements does not have values so i generated with empty tag.
Can any one please help me out:
Actual output:
<MESSAGE>
<LOCATIONS>
<LOCATION_ID>9999</LOCATION_ID>
<LOC_TYPE>S</LOC_TYPE>
<NAME>Test Location</NAME>
<PHONE_NUM>08 </PHONE_NUM>
<LAST_MODIFIED_BY/>
<LAST_MODIFIED_DATE/>
<POS_CODE/>
</LOCATIONS>
</MESSAGE>
Expected output:
<MESSAGE>
<LOCATIONS>
<LOCATION_ID>9999</LOCATION_ID>
<LOC_TYPE>S</LOC_TYPE>
<NAME>Test Location</NAME>
<PHONE_NUM>08 </PHONE_NUM>
</LOCATIONS>
</MESSAGE>
Use DELETEXML and look for the XPath //*[not(text())][not(*)] to find elements that contain no text and no children:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( xml ) AS
SELECT XMLTYPE( '<MESSAGE>
<LOCATIONS>
<LOCATION_ID>9999</LOCATION_ID>
<LOC_TYPE>S</LOC_TYPE>
<NAME>Test Location</NAME>
<PHONE_NUM>08 </PHONE_NUM>
<LAST_MODIFIED_BY/>
<LAST_MODIFIED_DATE/>
<POS_CODE/>
</LOCATIONS>
</MESSAGE>' ) FROM DUAL;
Query 1:
SELECT DELETEXML(
xml,
'//*[not(text())][not(*)]'
).getStringVal()
FROM table_name
Results:
| DELETEXML(XML,'//*[NOT(TEXT())][NOT(*)]').GETSTRINGVAL() |
|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| <MESSAGE><LOCATIONS><LOCATION_ID>9999</LOCATION_ID><LOC_TYPE>S</LOC_TYPE><NAME>Test Location</NAME><PHONE_NUM>08 </PHONE_NUM></LOCATIONS></MESSAGE> |
SELECT
deletexml(xml_data, '//*[not(text())][not(*)]').getstringval()
FROM
(
SELECT
xmltype('<MESSAGE>
<LOCATIONS>
<LOCATION_ID>9999</LOCATION_ID>
<LOC_TYPE>S</LOC_TYPE>
<NAME>Test Location</NAME>
<PHONE_NUM>08 </PHONE_NUM>
<LAST_MODIFIED_BY/>
<LAST_MODIFIED_DATE/>
<POS_CODE/>
</LOCATIONS>
</MESSAGE>'
) xml_data
FROM
dual
)
this is working fine thanks

How to pull XML key "value" from SQL CLOB

I am attempting to extract information from XML stored in a CLOB column. I've searched the forums and thus far have been unable to get the data to pull as needed. I have a basic understanding of SQL but this is beyond me.
The XML is similar to the following:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Header>
<OrderNum value="12354321"/>
<ExtractDate value="11-30-2012"/>
<RType value="Status"/>
<Company value="Company"/>
</Header>
<Body>
<Status>
<Order>
<ActivityType value="ValidateRequest"/>
<EndUser>
<Name value="Schmo, Joe"/>
<Address>
<SANO value="12345"/>
<SASN value="Mickey Mouse"/>
<SATH value="Lane"/>
<SASS value="N"/>
<City value="Orlando"/>
<State value="FL"/>
<Zip value="34786"/>
<Number value="5550000"/>
</Address>
</EndUser>
<COS value="1"/>
<TOS value="3"/>
<MainNumber value="5550000"/>
</Order>
<ErrorCode value="400"/>
<ErrorMessage value="RECEIVED"/>
</Status>
</Body>
</Response>
I want to get the values under "Address".
I've tried the following but it returns "NULL".
SELECT EXTRACTVALUE(XMLTYPE(RESPONSE_CLOB),'/Response/Body/Status/Order/EndUser/Address/SANO') AS SANO
FROM RESPONSE_TABLE
WHERE ROWNUM < 2
I am trying to get it so I can pull the "12345" assigned as "value" in "SANO" (ultimately getting the value for other fields, but want to at least get the one working first).
You're currently retrieving the text value of the node, but 12345 is the value attribute of the element rather than its text content. So you would need to use the #attribute syntax, i.e.:
SELECT EXTRACTVALUE(XMLTYPE(RESPONSE_CLOB),'/Response/Body/Status/Order/EndUser/Address/SANO/#value') AS SANO
FROM RESPONSE_TABLE
WHERE ROWNUM < 2;
SANO
--------------------
12345
But extractvalue is deprecated; assuming you're on a recent version of Oracle it would be better to use an XMLQuery:
SELECT XMLQUERY(
'/Response/Body/Status/Order/EndUser/Address/SANO/#value'
PASSING XMLTYPE(RESPONSE_CLOB)
RETURNING CONTENT
) AS SANO
FROM RESPONSE_TABLE
WHERE ROWNUM < 2;
You may find it even easier to use an XMLTable - necessary if an XML document has multiple Address nodes, but even with just one pulling the values out as columns is less repetitive, and it makes it easier to retrieve suitable data types:
select x.*
from response_table rt
cross join xmltable(
'/Response/Body/Status/Order/EndUser/Address'
passing xmltype(rt.response_clob)
columns sano number path 'SANO/#value',
sasn varchar2(30) path 'SASN/#value',
sath varchar2(10) path 'SATH/#value'
-- etc.
) x
where rownum < 2;
SANO SASN SATH
-------------------- ------------------------------ ----------
12345 Mickey Mouse Lane
Read more about using these functions to query XML data.

How to insert NULL into SQL Server DATE field *from XML*

I've got some XML that I'm trying to insert into a Microsoft SQL Server database using their XML datatype functions.
One of the table fields is a nullable DATE column. If the node is missing, then it's inserted as NULL which is great. However, if the node is present but empty <LastDay/> when running the XPath query, it interprets the value from the empty node as an empty string '' instead of NULL. So when looking at the table results, it casts the date to 1900-01-01 by default.
I would like for empty nodes to also be inserted as NULL instead of the default empty string '' or 1900-01-01. How can I get it to insert NULL instead?
CREATE TABLE myxml
(
"id" INT,
"name" NVARCHAR(100),
"company" NVARCHAR(100),
"lastday" DATE
);
DECLARE #xml XML =
'<?xml version="1.0" encoding="UTF-8"?>
<Data xmlns="http://example.com" xmlns:dmd="http://example.com/data-metadata">
<Company dmd:name="Adventure Works Ltd.">
<Employee id="1">
<Name>John Doe</Name>
<LastDay>2016-08-01</LastDay>
</Employee>
<Employee id="2">
<Name>Jane Doe</Name>
</Employee>
</Company>
<Company dmd:name="StackUnderflow">
<Employee id="3">
<Name>Jeff Puckett</Name>
<LastDay/>
</Employee>
<Employee id="4">
<Name>Ill Gates</Name>
</Employee>
</Company>
</Data>';
WITH XMLNAMESPACES (DEFAULT 'http://example.com', 'http://example.com/data-metadata' as dmd)
INSERT INTO myxml (id,name,company,lastday)
SELECT
t.c.value('#id', 'INT' ),
t.c.value('Name[1]', 'VARCHAR(100)' ),
t.c.value('../#dmd:name','VARCHAR(100)' ),
t.c.value('LastDay[1]', 'DATE' )
FROM #xml.nodes('/Data/Company/Employee') t(c)
This produces:
id name company lastday
------------------------------------------------
1 John Doe Adventure Works Ltd. 2016-08-01
2 Jane Doe Adventure Works Ltd. NULL
3 Jeff Puckett StackUnderflow 1900-01-01
4 Ill Gates StackUnderflow NULL
I am trying to achieve:
id name company lastday
------------------------------------------------
1 John Doe Adventure Works Ltd. 2016-08-01
2 Jane Doe Adventure Works Ltd. NULL
3 Jeff Puckett StackUnderflow NULL
4 Ill Gates StackUnderflow NULL
You have to use NULLIF function to avoid default values popping out from XML selection.
Returns a null value if the two specified expressions are equal.
Your query will be changed as below:
SELECT
t.c.value('#id', 'INT' ),
t.c.value('Name[1]','VARCHAR(100)' ),
t.c.value('../#dmd:name', 'VARCHAR(100)' ),
NULLIF(t.c.value('LastDay[1]', 'DATE' ),'')
FROM #xml.nodes('/Data/Company/Employee') t(c)
For more information on NULLIF, please check this MSDN page.
Besides techspider's very good answer I'd like to show another approach:
Doing .nodes() on Company and CROSS APPLY .nodes() on Employee allows a cleaner XPath navigation and avoids the backward navigation you are using by ../#dmd.name. In your case this is just for info probably, but good to consider: If there was a company without any Employee you would skip the whole company otherwise... (My code would skip as well due to the CROSS APPLY, but you could use OUTER APPLY).
And to your actual question: Using the internal cast as xs:date will do the logic within the XQuery and should be faster...
WITH XMLNAMESPACES (DEFAULT 'http://example.com', 'http://example.com/data-metadata' as dmd)
INSERT INTO myxml (id,name,company,lastday)
SELECT
e.value('#id', 'INT' ),
e.value('Name[1]', 'VARCHAR(100)' ),
c.value('#dmd:name', 'VARCHAR(100)' ),
e.value('let $x:=LastDay[1] return $x cast as xs:date?','DATE' )
FROM #xml.nodes('/Data/Company') AS A(c)
CROSS APPLY c.nodes('Employee') AS B(e)

Extracting a node where xmlns is set to blank

I'm having difficulties extracting the value from certain nodes in an XML structure using XMLTABLE. Below query works perfectly when you remove the xmlns="" attribute from the SubListItem nodes. And as you can see, the XML already has a default namespace. I honestly have no clue how I can deal with this "blanking out" of the namespace on certain nodes like this.
For further clarification, the creation of this XML is not within my control and is provided by a third-party. I've also changed the names of the nodes and the content from the delivered files while preserving the structure of the XML.
SELECT f.airline, f.flightnumber, fl.gate
FROM xmltable(
xmlnamespaces(
default 'http://some/name.space',
'http://www.w3.org/2001/XMLSchema' as "xsd",
'http://www.w3.org/2001/XMLSchema-instance' as "xsi"
),
'Body/Flight'
passing xmltype(
'<?xml version="1.0" encoding="utf-16"?>
<Body xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://some/name.space">
<Sender>
<System>ConnectionManagement</System>
</Sender>
<Flight>
<Airline>ABC</Airline>
<Number>1234</Number>
<SubList>
<SubListItem xmlns="">
<Gate>X</Gate>
</SubListItem>
<SubListItem xmlns="">
<Gate>Y</Gate>
</SubListItem>
<SubListItem xmlns="">
<Gate>Z</Gate>
</SubListItem>
</SubList>
</Flight>
</Body>'
)
columns airline varchar2(100) path 'Airline'
, flightNumber VARCHAR2(5) path 'Number'
, subList XMLTYPE path 'SubList'
) f
, xmltable (
xmlnamespaces( default 'http://some/name.space'),
'/SubList/SubListItem'
passing f.subList
columns gate varchar2(5) path 'Gate'
) fl
;
How can I target the Gate node when the XML looks like this?
Leave the default namespace alone in the second XMLTable, and specify a named namespace for the path you do have:
...
, xmltable (
xmlnamespaces( 'http://some/name.space' as "ns"),
'/ns:SubList/SubListItem'
passing f.subList
columns gate varchar2(5) path 'Gate'
) fl
;
AIRLINE FLIGH GATE
---------- ----- -----
ABC 1234 X
ABC 1234 Y
ABC 1234 Z
The SubList still has to match that, but as the child nodes don't the default is incorrect the way you have it. If you remove the xmlns="" as you mentioned in the question then that inherits the namespace from its parent, so your default works. With that override to no-namespace you can't use a default.