Create Smaller XML based on value of element - lxml

On Python 3.7, I am looking to create a subset of a XML. For example, the larger XML is:
and am looking for a new XML like below, the condition to create a smaller subset depends on a list of ids in this case 101 and 102. All other student blocks will be deleted.
i.e. The output XML will depend on a list of id's, in this case ['101',102']
This is what I tried:
import lxml.etree
#Original Large XML
tree = etree.parse(open('students.xml'))
root = tree.getroot()
results = root.findall('student')
textnumbers = [r.find('details/id').text for r in results]
required_ids = ['101','102']
wanted = tree.xpath("//student/details/[not(#id in required_ids)]")
for node in unwanted:
#New Smaller XML
tree.write(open('student_output.xml', 'wb'))
But I am getting an expected error of "Invalid expression" for
wanted = tree.xpath("//student/details/[not(#id in required_ids)]")
I know it's a read, but i am fairly new to Python, thanks in advance for your help.

I think you can do it like this:
from lxml import etree as ET
required_ids = ['101','102']
for event, element in ET.iterparse('students.xml'):
if element.tag == 'student' and not(element.xpath('.//id/text()')[0] in required_ids):
if element.tag == 'data':
Instead of the dump you would of course want to write to a file, that is use
if element.tag == 'data':
tree = ET.ElementTree(element)
Your attempt fails as you can't simply use a Python list variable in XPath and in is not an XPath 1.0 operator.


Extracting XML data using SQL

I would like to be able to extract specific data from a XML type using Oracle in my example for the customer named "Arshad Ali"
This is my xml data that was inserted:
<Customer CustomerName="Arshad Ali" CustomerID="C001">
<Order OrderDate="2012-07-04T00:00:00" OrderID="10248">
<OrderDetail Quantity="5" ProductID="10" />
<OrderDetail Quantity="12" ProductID="11" />
<OrderDetail Quantity="10" ProductID="42" />
<Address> Address line 1, 2, 3</Address>
<Customer CustomerName="Paul Henriot" CustomerID="C002">
<Order OrderDate="2011-07-04T00:00:00" OrderID="10245">
<OrderDetail Quantity="12" ProductID="11" />
<OrderDetail Quantity="10" ProductID="42" />
<Address> Address line 5, 6, 7</Address>
<Customer CustomerName="Carlos Gonzlez" CustomerID="C003">
<Order OrderDate="2012-08-16T00:00:00" OrderID="10283">
<OrderDetail Quantity="3" ProductID="72" />
<Address> Address line 1, 4, 5</Address>
using get clob I was able to extract all of the customers.
Was wondering if anyone could help me extract data for a specific customer.. tried using the following but was unsuccessful
SELECT extract(OBJECT_VALUE, '/root/Customers') "customer"
FROM mytable2
WHERE existsNode(OBJECT_VALUE, '/customers[CustomerName="Arshad Ali" CustomerID="C001"]')
= 1;
The case and exact names of the XML nodes matter:
'/ROOT/Customers/Customer[#CustomerName="Arshad Ali"][#CustomerID="C001"]') "customer"
FROM mytable2
'/ROOT/Customers/Customer[#CustomerName="Arshad Ali"][#CustomerID="C001"]') = 1
If you only want to search by name then only use that attribute:
'/ROOT/Customers/Customer[#CustomerName="Arshad Ali"]') "customer"
FROM mytable2
'/ROOT/Customers/Customer[#CustomerName="Arshad Ali"]') = 1
But extract() and existsnode() are deprecated; use xmlquery() and xmlexists() instead:
SELECT xmlquery('/ROOT/Customers/Customer[#CustomerName="Arshad Ali"][#CustomerID="C001"]'
passing object_value
returning content) "customer"
FROM mytable2
WHERE xmlexists('/ROOT/Customers/Customer[#CustomerName="Arshad Ali"][#CustomerID="C001"]'
passing object_value)

Karate: Match repeating element in xml

I'm trying to match a repeating element in a xml to karate schema.
XML message
* def xmlResponse =
I want to match each with following karate schema
Given def serviceGroupItem =
This is how I tried
* xml serviceGroupListItems = get xmlResponse //serviceGroupList
* match each serviceGroupListItems == serviceGroupItem
But it doesn't work. Any idea how can I make it work
You have to match each serviceGroup.
* xml serviceGroupListItems = get xmlResponse //serviceGroupList
* match each serviceGroupListItems.serviceGroupList.serviceGroup == serviceGroupItem.serviceGroup

XPath doesn't provide proper tag

I'm trying to get tag "" from xml below.
If i execute request like this:
WITH x(col) AS (select'<document xmlns="" xmlns:ns2="" xmlns:xsi="" xsi:schemaLocation="">
<reqTransfer id="154638">
SELECT xpath('/document/pay/reqTransfer/source/card/bsc/text()', col) AS bsc
I get {}, but if I relpace the document start tag
<document xmlns="" xmlns:ns2="" xmlns:xsi="" xsi:schemaLocation="">
with <document> or even <document xmlns="">, I get { VISA } - that is right.
What should I do to replace <document xmlns="..."> with <document> or get { VISA } without replacement?
If you are working with XML namespaces, they are worth mentioning in your Xpath queries too, i.e. use
SELECT xpath('/d:document/d:pay/d:reqTransfer/d:source/d:card/d:bsc/text()', col,
ARRAY[ARRAY['d', '']]) AS bsc!17/9eecb/24719
See also:
how to ignore namespaces with XPath

Find element or attribute value anywhere in XML

I am trying to find the value of an element / attribute regardless of where it exists in the XML.
<?xml version="1.0" encoding="UTF-8"?>
<cXML payloadID="12345677-12345567" timestamp="2017-07-26T09:11:05">
<Credential domain="1212">
<Identity>01235 </Identity>
<Credential domain="1212">
<Credential domain="8989">
<Request deploymentMode="Prod">
<ConfirmationHeader noticeDate="2017-07-26T09:11:05" operation="update" type="detail">
<Comments>WO# generated</Comments>
<OrderReference orderDate="2017-07-25T15:22:11" orderID="123456780000">
<DocumentReference payloadID="5678-4567"/>
<ConfirmationItem quantity="1" lineNumber="1">
<ConfirmationStatus quantity="1" type="detail">
<ItemIn quantity="1">
<Money currency="USD">0.00</Money>
<Description>Test Descritpion 1</Description>
<ConfirmationItem quantity="1" lineNumber="2">
<ConfirmationStatus quantity="1" type="detail">
<ItemIn quantity="1">
<Money currency="USD">0.00</Money>
<Description>Test Descritpion 2</Description>
I want to get the value of the payloadID on the DocumentReference element. This is what I have tried so far:
Declare #Xml xml
Set #Xml = ('..The XML From Above..' as xml)
--no value comes back
Select c.value('(/*/DocumentReference/#payloadID)[0]','nvarchar(max)') from #Xml.nodes('//cXML') x(c)
--no value comes back
Select c.value('#payloadID','nvarchar(max)') from #Xml.nodes('/cXML/*/DocumentReference') x(c)
--check if element exists and it does
Select #Xml.exist('//DocumentReference');
I tried this in an xPath editor: //DocumentReference/#payloadID
This does work, but I am not sure what the equivalent syntax is in SQL
Calling .nodes() (like suggested in comment) is an unecessary overhead...
Better try it like this:
SELECT #XML.value('(//DocumentReference/#payloadID)[1]','nvarchar(max)')
And be aware, that XPath starts counting at 1. Your example with [0] cannot work...
--no value comes back
Select c.value('(/*/DocumentReference/#payloadID)[0]','nvarchar(max)') from...

Read XML data in SQL

I want to query data from XML. I have managed to retrive data from another set of XML data but this are a bit problematic.
Bellow you see the data and the query that does not retrive any data.
SET #xml=N'<DocumentXML>
<LoadApplicationResult xmlns:i="" xmlns="">
<root xmlns="">
<Guaranteer ChangeTime="2012-04-28T08:50:07.5706054+02:00" ChangedBy="sven" OldValue="">
<PercentGuarantee ChangeTime="2012-04-28T08:50:07.5706054+02:00" ChangedBy="sven" OldValue="">
<PluginData i:nil="true" />
<root xmlns="">
<root TableId="192">
<CustomData i:nil="true" />
<PluginData i:nil="true" />
<root xmlns="">
<root TableId="013">
<EmbProd.MonthFee Operator="DBLMUL" Target="CUSTOM.EPTermFee.ADD" Source="XPATH://PaySeries[1]/TermLength" DFValue="200">200</EmbProd.MonthFee>
<root TableId="759" GroupText="210" GroupText0="210">
<root xmlns="" />
<PluginData i:nil="true" />
<root xmlns="">
<root TableId="102">
<EmbProd.MonthFee Operator="DBLMUL" Target="CUSTOM.EPTermFee.ADD" Source="XPATH://PaySeries[1]/TermLength" DFValue="300">300</EmbProd.MonthFee>
<EP.GenericCost Target="COST">114</EP.GenericCost>
<root TableId="102" GroupText="11" GroupText0="7">
<EP.TermCount Target="DBLMUL">13</EP.TermCount>
<root TableId="102" GroupText="210" GroupText0="210">
SELECT tab.col.value('(Flag)[1]', 'nvarchar(max)') AS Flag
,tab.col.value('(Data/root/EmbProd.MonthFee)[1]', 'nvarchar(max)') AS Value
,tab.col.value('(ID)[1]', 'nvarchar(max)') AS Product
FROM #xml.nodes('/DocumentXML//LoadApplicationResult/Application/EmbeddedProductList/EmbeddedProduct') AS Tab(col)
The expected output should look like this:
| Flag | Value | Product |
| false | | 12 |
| false | 200 | 30 |
| true | 300 | 16 |
You need to specify namespace
SELECT tab.col.value('(x:Flag)[1]', 'nvarchar(max)') AS Flag
,tab.col.value('(x:Data/root/root/EmbProd.MonthFee)[1]', 'nvarchar(max)') AS Value
,tab.col.value('(x:ID)[1]', 'nvarchar(max)') AS Product
FROM #xml.nodes('DocumentXML/x:LoadApplicationResult/x:Application/x:EmbeddedProductList/x:EmbeddedProduct') AS Tab(col);