Querying a LINQ to XML feed in VB.NET - vb.net

I have the following XML which I load via XDocument.Load(uri) or XElement.Load(uri). I am having trouble getting a collection of <asset> elements via LINQ.
Here is a snippet of the XML I'm trying to query:
<assetCollection xmlns="tag:aisle7.net,2009:/api/1.0">
<title>All Assets</title>
<description>Collection containing all assets in the system</description>
<resourcePath>/us/assets/~all</resourcePath>
<link rel="self" href="http://web.aisle7.net/api/1.0/us/assets/~all?apikey=1234567890&Format=XML" />
<link rel="first" href="http://web.aisle7.net/api/1.0/us/assets/~all?apikey=1234567890&Format=XML" />
<link rel="next" href="http://web.aisle7.net/api/1.0/us/assets/~all?apikey=1234567890&Format=XML&page=2" />
<link rel="last" href="http://web.aisle7.net/api/1.0/us/assets/~all?apikey=1234567890&Format=XML&page=66" />
<updated>2011-03-01T19:01:49.667Z</updated>
<assets>
<asset>
<title>Homeopathy</title>
<resourcePath>/us/assets/toc/homeopathy</resourcePath>
<link rel="alternate" href="http://web.aisle7.net/api/1.0/us/assets/toc/homeopathy?apikey=1234567890&Format=XML" />
<updated>2011-03-01T19:01:49.667Z</updated>
</asset>
<asset>
<title>What Is Homeopathy?</title>
<resourcePath>/us/assets/generic/what-is-homeopathy_13615_1</resourcePath>
<link rel="alternate" href="http://web.aisle7.net/api/1.0/us/assets/generic/what-is-homeopathy_13615_1?apikey=1234567890&Format=XML" />
<updated>2011-03-01T19:00:17.680Z</updated>
</asset>
...
And here is the code I'm trying to use:
Dim uri As String = HttpUtility.UrlDecode(ConfigurationManager.AppSettings("Aisle7_Index_Url"))
Dim assets = (From a In XElement.Load(uri)
.Element("assets")
.Elements("asset")
Select a)
For Each asset In assets
Console.WriteLine(asset)
Next

Try
Dim assets = From a In XElement.Load(uri).Descendants("asset") Select a
or
Dim assets = From a In XDocument.Load(uri).Root.Element("assets").Elements("asset") Select a

Here's the version using xml literal syntax:
Dim xml = XElement.Load(uri)
Dim q = From a In xml.<assets>...<asset>
Select a

Related

extracting json tag value available within xml

From the below XML log, I have requirement to extract phoneNumber only <json:string name="phoneNumber">9480562628</json:string>. Can someone help in this??
<Input>
<Header>
<User-Agent>android;11;4.27.0;samsung_SM-A205F</User-Agent>
<Date>Sun, 09 Oct 2022 21:59:08 GMT</Date>
<Username />
<UserInfo />
<Location>1wIRSNscfkI0qragmfshMiG189qgAf/PumlP3DTbgN4=</Location>
<AAAUN>SFASJNF3U6375H7D1Y4XWJDZ</AAAUN>
<Authorization>WS androidDove:ri6/G20ZNX+bsNyX8GUEB4vSMS4=</Authorization>
<h>test</h>
<Accept-Language>en</Accept-Language>
<Test>false</Test>
<Content-Type>application/json; charset=UTF-8</Content-Type>
<Content-Length>75</Content-Length>
<Host>com.in</Host>
<Accept-Encoding>gzip</Accept-Encoding>
<X-Forwarded-For>0.0.0.0, 0.0.0.0</X-Forwarded-For>
<X-APIRP-ID>0.0.0.0</X-APIRP-ID>
<Via>1.1 69WC0-</Via>
<X-Client-IP>0.0.0.0.</X-Client-IP>
<X-Global-Transaction-ID>RU44D1I3S40ZBVLMQHHUZSB1QOH1HEWO700LZQLB5WR8IGZYU4</X-Global-Transaction-ID>
</Header>
<X />
<URI>/esb/crs2/public/Login</URI>
<ServiceName>PUBLICPOSTOTPLOGIN</ServiceName>
<PrimaryKey />
<Parameters>
<Parameter1 />
<Parameter2 />
<Parameter3 />
<Parameter4 />
</Parameters>
<Body>
<json:object xmlns:json="http://www.ibm.com/" xmlns:xsi="http://www.w3.org/" xsi:schemaLocation="http://www.datapower.com">
<json:string name="phoneNumber">9480562628</json:string>
</json:object>
</Body>
<standardRule>Y</standardRule>
<TRANSACTION_ID>SFASJNF3U6375H7D1Y4XWJDZ</TRANSACTION_ID>
<TRANSACTION_NAME>Login</TRANSACTION_NAME>
<JSON_Body>{ "phoneNumber":"9480562628" }</JSON_Body>
</Input>
Expected Result:
9480562628
This statement got the result as expected. :)
extractvalue(xmltype(x.xml_request), '(/Input//phoneNumber)[1]/text()')
|| ' '|| extractvalue(xmltype(x.xml_request), '(/Input/Body//*[#name="phoneNumber"])[1]/text()') AS "phoneNumber_"
Use XMLTABLE and specify the XMLNAMESPACES:
SELECT x.*
FROM table_name t
CROSS APPLY XMLTABLE(
XMLNAMESPACES('http://www.ibm.com/' AS "json", 'http://www.w3.org/' AS "xsi"),
'/Input'
PASSING XMLTYPE(t.xml)
COLUMNS
phonenumber VARCHAR2(20) PATH './Body/json:object/json:string[#name="phoneNumber"]'
) x
Which, for your sample data:
CREATE TABLE table_name (xml CLOB);
INSERT INTO table_name (xml) VALUES ('<Input>
<Header>
<User-Agent>android;11;4.27.0;samsung_SM-A205F</User-Agent>
<Date>Sun, 09 Oct 2022 21:59:08 GMT</Date>
<Username />
<UserInfo />
<Location>1wIRSNscfkI0qragmfshMiG189qgAf/PumlP3DTbgN4=</Location>
<AAAUN>SFASJNF3U6375H7D1Y4XWJDZ</AAAUN>
<Authorization>WS androidDove:ri6/G20ZNX+bsNyX8GUEB4vSMS4=</Authorization>
<h>test</h>
<Accept-Language>en</Accept-Language>
<Test>false</Test>
<Content-Type>application/json; charset=UTF-8</Content-Type>
<Content-Length>75</Content-Length>
<Host>com.in</Host>
<Accept-Encoding>gzip</Accept-Encoding>
<X-Forwarded-For>0.0.0.0, 0.0.0.0</X-Forwarded-For>
<X-APIRP-ID>0.0.0.0</X-APIRP-ID>
<Via>1.1 69WC0-</Via>
<X-Client-IP>0.0.0.0.</X-Client-IP>
<X-Global-Transaction-ID>RU44D1I3S40ZBVLMQHHUZSB1QOH1HEWO700LZQLB5WR8IGZYU4</X-Global-Transaction-ID>
</Header>
<X />
<URI>/esb/crs2/public/Login</URI>
<ServiceName>PUBLICPOSTOTPLOGIN</ServiceName>
<PrimaryKey />
<Parameters>
<Parameter1 />
<Parameter2 />
<Parameter3 />
<Parameter4 />
</Parameters>
<Body>
<json:object xmlns:json="http://www.ibm.com/" xmlns:xsi="http://www.w3.org/" xsi:schemaLocation="http://www.datapower.com">
<json:string name="phoneNumber">9480562628</json:string>
</json:object>
</Body>
<standardRule>Y</standardRule>
<TRANSACTION_ID>SFASJNF3U6375H7D1Y4XWJDZ</TRANSACTION_ID>
<TRANSACTION_NAME>Login</TRANSACTION_NAME>
<JSON_Body>{ "phoneNumber":"9480562628" }</JSON_Body>
</Input>'
);
Outputs:
PHONENUMBER
9480562628
fiddle

Get a text field formated as xml body value

I have a table where a column is stored in a xml format so it is possible do display it on other projects with formatted text.
But I need to convert it to a single line without tags.
I have tried to use value() method and nodes(), but didn't quite managed to make it work...
This is the example of the content of the column i want to format.
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <style type="text/css">p {font-family: sans-serif;font-size: 8.25pt;margin: 0px;}</style> </head> <body> <p>VALUE I WANT TO GET </p> </body> </html>
SELECT Id, Description, Value FROM MyTable
Where Value is the column with stored xml..
Is there a way to get the body content without any tags in a single line?
THE COLUMN IS NOT XML TYPE BUT VARCHAR(MAX) TYPE
If your HTML value always same format, please try following scripts together:
Converting HTML into supported format for querying with value () and node()
Declare #x nvarchar (4000) = '<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <style type="text/css">p {font-family: sans-serif;font-size: 8.25pt;margin: 0px;}</style> </head> <body> <p>VALUE I WANT TO GET </p> </body> </html>'
select #x = replace (#x,
'<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> ',
''
);
SELECT #x = REPLACE(#x, 'xmlns=', 'xmlns:i=');
--Print #x;
Actual select query on converted XML
select xmldata,
Cast (x2.c.query('data(body/p)') as nvarchar (100)) as HtmlBody
From
(select convert (xml, #x ) as xmldata) as x
cross apply xmldata.nodes('html') as x2(c)
Additional:
Querying direct table, hope this would work if you replace #temp with your table name and column names accordingly
Declare #Temp table (ID bigint, xmlvalue nvarchar(4000) );
Declare #x nvarchar (4000) = '<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <style type="text/css">p {font-family: sans-serif;font-size: 8.25pt;margin: 0px;}</style> </head> <body> <p>VALUE I WANT TO GET </p> </body> </html>';
Insert into #Temp
VALUES (101, #x);
select x.*,
Cast (x2.c.query('data(body/p)') as nvarchar (100)) as HtmlBody
From (
select ID, CAST(REPLACE(
(replace (xmlvalue,
'<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> ',
'') ),
'xmlns=',
'xmlns:i='
) AS xml) as xmldata
from #Temp
) as x
CROSS APPLY xmldata.nodes('html') as x2(c)
go
If you think it's difficult to manager in future, create scalar function with same logic - Consider recommendations in terms of performance when using custom scalar functions
You can do something similar to this:
Language is VB.Net.
Dim XMLDoc As New XmlDocument
Dim dt As DataTable = GetData() <-- GetData() is where you load the data using your query
XMLDoc.Load(dt.Rows(0).Item("Value").ToString)
Dim TextThatIWant As string = XMLDoc.SelectSingleNode("/html/body/p").InnerText
Try it like this:
DECLARE #YourValue VARCHAR(MAX)=
'<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <style type="text/css">p {font-family: sans-serif;font-size: 8.25pt;margin: 0px;}</style> </head> <body> <p>VALUE I WANT TO GET </p> </body> </html>';
WITH XMLNAMESPACES(DEFAULT 'http://www.w3.org/1999/xhtml')
SELECT CONVERT(xml,#YourValue,2).value('(/html/body/p/text())[1]','varchar(1000)');
The idea in short:
We can not CAST() the string to XML, due to the <!DOCTYPE>, but we can use CONVERT() with the parameter 2. This will return your string as XML.
Against the XML we can use .value()
As your XML declares a default namespace we declare this namespace using WITH XMLNAMESPACES
Attention: HTML is far not as strict as XML. If you cannot be sure, that your HTML is XHTML actually (which means, that it follows the much stricter rules of XML), it can be dangerous to rely on XML methods. Luckily the namespace points to xhtml...
If not, this might work in all your tests, but break in production at any time...

Get attributes from XML with more than one namespaces

I have this XML:
<?xml version="1.0" encoding="UTF-8" ?>
<cfdi:Comprobante xmlns:cfdi="http://www.sat.gob.mx/cfd/3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Version="3.3" Serie="NG" Folio="55" Fecha="2017-08-02T20:08:58" FormaPago="99" SubTotal="5861.73" Descuento="778.38" Moneda="MXN" Total="5083.35" TipoDeComprobante="N" MetodoPago="PUE" LugarExpedicion="08400" Sello="yAfYBR0/bvvFgq/hNL+DSTTPt+kNtsE3DzxagXvG0M/alfaUrjp73IySCEBHIeo4nNF4uoscqnRoIowwfQPnNwC4LL6iD77rfykF2hq+i6VzqAlvbx0aawZUAcJpjNVWyS3zjfa3rYeU1TpNBrhSuU8r+4BoBz3jr1pB7yyCIm4JbwzNqy0TvTjD1XXnpy6v74+eqIoZqWZoi3CyiCqMS2B3FEfDMiXTfVlZ/3/evLYc5WvFPpBsm61I+SBID/rHhJvLQLjJDUX7Myt177N41xptITkIKQCJAIV6XN7mRGJTHKA3h3F7tSZaei8+LeONO9ZtlUZGw4dzLhrpNk/tuA==" xsi:schemaLocation="http://www.sat.gob.mx/cfd/3 http://www.sat.gob.mx/sitio_internet/cfd/3/cfdv33.xsd">
<cfdi:Emisor Rfc="BBBBBB" Nombre="S.A. de C.V." RegimenFiscal="601" />
<cfdi:Receptor Rfc="AAAAAAAA" Nombre="AZUCENA SAN " UsoCFDI="P01" />
<cfdi:Conceptos>
<cfdi:Concepto ClaveProdServ="84111505" Cantidad="1" ClaveUnidad="ACT" Descripcion="Pago de nómina" ValorUnitario="5861.73" Importe="5861.73" Descuento="778.38" />
</cfdi:Conceptos>
<cfdi:Complemento>
<nomina12:Nomina xmlns:nomina12="http://www.sat.gob.mx/nomina12" Version="1.2" TipoNomina="O" FechaPago="2017-07-16" FechaInicialPago="2017-07-01" FechaFinalPago="2017-08-15" NumDiasPagados="15.000" TotalPercepciones="5861.73" TotalDeducciones="778.38" xsi:schemaLocation="http://www.sat.gob.mx/nomina12 http://www.sat.gob.mx/sitio_internet/cfd/nomina/nomina12.xsd">
<nomina12:Emisor RegistroPatronal="Y6844621109" />
<nomina12:Receptor Curp="SASA850203MDFNNZ06" NumSeguridadSocial="39058519115" FechaInicioRelLaboral="2012-09-14" Antigüedad="P254W" TipoContrato="01" Sindicalizado="No" TipoJornada="03" TipoRegimen="02" NumEmpleado="073" Departamento="VENTAS" Puesto="GERENTE TIENDAS" RiesgoPuesto="1" PeriodicidadPago="04" SalarioBaseCotApor="183.33" SalarioDiarioIntegrado="356.57" ClaveEntFed="DIF" />
<nomina12:Percepciones TotalSueldos="5861.73" TotalGravado="5089.21" TotalExento="772.52">
<nomina12:Percepcion TipoPercepcion="001" Clave="0001" Concepto="Sueldos" ImporteGravado="1026.69" ImporteExento="0.00" />
<nomina12:Percepcion TipoPercepcion="020" Clave="0008" Concepto="Prima dominical" ImporteGravado="0" ImporteExento="78.36" />
<nomina12:Percepcion TipoPercepcion="001" Clave="0020" Concepto="Vacaciones" ImporteGravado="1466.69" ImporteExento="0" />
<nomina12:Percepcion TipoPercepcion="021" Clave="0009" Concepto="Prima Vacacional" ImporteGravado="0" ImporteExento="458.33" />
<nomina12:Percepcion TipoPercepcion="038" Clave="0022" Concepto="Bonos" ImporteGravado="1000" ImporteExento="0" />
<nomina12:Percepcion TipoPercepcion="028" Clave="0010" Concepto="Comisiones" ImporteGravado="1360" ImporteExento="0" />
<nomina12:Percepcion TipoPercepcion="019" Clave="0007" Concepto="Horas extras" ImporteGravado="235.83" ImporteExento="235.83">
<nomina12:HorasExtra Dias="2" TipoHoras="01" HorasExtra="2" ImportePagado="235.83" />
<nomina12:HorasExtra Dias="1" TipoHoras="02" HorasExtra="1" ImportePagado="235.83" />
</nomina12:Percepcion>
</nomina12:Percepciones>
<nomina12:Deducciones TotalOtrasDeducciones="122.73" TotalImpuestosRetenidos="655.65">
<nomina12:Deduccion TipoDeduccion="002" Clave="0013" Concepto="ISR" Importe="655.65" />
<nomina12:Deduccion TipoDeduccion="001" Clave="0012" Concepto="IMSS" Importe="122.73" />
</nomina12:Deducciones>
</nomina12:Nomina>
<tfd:TimbreFiscalDigital xmlns:tfd="http://www.sat.gob.mx/TimbreFiscalDigital" xsi:schemaLocation="http://www.sat.gob.mx/TimbreFiscalDigital http://www.sat.gob.mx/sitio_internet/cfd/timbrefiscaldigital/TimbreFiscalDigitalv11.xsd" Version="1.1" UUID="2A73927D-7E57-40C7-9A8B-2E062560AC9E" FechaTimbrado="2017-08-02T19:52:07" SelloCFD="yAfYBR0/bvvFgq/hNL+DSTTPt+kNtsE3DzxagXvG0M/alfaUrjp73IySCEBHIeo4nNF4uoscqnRoIowwfQPnNwC4LL6iD77rfykF2hq+i6VzqAlvbx0aawZUAcJpjNVWyS3zjfa3rYeU1TpNBrhSuU8r+4BoBz3jr1pB7yyCIm4JbwzNqy0TvTjD1XXnpy6v74+eqIoZqWZoi3CyiCqMS2B3FEfDMiXTfVlZ/3/evLYc5WvFPpBsm61I+SBID/rHhJvLQLjJDUX7Myt177N41xptITkIKQCJAIV6XN7mRGJTHKA3h3F7tSZaei8+LeONO9ZtlUZGw4dzLhrpNk/tuA==" NoCertificadoSAT="20001000000300022323" SelloSAT="FXfgRBNhWla+53sM4eGnMFbbmQFb6EIaWMt3GMaqXJn9XPxQ5DVNuz7oTJ+yUZV0ObM5myqzzsI4Zvx3g==" RfcProvCertif="EME000602QR9" />
</cfdi:Complemento>
</cfdi:Comprobante>
And I need to get the values of some attributes, I can get attributes from the node cfdi:Comprobante, but when I try to read an attribute from tfd:TimbreFiscalDigital I get object reference not set to an instance of an object.
Here is the code I'm using:
Dim doc As XmlDocument = New XmlDocument()
doc.Load("C:\Users\s_osr\Desktop\EjemplosCFDINomina12\HorasExtras\SIGN_XML_COMPROBANTE_3_0.xml")
Dim managerdoc As XmlNamespaceManager = New XmlNamespaceManager(doc.NameTable)
managerdoc.AddNamespace("cfdi", doc.DocumentElement.NamespaceURI)
managerdoc.AddNamespace("tfd", doc.DocumentElement.NamespaceURI)
Dim Sello As String = doc.SelectSingleNode("/cfdi:Comprobante/#Sello", managerdoc).InnerText
Dim SelloSat As String = doc.SelectSingleNode("/cfdi:Comprobante/cfdi:Complemento/tfd:TimbreFiscalDigital/#SelloSAT", managerdoc).InnerText
Sello is working, but I can´t get SelloSAT.
UPDATE:
Thanks to #Kaiido for the solution, the uri string for the second namespace:
managerdoc.AddNamespace("tfd", "http://www.sat.gob.mx/TimbreFiscalDigital")

inverse result of query_to_xml() PostgreSQL?

I am using a servlet(servlet1) via a GET method from httpurlconnection() function in order to get the XML generated by an another servlet(servlet2) connected to a postgreSQL Database.
The XML generated by the servlet2 via the statement:
select query_to_xml('select * from test', true, false, '');
I initialize an httpurlconnection() from servlet1 to servlet2 and I stored in a string "response1" the output like that:
String inputLine;
StringBuffer response1 = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response1.append(inputLine);
}
in.close();
//The output in console
System.out.println(response1.toString());
The XML is stored in the string "response1" as:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<link type="text/css" rel="stylesheet" href="/pro/inc/form.css;jsessionid=D965FA3338EC4E88F6F7AA0B64308446" />
</head>
<body>
<p><alltests_00_with_defects xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<n5>a</n5>
<n4>b</n4>
<n3>c</n3>
<n2>d</n2>
<n1>e</n1>
<row>
<row>
<n5>a</n5>
<n4>b</n4>
<n3>c</n3>
<n2>d</n2>
<n1>e</n1>
<row>
</html>
Until here, it's cool!!
Now, I am looking for a solution that can convert BACK the XML generated to data values.
Any ideas?

Tips for finding prefixed tags in python lxml?

I am trying to using lxml's ElementTree etree to find a specific tag in my xml document.
The tag looks as follows:
<text:ageInformation>
<text:statedAge>12</text:statedAge>
</text:ageInformation>
I was hoping to use etree.find('text:statedAge'), but that method does not like 'text' prefix.
It mentions that I should add 'text' to the prefix map, but I am not certain how to do it. Any tips?
Edit:
I want to be able to write to the hr4e prefixed tags.
Here are the important parts of the document:
<?xml version="1.0" encoding="utf-8"?>
<greenCCD xmlns="AlschulerAssociates::GreenCDA" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hr4e="hr4e::patientdata" xsi:schemaLocation="AlschulerAssociates::GreenCDA green_ccd.xsd">
<header>
<documentID root="18c41e51-5f4d-4d15-993e-2a932fed720a" />
<title>Health Records for Everyone Continuity of Care Document</title>
<version>
<number>1</number>
</version>
<confidentiality codeSystem="2.16.840.1.113883.5.25" code="N" />
<documentTimestamp value="201105300211+0800" />
<personalInformation>
<patientInformation>
<personID root="2.16.840.1.113883.3.881.PI13023911" />
<personAddress>
<streetAddressLine nullFlavor="NI" />
<city>Santa Cruz</city>
<state nullFlavor="NI" />
<postalCode nullFlavor="NI" />
</personAddress>
<personPhone nullFlavor="NI" />
<personInformation>
<personName>
<given>Benjamin</given>
<family>Keidan</family>
</personName>
<gender codeSystem="2.16.840.1.113883.5.1" code="M" />
<personDateOfBirth value="NI" />
<hr4e:ageInformation>
<hr4e:statedAge>9424</hr4e:statedAge>
<hr4e:estimatedAge>0912</hr4e:estimatedAge>
<hr4e:yearInSchool>1</hr4e:yearInSchool>
<hr4e:statusInSchool>attending</hr4e:statusInSchool>
</hr4e:ageInformation>
</personInformation>
<hr4e:livingSituation>
<hr4e:homeVillage>Putney</hr4e:homeVillage>
<hr4e:tribe>Oromo</hr4e:tribe>
</hr4e:livingSituation>
</patientInformation>
</personalInformation>
The namespace prefix must be declared (mapped to an URI) in the XML document. Then you can use the {URI}localname notation to find text:statedAge and other elements. Something like this:
from lxml import etree
XML = """
<root xmlns:text="http://example.com">
<text:ageInformation>
<text:statedAge>12</text:statedAge>
</text:ageInformation>
</root>"""
root = etree.fromstring(XML)
ageinfo = root.find("{http://example.com}ageInformation")
age = ageinfo.find("{http://example.com}statedAge")
print age.text
This will print "12".
Another way of doing it:
ageinfo = root.find("text:ageInformation",
namespaces={"text": "http://example.com"})
age = ageinfo.find("text:statedAge",
namespaces={"text": "http://example.com"})
print age.text
You can also use XPath:
age = root.xpath("//text:statedAge",
namespaces={"text": "http://example.com"})[0]
print age.text
I ended up having to use nested prefixes:
from lxml import etree
XML = """
<greenCCD xmlns="AlschulerAssociates::GreenCDA" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hr4e="hr4e::patientdata" xsi:schemaLocation="AlschulerAssociates::GreenCDA green_ccd.xsd">
<personInformation>
<hr4e:ageInformation>
<hr4e:statedAge>12</hr4e:statedAge>
</hr4e:ageInformation>
</personInformation>
</greenCCD>"""
root = etree.fromstring(XML)
#root = etree.parse("hr4e_patient.xml")
ageinfo = root.find("{AlschulerAssociates::GreenCDA}personInformation/{hr4e::patientdata}ageInformation")
age = ageinfo.find("{hr4e::patientdata}statedAge")
print age.text