How to get Xpath Value in Hive

How to get Xpath Value in Hive - hive

I wanted to get the value of Hindi & English as an arry from the below xml using XPATH in Hive.
<employees>
<employee>
<name>Ranjith</name>
<language emp:langCode="HI">Hindi</language>
<city emp:country="india">Delhi</city>
<employee>
<employee>
<name>John</name>
<language emp:langCode="EN">English</language>
<city emp:country="america">Sunnyvale</city>
<employee>
</employees>
Can anyone help??
I have tried few options, but not works.

Generic xpath would be:
/employees/employee/language
But your xml has some errors in it. Here is the updated xml that works:
<employees>
<employee>
<name>Ranjith</name>
<language langCode="HI">Hindi</language>
<city country="india">Delhi</city>
</employee>
<employee>
<name>John</name>
<language langCode="EN">English</language>
<city country="america">Sunnyvale</city>
</employee>
</employees>
And here is the result using the shown xpath
Element='<language langCode="HI">Hindi</language>'
Element='<language langCode="EN">English</language>'

Related

Differentiate between similar records

I have some input XML that is auto generated(so I am unable to rename the fields accordingly):
<?xml version="1.0" encoding="UTF-8"?>
<csv-xml>
<record>
<csv-field-1>1</csv-field-1>
<csv-field-2>12345</csv-field-2>
<csv-field-3>7654321</csv-field-3>
<csv-field-4>1</csv-field-4>
<csv-field-5>08/08/19</csv-field-5>
<csv-field-6>08/08/19</csv-field-6>
</record>
<record>
<csv-field-1>2</csv-field-1>
<csv-field-2>12345</csv-field-2>
<csv-field-3>12345678</csv-field-3>
<csv-field-4>3</csv-field-4>
</record>
<record>
<csv-field-1>2</csv-field-1>
<csv-field-2>12345</csv-field-2>
<csv-field-3>22345679</csv-field-3>
<csv-field-4>7</csv-field-4>
</record>
<record>
<csv-field-1>2</csv-field-1>
<csv-field-2>12345</csv-field-2>
<csv-field-3>32345680</csv-field-3>
<csv-field-4>6</csv-field-4>
</record>
<record>
<csv-field-1>2</csv-field-1>
<csv-field-2>12345</csv-field-2>
<csv-field-3>42345681</csv-field-3>
<csv-field-4>2</csv-field-4>
</record>
<record>
<csv-field-1>3</csv-field-1>
<csv-field-2>12345</csv-field-2>
<csv-field-3></csv-field-3>
</record>
</csv-xml>
I am trying to figure out how to use an XSLT transformation to take out the data I need when records have the same path/name.
I have tried using:
<xsl:copy-of select="/csv-xml/record/csv-field-2/node()"/>
But the output is:
1234512345123451234512345
Code Used:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.1"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="text"
media-type="text/plain"/>
<xsl:template match="/">
<xsl:copy-of select="/csv-xml/record/csv-field-2/node()"/>
</xsl:template>
</xsl:stylesheet>
My expected result would be from the first 'csv-field-2' something like:
<name>12345</name>
My final goal is to be able to extract all the data needed from these XML's that may have more or less records using the same script. But that's a future problem.

Concatenate XML in BPEL 2.0

Need your help for a requirement in BPEL 2.0. I have a collection in the below format
<FilesCollection>
<Files>
<transactionid>
<status>
<filename>
<Files>
<FilesCollection>
I would be getting several such collections while traversing through a ForEach loop.
Once I have exited the loop , I need to concatenate all the collections so that finally I get something as below
<FilesCollection>
<Files>
<transactionid>
<status>
<filename>
<Files>
<Files>
<transactionid>
<status>
<filename>
<Files>
<Files>
<transactionid>
<status>
<filename>
<Files>
<FilesCollection>
Please note that the number of FilesCollection element and the number of Files element appearing within it would be dynamic.
Please help me with this.
Thanks
Arijit

As i understand you have multiple FilesCollection in XML document and you want to wrap inside one then you need to do something like this:
Note: Suppose your root element is root in source XML
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs" version="1.0">
<xsl:template match="root">
<root>
<FilesCollection>
<xsl:copy-of select="FilesCollection/node()"/>
</FilesCollection>
</root>
</xsl:template>
</xsl:stylesheet>

how to select preceding nodes within a specific parent

Given the following xml
<Root>
<Employee>
<service>
<Record>xxx</Record>
<Record>yyy</Record>
</service>
<service>
<Record>xxx</Record>
<Record>yyy</Record>
<Record>zzz</Record>
</service>
</Employee>
<Employee>
<service>
<Record>xxx</Record>
<Record>yyy</Record>
</service>
<service>
<Record>xxx</Record>
<Record>yyy</Record>
<Record>zzz</Record>
</service>
</Employee>
</Root>
Using XSLT1.0, while transforming the xml for each <Employee> the <Record> field containing 'xxx','yyy','zzz' should occur only once in the result
<Root>
<Employee>
<Service>
<Record>xxx</Record>
<Record>yyy</Record>
<Record>zzz</Record>
</Service>
</Employee>
<Employee>
<Service>
<Record>xxx</Record>
<Record>yyy</Record>
<Record>zzz</Record>
<Service>
</Employee>
</Root>
In a for-each loop of Employee I tired using <xsl:if test='not(preceding::./service/Record=$record)'>. The test works fine for the first <Employee> taking the <Record> for 'xxx','yyy','zzz' only once. When the iteration goes to the next <Employee> the test condition checks for the <Record>values in the first<Employee> also and it finds preceding nodes already exisitng with the values'xxx','yyy','zzz', hence i am not able to get the records for the second <Employee>.
How to get the <Record>s in the second <Employee> . Any help is much appreciated.
Thanks

This transformation uses the Muenchian method for grouping:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kEmpRecordByVal" match="Employee/service/Record"
use="concat(generate-id(../..), '+', .)"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Employee">
<Employee>
<xsl:apply-templates select=
"service/Record
[generate-id()
=
generate-id(key('kEmpRecordByVal',
concat(generate-id(../..), '+', .)
)[1]
)
]
"/>
</Employee>
</xsl:template>
</xsl:stylesheet>
when applied on the following XML document (the provided one with different values for the second Employee to aid readability):
<Root>
<Employee>
<service>
<Record>xxx</Record>
<Record>yyy</Record>
</service>
<service>
<Record>xxx</Record>
<Record>yyy</Record>
<Record>zzz</Record>
</service>
</Employee>
<Employee>
<service>
<Record>aaa</Record>
<Record>bbb</Record>
</service>
<service>
<Record>aaa</Record>
<Record>bbb</Record>
<Record>ccc</Record>
</service>
</Employee>
</Root>
the wanted, correct result is produced:
<Root>
<Employee>
<Record>xxx</Record>
<Record>yyy</Record>
<Record>zzz</Record>
</Employee>
<Employee>
<Record>aaa</Record>
<Record>bbb</Record>
<Record>ccc</Record>
</Employee>
</Root>

Piwik statitics about all websites

Is it possible to use the Piwik-API with all Websites, not just for a single one?
What i want to do is get a mean value of used browsers. I can do this for a single website like this:
?module=API&method=UserSettings.getBrowser&idSite=1&period=day&date=last10&format=rss
If i just remove idSite=1 i get an error.

You can specify all sites using idSite=all, you can also specify multiple sites by separating the ids with commas idSite=1,2,4,5.
The resulting output is given per idSite wrapped in an extra <results> tag, so whereas before you had
<result>
<row>
<label>Chrome 14.0</label>
<nb_uniq_visitors>13</nb_uniq_visitors>
...
</row>
<row>
<label>Chrome 13.0</label>
<nb_uniq_visitors>13</nb_uniq_visitors>
...
</row>
...
</result>
You now get
<results>
<result idSite="2">
<row>
<label>Chrome 14.0</label>
<nb_uniq_visitors>13</nb_uniq_visitors>
...
</row>
<row>
<label>Chrome 13.0</label>
<nb_uniq_visitors>13</nb_uniq_visitors>
...
</row>
...
</result>
<result idSite="3">
<row>
<label>Chrome 14.0</label>
<nb_uniq_visitors>13</nb_uniq_visitors>
...
</row>
<row>
<label>Chrome 13.0</label>
<nb_uniq_visitors>13</nb_uniq_visitors>
...
</row>
...
</result>
...
</results>
This does mean that any aggregating for your mean value will have to be done once you get the data but this should be relatively trivial.

Oracle SQL*Loader getting CDATA values

Anybody knows how to do this? I know there's a better way of loading XML data to Oracle without using SQL*Loader, but I'm just curious on how this is done using it. I have already a code that can load XML data to the DB, however, it wont run if the XML file has values that contain a CDATA...
Below is the control file code which works if the values are not CDATA...
LOAD DATA
INFILE FRATS.xml "str '</ROW>'"
APPEND
INTO TABLE "FRATERNITIES"
(
DUMMY FILLER TERMINATED BY "<ROW>",
THE_CODE SEQUENCE (MAX, 1),
DUMMY2 FILLER TERMINATED BY "</COLUMN>",
STORE_NN_KJ ENCLOSED BY '<COLUMN NAME="THE_NAME">' AND '</COLUMN>',
STAFF_COUNT ENCLOSED BY '<COLUMN NAME="THE_COUNT">' AND '</COLUMN>'
)
Here's the XML file:
<?xml version='1.0' encoding='MS932' ?>
<RESULTS>
<ROW>
<COLUMN NAME="THE_CODE">777</COLUMN>
<COLUMN NAME="THE_NAME">CharlieOscarDelta</COLUMN>
<COLUMN NAME="THE_COUNT">24</COLUMN>
</ROW>
</RESULTS>
Here's the XML file with CDATA values. My control file will not run with it...:
<?xml version='1.0' encoding='MS932' ?>
<RESULTS>
<ROW>
<COLUMN NAME="THE_CODE"><![CDATA[777]]></COLUMN>
<COLUMN NAME="THE_NAME"><![CDATA[CharlieOscarDelta]]></COLUMN>
<COLUMN NAME="THE_COUNT"><![CDATA[24]]></COLUMN>
</ROW>
</RESULTS>

have you tried
STORE_NN_KJ "substr(substr(:STORE_NN_KJ,instr(:STORE_NN_KJ,'<![CDATA[')+9),0,instr(substr(:STORE_NN_KJ,instr(:STORE_NN_KJ,'<![CDATA[')+9),']]>'))" ENCLOSED BY '<COLUMN NAME="THE_NAME">' AND '</COLUMN>'
EDIT
Looks like I forgot a ).. Try this..

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to get Xpath Value in Hive - hive

Related

Differentiate between similar records

Concatenate XML in BPEL 2.0

how to select preceding nodes within a specific parent

Piwik statitics about all websites

Oracle SQL*Loader getting CDATA values

Categories

Resources