XPath using Get data from XML - Pentaho - pentaho

I am calling Xero's API and then, using Get data from XML step. How can I extract Depreciation Expense - 218.8? I've tried /Rows/Row/Cells/Cell/Attributes/. and Rows/Row/Cells/Cell/Value - among other options but they didn't work. And another question, if I have multiple accounts and I need to extract exactly 'Depreciation Expense', I've tried playing with [] to extract Nth element but somehow it didn't work. Is it Pentaho specifics?
<RowType>Section</RowType>
<Title>Less Operating Expenses</Title>
<Rows>
<Row>
<RowType>Row</RowType>
<Cells>
<Cell>
<Value>Depreciation Expense</Value>
<Attributes>
<Attribute>
<Value>f14d778f842543feafca2fdcf0437cf7</Value>
<Id>account</Id>
</Attribute>
<Attribute>
<Value>f14d778f842543feafca2fdcf0437cf7</Value>
<Id>groupID</Id>
</Attribute>
</Attributes>
</Cell>
<Cell>
<Value>218.16</Value>
<Attributes>
<Attribute>
<Value>f14d778f842543feafca2fdcf0437cf7</Value>
<Id>account</Id>
</Attribute>
<Attribute>
<Value>f14d778f842543feafca2fdcf0437cf7</Value>
<Id>groupID</Id>
</Attribute>
</Attributes>
</Cell>
</Cells>
</Row>

With complex XML structures like this one, it's often best to use nested Get Data from XML steps in Pentaho.
In your sample (which misses a root element and closing /rows btw) it looks like the XML represents an Excel-like "rows with cells" structure. The cells likely belong to a column depending on their order. For this answer, I'll assume this order is indeed fixed in the XML and there are no missing cells. Verify that!
The first XML step should extract each XML "row" into a Pentaho row and give back the XML node, not just a value. For that, you can use the Loop XPath setting /YourRoot/Rows/Row and get a field with XPath "Cells" and Result type "Single node". Including a rownum field might be nice, select that option if you need it.
The second XML step can then use the output field from the first, extracting from Loop XPath /Cells/Cell and getting all the fields you need using the Get Fields button.
Once you have the fields, use a Select Values step to remove the original XML fields, then use a Row Flattener (only works for fixed Cell order).

Related

Extracting Text Values from XML in SQL

I'm working with SQL data hosted by a 3rd party, and am trying to pull some specific information for reporting. However, some of what I need to parse out is in XML format, and I'm getting stuck. I'm looking for the syntax to pull the text= values only from the XML code.
I believe the code should something like this, but most of the examples I can find online involve simpler XML hierarchy's than what I'm dealing with.
<[columnnameXML].value('(/RelatedValueListBO/Items/RelatedValueListBOItem/text())[1]','varchar(max)')>
Fails to pull any results. I've tried declaring the spacenames as well, but again...I only ever end up with NULL values pulled.
Example XML I'm dealing with:
<RelatedValueListBO xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://tempuri.org/RelatedValueListBOSchema.xsd">
<Items>
<RelatedValueListBOItem groupKey="Response1" text="Response1" selected="true" />
<RelatedValueListBOItem groupKey="Response2" text="Response2" selected="true" />
<RelatedValueListBOItem groupKey="Response3" text="Response3" selected="true" />
</Items>
</RelatedValueListBO>
Ideally I'd like to pull response1; response2; response3 into a single column. Allowing for the fact that multiple responses may exist. I believe I'm getting stuck with the basic code I've been trying due to the namespaces associated to RelatedValueListBO and the fact that what I want is grouped in groupKey, text, and selected, instead of the value I want just being under the Items node.
You have the namespaces defined in your XML, so you need to define them in the XQuery too.
Fast and dirty method is to replace all namespaces with a "*":
SELECT #x.value('(/*:RelatedValueListBO/*:Items/*:RelatedValueListBOItem/#text)[1]','varchar(max)')
To get all responses in a single column you can use:
SELECT
Item.Col.value('./#text','varchar(max)') X
FROM #x.nodes('/*:RelatedValueListBO/*:Items/*:RelatedValueListBOItem') AS Item(Col)
If you need a better performance, you may need to define namespaces properly.
You can use something like this to extract the value of "text" in the first node of RelatedValueListBOItem
SELECT extractvalue(value(rs), '//RelatedValueListBOItem[1]/#text')
FROM TABLE (xmlsequence(extract(sys.xmltype('<RelatedValueListBO>
<Items>
<RelatedValueListBOItem groupKey="Response1" text="Response1"
selected="true" />
<RelatedValueListBOItem groupKey="Response2" text="Response2"
selected="true" />
<RelatedValueListBOItem groupKey="Response3" text="Response3"
selected="true" />
</Items>
</RelatedValueListBO>'),'/RelatedValueListBO/Items'))) rs;

Document id reference desn't work for impex

I have a problem with impex which contains document id reference.
From docs:
"Especially for importing partOf item values it is necessary to reference these items by means other than the usual unique column technique because partOf items often do not provide a unique key but only hold their enclosing parent as foreign key."
Items from *items.xml (only the most important parts)
<itemtype code="A" autocreate="true" generate="true" abstract="true"/>
<itemtype code="B" autocreate="true" generate="true" extends="A">
<deployment table="btable" typecode="20115" />
<attributes>
<attribute qualifier="code" type="java.lang.Integer" autocreate="true" generate="true">
<persistence type="property"/>
<modifiers optional="false"/>
</attribute>
</attributes>
</itemtype>
<itemtype code="C" autocreate="true" generate="true">
<deployment table="ctable" typecode="20117" />
<attributes>
<attribute qualifier="code" type="java.lang.String" autocreate="true" generate="true">
<persistence type="property"/>
<modifiers optional="false" unique="true"/>
</attribute>
<attribute qualifier="test" type="A" autocreate="true" generate="true">
<persistence type="property"/>
<modifiers optional="false" partof="true"/>
</attribute>
</attributes>
</itemtype>
Impex code:
INSERT B;code;&docIdRef
;1;docId
INSERT_UPDATE C;code[unique=true];test(&docIdRef)
;uniqueCode;docId
Error message:
cannot create C with values ItemAttributeMap[ registry: null, type: <null>, (...) due to [de.hybris.platform.servicelayer.interceptor.impl.MandatoryAttributesValidator#3b777877]:missing values for [test] in model C
When I removed 'partof' modifier from 'test' attribute (C class) everything worked fine.
I wonder how impex should looks like if i want to keep 'partof' modifier.
When you use partOf you must reference the partOf using the owner.
So it does :
INSERT B;owner(C.code);&docIdRef
;uniqueCode;docId
INSERT_UPDATE C;code[unique=true];test(&docIdRef)
;uniqueCode;docId
You don't need to assign B an identifier, you just need to reference the owner.
If you know for sure that your data is correct you can use [forceWrite=true] modifier or legacy mode to skip service layer validation.
You should also make sure that this configuration is what you really need. Setting either optional to true or partOf to false or providing default value should fix the issue as well.
Since you have mentioned partof="true" you can not assign a reference of type A. You can only create a new entity.
Check the OOTB AbstractOrder2AbstractOrderEntry relationship, they have mentioned partof="true" for AbstractOrderEntry means you can't reference any other AbstractOrderEntry to Order. You can always create new entry.
Have a look at HMC site as well
You can see here there is no + Add Entry button available here. The reciprocal can be possible.

FetchXML next page results

I want to populate a grid with data from Dynamics CRM. I use fetchXML, to get for each page 10 records. I want to get to the next page, to retrieve the next 10 records. But this isn't happening, I'm using XRMToolbox to simulate the fetch query but it returns me the same results, regardless of the page attribute value.
The fetchXML query is:
<fetch version="1.0" output-format="xml-platform" mapping="logical" count="10" page="1" aggregate="true" distinct="false" >
<entity name="webpage" >
<attribute name="url" groupby="true" alias="url" />
<attribute name="webpageid" aggregate="count" alias="top" />
<order descending="true" alias="top" />
</entity>
</fetch>
If I change the page attribute value, say to 10 the response won't be different.
Can anyone help me with this?
UPDATE
After many tests with XRMToolbox I've come to conclusion that this query won't listen, whatever page I provide to it. This is because of the aggregate attribute. If I remove it and of course remove the count aggregate, then changing the page attribute will actually fetch for me the next page results.
So in summary page attribute doesn't like the aggregate attribute. Maybe this can work with paging cookies, but I haven't tested it yet, I will test it and update this post.
To implement paging you need to use not only page number/records per page attributes but paging cookie as well. This msdn article provides all code you need to implement paging.

In VersionOne REST API how does one use multiple “with” statements with multiple “Where” clauses?

With the following query:
Base-URL/rest-1.v1/Data/Epic?sel=Category.Name,Custom_RoadmapInOut&where=Number=$numbers&with=$numbers=E-05322%2CE-05280%2CE-05616%2CE-04942%2CE-04921
I am getting the following response:
<Assets total="5" pageSize="2147483647" pageStart="0">
<Asset href="End-of-Base-URL/rest-1.v1/Data/Epic/138904" id="Epic:138904">
<Attribute name="Category.Name">Business Objective</Attribute>
<Attribute name="Custom_RoadmapInOut">2</Attribute>
</Asset>
<Asset href="End-of-Base-URL/rest-1.v1/Data/Epic/139078" id="Epic:139078">
<Attribute name="Category.Name">Initiative</Attribute>
<Attribute name="Custom_RoadmapInOut">1</Attribute>
</Asset>
<Asset href="End-of-Base-URL/rest-1.v1/Data/Epic/147147" id="Epic:147147">
<Attribute name="Category.Name">Parent Story</Attribute>
<Attribute name="Custom_RoadmapInOut"/>
</Asset>
<Asset href="End-of-Base-URL/rest-1.v1/Data/Epic/148702" id="Epic:148702">
<Attribute name="Category.Name">Parent Story</Attribute>
<Attribute name="Custom_RoadmapInOut"/>
</Asset>
<Asset href="End-of-Base-URL/rest-1.v1/Data/Epic/156961" id="Epic:156961">
<Attribute name="Category.Name">Milestone</Attribute>
<Attribute name="Custom_RoadmapInOut"/>
</Asset>
</Assets>
I want to limit the results to only return those assets that have a "Category.Name" of either "Business Objective" or "Initiative" and from those types to only return the ones that have a "Custom_RoadmapInOut" set to between 1 and 99.
What do I need to add to the query to have VersionOne do the heavy lifting and return only the desired items?
I am thinking I should be able to also add:
Category.Names=$names&with=$names=Business+Objective%2CInitiative
to the query and another where part to check the Custom_RoadmapInOut but I am not sure how to do this.
Currently I am making multiple queries and then using my own code to go through the results and keep only the ones that I desire to see.
Thanks for any help that can be provided.
Doug
Multiple with values can be separated by | (pipe). If you need more details than are shown in the documentation, you can try reading the grammar.
Multiple where filter tokens must be joined by logical operators. Logical and is ; (semicolon). Logical or is | (pipe). Again, the documentation can be a little sparse so you might try reading the grammar.
If you still find yourself still needing to run multiple queries to get what you need, you may find it advantageous to convert to the query.v1 endpoint.

Quickest method for matching nested XML data against database table structure

I have an application which creates datarequests which can be quite complex. These need to be stored in the database as tables. An outline of a datarequest (as XML) would be...
<datarequest>
<datatask view="vw_ContractData" db="reporting" index="1">
<datefilter modifier="w0">
<filter index="1" datatype="d" column="Contract Date" param1="2009-10-19 12:00:00" param2="2012-09-27 12:00:00" daterange="" operation="Between" />
</datefilter>
<filters>
<alternation index="1">
<filter index="1" datatype="t" column="Department" param1="Stock" param2="" operation="Equals" />
</alternation>
<alternation index="2">
<filter index="1" datatype="t" column="Department" param1="HR" param2="" operation="Equals" />
</alternation>
</filters>
<series column="Turnaround" aggregate="avg" split="0" splitfield="" index="1">
<filters />
</series>
<series column="Requested 3" aggregate="avg" split="0" splitfield="" index="2">
<filters>
<alternation index="1">
<filter index="1" datatype="t" column="Worker" param1="Malcom" param2="" operation="Equals" />
</alternation>
</filters>
</series>
<series column="Requested 2" aggregate="avg" split="0" splitfield="" index="3">
<filters />
</series>
<series column="Reqested" aggregate="avg" split="0" splitfield="" index="4">
<filters />
</series>
</datatask>
</datarequest>
This encodes a datarequest comprising a daterange, main filters, series and series filters. Basically any element which has the index attribute can occur multiple times within its parent element - the exception to this being the filter within datefilter.
But the structure of this is kind of academic, the problem is more fundamental:
When a request comes through, XML like this is sent to SQLServer as a parameter to a stored proc. This XML is shredded into a de-normalised table and then written iteratively to normalised tables such as tblDataRequest (DataRequestID PK), tblDataTask, tblFilter, tblSeries. This is fine.
The problem occurs when I want to match a given XML defintion with one already held in the DB. I currently do this by...
Shredding the XML into a de-normalised table
Using a CTE to pull all the existing data in the database into that same de-normalised form
Matching using a huge WHERE condition (34 lines long)
..This will return me any DataRequestID which exactly matches the XML given. I fear that this method will end up being painfully slow - partly because I don't believe the CTE will do any clever filtering, it will pull all the data every single time before applying the huge WHERE.
I have thought there must be better solutions to this eg
When storing a datarequest, also store a hash of the datarequest somehow and simply match on that. In the case of collision, use the current method. I wanted however to do this using set-logic. And also, I'm concerned about irrelevant small differences in the XML changing the hash - spurious spaces etc.
Somehow perform the matching iteratively from the bottom up. Eg produce a list of filters which match on the lowest level. Use this as part of an IN to match Series. Use this as part of an IN to match DataTasks etc etc. The trouble is, I start to black-out when I think about this for too long.
Basically - Has anyone ever encountered this kind of problem before (they must have). And what would be the recommended route for tackling it? example (pseudo)code would be great :)
To get rid of the possibility of minor variances, I'd run the request through an XML transform (XSLT).
Alternatively, since you've already got the code to parse this out into a denormalized staging table that's fine too. I would then simply using FOR XML to create a new XML doc.
Your goal here is to create a standardized XML document that respects ordering where appropriate and removes inconsistencies where it is not.
Once that is done, store this in a new table. Now you can run a direct comparison of the "standardized" request XML against existing data.
To do the actual comparison, you can use a hash, store the XML as a string and do a direct string comparison, or do a full XML comparison like this: http://beyondrelational.com/modules/2/blogs/28/posts/10317/xquery-lab-36-writing-a-tsql-function-to-compare-two-xml-values-part-2.aspx
My preference, as long as the XML is never over 8000bytes, would be to create a unique string (either VARCHAR(8000) or NVARCHAR(4000) if you have special character support) and create a unique index on the column.