I have a transformation that has a complex Javascript transformation in the middle of it. Within this Javascript I need to send an ID to a lookup step elsewhere in the transformation. Ie:
Is there a way to send an ID from Javascript in the "Modified Java Script Value" step to the XMLLookup Step to obtain a result from that step within that flow the transformation?
Related
I am trying to develop a reusable component in Pentaho which will take an Excel file and convert it to a CSV with an encoding option.
In short, I need to develop a transformation that has an Excel input and a CSV output.
I don't know the columns in advance. The columns have to be dynamically injected to the excel input.
That's a perfect candidate for Pentaho Metadata Injection.
You should have a template transformation wich contains the basic workflow (read from the excel, write to the text file), but without specifiying the input and/or output formats. Then, you should store your metadata (the list of columns and their properties) somewhere. In Pentaho example an excel spreadsheet is used, but you're not limited to that. I've used a couple of database tables to store the metadata for example, one for the input format and another one for the output format.
Also, you need to have a transformation that has the Metadata Injection step to "inject" the metadata into the template transformation. What it basically does, is to create a new transformation at runtime, by using the template and the fields you set to be populated, and then it runs it.
Pentaho's example is pretty clear if you follow it step by step, and from that you can then create a more elaborated solution.
You'll need at least two steps in a transformation:
Input step: Microsoft Excel input
Output step: Text file output
So, Here is the solution. In your Excel Input Component, in Fields Section, mention maximum number of fields which will come in any excel. Then Route the Input excel to text field based on the Number of fields which are actually present. You need to play switch/case component here.
I have an excel with 300 rows. I need to use each of these rows as a field name in a transformation.
I was thinking of creating a job that for each row of a table sets a variable that I use afterwards on my transformation.
I tried defining a variable as the value I have in one row and the transformation works. Now I need a loop that gets value after value and redefines the variable I created then executes the transformation.
I tried to define a Job that has the following:
Start -> Transformation(ExcelFileCopyRowsToResult) -> SetVariables -> Transformation(The transf that executes using whatever the variable name is at the moment).
The problem is that the variable I defined never changes and the transformation result is always the same because of that.
Executing a transformation for each row in a result set is a standard way of doing things in PDI. You have most of it correct, but instead of setting a variable (which only happens once in the job flow), use the result rows directly.
First, configure the second transformation to Execute for each row in the Edit window.
You can then use one of two ways to pass the fields into the transformation, depending on which is easier for you:
Start the transformation with a get rows from result. This should get you one row each time. The fields will be in stream directly and can be used as such.
Pass the fields as parameters, so they can be used like variables. I use this one more often, but it takes a bit more setup.
Inside the second transformation, go to the properties and enter variable names you want in the Parameters tab.
Save the transformation.
In the job, open the transformation edit window and go to Parameters.
Click Get Parameters.
Type the field name from the first transformation under Stream Column Name for each parameter.
I am researching standard sample from Pentaho DI package: GetXMLData - Read parent children rows. It reads separately from same XML input parent rows & children rows. I need to do the same and update two different sheets of the same MS Excel Documents.
My understanding is that normal way to achieve it is to put first sequence in one transformation file with XML Output or Writer, second to the second one & at the end create job with chain from start, through 1st & 2nd transformations.
My problems are:
When I try to chain above sequences I loose content of first updated Excel sheet in the final document;
I need to have at the end just one file with either Job or Transformation without dependencies (In case of above proposed scenario I would have 1 KJB job + 2 KTR transformation files).
Questions are:
Is it possible to join 2 sequences from above sample with some wait node before starting update 2nd Excel sheet?
If above doesn't work: Is it possible to embed transformations to the job instead of referencing them from external files?
And extra question: What is better to use: Excel Output or Excel Writer?
=================
UPDATE:
Based on #AlainD proposal I have tried to put Block node in-between. Here is a result:
Looks like Block step can be an option, but somehow it doesn't work as expected with Excel Output / Writers node (or I do something wrong). What I have observed is that Pentaho tries to execute next after Block steps before Excel file is closed properly by the previous step. That leads to one of the following: I either get Excel file with one empty sheet or generated result file is malformed.
My input XML file (from Pentaho distribution) & test playground transformation are: HERE
NOTE: While playing do not forget to remove generated MS Excel files between runs.
Screenshot:
Any suggestions how to fix my transformation?
The pattern goes as follow:
read data: 1 row per children, with the parent data in one or more column
group the data : 1 row per parent, forget the children, keep the parent data. Transform and save as needed.
back from the original data, lookup each row (children) and fetch the parent in the grouped data flow.
the result is one row per children and the needed column of the transformed parent. Transform and save as needed.
It is a pattern, you may want to change the flow, and/or sort to speed up. But it will not lock, nor feed up the memory: the group by and lookup are pretty reliable.
Question 1: Yes, the step you are looking after is named Block until this (other) step finishes, or Blocking Step (untill all rows are processed).
Question 2: Yes, you can pass the rows from one transformation to an other via the job. But it would be wiser to first produce the parent sheet and, when finished, read it again in the second transformation. You can also pass the row in a sub-transformation, or use other architecture strategies...
Question 3: (Short answer) The Excel Writer appends data (new sheet or new rows) to an existing Excel file, while the Excel Output creates and feed a one sheet Excel file.
How dynamically can I get each element from get data from xml step separately to be an input to other transformation which do the parsing of message(value node xml), my main idea is how to run transformation kettle for each row xml data dynamically) .
*dynamically means that the number of elements is unknown.
question in forum pentaho community: http://forums.pentaho.com/showthread.php?204226-run-transformation-kettle-for-each-row-xml-data-dynamically&highlight=How+dynamically+can+I+get+each+element+from+get+data+from+xml+step+separately+to+be+an+input+to+other+transformation+which+do+the+parsing+of+message%28value+node+xml%29%2C+my+main+idea+is+how+to+run+transformation+kettle+for+each+row+xml+data+dynamically%29+.++%2Adynamically+means+that+the+number+of+elements+is+unknown.
It's a bit dated, but it sounds like this is what you're looking for:
Run Kettle Job for each Row
Essentially, you get the data from your XML file with a transform (Get data from XML) and flow it into a Copy rows to result step. Then in your job, add a Transformation step and in its options, on the advanced tab, check the "Copy previous results to parameters" and "Execute for every input row" check boxes.
You will have to setup parameters for the Transformation step to match the metadata of your XML data row.
Note that this will be pretty slow if you have a large number of message IDs and relatively little child data for each message. If that's the case, you might want to try a lookup from the XML data in the first Transformation instead.
I am passing a value to the sub transformation, sub transformation takes the value fine as i have used java-script step to Alert it.
But i have a table input step in the sub transformation step, where i need to used the parent transformation value as a parameter to the TABLE INPUT step to run a query against it, but its not working, as the table input step does not understand the field, how can i achieve this behavior?
I am stuck at this point and can't go further.
The only option that is left is to use the Pentaho JOBS, but is it possible using Mapping inside a transformation?
I tried to setVariable function from the javascript in the sub transformation but nothing works.
I expect that your sub transformation is similar to the one in the figure below. Are you sure you are passing the parameters correctly? The important is to:
have same number of parameters in Mapping input specification as parameters used in Table input step
Replace variables in script checked
Insert data from step filled
parameter ? used in SQL query
If you need to pass more parameters to the table input, the number of parameters in previous step (Mapping input specification in case of my example) needs to respect the number of parameters you use in table input. Then you use ? more times in your query. E.g. for 3 params you could have:
WHERE name = ? AND surname = ? AND age = ?
Also you need to respect the order of parameters which come from previous step: