Pentaho Row Denormaliser Step Not Working - pentaho

I have some sorted data that I'm trying to denormalize but the step in Pentaho isn't working correctly.
Here is a snapshot of the sorted data:
And here is a snapshot of the Row Denormaliser Step as I've configured it:
What I get is:
There are no steps between the sorted data preview and the Row Denormaliser Step. I've also made sure that the field type of 'Number' is consistent with the field type of the output field of the previous step.
What am I missing/getting wrong? Any ideas as to why it's not working?
EDIT
I took a Data Grid step and input the data exactly the same as the output of the Table Input step - and it worked fine! But with the Table Input step, it breaks. Here are the screenshots:
1) With the Table Input:
Transformation:
Table Input Step's Data:
Final Output:
2) With the Data Grid Step:
Transformation:
Data Grid Step's Data:
Output:
I've hit a roadblock and don't understand how the table input step could be breaking the transformation. If anyone has any insight, please share!
Edit 2: Further Testing
My database connection is that of an MS SQL Server 2008 R2 SP2 Express for the original issue. I have now tested the following:
Similar architecture for a PostgreSQL Server (2 groupings on the normaliser step): SUCCESS
Single grouping on the MS SQL Server with the original field types (without Select Values Step) as 'String': FAILURE
It seems that this issue is localized to the use of a MS SQL Server connection. Creating a blocker JIRA ticket now on Pentaho - hopefully someone on the team will be able to reproduce the bug(?).

The issue was caused due to extra spaces being padded on the cells, which the Row Denormaliser couldn't parse against correctly. Upon trimming the cells using the String Operations step, the transformation now works correctly.

Maybe the data types of the columns in the Table Input step are different from those specified in the Data Grid step, which might lead to conversion errors in the Row Denormaliser. Make sure in your Select Values that you are specifying the types of all used fields, hopefully that will ensure exactly the same data is going into the Sort Rows whether it comes from the Data Grid or Table Input step.

Related

Pentaho step - Use SQL functions to add a column in data before dumping it int DB

I am fairly new to Pentaho, and while working on it, I have stumbled across a problem. Below is how my flow is:
Read input from a file. Let's say this has 5 columns.
Make some modifications to existing columns. (Filter, modify and all).
Add a new column, which will be equal to an SQL function of the current row data. Example, it can be sum(id, id+1)
Dump to the database.
Step 1,2, 4 are already in place and are working fine. It's Step 3 where I am stuck. I've tried to execute SQL, but that is only for Modifying DDL and doesn't return data. Table input needs data to be in a table already, which isn't the case with me.
I have a workaround, that I can enter all rows in DB, and then fire an update query, but I was hoping if there is a better way to do this.
You can add formula step and in the formula column, you can specify what you want to achieve. For example, your other column+1 and save it in a new field or also replace the existing field value

Add new column to existing table Pentaho

I have a table input and I need to add the calculation to it i.e. add a new column. I have tried:
to do the calculation and then, feed back. Obviously, it stuck the new data to the old data.
to do the calculation and then feed back but truncate the table. As the process got stuck at some point, I assume what happens is that I was truncating the table while the data was still getting extracted from it.
to use stream lookup and then, feed back. Of course, it also stuck the data on the top of the existing data.
to use stream lookup where I pull the data from the table input, do the calculation, at the same time, pull the data from the same table and do a lookup based on the unique combination of date and id. And use the 'Update' step.
As it is has been running for a while, I am positive it is not the option but I exhausted my options.
It's seems that you need to update the table where your data came from with this new field. Use the Update step with fields A and B as keys.
actully once you connect the hope, result of 1st step is automatically carried forward to the next step. so let's say you have table input step and then you add calculator where you are creating 3rd column. after writing logic right click on calculator step and click on preview you will get the result with all 3 columns
I'd say your issue is not ONLY in Pentaho implementation, there are somethings you can do before reaching Data Staging in Pentaho.
'Workin Hard' is correct when he says you shouldn't use the same table, but instead leave the input untouched, and just upload / insert the new values into a new table, doesn't have to be a new table EVERYTIME, but instead of truncating the original, you truncate the staging table (output table).
How many 'new columns' will you need ? Will every iteration of this run create a new column in the output ? Or you will always have a 'C' Column which is always A+B or some other calculation ? I'm sorry but this isn't clear. If the case is the later, you don't need Pentaho for transformations, Updating 'C' Column with a math or function considering A+B, this can be done directly in most relational DBMS with a simple UPDATE clause. Yes, it can be done in Pentaho, but you're putting a lot of overhead and processing time.

How to do not see the previous fields into output table step?

I have a transformation into Kettle Pentaho called Test.
This ETL process should load three different tables of a single database, where each one has his source into a different table of a another database.
To do this I use three table input steps. Each one connects to a value mapper, this to a Select value step, then a Data Validator, and add sequence step and finally a table output.
Summarising I have a total of six steps per table load.
When I am editing the finals steps I found a thing that I would like to solve, I drag the fields of the previous tables loads.
For example, table A load have the field bank_id, in the second table it does not exist, but in the table output step of the second load process I can select this despite I do not want this.
Is there any option to do not see the previous fields? Thsi way I avoid easy errors. Especially, when the tables have a field with the same name.
Thank you
EDIT
The screenshot clarifies the situation immensely, so now the answer is simple:
Delete the diagonal hops (arrows) between the rows.
Transformations in PDI don't have a single starting or ending point, so you don't need to connect all the steps in a single line. Having three separate streams is just fine.
All steps in a transformation start in parallel, then wait and process rows as they come in (or in the case of input steps, start reading data and generating rows into their output hop). That means your three streams will execute in parallel following their own hops from input to output.
add a Select Values step, i use to add filter steps often to "clean" the flow

How can I use an additional condition by getting data from xls-file input in Pentaho spoon?

I have just started learning pentaho spoon steps and have one problem with solving one problem. I need to transform the data from xls-file and convert it do database. The problem is that my input file looks like this: table-description
And I can not find how to solve two problems:
For my next step I need to save not only the table itself (Range A8:D11), but also the date (cell A5). When I am trying to do it in pentaho with Microsoft Excel Input – Step it works only when I select A8-cell as a start row, but the date is not saved.
In Microsoft Excel Input – Step I must always select a start row in order to generate a table and use it in next steps. And I must do it manually, I mean to say that my table starts from A8-cell. In my case I can not always say for sure that the table starts from A8-cell. I know, that the start-cell is that cell, which is in A-Column and has value = “Date”. Microsoft Excel Input – Step will be first step in my kettle because I must get data and change them. That is why I think I can not use before Java Script.
I have not found the solution to these two problems and I do not know if it is possible to make it. I will be grateful for any help.
I am not sure what do you mean by converting an excel file to database but If you can convert the xls into csv and read that file then you know from which row you need to filter the data. Basically you can use a simple filter step to filter the data when it matches column name. I hope this will help.
Use two Microsoft Excel Input steps. One step reads the table (A8:D11). The other step reads the date (A5). Then merge the two streams, for example using a Join Rows (cartesian product) step
Read everything. Then use a Javascript step with two script tabs. For one of the tabs: Right-click and choose Set start script. Code : var start = 0; The other tab should be kept as a transformation script. Pseudocode: if(FieldA equals "Date") {start = 1;}. Now you will have an additional field in the stream called start. If start equals 0, then you know that your tabular data hasn't started yet, and you can filter out the row.

Add more data to a step "copy rows to result"

I'm doing a transformation in kettle and I need to send data to another transformation, for this use a step "copy rows to result", but this step I do half the processing and need to add more data to the end of the transformation, as might do?.
Greetings and thanks
EDIT 24-06-2014
This image is the example of my transformation:
The only thing you need to make sure when merging the results from multiple streams in Kettle is that the columns from both hops to Copia Filas (Copy rows to result in the English version) are completely identical, meaning:
The number of columns sent to the final step is equal;
The column types are completely identical;
The column names are completely identical;
Kettle should alert if otherwise.
If this is not the issue, you should check that the second stream in the data distribution actually generates the data you want to merge into the final step of the transformation, depending on what you'd like to do there.
Use the preview and debug options, as well as the right click (on a step) ==> Show output fields in Kettle to make sure all of the above are happening.
I hope this helps a bit.