Pentaho Kettle JSON Output - pentaho

I have to design a KTR, where am passing fields from a step - Select values to JSON Output Step. The output from it is as shown in figure before:
How Can i remove the fields Chart, Title , XAxis, YAxis, series? So that i get only result from this step as outputvalue
My KTR is:

At select values step 4.2.2 select those fields which are required, in your case output field only.
for that press on get fields and delete the fields you mansion.
i am not sure with json thing but it will work as it is applicable for generating or getting result from database.

Related

Pentaho PDI: execute transformation for each line from CSV?

Here's a distilled version of what we're trying to do. The transformation step is a "Table Input":
SELECT DISTINCT ${SRCFIELD} FROM ${SRCTABLE}
We want to run that SQL with variables/parameters set from each line in our CSV:
SRCFIELD,SRCTABLE
carols_key,carols_table
mikes_ix,mikes_rec
their_field,their_table
In this case we'd want it to run the transformation three times, one for each data line in the CSV, to pull unique values from those fields in those tables. I'm hoping there's a simple way to do this.
I think the only difficulty is, we haven't stumbled across the right step/entry and the right settings.
Poking around in a "parent" transformation, the highest hopes we had were:
We tried chaining CSV file input to Set Variables (hoping to feed it to Transformation Executor one line at a time) but that gripes when we have more than one line from the CSV.
We tried piping CSV file input directly to Transformation Executor but that only sends TE's "static input value" to the sub-transformation.
We also explored using a job, with a Transformation object, we were very hopeful to stumble into what the "Execute every input row" applied to, but haven't figured out how to pipe data to it one row at a time.
Suggestions?
Aha!
To do this, we must create a JOB with TWO TRANSFORMATIONS. The first reads "parameters" from the CSV and the second does its duty once for each row of CSV data from the first.
In the JOB, the first transformation is set up like this:
Options/Logging/Arguments/Parameters tabs are all left as default
In the transformation itself (right click, open referenced object->transformation):
Step1: CSV file input
Step2: Copy rows to result <== that's the magic part
Back in the JOB, the second transformation is set up like so:
Options: "Execute every input row" is checked
Logging/Arguments tabs are left as default
Parameters:
Copy results to parameters, is checked
Pass parameter values to sub transformation, is checked
Parameter: SRCFIELD; Parameter to use: SRCFIELD
Parameter: SRCTABLE; Parameter to use: SRCTABLE
In the transformation itself (right click, open referenced object->transformation):
Table input "SELECT DISTINCT ${SRCFIELD} code FROM ${SRCTABLE}"
Note: "Replace variables in script" must be checked
So the first transformation gathers the "config" data from the CSV and, one-record-at-a-time, passes those values to the second transformation (since "Execute every input row" is checked).
So now with a CSV like this:
SRCTABLE,SRCFIELD
person_rec,country
person_rec,sex
application_rec,major1
application_rec,conc1
status_rec,cur_stat
We can pull distinct values for all those specific fields, and lots more. And it's easy to maintain which tables and which fields are examined.
Expanding this idea to a data-flow where the second transformation updates code fields in a datamart, isn't much of a stretch:
SRCTABLE,SRCFIELD,TARGETTABLE,TARGETFIELD
person_rec,country,dim_country,country_code
person_rec,sex,dim_sex,sex_code
application_rec,major1,dim_major,major_code
application_rec,conc1,dim_concentration,concentration_code
status_rec,cur_stat,dim_current_status,cur_stat_code
We'd need to pull unique ${TARGETTABLE}.${TARGETFIELD} values as well, use a Merge rows (diff) step, use a Filter rows step to find only the 'new' ones, and then a Execute SQL script step to update the targets.
Exciting!

Generate consecutive rows in Pentaho

How do I generate consecutive rows in Pentaho Spoon?
I have a text file and I am using "Sample Rows" step to select every third line from the text file. But the problem with the "Sample Rows" is that I have to manually type "3,6,9,12....".
Is there a better way to do this. I tried adding the field name from "Add Sequence" step, but it doesn't read.
Attached Image
You can add a counter using the Add Sequence step and setting the Maximim value as 3.
This will create a new field, integer, with values 1,2,3,1,2,3,...
Then, a Filter Rows step can be used on the condition that the field must equal 3, and only every 3rd row will pass to the output of the filter rows step.
If I understood issue correctly,
You can use a separate Table or file which will have input configuration for Transformations and Job.
So manually you don't need to enter 3,5,7 etc. it will read input data from input table or file.

How can I use an additional condition by getting data from xls-file input in Pentaho spoon?

I have just started learning pentaho spoon steps and have one problem with solving one problem. I need to transform the data from xls-file and convert it do database. The problem is that my input file looks like this: table-description
And I can not find how to solve two problems:
For my next step I need to save not only the table itself (Range A8:D11), but also the date (cell A5). When I am trying to do it in pentaho with Microsoft Excel Input – Step it works only when I select A8-cell as a start row, but the date is not saved.
In Microsoft Excel Input – Step I must always select a start row in order to generate a table and use it in next steps. And I must do it manually, I mean to say that my table starts from A8-cell. In my case I can not always say for sure that the table starts from A8-cell. I know, that the start-cell is that cell, which is in A-Column and has value = “Date”. Microsoft Excel Input – Step will be first step in my kettle because I must get data and change them. That is why I think I can not use before Java Script.
I have not found the solution to these two problems and I do not know if it is possible to make it. I will be grateful for any help.
I am not sure what do you mean by converting an excel file to database but If you can convert the xls into csv and read that file then you know from which row you need to filter the data. Basically you can use a simple filter step to filter the data when it matches column name. I hope this will help.
Use two Microsoft Excel Input steps. One step reads the table (A8:D11). The other step reads the date (A5). Then merge the two streams, for example using a Join Rows (cartesian product) step
Read everything. Then use a Javascript step with two script tabs. For one of the tabs: Right-click and choose Set start script. Code : var start = 0; The other tab should be kept as a transformation script. Pseudocode: if(FieldA equals "Date") {start = 1;}. Now you will have an additional field in the stream called start. If start equals 0, then you know that your tabular data hasn't started yet, and you can filter out the row.

Pentaho Row Denormaliser Step Not Working

I have some sorted data that I'm trying to denormalize but the step in Pentaho isn't working correctly.
Here is a snapshot of the sorted data:
And here is a snapshot of the Row Denormaliser Step as I've configured it:
What I get is:
There are no steps between the sorted data preview and the Row Denormaliser Step. I've also made sure that the field type of 'Number' is consistent with the field type of the output field of the previous step.
What am I missing/getting wrong? Any ideas as to why it's not working?
EDIT
I took a Data Grid step and input the data exactly the same as the output of the Table Input step - and it worked fine! But with the Table Input step, it breaks. Here are the screenshots:
1) With the Table Input:
Transformation:
Table Input Step's Data:
Final Output:
2) With the Data Grid Step:
Transformation:
Data Grid Step's Data:
Output:
I've hit a roadblock and don't understand how the table input step could be breaking the transformation. If anyone has any insight, please share!
Edit 2: Further Testing
My database connection is that of an MS SQL Server 2008 R2 SP2 Express for the original issue. I have now tested the following:
Similar architecture for a PostgreSQL Server (2 groupings on the normaliser step): SUCCESS
Single grouping on the MS SQL Server with the original field types (without Select Values Step) as 'String': FAILURE
It seems that this issue is localized to the use of a MS SQL Server connection. Creating a blocker JIRA ticket now on Pentaho - hopefully someone on the team will be able to reproduce the bug(?).
The issue was caused due to extra spaces being padded on the cells, which the Row Denormaliser couldn't parse against correctly. Upon trimming the cells using the String Operations step, the transformation now works correctly.
Maybe the data types of the columns in the Table Input step are different from those specified in the Data Grid step, which might lead to conversion errors in the Row Denormaliser. Make sure in your Select Values that you are specifying the types of all used fields, hopefully that will ensure exactly the same data is going into the Sort Rows whether it comes from the Data Grid or Table Input step.

Date format and data extract from Pentaho Kettle Spoon

I am using Kettle Spoon for transformation.
How do I give fixed input date from 'Get System Info'? I see options of selecting yesterday, month ago etc. But I want to select fixed date manually such as: '2012-12-14'
I got an csv, 'text file output' from transformation. The outputs are for say A, B, C, D, E. I want to filter and get only A, B, D, E.
How do I filter from 'text file output' and select only desired columns to get my data into final table.
Thank you in advance.
1) use a select value step right after the "Get system info". In the Meta-data tab choose the field, use type Date and choose the desired format mask (yyyy-MM-dd).
2) if you need filtering columns, i.e. dropping some columns from output, again use select value step; if you need filtering rows based on the values contained in a field/column, then use filter rows step.
I'm guessing you want to add hard coded dates, rather than reformat existing dates. If that's the case, just use an Add Constants step. Set the column type to Date. If you need to do it as a source step you can use a Data Grid or Generate Rows step.
If you want to remove columns from a text file output, you can use a Select Values step as #andtorg said, but you can also simply remove the columns from the Fields tab of the Text File Output step.
Use Modified Javascript Value
Add Modified Javascript Value in PDI and Choose whatever format u want .