Date format and data extract from Pentaho Kettle Spoon - pentaho

I am using Kettle Spoon for transformation.
How do I give fixed input date from 'Get System Info'? I see options of selecting yesterday, month ago etc. But I want to select fixed date manually such as: '2012-12-14'
I got an csv, 'text file output' from transformation. The outputs are for say A, B, C, D, E. I want to filter and get only A, B, D, E.
How do I filter from 'text file output' and select only desired columns to get my data into final table.
Thank you in advance.

1) use a select value step right after the "Get system info". In the Meta-data tab choose the field, use type Date and choose the desired format mask (yyyy-MM-dd).
2) if you need filtering columns, i.e. dropping some columns from output, again use select value step; if you need filtering rows based on the values contained in a field/column, then use filter rows step.

I'm guessing you want to add hard coded dates, rather than reformat existing dates. If that's the case, just use an Add Constants step. Set the column type to Date. If you need to do it as a source step you can use a Data Grid or Generate Rows step.
If you want to remove columns from a text file output, you can use a Select Values step as #andtorg said, but you can also simply remove the columns from the Fields tab of the Text File Output step.

Use Modified Javascript Value
Add Modified Javascript Value in PDI and Choose whatever format u want .

Related

How to extract partial column name and pass the value to variable in SSIS

I am working on Excel file to load the data into sql table using SSIS VS2013.
How do I extract the column names into a temptable?
In the image attached,there is the "2017 Marketing Sales - Actual" column in the first row, I want to extract the YYYY from the column name and pass that value to a variable and update the table field with YYYY info.
Can you anyone help me how to do this?
In your Excel Connection Manager, uncheck the "First row has column names" check box. This should allow you to access that first row. You'll need to setup a Data Flow Task using this Excel Connection Manager, followed by a derived column to extract the left 4 characters from that 1st row. Unless you somehow limit the rest of the Excel data source, you'll probably also get a lot of extraneous rows importing into your destination. Might need to do some clean up to get it down to just that year.

How to map input to output fileds from excel to csv in pentaho?

How to map input to output fileds from excel to csv in pentaho?
How to tranform this in pentaho ? Where to map values of input to output columns as the positions and name are different in input to output.
You can rename the fields right in your MS-Excel-Input step, and you can reorder the fields in Text-File-Output. Also, a Select-Values step allows you to rename and reorder fields in one sweep on the Select & Alter tab.
The Select Values step allows you to change the column names and position (as well as type).
Note that the column name in the Excel Input is arbitrary and do not need to be related to the actual name in the Excel file, so that can rename them at wish. You can even copy/paste the names into the Field list.
Note also that the order of the column in the CSV output file is defined in Fields tab of Text file output step. You can change it with the Ctrl-Arrows keys.
If you need to industrialize the process and have the new columns name and order in, for example, a set of files or a database table, then you need Metadata injection. Have a look to Diethard's or Jens' examples.

Generate consecutive rows in Pentaho

How do I generate consecutive rows in Pentaho Spoon?
I have a text file and I am using "Sample Rows" step to select every third line from the text file. But the problem with the "Sample Rows" is that I have to manually type "3,6,9,12....".
Is there a better way to do this. I tried adding the field name from "Add Sequence" step, but it doesn't read.
Attached Image
You can add a counter using the Add Sequence step and setting the Maximim value as 3.
This will create a new field, integer, with values 1,2,3,1,2,3,...
Then, a Filter Rows step can be used on the condition that the field must equal 3, and only every 3rd row will pass to the output of the filter rows step.
If I understood issue correctly,
You can use a separate Table or file which will have input configuration for Transformations and Job.
So manually you don't need to enter 3,5,7 etc. it will read input data from input table or file.

OpenRefine columnwise scripting

I spent some time Googling, but couldn't find anything useful.
How to select all the values of a single column in OpenRefine in a script?
It seems that all the operations are row-wise
In particular, I want to find highest and lowest values in a column
By default, OpenRefine functionality are limited for computation. The Stats Extension make basic stats per column (min, max, average, medium ...).
Facets will give you a list of all the values in a column - so the simplest way of getting the lowest/highest values in the column is to make a facet on the column and see the resulting highest/lowest in the facet to get the answer.
However I'm not sure if this meets your criteria for selecting the values 'in a script'. By this I assume you mean you want to be able to access the lowest/highest values in a GREL expression?
You can do this, but you have to force OpenRefine to treat all the rows in project as part of a single record. The easiest way to do this is usually to add a column at the start of the project which is empty except for the first cell which contains a value.
Once you've done this you can access all the values in a column by using syntax like:
row.record.cells["Column name"].value
See also my answer to OpenRefine - Fill between cells but not at the end of the list which uses the same technique
Further explanation:
Create a new column at the start of your project and put a single value in the very first cell in that column
Switch to Record mode
At this point you should have a single 'Record' in your project - e.g.
At this point using the syntax like row.record.cells["Column 1"].value gives you an array of all the values in "Column 1". You can then use GREL expressions to manipulate this - including sorting or comparing values.
A Text Facet has an nice undocumented option to gives you aggregated results in a column that you can just copy and paste.
Click on the "X choices" in the upper left corner of the Text Facet box.
This will bring up a separate dialog that contains the values along with the count of each value in that column.
(If your looking to just get ALL the values of a single column, then use Export -> Custom Tabular Exporter and then Select and Order Columns to Export by clicking on checkboxes, then click on Download tab to choose your export format and then click Download button.)

Pentaho Kettle JSON Output

I have to design a KTR, where am passing fields from a step - Select values to JSON Output Step. The output from it is as shown in figure before:
How Can i remove the fields Chart, Title , XAxis, YAxis, series? So that i get only result from this step as outputvalue
My KTR is:
At select values step 4.2.2 select those fields which are required, in your case output field only.
for that press on get fields and delete the fields you mansion.
i am not sure with json thing but it will work as it is applicable for generating or getting result from database.