Pentaho kettle transformation - Empty CSV file input step skip subsequent steps - pentaho

I have a kettle transformation which has a csv file input step. I would like the transformation to just skip all the subsequent steps in the transformation if the csv file has no data (empty). Is there a way to achieve this?

Try the step "Detect Empty Stream" and check for NULL condition for any one of the columns from the CSV.
Appending a link from PDI Wiki:
http://wiki.pentaho.com/display/EAI/Detect+empty+stream

Otherwise, you could use the step "Get File Names". This returns many fields, one of which is the field "size". After this you can put a "Filter rows" step that sends the flow to a "Dummy" step if the size is equale to 0.

Related

PDI. I would like to aggregate a line to csv im creating

For make it easy, i have a table on a ddbb that i'm making the input on pdi then i want to output it as a csv but i would like to add a line before.
For example what i would look like when you open the csv:
i'm idiot
id;xxxxx;xx;xx;ddd;d
1;ddd;ddd;ddd;ddd;d
2;ds;dd;ss;dss;s
i'm using pentaho
I think the easiest way to achieve this would be to use two transformations, the first one create the file with the line, and the second one uses the Text file output step to add the data to the file created in the first transformation. The Text file output step allows you to check the Append, so you add the new information to the existing file.

Cannot get data using get rows from result step in the second transformation in kettle

I have two transformations inside one job in kettle. the first transformation read data from a csv file and sorted the data. At the end of the 1st trans I use copy rows to result step. In the 2nd trans, I begin with get rows from result step followed by a text file output step.
The job looks like this:
Trans1 and trans2 are as follows:
The job runs well, so do trans1 and trans2, except that in Trans2 there is no data read and not data written.
According the answers to similar questions I checked the box "copy previous results to parameters" and "execute for every input row" under Advanced tab. Then I go to the parameters tab and click button"Get Parameters" button. No parameters were returned. Instead I got error in log saying "cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp"
I've tried all the advise given to similar question but still so confused why it doesn't work. I don't think this is a version issue of Pentaho Spoon. Any advise will be welcomed. Thanks in advance!
One suggestion I received ia that I need to edit the get variable step manually.enter image description here

Set Table Input data to Polling Folder - Pentaho Data Integration

I have a requirement where we can get list of file names from SQL and need to pass these file names as variable to Step which can poll folder for these file names as text file. Please advise how to set SQL output of file names as array variable and pass to polling folder step ?
Don't use variables. Variables are only suitable if your input has 1 single row.
Instead, use two transformations inside a parent job. The first transformation gets a list of filenames and passes those to a step Copy Rows to Result;
The second transformation can do one of two things:
Process all files at once: just use a Get Rows from Result step as your entry point to the transformation;
Process one file at a time: create a parameter for the filename on the transformation; open the parent job, and on the properties of the transformation go to Advanced and tick the box "Execute for every input row" and on Parameters put the child trans parameter name and the stream column name coming from the 1st transformation.

Pentaho Data Integration: How to select output of sql query as a filename for Microsoft Excel Input.

I have files abc.xlsx, 1234.xlsx, and xyz.xlsx in some folder. My requirement is to develop a transformation where the Microsoft Excel Input in PDI (Pentaho Data Integration) should only pick the file based on the output of a sql query. If the output query is abc.xlsx. Microsoft Excel Input should pick of abc.xlsx for further processing. How do I achieve this? Would really appreciate your help. Thanks.
Transformations in Kettle run asynchronously, so you're probably looking into needing a job for this.
Files to create
Create a transformation that performs the SQL query you're looking for and populates a variable based on the result
Create a transformation that pulls data from the Excel file, using the variable populated as the filename
Create a job that executes the first transformation, then steps into the second transformation
Jobs run sequentially, so it will execute the first transformation, perform the query, get the result, and set a variable. Variables need to be set and retrieved in different transformations because of their asynchronous nature. This is the reason for the second transformation; the job won't step into the second transformation until the first one is done running (therefore, not until the variable is populated).
This is all assuming you only want to run the transformation once, expecting a single result from the query. If you want to loop it, pulling data from a set, then setup is a little bit different.
The Excel input step has a "accept filenames from previous step" option. You can have a table input build the full path of the file you want to read (or you somehow build it later knowing the base dir and the short filename), pass the filename to the excel input, tick that box and specify the step and the field you want to use for the filename.

Kettle - Read multiple files from a folder

I'm trying to read multiple XML files from a folder, to compile all the data they have (all of them have the same XML structure), and than save that data in a CSV file.
I already have a 'read-files' Transformation with the steps: Get File Names and Copy Rows to Result, to get all the XML files. (it's working - I print a file with all the files names)
Then, I enter in a 'for-each-file' Job which has a Transformation with the Get Rows from Result Step, and then another Job to process those files.
I think I'm loosing information from the 'read-files' Transformation to the Transformation in the 'for-each-file' Job which Get all the rows. (I print another file with all the files names, but it is empty)
Can you tell me if I'm thinking in the right way? I have to set some variables, or some option that is disabled? Thanks.
Here is an example of "How to process a Kettle transformation once per filename"
http://www.timbert.net/doku.php?id=techie:kettle:jobs:processtransonceperfile