I have a requirement where we can get list of file names from SQL and need to pass these file names as variable to Step which can poll folder for these file names as text file. Please advise how to set SQL output of file names as array variable and pass to polling folder step ?
Don't use variables. Variables are only suitable if your input has 1 single row.
Instead, use two transformations inside a parent job. The first transformation gets a list of filenames and passes those to a step Copy Rows to Result;
The second transformation can do one of two things:
Process all files at once: just use a Get Rows from Result step as your entry point to the transformation;
Process one file at a time: create a parameter for the filename on the transformation; open the parent job, and on the properties of the transformation go to Advanced and tick the box "Execute for every input row" and on Parameters put the child trans parameter name and the stream column name coming from the 1st transformation.
Related
Now I have a pentaho job using shell script to process some data.
But I found if I want to use the result in the script, I had to write that into a file and read the file to asign variables.
Is there an esaier way to use the result of a script step in the following steps?
This is the Script content.
Here is the whole process.
In pentaho you cannot create a variable and used in the same place.
Basically you just need to create one ktr and one job:
the first one is in charge of perform some task and save the variable with set-variable step (root job level option)
variables created in the first ktr are also available at job level
If you want to use the variable in another ktr
the second ktr, at the beginning should use the get-variable step to retrieve the variable created in previous transformation
transformations should be executed sequentially using a job
In your case, you should run the shell in the first ktr, transform the result into a variable and save it using set-variable. Your job which invokes the ktr, are able to use the variable create in the previous ktr
I need, at runtime, to change which connection is used by a table input step.
I have 3 connections defined: STG, DWH, DM.
I want to choose at runtime between them.
I can't create a new connection with parameters for server name, database name, etc. I must use the existing connections.
I wish I can write down a variable ${my_connection} in the box below, but the field cannot be edited.
Any suggestion?
Instead of using the variable in the connection selector of the Step, use the Host and Database name in the connection configuration.
EDIT:
You can pass a variable for the KTR to capture and test it using a Switch/Case step that calls a Transformation Executor, in this KTR you'll have your Table input and a copy rows to result step, results which will be captured after the Transformation Executor. You'll need 3 different KTR's, each with the Table input step that's going to execute the row passed by the Switch / Case step.
If i'm not clear or you need further explanation i can perhaps produce an example.
There is a case in my ETL where i am trying to take "table output" name from command line. The table name does not correspond to any streaming field's name. Is there any way to get it done in pentaho kettle?
Pentaho DI is a metadata based tool. I assume you will be trying to pass the output table name from the command line like below:
.../pan.sh -file:"/home/user/sample.ktr" -param:table_output=SOMETABLE
Assuming the command above is what you are trying.
So firstly, change the transformation settings of sample.ktr (just an example) and add the parameter name : "table_output" to the Parameters section.
Next, in the Table Output Step, use this parameter name in the format : ${table_output} in place of table name. This should solve your query.
Incase you are passing the parameters to a job. As mentioned above, the first section of the adding the parameters remains the same.
You can next take a separate transformation (.ktr) file inside a job, double click on the ktr (from the job file) and you will find PARAMETERS Section like the image below. Add the parameters
Thirdly inside the .ktr file, repeat the step from above (first section) and use a SET VARIABLE or TABLE OUTPUT. SET Variable step will ensure that you have the parameter available across the entire job. Mostly depends on your requirement.
Hope it helps :)
This should give you an idea how to do it. Since transformations are just xml you can read the metadata from them. Basically you find the table output step and set it as a variable in this case "TABLE"
I have files abc.xlsx, 1234.xlsx, and xyz.xlsx in some folder. My requirement is to develop a transformation where the Microsoft Excel Input in PDI (Pentaho Data Integration) should only pick the file based on the output of a sql query. If the output query is abc.xlsx. Microsoft Excel Input should pick of abc.xlsx for further processing. How do I achieve this? Would really appreciate your help. Thanks.
Transformations in Kettle run asynchronously, so you're probably looking into needing a job for this.
Files to create
Create a transformation that performs the SQL query you're looking for and populates a variable based on the result
Create a transformation that pulls data from the Excel file, using the variable populated as the filename
Create a job that executes the first transformation, then steps into the second transformation
Jobs run sequentially, so it will execute the first transformation, perform the query, get the result, and set a variable. Variables need to be set and retrieved in different transformations because of their asynchronous nature. This is the reason for the second transformation; the job won't step into the second transformation until the first one is done running (therefore, not until the variable is populated).
This is all assuming you only want to run the transformation once, expecting a single result from the query. If you want to loop it, pulling data from a set, then setup is a little bit different.
The Excel input step has a "accept filenames from previous step" option. You can have a table input build the full path of the file you want to read (or you somehow build it later knowing the base dir and the short filename), pass the filename to the excel input, tick that box and specify the step and the field you want to use for the filename.
I've few excel source files in one folder in SSIS. I want to pull data from these excel files and load in to SQL tables.
My problem is I want to save all the files names one by one and want to create SQL table with exactly same name as filename
and then want to load each excel file in corresponding table.
Please help me how to create a package for this.
Jayvee has presented the high level view which is good enough! Let me add in bit detail.
I am assuming that you have dynamic Excel file connection.
Declare a variable and named it as FileName. And assign it the first file name which is available in the folder.
Place Foreach Loop Container and double click on it. Specify the Folder: and Files: as shown in image below.
In the same Foreach Loop Editor, go to Variable Mappings. Select Variable from drop down list. This is the same variable which we defined in first step. Set its Index to 0. Click OK.
Remaining task is same as Jayvee explained.
See this link for further help. And this for Result Set Property Not Set Correctly. I think setting ResultSet property to SingleRow will do the job.
your control flow should look like this: