I am looking for a solution to extract results (variables or parameters or else) from a job kettle.
Exactly I want to get logs about insert/update steps (if fields in database are inserted or not).
Make an error step on the Insert/Update step, if errors occur, row will be send to the error handling step, save them in an excel file and email it to your self. when a row is passed, error# and error MSG is passed as well to the error handling step.
You can pass data/rows to next transformation in job, double click the transformation and under advanced tab check "copy previous result to parameters" then in the transformation under JOB header, select get rows from results. rest is pretty much self explanatory :)
Related
I'm working on a package to import data from a raw text file to a table in SQL Server. My package contains:
1) An Execute Process Task that runs a batch file to compile .txt files
2) An Execute SQL Task that Truncates the table I want to import
3) A Data Flow Task that takes the data from the raw text file and puts it in the table in SQL Server
I was able to run each step individually and they worked as expected - however, when I run the solution from inside SSIS itself, it gives me the "success" message but nothing actually happens. Even worse, the components of the data flow task are now missing.
Has anyone experienced this who found a work around?
Sorry for the lack of specifics! I actually figured it out. Let me clarify my second paragraph:
The batch portion and the Execute SQL Task work perfectly when I disable the Data Flow Task! However, upon enabling the Data Flow Task, the package would "run" but not complete the Data Flow Task and would delete the Data Flow Task's components completely. Within the data flow task I had:
1) Flat File Source
2) Conditional split that ignored rows in the first column if the value was "".
3) OLE DB destination table
What I found is that changing the Conditional Split from specifically ignoring rows with "" to making the criteria based on value length, rather than looking for that symbol, worked out and didn't completely delete out components in the data flow task.
TL;DR: For whatever reason, the solution I built didn't like the conditional split criteria being based on the "" character. When I removed that, the solution worked perfectly.
Introduction
To keep it simple, let's imagine a simple transformation.
This transformation gets an input of 4 rows, from a Data Grid step.
The stream passes through a Job Executor, referencing to a simple job, with a Write Log component.
Expectations
I would like the simple job executes 4 times, that means 4 log messages.
Results
It turns out that the Job Executor step launches the simple job only once, instead of 4 times : I only have one log message.
Hints
The documentation of the Job Executor component specifies the following :
By default the specified job will be executed once for each input row.
This is parametrized in the "Row grouping" tab, with the following field :
The number of rows to send to the job: after every X rows the job will be executed and these X rows will be passed to the job.
Answer
The step actually works well : an input of X rows will execute a "Job Executor" step X times. The fact is I wasn't able to see it with the logs.
To verify it, I have added a simple transformation inside the "Job Executor" step, which writes into a text file. After I have checked this file, it appeared that the "Job Executor" was perfectly executed X times.
Research
Trying to understand why I didn't have X log messages after the X times execution of "Job Executor", I have added a "Wait for" component inside the initial simple job. Finally, adding two seconds allowed me to see X log messages appearing during the execution.
Hope this helps because it's pretty tricky. Please feel free to provide further details.
A little late to the party, as a side note:
Pentaho is a set of programs (Spoon, Kettle, Chef, Pan, Kitchen), The engine is Kettle, and everything inside transformations is started in parallel. This makes log retrieving a challenging task for Spoon (the UI), you don't actually need a Wait for entry, try outputting the logs into a file (specifying a log file on the Job executor entry properties) and you'll see everything in place.
Sometimes we need to give Spoon a little bit of time to get everything in place, personally that's why I recommend not relying on Spoon Execution Results logging tab, it is better to output the logs to a DB or files.
I have two transformations inside one job in kettle. the first transformation read data from a csv file and sorted the data. At the end of the 1st trans I use copy rows to result step. In the 2nd trans, I begin with get rows from result step followed by a text file output step.
The job looks like this:
Trans1 and trans2 are as follows:
The job runs well, so do trans1 and trans2, except that in Trans2 there is no data read and not data written.
According the answers to similar questions I checked the box "copy previous results to parameters" and "execute for every input row" under Advanced tab. Then I go to the parameters tab and click button"Get Parameters" button. No parameters were returned. Instead I got error in log saying "cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp"
I've tried all the advise given to similar question but still so confused why it doesn't work. I don't think this is a version issue of Pentaho Spoon. Any advise will be welcomed. Thanks in advance!
One suggestion I received ia that I need to edit the get variable step manually.enter image description here
I have a series of task that are very similar:
SELECT a,b FROM c
Lookup in another table and change value in column b.
Save new value back to c and if not match, send the result on to an error table.
That part is pretty straight forward and illustrated here:
Source ==> Lookup =match=> SQL Update command
=No match=> SQL Save Error command
(Hope you understand what I mean - but it works!)
I now have to repeat this a number of times, where my source-sql changes. So what I want to do is to insert a Script Component in front of the Source and set my User::Sql variable like:
Variables.Sql = "SELECT d, e FROM f"
All of the above is contained in a Data Flow. When I have created one I can then copy that one and only change the Sql variable in the script and then it should all work.
My problem is: When I insert the Script Command it asks me if it is a Source, Destination or Transscript script. And by only setting the variable it does not produce any rows for output and cannot connect to my Source.
Anyone know how to make that work?
(I have simplified the above. I actually want to update multiple variables and use those in my Source, Lookup and Error update as well - therefore it is not more simple just to change the SQL script in the initial Source! But being able to do the above, I will be able to achieve what I want :-))
You should set your variable containing the SQL query in the control flow, before you execute the dataflow.
Then you need to use that variable as an expression in your Dataflow. You can parametrize the query used in the lookup or any other parameters of your dataflow.
If your dataflows really have always the same structure, you could even generate a list of queries and call your dataflow task in a loop, preventing the duplication of the same tasks.
I have files abc.xlsx, 1234.xlsx, and xyz.xlsx in some folder. My requirement is to develop a transformation where the Microsoft Excel Input in PDI (Pentaho Data Integration) should only pick the file based on the output of a sql query. If the output query is abc.xlsx. Microsoft Excel Input should pick of abc.xlsx for further processing. How do I achieve this? Would really appreciate your help. Thanks.
Transformations in Kettle run asynchronously, so you're probably looking into needing a job for this.
Files to create
Create a transformation that performs the SQL query you're looking for and populates a variable based on the result
Create a transformation that pulls data from the Excel file, using the variable populated as the filename
Create a job that executes the first transformation, then steps into the second transformation
Jobs run sequentially, so it will execute the first transformation, perform the query, get the result, and set a variable. Variables need to be set and retrieved in different transformations because of their asynchronous nature. This is the reason for the second transformation; the job won't step into the second transformation until the first one is done running (therefore, not until the variable is populated).
This is all assuming you only want to run the transformation once, expecting a single result from the query. If you want to loop it, pulling data from a set, then setup is a little bit different.
The Excel input step has a "accept filenames from previous step" option. You can have a table input build the full path of the file you want to read (or you somehow build it later knowing the base dir and the short filename), pass the filename to the excel input, tick that box and specify the step and the field you want to use for the filename.