Pentaho Spoon Simultaneous execution of same Job - pentaho

We have created a single Job in pentaho. We want to run this same job from .kjb file multiple times passing different parameters from command line. But as the code is a single source file, we are not able to execute in parallel. What is the solution for running single pentaho job in parallel?

1) You can use a wrapper Pentaho job that calls the same .kjb multiple times with different parameters. In this scenario you'd use the "Run Next Entries in Parallel" setting as it's described here: http://wiki.pentaho.com/display/EAI/Launching+job+entries+in+parallel
2) If you're using kitchen to run .kjb from the command line, you can take care about parallel execution in the shell script itself.

Related

Splitting Jenkins Job to run concurrently

Does anyone know of a way to split a single Jenkins job into parts and run them concurrently/parallel?
For example if I have a job that runs tests which take 30 minutes, is there a way I can break this job into three 10 minute runs that run at the same time but in three different instances
Thanks in advance.
Create new jobs, call it f.e. Test . You should select the job type based on the type of the root job.
If you have a Maven Job type, you can set the workspace directory under build -> advanced. Freestyle Job type has this option directly under project -> advanced.
Set for all jobs the same working directory. The root job will compile and all other jobs uses the same working directory to use the compiled output.
For the test jobs add the test execution as build step and differ here the tests which should be executed.
Edit your root job and remove there the excution of the long running tests. You can call there the three jobs now. But you need the Parameterized Trigger Plugin.
The downside of this way, you need enough jenkins executors to handle all tests jobs.
If you're using Jenkins 1.x, I would suggest trying the multijob plugin - I've successfully used it to split a single job into a parent job plus multiple child jobs:
https://wiki.jenkins-ci.org/display/JENKINS/Multijob+Plugin
If you're using Jenkins 2.x, then try out the pipeline feature :) It makes running parallel tasks very easy:
https://github.com/jenkinsci/pipeline-plugin/blob/master/TUTORIAL.md#creating-multiple-threads
If you want, I believe you can also use pipelines in Jenkins 1.x by means of a plugin. I haven't looked into that, though.

How to schedule Pentaho Kettle transformations?

I've set up four transformations in Kettle. Now, I would like to schedule them so that they will run daily at a certain time and one after the another. For example,
tranformation1 -> transformation2 -> transformation3 -> transformation4
should run daily at 8.00 am. How can I do that?
There are basically two ways of scheduling jobs in PDI.
1. You can use the command line (as correctly written by Anders):
for transformation scheduling:
<pentaho-installation directory>/pan.sh -file:"your-transformation.ktr"
for job scheduling:
<pentaho-installation directory>/kitchen.sh -file:"your-transformation.kjb"
2. You can also use the inbuilt scheduler in Pentaho Spoon.
If you are using the EE version of PDI, you will have a inbuilt scheduler in the spoon itself. Its an UI interface which you can use it to easily schedule jobs. You can also read this section of doc for more.
You can execute transformation from the command line using the tool Pan:
Pan.bat /file:transform.ktr /param:name=value
The syntax might be different depending on your system - check out the link above for more information. When you have a batch file executing your transformation you can just schedule it to run using any scheduling tool on the whatever system you are running.
Also, you could put all the transformation in a job and execute that from the command line with Kitchen.
I'd like to add another answer that many first-time spoon users miss. Let's say you have a transformation exampleTrafo.ktr that you want to run in a certain interval. Then what you could do is create a job exampleJob.kjb which merely runs the transformation. If you do so, you will have to create something that looks like this:
The START node here is the important thing: right klick on it and choose Edit... and you'll be presented with a job scheduling window where you can specify your desired job schedule. Then save and run this job (either locally or eventually remote on a slave using PDI's carte server). Basically what you will end up with is a indefinitely running job called exampleJob that will execute your exampleTrafo in the desired intervals.

Counting how many script in SSIS have been completed

In SSIS package i have multiple scripts running within a job. At any given time i want to read how many scripts have been executed eg, 5/10 (50%) have been completed. Please tell how can i achieve that?
Currently there is no such functionality provided by SSIS to track progress of package execution.
It seems you need to write your own custom utility/application to implement same or use third party one.
There are few ways to do -
Using a switch called /Reporting or /Rep of DTEXEC at the command-line . For example:
DTEXEC /F ssisexample.dtsx /Rep P > progress.txt
2.Implement package logging or customize it.
3 . Implement Event handler on required executable. You can also use OnPipelineRowsSent log of Data Flow Task.
If you want to write your own application then below thread will provide nice starting point.
How do you code the Package Execution Progress window in C#
In my experience, we are using another software to monitor the jobs that are running. http://en.wikipedia.org/wiki/CA_Workload_Automation_AE
You can also try to create your own application that runs on the background that checks that status of your jobs, through checking the logs.

Execute Multiple PowerShell Files using a SSIS Package

I have multiple PowerShell script files that I need to execute in a sequential flow(one after other). Can someone please help me how to schedule multiple PowerShell files to be executed using a SSIS Package. And I need to build a fault tolerant model were I need to re-execute a powershell script in case of failure.
Running PowerShell
There isn't a built-in Execute PowerShell task (pity) so you'll need to use an Execute Process Task with the path to powershell.exe
Something that you will need to take into consideration is that the default execution policy for PowerShell is Restricted which cannot run a script. Further complicating matters is the account that runs the SSIS package will also need to have its execution policy modified to be able to fire off those scripts. It's a simple matter of Set-ExecutionPolicy RemoteSigned or whatever level you feel is appropriate but you'll need to do this from within the account.
Fault Tolerance
The simple approach is to ignore the return code in the Execute Process Task. Alternatively, if the desire is to keep running the PS1 until it doesn't fail, then you'd wrap a For Loop Container around the Execute Process Task and only set the terminal condition once the task returns a success value. Things might still go sideways depending on what the failure is.

ms-access: doing repetitive processes with vba/sql

i have an access database backend that contains three tables. i have distributed the front end to several users. this is a very simple database with minimal functionality. i need to import certain rows from a file every hour into one of the tables in the database. i would like to know what is the best way to automate this process so that i can have it running hourly. i need it to be running sort of as a service in the background. can you tell me how you would do this?
You could have for example:
a ms-access file with all necessary code to run the import proc
a BAT file containing the command line(s) that will run this ms-access file with all requested parameters. Check ms-access command line parameters to see the available options.
a task scheduler service software to launch the BAT file: depending on the task scheduler and the command line to be sent, you could even avoid the BAT file step
If all you want to do is run some queries, I would not do this by automating all of Access, but instead by writing a VBScript that uses DAO to execute the SQL directly. That's a much more efficient way to do it, and will run without a console logon (which may or may not be required for full Access to be run by the task scheduler).