I've set up four transformations in Kettle. Now, I would like to schedule them so that they will run daily at a certain time and one after the another. For example,
tranformation1 -> transformation2 -> transformation3 -> transformation4
should run daily at 8.00 am. How can I do that?
There are basically two ways of scheduling jobs in PDI.
1. You can use the command line (as correctly written by Anders):
for transformation scheduling:
<pentaho-installation directory>/pan.sh -file:"your-transformation.ktr"
for job scheduling:
<pentaho-installation directory>/kitchen.sh -file:"your-transformation.kjb"
2. You can also use the inbuilt scheduler in Pentaho Spoon.
If you are using the EE version of PDI, you will have a inbuilt scheduler in the spoon itself. Its an UI interface which you can use it to easily schedule jobs. You can also read this section of doc for more.
You can execute transformation from the command line using the tool Pan:
Pan.bat /file:transform.ktr /param:name=value
The syntax might be different depending on your system - check out the link above for more information. When you have a batch file executing your transformation you can just schedule it to run using any scheduling tool on the whatever system you are running.
Also, you could put all the transformation in a job and execute that from the command line with Kitchen.
I'd like to add another answer that many first-time spoon users miss. Let's say you have a transformation exampleTrafo.ktr that you want to run in a certain interval. Then what you could do is create a job exampleJob.kjb which merely runs the transformation. If you do so, you will have to create something that looks like this:
The START node here is the important thing: right klick on it and choose Edit... and you'll be presented with a job scheduling window where you can specify your desired job schedule. Then save and run this job (either locally or eventually remote on a slave using PDI's carte server). Basically what you will end up with is a indefinitely running job called exampleJob that will execute your exampleTrafo in the desired intervals.
Related
We have created a single Job in pentaho. We want to run this same job from .kjb file multiple times passing different parameters from command line. But as the code is a single source file, we are not able to execute in parallel. What is the solution for running single pentaho job in parallel?
1) You can use a wrapper Pentaho job that calls the same .kjb multiple times with different parameters. In this scenario you'd use the "Run Next Entries in Parallel" setting as it's described here: http://wiki.pentaho.com/display/EAI/Launching+job+entries+in+parallel
2) If you're using kitchen to run .kjb from the command line, you can take care about parallel execution in the shell script itself.
Does anyone know of a way to split a single Jenkins job into parts and run them concurrently/parallel?
For example if I have a job that runs tests which take 30 minutes, is there a way I can break this job into three 10 minute runs that run at the same time but in three different instances
Thanks in advance.
Create new jobs, call it f.e. Test . You should select the job type based on the type of the root job.
If you have a Maven Job type, you can set the workspace directory under build -> advanced. Freestyle Job type has this option directly under project -> advanced.
Set for all jobs the same working directory. The root job will compile and all other jobs uses the same working directory to use the compiled output.
For the test jobs add the test execution as build step and differ here the tests which should be executed.
Edit your root job and remove there the excution of the long running tests. You can call there the three jobs now. But you need the Parameterized Trigger Plugin.
The downside of this way, you need enough jenkins executors to handle all tests jobs.
If you're using Jenkins 1.x, I would suggest trying the multijob plugin - I've successfully used it to split a single job into a parent job plus multiple child jobs:
https://wiki.jenkins-ci.org/display/JENKINS/Multijob+Plugin
If you're using Jenkins 2.x, then try out the pipeline feature :) It makes running parallel tasks very easy:
https://github.com/jenkinsci/pipeline-plugin/blob/master/TUTORIAL.md#creating-multiple-threads
If you want, I believe you can also use pipelines in Jenkins 1.x by means of a plugin. I haven't looked into that, though.
Suppose I have a smart folder X having 5 jobs with multiple dependencies. For example, let us assume the job hierarchy is like this:
So, from Planning tab, I order this smart folder for execution. Since I don't want to wait for Job 202 to execute, as it a tape backup job which is not needed in the environment I am working in, I mark Job 202 as "OK" in the monitoring tab. For Job 302, it is a pre-requisite that Job 202 ends "OK".
In a similar set up, I have hundreds of jobs with similar dependencies. I have to order the folder from time to time, and have to manually mark all the jobs that are not required to run as "OK". I cannot simply remove the jobs that I need to mark ok as they have dependencies with the other jobs I want to execute.
My question is - How can I do this once - that is mark ok all unnecessary jobs - and save this for all future instances when I am going to run the workload?
If the job you mentioned as Job202 is not that important for Job302 to start, then this should be independent. Remove if from the flow and make it independant. Make this changes in Control M Desktop and write it to database. You will not have to make the changes daily.
For all jobs not required in the "environment" you are testing, you can check the check box for "run as dummy" to convert those jobs to "dummy" jobs while maintaining the structure, relationships, and dependencies in your folder. A dummy job will not execute the command, script, etc, rather the dummy job will only provide control-m the instructions on the post-processing steps of the job OR in your case the adding of conditions to continue processing the job flow after the dummy job.
(I realize this is an old question; I provided a response should it be helpful to anyone that finds this thread after me)
In SSIS package i have multiple scripts running within a job. At any given time i want to read how many scripts have been executed eg, 5/10 (50%) have been completed. Please tell how can i achieve that?
Currently there is no such functionality provided by SSIS to track progress of package execution.
It seems you need to write your own custom utility/application to implement same or use third party one.
There are few ways to do -
Using a switch called /Reporting or /Rep of DTEXEC at the command-line . For example:
DTEXEC /F ssisexample.dtsx /Rep P > progress.txt
2.Implement package logging or customize it.
3 . Implement Event handler on required executable. You can also use OnPipelineRowsSent log of Data Flow Task.
If you want to write your own application then below thread will provide nice starting point.
How do you code the Package Execution Progress window in C#
In my experience, we are using another software to monitor the jobs that are running. http://en.wikipedia.org/wiki/CA_Workload_Automation_AE
You can also try to create your own application that runs on the background that checks that status of your jobs, through checking the logs.
I have scheduled a DTS to run from a scheduled job. The DTS has several steps in it. Now whenever the job is running and I take a look at the jobs section in Enterprise manager, then it always displays the following in the status: Executing Job Step 1'.... although its running all steps properly. How do I know at what step the DTS is running at?
Can I get the status maybe from sql analyzer?
You can add something which can show you where the dts currently running at. I prefer best way is to put alert using a script. There is no other direct way using which you can trace DTS task !
The display you get is a snap shot. you need to keep refreshing it.
There is only one step in the job, the command to run the DTS package.
If you want to see progress of steps within the package, you need to add something to the DTS package to record each step as it finishes in a logging table.
Since the DTS mostly executes against database tables, on the SQLServer side you can find what all sessions are currently active, the statement it is executing etc if you have administrative privileges. You can find this under Management as Activity monitor.