In my kettle file, I have this variable:
my_variable = c:/Users/me/Desktop
In my Pentaho job, I have a start step connected to a transformation.
In the transformation step, I am trying to run my transformation. For the location of the file, I have:
${my_variable}/name_of_transformation.ktr
For some reason, it can't find my file. What am I doing wrong?
I figured it out.
When you set the my_variable value in the Kettle Properties file, CLOSE OUT of Pentaho (not the file, but the entire software) and reopen it.
Do this anytime you make changes to the Kettle Properties file if you want it to be recognized.
Now, using the exact same set up as my question, it works fine. All I needed to do was close Pentaho completely and restart the software.
Related
I'm new in using pentaho and I need your help to investigate a problem.
I have scheduled in crontab to run a job by kitchen command. I'm using pentaho release 6.0.1.0.386.
Sometimes (it's not a deterministic problem) one of the transformation stops after "Loading transformation from repository" and before "Dispatching started for transformation". The log interrupts. No errors. Nothing. And the job doesn't go on.
Any idea? Any check I can do ? Thanks
is so many bigger the quantity data in this transformation?
There are some files that can cause some errors, you can find them in this path:
enter image description here
my computer/users / your user / .kettle
If you delete the ones I marked in the image, they will be created automatically when you open the pentaho again.
I am in process of migrating Pentaho from Database repository to file repository.
I have exported the database repository into xml file and then created a file repository and imported the repository...
The first issue that I saw after importing is all my database connections are being stored in .ktr and .kjb files, This is going to be a big issue If I update a connection string like updating a password, I have more than a hundreds of sub transformations and jobs, do I have to update this in all those files?
Is there any way to ignore the password and other connection settings that is stored in the .ktr and .kjb files and instead use the repository connection or specify it in the .kettle property?
The other issue that I face is When I try to run the master job via kitchen in cmd it does not recognize the sub transformation and jobs. However when I change the Transformation root to ${Internal.Entry.Current.Directory} - the sub transformation is being recognized and processed- As I mentioned I have more than 100 sub transformation and jobs - is there any way to update this root for all jobs and transformation at once.
Kitchen.bat /file:"C:\pentaho-8-1\Dev_Repo\home\jobs\MainProcess\MasterJob.kjb" /level:Basic /logfile:"C:\pentaho-8-1\logs\my-job.txt"
This fails with error (.ktr is not a file or the repository is not defined)
However when I change the root directory to ${Internal.Entry.Current.Directory} it works!
For the database connections, you can make .kdbs in the repository and enter variables for all the properties (Host, Port, Schema, User, etc) and define them in kettle.properties or another properties file.
This works like a more convenient version of JNDI files, with one properties file per environment. You can easily inspect current values by opening the kettle properties from within the Spoon client (don't edit them or it will mess up the layout!) and you can also put kettle "encrypted" passwords in the properties file.
PDI will still save copies of the connections into all the .kjb and ktr files (and should in theory update them from .kdb or shared.xml when opening them) but since the contents are just generic variable names (${STAGING_DB_HOST} etc) you will almost never run into problems with this.
For the transformation filenames, a good text search and replace tool should fix most of your transformations in one go. Include some of the XML tag to prevent replacing too much.
I have some SSIS packages that connect to an Oracle database. The connection parameters are stored in a SQL database and retrieved by using the Package Configuration tool.
My problem is that the variable that gets populated automatically by SSIS with the configuration string does not get emptied after the package is run. As a result, the value of the variable get saved in the source code when the package is saved. I DO NOT want to have this variable value saved in my source files.
Any idea on how to prevent this from happening?
Thanks!
You can try setting the variable in a package configuration. The way you do this is simple.
First go to the top most layer of your package and right click on the empty space and select package configurations. Choose to add add. Give a location and name for the file and then click next.
Once you've done that choose the variable you want and set the value like this.
Now you're not storing the actual value in the package. Just the information for how to find it.
EDIT: I may not have been clear on this. This process will create a completely separate file that the package will look to to get that expression. This way you don't have to store the expression or the value in the package itself. It just knows at run time to go look for that config file for any additional data.
EDIT 2: The package configuration will only overwrite when you execute the package in BIDS over Visual Studio. The reason this happens is because the package evaluates and then saves prior to run time. This does not happen when you are using SQL agent to run a package and therefore will not store the value or the expression in the source code. I hope I have clarified that for you.
How can I set the dynamic file path or folder directory for kettle jobs?
Please check the attached screenshot.
Goal: Read the path from a config file as a variable[so that we can change the path dynamically as per the other parameters.]
Details: Say, we want to use the /web/test directory for test environment and we want to fetch file repository from the normal path when the parameter is not test! I assume, there must be a way to keep a config/ini file from where we can read the path and use the variable inside the "File/Directory" section of pentaho.
I am gone through variable reference option but which is mainly for database configurations parameter ,some people suggested which is not good option instead of you can specified the database configuration in xml.
Please suggest any idea or solution.
Sounds like you want to set a parameter/variable in the .kettle file and reference it in the File or directory text box. Note the red dollar sign next to the box. That means this field accepts variables. Here's the wiki entry for variables:
PDI Variables
You can also read from a config file directly (from a transform) and set it dynamically with the Set variables step if you can only have one .kettle file. Also check out the Check if connected to repository (from the Repository branch) step as well and see if that will suit your needs.
If none of these suite your needs, please add detail to your question to describe exactly what you're trying to do and how you're trying to do it.
I am new to Pentaho Kettle and I am wondering what the Internal.Job.Filename.Directory is?
Is it my SPoon.bat folder, or the job/xfrm folder i created?
Is there a way I can change it to point to particular folder?
I am runnig spoon.bat in Windows XP.
Internal.Job.Filename.Directory is only set when you don't use a repository, and it is set automatically. You cannot set it manually.
How not to use an repository?
When you start Spoon, you get a dialog which asks for a repository. Just close this dialog with cancel and you're fine!
It took me a while to find this: I was wondering why Internal.Job.Filename.Directory was always empty. The repository was the cause.
It's documented here: http://jira.pentaho.com/browse/PDI-7434
Internal.Job.Filename.Directory is an internal variable that is always available. It points to the directory in which the job lives.
You can find more information here.
This variable is deprecated now in version 7 and newer. You should use Internal.Entry.Current.Directory and this works regardless of repository or not, hence you can build more portable code.
Internal.Job.Filename.Directory is kettle environment variable which points to the location of job on disk.
To set value to variable Internal.Job.Filename.Directory, you need to launch Job in this
way:
String filename="path_filename";
KettleEnvironment.init();
JobMeta jobMeta = new JobMeta(filename, null);
Job job = new Job(null, jobMeta);
job.start();
job.waitUntilFinished();
this is the variable for your folder where the current job you are at resides.
if you dont use repository then you need to specify where the transformations are.
to make it more flexible you can put the jobs and trasformations at the same folder
and then you can use the Internal.Job.Filename.Directory.
so if your transformation is called : my.ktr
then to call it in the job you can point to it by {Internal.Job.Filename.Directory}/my.ktr
you can learn more about it at my course : pentaho tutorial