Pentaho data integration logging backup - pentaho

I wanted to have Kettle logging backup per day. By default Kettle writes its log into standard output. I'm able to write logging into files and also sets the logging level. But it'll write to a single file only. I'm using Pentaho data integration stable release 4.2.0
Is there a way to backup last days log? (Like using log4j in Java)

There is ability to store job/transformation log to the specified database table.
Go to transformation/job properties -> logging tab.
You can specify file based db connection (Hypersonic and etc).
It would be the simplest decision.

If you are executing in a *NIX environment you could just redirect standard out/standard error to a log file name containing a date. Example below.
"Job Execute Command" 2>> JobLog_`date +%Y-%m-%d`.log
*YYYY-MM-DD format date in shell script

Related

Pentaho - Migrating from Database repository to File Repository

I am in process of migrating Pentaho from Database repository to file repository.
I have exported the database repository into xml file and then created a file repository and imported the repository...
The first issue that I saw after importing is all my database connections are being stored in .ktr and .kjb files, This is going to be a big issue If I update a connection string like updating a password, I have more than a hundreds of sub transformations and jobs, do I have to update this in all those files?
Is there any way to ignore the password and other connection settings that is stored in the .ktr and .kjb files and instead use the repository connection or specify it in the .kettle property?
The other issue that I face is When I try to run the master job via kitchen in cmd it does not recognize the sub transformation and jobs. However when I change the Transformation root to ${Internal.Entry.Current.Directory} - the sub transformation is being recognized and processed- As I mentioned I have more than 100 sub transformation and jobs - is there any way to update this root for all jobs and transformation at once.
Kitchen.bat /file:"C:\pentaho-8-1\Dev_Repo\home\jobs\MainProcess\MasterJob.kjb" /level:Basic /logfile:"C:\pentaho-8-1\logs\my-job.txt"
This fails with error (.ktr is not a file or the repository is not defined)
However when I change the root directory to ${Internal.Entry.Current.Directory} it works!
For the database connections, you can make .kdbs in the repository and enter variables for all the properties (Host, Port, Schema, User, etc) and define them in kettle.properties or another properties file.
This works like a more convenient version of JNDI files, with one properties file per environment. You can easily inspect current values by opening the kettle properties from within the Spoon client (don't edit them or it will mess up the layout!) and you can also put kettle "encrypted" passwords in the properties file.
PDI will still save copies of the connections into all the .kjb and ktr files (and should in theory update them from .kdb or shared.xml when opening them) but since the contents are just generic variable names (${STAGING_DB_HOST} etc) you will almost never run into problems with this.
For the transformation filenames, a good text search and replace tool should fix most of your transformations in one go. Include some of the XML tag to prevent replacing too much.

flyway database script logging

I am currently evaluating Flyway software as a deployment option for our
company. We run our database deployments on an ORACLE database and
currently spool the output from a sqlplus session for logging purposes. We
use this to verify feedback information such as were objects created
successfully, were packages and functions, etc. compiled without errors,
verify amount of records entered and so forth.
Is there similar logging functionality in Flyway? Currently the only
logging we have found is in the server logs. We can tell from these logs
that a script has completed successfully or has triggered an ORA error but
we are curious as to whether this is the extent of the database logging
options or not.
Thank you,
We used the command line method for running flyway and turned on debug output (-X). Along with a lot of other output it also logs more information about the SQL migrations run (eg content of repeatable migrations) and the number of records affected. This is not perfect however it helped us a lot in capturing more information about what was applied.
See https://flywaydb.org/documentation/commandline/ as it is not documented for each individual command as it applies to flyway itself.

Logging SQL queries for specific files in Grails

I am trying to make SQL queries output into the log in my grails app, and I found out how to do so here:
However, I want SQL logging for only specific files in my project.. how can I do this?
Unfortunately, that is not possible. The grails loggers are turned on and off by class or package name of the code doing the logging. In this case, these are core Hibernate and / or Grails classes, so they either log all activity or no activity.
What you can do is add your own logging statements in your code before and after the operations you are interested in. Then you can use grep to find your marker statements in the log file. The SQL logging you are interested in will be in between your markers and you can ignore the rest of the very large log file.

How to schedule Pentaho Kettle transformations?

I've set up four transformations in Kettle. Now, I would like to schedule them so that they will run daily at a certain time and one after the another. For example,
tranformation1 -> transformation2 -> transformation3 -> transformation4
should run daily at 8.00 am. How can I do that?
There are basically two ways of scheduling jobs in PDI.
1. You can use the command line (as correctly written by Anders):
for transformation scheduling:
<pentaho-installation directory>/pan.sh -file:"your-transformation.ktr"
for job scheduling:
<pentaho-installation directory>/kitchen.sh -file:"your-transformation.kjb"
2. You can also use the inbuilt scheduler in Pentaho Spoon.
If you are using the EE version of PDI, you will have a inbuilt scheduler in the spoon itself. Its an UI interface which you can use it to easily schedule jobs. You can also read this section of doc for more.
You can execute transformation from the command line using the tool Pan:
Pan.bat /file:transform.ktr /param:name=value
The syntax might be different depending on your system - check out the link above for more information. When you have a batch file executing your transformation you can just schedule it to run using any scheduling tool on the whatever system you are running.
Also, you could put all the transformation in a job and execute that from the command line with Kitchen.
I'd like to add another answer that many first-time spoon users miss. Let's say you have a transformation exampleTrafo.ktr that you want to run in a certain interval. Then what you could do is create a job exampleJob.kjb which merely runs the transformation. If you do so, you will have to create something that looks like this:
The START node here is the important thing: right klick on it and choose Edit... and you'll be presented with a job scheduling window where you can specify your desired job schedule. Then save and run this job (either locally or eventually remote on a slave using PDI's carte server). Basically what you will end up with is a indefinitely running job called exampleJob that will execute your exampleTrafo in the desired intervals.

ms-access: doing repetitive processes with vba/sql

i have an access database backend that contains three tables. i have distributed the front end to several users. this is a very simple database with minimal functionality. i need to import certain rows from a file every hour into one of the tables in the database. i would like to know what is the best way to automate this process so that i can have it running hourly. i need it to be running sort of as a service in the background. can you tell me how you would do this?
You could have for example:
a ms-access file with all necessary code to run the import proc
a BAT file containing the command line(s) that will run this ms-access file with all requested parameters. Check ms-access command line parameters to see the available options.
a task scheduler service software to launch the BAT file: depending on the task scheduler and the command line to be sent, you could even avoid the BAT file step
If all you want to do is run some queries, I would not do this by automating all of Access, but instead by writing a VBScript that uses DAO to execute the SQL directly. That's a much more efficient way to do it, and will run without a console logon (which may or may not be required for full Access to be run by the task scheduler).