Define Database connection for two different datasets using variables in Pentaho - pentaho

I have a requirement of connecting to two different datasets using a variable which compares these datasets. I'm using two different table input steps where the database connection names, hostnames are hard coded.
Instead of using hardcoded I want to use a variable which defines these connections and should be able to connect to them

You can define variables in the kettle.properties file, located in the .kettle directory. Then you can use these variables in your database connection settings.
You can also define variables in your own .properties files and read them in using the Set Variables job entry.
set the variables like:
db_name.host=localhost
db_name.db=databasename
db_name.user=username
Then access those variables in your job/transformation by using the format ${db_name.host} etc.

Use JNDI to set up all your connection parameters:
(1) edit the file: data-integration/simple-jndi/jdbc.properties, and add your DB connection strings, for example:
db1/type=javax.sql.DataSource
db1/driver=com.mysql.jdbc.Driver
db1/url=jdbc:mysql://127.0.0.1:3305/mydb
db1/user=user1
db1/password=password1
db2/type=javax.sql.DataSource
db2/driver=com.mysql.jdbc.Driver
db2/url=jdbc:mysql://mydbserver:3306/mydb
db2/user=user2
db2/password=password2
Here we created two JNDI name db1 and db2 which we can use in PDI jobs/transformations.
(2) In your PDI job/transformation, add a parameter, i.e. mydb, through the menu 'Edit' -> Settings... -> Parameters Tab. You can add more such DB parameters if more than one is used.
(3) In the Table Input step, click Edit... or New.. button and in the coming dialog, switch the item in Access: box to JNDI and then add the ${mydb} in the JNDI name at the upper-right corner. you can also use plain text db1 and db2 which we defined in (1) to identify the DB connection.
Using JNDI to manage DB connections, we were able to switch between staging and production DBs simply by using the parameters. You can do the similar to PRD.

Related

Kettle Change connection used at runtime

I need, at runtime, to change which connection is used by a table input step.
I have 3 connections defined: STG, DWH, DM.
I want to choose at runtime between them.
I can't create a new connection with parameters for server name, database name, etc. I must use the existing connections.
I wish I can write down a variable ${my_connection} in the box below, but the field cannot be edited.
Any suggestion?
Instead of using the variable in the connection selector of the Step, use the Host and Database name in the connection configuration.
EDIT:
You can pass a variable for the KTR to capture and test it using a Switch/Case step that calls a Transformation Executor, in this KTR you'll have your Table input and a copy rows to result step, results which will be captured after the Transformation Executor. You'll need 3 different KTR's, each with the Table input step that's going to execute the row passed by the Switch / Case step.
If i'm not clear or you need further explanation i can perhaps produce an example.

IBM IIB 9 multiple data sources

I have message flow with compute node, which is calling some stored procedure from database. I set up data source field on this node for db1, same name exists in odbc.ini file. Now I want dynamically(without redeploying) change datasource to db2.
For now I've found theese two solutions, but both of them ugly:
Change datasource description in odbc.ini and call mqsireload.
Declare user-defined properties for db name and schema, and call stored procedure like this:
CALL SOME_PROC() IN DATABASE.{UDP_DBNAME}.{UDP_DBSCHEMA};
Then you can change this properties in runtime using broker API and flow will catch it immediately.
Are there any other options?
As per your requirements, you can do something like this:
Define both DB1 and DB2 in your odbc.ini file
Create two compute nodes that points to the same ESQL file, but one has the datasource configured with DB1 and the other with DB2
Set a new compute node (before the previous two) that contains the logic to determine which one you want to use. Out1 is connected to ComputeNodeDB1 and Out2 is connected to ComputeNodeDB2.
With this solution, DB can be dynamically determined during runtime!
Another solution could be to use the PASSTHRU statement to run the database operations.
On that statement you can specify the datasource name so that it is subject to name substitution, which means that it can take the value of a user defined property, which in turn can be modified without stoping the flow.
https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.0/com.ibm.etools.mft.doc/ak05890_.htm

using variable names for a database connection in Pentaho Kettle

I am working on PDI kettle. Can we define a variable and use it in a database connection name. So that if in future if i need to change the connections in multiple transformations i would just change the variable value in kettle properties file?
Just use variables in the Database Connection.
For instance ${DB_HostName}, and ${DB_Name} etc.
Then just put it in your kettle.properties:
DB_HostName=localhost
You can see what fields that support variables by the S in the blue diamond.

Connecting DB Connection from properties file outside the Kettle is not working

I am trying to remove DB Connection from ktr file and I am trying to connect to DBConnection by using the properties file which contains information about the connection. I used this link as reference;
Pass DB Connection parameters to a Kettle a.k.a PDI table Input step dynamically from Excel.
I followed all the steps but I am not able to get the required output.
I want to connect to the database using properties file and have to excute the SQL using the DB defined in the properties file and the output has to be transfered into the output(Excel,csv,output-table etc).
Try something like this:
1- A Index job for start all (is my way)
This job call a transformation whose job is to load the connection data to the database
2- The transformation that load the connection data pass this variables like parameters
3- The middle job is only for repeat the process if is necessary, only work like a bridge, pass the parameters
4- In this transformation does all DB work
5- The datasource look like this.
PS: sorry for my poor english :(

Dynamic OLEDB Connections in SSIS

I am designing a SSIS package which imports data from one data base to other database. In reality I need to import data from multiple data source to one destination database. One way to do, that I know is to use package configuration for all data sources (connection strings) and run multiple instances of the same package. But I want something like, I should provide as many connection strings as I need at a point of time in my config file and my package should connect to each database reading data source connection strings from configuration and imports to my destination table.
Is this possible in any way?
If your Data Flow Task is going to be the same for every data source (e.g. using same table from each data source), you could do something like this:
Create an object variable, say ConnStrList. This will hold the list of connection strings.
In a script task, loop through your config file and add each connection string to ConnStrList.
Add a ForEach loop container, set it's data source to ConnStrList. Create a string variable, say ConnStr. This will hold an individual connection string. Set ConnStr as the iteration variable of the foreach loop.
Add your Data Flow Task inside the ForEach loop container.
Create an OLEDB connection manager for your OLEDB source. Go to Properties -> Expressions and for ConnectionString, assign the variable ConnStr.
If the DFT is going to be different for each scenario, you might want to have separate data flows for each source.
Please let me know if this answers your question, or if I am getting the scenario wrong.