I have message flow with compute node, which is calling some stored procedure from database. I set up data source field on this node for db1, same name exists in odbc.ini file. Now I want dynamically(without redeploying) change datasource to db2.
For now I've found theese two solutions, but both of them ugly:
Change datasource description in odbc.ini and call mqsireload.
Declare user-defined properties for db name and schema, and call stored procedure like this:
CALL SOME_PROC() IN DATABASE.{UDP_DBNAME}.{UDP_DBSCHEMA};
Then you can change this properties in runtime using broker API and flow will catch it immediately.
Are there any other options?
As per your requirements, you can do something like this:
Define both DB1 and DB2 in your odbc.ini file
Create two compute nodes that points to the same ESQL file, but one has the datasource configured with DB1 and the other with DB2
Set a new compute node (before the previous two) that contains the logic to determine which one you want to use. Out1 is connected to ComputeNodeDB1 and Out2 is connected to ComputeNodeDB2.
With this solution, DB can be dynamically determined during runtime!
Another solution could be to use the PASSTHRU statement to run the database operations.
On that statement you can specify the datasource name so that it is subject to name substitution, which means that it can take the value of a user defined property, which in turn can be modified without stoping the flow.
https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.0/com.ibm.etools.mft.doc/ak05890_.htm
Related
I am trying to copy tables from one schema to another with the same Azure SQL db. So far, I have created a lookup pipeline and passed the parameters for the for each loop and copy activity. But my sink dataset is not taking the parameter value I have given under "table option" field rather it is taking the dummy table I chose when creating the sink dataset. Can someone tell how can I pass dynamic table name to a sink dataset?
I have given concat('dest_schema.STG_',#{item().table_name})} in the table option field.
To make the schema and table names dynamic, add Parameters to the Dataset:
Most important - do NOT import a schema. If you already have one defined in the Dataset, clear it. For this Dataset to be dynamic, you don't want improper schemas interfering with the process.
In the Copy activity, provide the values at runtime. These can be hardcoded, variables, parameters, or expressions, so very flexible.
If it's the same database, you can even use the same Dataset for both, just provide different values for the Source and Sink.
WARNING: If you use the "Auto-create table" option, the schema for the new table will define any character field as varchar(8000), which can cause serious performance problems.
MY OPINION:
While you can do this, one of my personal rules is to not cross the database boundary. If the Source and Sink are on the same SQL database, I would try to solve this problem with a Stored Procedure rather than a data factory.
This is merely a SSIS question for advanced programmers. I have a sql table that holds clientid, clientname, Filename, Ftplocationfolderpath, filelocationfolderpath
This table holds a unique record for each of my clients. As my client list grows I add a new row in my sql table for that client.
My question is this: Can I use the values in my sql table and somehow reference each of them in my SSIS package variables based on client id?
The reason for the sql table is that sometimes we get request to change the delivery or file name of a file we send externally. We would like to be able to change those things dynamically on the fly within the sql table instead of having to export the package each time and manually change then re-import the package. Each client has it's own SSIS package
let me know if this is feasible..I'd appreciate any insight
Yes, it is possible. There are two ways to approach this and it depends on how the job runs. First is if you are running for a single client for a single job run or if you are running for multiple clients for a single job run.
Either way, you will use the Execute SQL Task to retrieve data from the database and assign it to your variables.
You are running for a single client. This is fairly straightforward. In the Result Set, select the option for Single Row and map the single row's result to the package variables and go about your processing.
You are running for multiple clients. In the Result Set, select Full Result Set and assign the result to a single package variable that is of type Object - give it a meaningful name like ObjectRs. You will then add a ForEachLoop Enumerator:
Type: Foreach ADO Enumerator
ADO object source variable: Select the ObjectRs.
Enumerator Mode: Rows in all the tables (ADO.NET dataset only)
In Variable mappings, map all of the columns in their sequential order to the package variables. This effectively transforms the package into a series of single transactions that are looped.
Yes.
I assume that you run your package once per client or use some loop.
At the beginning of the "per client" code read all required values from the database into SSIS varaibles and the use these variables to define what you need. You should not hardcode client specific information in the package.
I have a requirement of connecting to two different datasets using a variable which compares these datasets. I'm using two different table input steps where the database connection names, hostnames are hard coded.
Instead of using hardcoded I want to use a variable which defines these connections and should be able to connect to them
You can define variables in the kettle.properties file, located in the .kettle directory. Then you can use these variables in your database connection settings.
You can also define variables in your own .properties files and read them in using the Set Variables job entry.
set the variables like:
db_name.host=localhost
db_name.db=databasename
db_name.user=username
Then access those variables in your job/transformation by using the format ${db_name.host} etc.
Use JNDI to set up all your connection parameters:
(1) edit the file: data-integration/simple-jndi/jdbc.properties, and add your DB connection strings, for example:
db1/type=javax.sql.DataSource
db1/driver=com.mysql.jdbc.Driver
db1/url=jdbc:mysql://127.0.0.1:3305/mydb
db1/user=user1
db1/password=password1
db2/type=javax.sql.DataSource
db2/driver=com.mysql.jdbc.Driver
db2/url=jdbc:mysql://mydbserver:3306/mydb
db2/user=user2
db2/password=password2
Here we created two JNDI name db1 and db2 which we can use in PDI jobs/transformations.
(2) In your PDI job/transformation, add a parameter, i.e. mydb, through the menu 'Edit' -> Settings... -> Parameters Tab. You can add more such DB parameters if more than one is used.
(3) In the Table Input step, click Edit... or New.. button and in the coming dialog, switch the item in Access: box to JNDI and then add the ${mydb} in the JNDI name at the upper-right corner. you can also use plain text db1 and db2 which we defined in (1) to identify the DB connection.
Using JNDI to manage DB connections, we were able to switch between staging and production DBs simply by using the parameters. You can do the similar to PRD.
I need, at runtime, to change which connection is used by a table input step.
I have 3 connections defined: STG, DWH, DM.
I want to choose at runtime between them.
I can't create a new connection with parameters for server name, database name, etc. I must use the existing connections.
I wish I can write down a variable ${my_connection} in the box below, but the field cannot be edited.
Any suggestion?
Instead of using the variable in the connection selector of the Step, use the Host and Database name in the connection configuration.
EDIT:
You can pass a variable for the KTR to capture and test it using a Switch/Case step that calls a Transformation Executor, in this KTR you'll have your Table input and a copy rows to result step, results which will be captured after the Transformation Executor. You'll need 3 different KTR's, each with the Table input step that's going to execute the row passed by the Switch / Case step.
If i'm not clear or you need further explanation i can perhaps produce an example.
I have a series of task that are very similar:
SELECT a,b FROM c
Lookup in another table and change value in column b.
Save new value back to c and if not match, send the result on to an error table.
That part is pretty straight forward and illustrated here:
Source ==> Lookup =match=> SQL Update command
=No match=> SQL Save Error command
(Hope you understand what I mean - but it works!)
I now have to repeat this a number of times, where my source-sql changes. So what I want to do is to insert a Script Component in front of the Source and set my User::Sql variable like:
Variables.Sql = "SELECT d, e FROM f"
All of the above is contained in a Data Flow. When I have created one I can then copy that one and only change the Sql variable in the script and then it should all work.
My problem is: When I insert the Script Command it asks me if it is a Source, Destination or Transscript script. And by only setting the variable it does not produce any rows for output and cannot connect to my Source.
Anyone know how to make that work?
(I have simplified the above. I actually want to update multiple variables and use those in my Source, Lookup and Error update as well - therefore it is not more simple just to change the SQL script in the initial Source! But being able to do the above, I will be able to achieve what I want :-))
You should set your variable containing the SQL query in the control flow, before you execute the dataflow.
Then you need to use that variable as an expression in your Dataflow. You can parametrize the query used in the lookup or any other parameters of your dataflow.
If your dataflows really have always the same structure, you could even generate a list of queries and call your dataflow task in a loop, preventing the duplication of the same tasks.