I am designing a SSIS package which imports data from one data base to other database. In reality I need to import data from multiple data source to one destination database. One way to do, that I know is to use package configuration for all data sources (connection strings) and run multiple instances of the same package. But I want something like, I should provide as many connection strings as I need at a point of time in my config file and my package should connect to each database reading data source connection strings from configuration and imports to my destination table.
Is this possible in any way?
If your Data Flow Task is going to be the same for every data source (e.g. using same table from each data source), you could do something like this:
Create an object variable, say ConnStrList. This will hold the list of connection strings.
In a script task, loop through your config file and add each connection string to ConnStrList.
Add a ForEach loop container, set it's data source to ConnStrList. Create a string variable, say ConnStr. This will hold an individual connection string. Set ConnStr as the iteration variable of the foreach loop.
Add your Data Flow Task inside the ForEach loop container.
Create an OLEDB connection manager for your OLEDB source. Go to Properties -> Expressions and for ConnectionString, assign the variable ConnStr.
If the DFT is going to be different for each scenario, you might want to have separate data flows for each source.
Please let me know if this answers your question, or if I am getting the scenario wrong.
Related
I'm trying to create a SSIS package which will copy data from Table-A on Server-A into Table-B on Server-B. And to avoid duplicates, I want to update the data of the records which already exist in Table-B if there are any changes to the data. Please let me know what would be the best approach for this.
Thank You
You should use the SSIS Sort Transformation to remove duplicate records
Drag Sort Transformation and Connect Flat File Source to it. Double-Click on Sort Transformation and Choose the columns to sort. Also Check the Checkbox : Remove rows with duplicate sort values and then click OK
The SSIS Sort Transformation task is useful when you need to sort data into a certain sort order.
Create regular data flow with 2 components - OLE DB Source and OLE DB Destination (I assume you are using MS SQL Server, in general, use whatever components your company uses to connect to the DB).
In case of 2 DBs, create 2 connection managers, each pointing to its DB. Point OLE DB Source to first connection manager configured to point to source of data, and OLE DB Destination to second connection manager configured to point to destination DB.
Now point OLE DB Source to the source table in source DB, leave all the fields intact. Connect source and destination components with green arrow originally going out of source component. Now point OLE DB Destination to the destination table in target DB. Double-click destination, go to mappings and make sure they are correct (SSIS tries to map automatically using strick name matching), otherwise (in case names are different) connect source and destination fields manually. That's it, you just don't provide mappings for the fields which cannot be accommodated by destination table.
Alternatively, you can leave out the columns you don't need at source component - double-click it, go to Columns and uncheck columns you don't need.
This is merely a SSIS question for advanced programmers. I have a sql table that holds clientid, clientname, Filename, Ftplocationfolderpath, filelocationfolderpath
This table holds a unique record for each of my clients. As my client list grows I add a new row in my sql table for that client.
My question is this: Can I use the values in my sql table and somehow reference each of them in my SSIS package variables based on client id?
The reason for the sql table is that sometimes we get request to change the delivery or file name of a file we send externally. We would like to be able to change those things dynamically on the fly within the sql table instead of having to export the package each time and manually change then re-import the package. Each client has it's own SSIS package
let me know if this is feasible..I'd appreciate any insight
Yes, it is possible. There are two ways to approach this and it depends on how the job runs. First is if you are running for a single client for a single job run or if you are running for multiple clients for a single job run.
Either way, you will use the Execute SQL Task to retrieve data from the database and assign it to your variables.
You are running for a single client. This is fairly straightforward. In the Result Set, select the option for Single Row and map the single row's result to the package variables and go about your processing.
You are running for multiple clients. In the Result Set, select Full Result Set and assign the result to a single package variable that is of type Object - give it a meaningful name like ObjectRs. You will then add a ForEachLoop Enumerator:
Type: Foreach ADO Enumerator
ADO object source variable: Select the ObjectRs.
Enumerator Mode: Rows in all the tables (ADO.NET dataset only)
In Variable mappings, map all of the columns in their sequential order to the package variables. This effectively transforms the package into a series of single transactions that are looped.
Yes.
I assume that you run your package once per client or use some loop.
At the beginning of the "per client" code read all required values from the database into SSIS varaibles and the use these variables to define what you need. You should not hardcode client specific information in the package.
I need to create an SSIS package to export the data from an OLDB source into separate Excel files for each of the 30+ providers and dynamically name the files.
I have successfully created the package using a task-level variable for provider ID to use in the query, package-level variable for provider name to use in the file name and package-level variable for the year to use in both places.
Q1: Is there a way to use the column name (name of the provider) for the provider in the expression for destination file name?
Q2: Now I need to repeat the task for all our providers. What is a better way than repeat the same data flow task for each of the provider and change the provider ID for each of the tasks and create a separate variable for each of the provider name and change the expression for file name in each of the tasks?
Q3: Can I copy the data flow task and change the details? If I do that, then after I execute the task I get an error asking me to run the package as an administrator. What is the best way to copy data flow tasks?
Q1: Yes, just drag the variable holding the Provider Name into the expression.
Q2: Use a Loop (for or foreach) and start the loop with a script task that sets the ProviderID variable, and the ProviderName variable (no need to create multiple copies of the variable), and also change the file name in the connection manager.
Q3: No need to make multiple copies of the dataflow task either. Just include it in the loop after the script task mentioned above.
I am trying to remove DB Connection from ktr file and I am trying to connect to DBConnection by using the properties file which contains information about the connection. I used this link as reference;
Pass DB Connection parameters to a Kettle a.k.a PDI table Input step dynamically from Excel.
I followed all the steps but I am not able to get the required output.
I want to connect to the database using properties file and have to excute the SQL using the DB defined in the properties file and the output has to be transfered into the output(Excel,csv,output-table etc).
Try something like this:
1- A Index job for start all (is my way)
This job call a transformation whose job is to load the connection data to the database
2- The transformation that load the connection data pass this variables like parameters
3- The middle job is only for repeat the process if is necessary, only work like a bridge, pass the parameters
4- In this transformation does all DB work
5- The datasource look like this.
PS: sorry for my poor english :(
I have a SSIS-project which uses xml-configuration file (dtsConfig) where the connection string to source data base is given. Configuration file is stored to environmental variable.
Data needs to pulled from four different data bases, i.e. now I need to run the same set of packages four times by using four different connection strings.
I can make four different configuration files where each of them has a different connection string and update it to the environmental variable after each run. This is how I'm doing it now and it works ok, but I wouldn't like to keep on updating the env variable all the time.
Or then I can use the same configuration file and just update the connection string after each run. But I think it's even worse idea than having four different files.
What I would like to do is dynamically change the connection string after each run.
I have a master-package which runs the set of packages I want. So I was thinking of just adding this master package four times in the control flow, after each run I'd need to update the connection string which then would be used at the next run. But how to actually do this?
Or for each loop container which would contain the master package and would loop the it four times and changing the connection string after each iteration would be cool as well.
To run the packages sequentially, you could simply create a table or file with the connection strings (eg. 4 rows for the 4 data sources). You would then have a for each loop which would loop through the connections (from the table or file) and call the child package passing the connection string down to it as a variable. The child package would access the variable through a Package Configuration. The variable in the child package would be pointed to the connection string of the connection.