Variable values stored outside of SSIS - sql

This is merely a SSIS question for advanced programmers. I have a sql table that holds clientid, clientname, Filename, Ftplocationfolderpath, filelocationfolderpath
This table holds a unique record for each of my clients. As my client list grows I add a new row in my sql table for that client.
My question is this: Can I use the values in my sql table and somehow reference each of them in my SSIS package variables based on client id?
The reason for the sql table is that sometimes we get request to change the delivery or file name of a file we send externally. We would like to be able to change those things dynamically on the fly within the sql table instead of having to export the package each time and manually change then re-import the package. Each client has it's own SSIS package
let me know if this is feasible..I'd appreciate any insight

Yes, it is possible. There are two ways to approach this and it depends on how the job runs. First is if you are running for a single client for a single job run or if you are running for multiple clients for a single job run.
Either way, you will use the Execute SQL Task to retrieve data from the database and assign it to your variables.
You are running for a single client. This is fairly straightforward. In the Result Set, select the option for Single Row and map the single row's result to the package variables and go about your processing.
You are running for multiple clients. In the Result Set, select Full Result Set and assign the result to a single package variable that is of type Object - give it a meaningful name like ObjectRs. You will then add a ForEachLoop Enumerator:
Type: Foreach ADO Enumerator
ADO object source variable: Select the ObjectRs.
Enumerator Mode: Rows in all the tables (ADO.NET dataset only)
In Variable mappings, map all of the columns in their sequential order to the package variables. This effectively transforms the package into a series of single transactions that are looped.

Yes.
I assume that you run your package once per client or use some loop.
At the beginning of the "per client" code read all required values from the database into SSIS varaibles and the use these variables to define what you need. You should not hardcode client specific information in the package.

Related

SSIS: Excel data source - if column not exists use other column

I am using select statement in excel source to select just specific columns data from excel for import.
But I am wondering, is it possible to select data such way when I select for example column with name: Column_1, but if this column is not exists in excel then it will try to select column with name Column_2? Currently if Column_1 is missing, then data flow task fails.
Use a Script task and write .net code to read the excel file and then perform the check for the Column_1 availability in the file. If the column does not present then use Column_2 as input. Script Task in SSIS can act as a source.
SSIS is metadata based and will not support dynamic metadata, however you can use Script Component as #nitin-raj suggested to handle all known source columns. There is a good post below on how it can be done.
Dynamic File Connections
If you have many such files that can have varying columns then it is better to create a custom component.However, you cannot have dynamic metadata even with custom component, the set of columns should be known upfront to SSIS.
If the list of columns keep changing and you cannot know in advance what are expected columns then you are better off handling the entire thing in C#/VB.Net using Script Task of control flow
As a best practice, because SSIS meta data is static, any data quality and formatting issues in source files should be corrected before ssis data flow task runs.
I have seen this situation before and there is a very simple fix. In the beginning of your ssis package, using a file task to create copy of the source excel file and then run a c# script or execute a powershell to rename the columns so that if column 1 does not exist, it is either added at the appropriate spot in excel file or in case the column name is wrong is it corrected.
As a result of this, you will not need to refresh your ssis meta data every time it fails. This is a standard data standardization practice.
The easiest way is to add two data flow tasks, one data flow for each Excel source select statement and use precedence constraints to execute the second data flow when the first one fails.
The disadvantage of this approach is that if the first data flow task fails for another reason, it will also try to execute the second one. You will need some advanced error handling to check if the error is thrown due to missing columns or not.
But if have a similar situation, I will use a Script Task to check if the column exists and build the SQL command dynamically. Note that this SQL command must always return the same metadata (you must use aliases).
Helpful links
Overview of SSIS Precedence Constraints
Working with Precedence Constraints in SQL Server Integration Services
Precedence Constraints

SSIS Script Component - only to change variables

I have a series of task that are very similar:
SELECT a,b FROM c
Lookup in another table and change value in column b.
Save new value back to c and if not match, send the result on to an error table.
That part is pretty straight forward and illustrated here:
Source ==> Lookup =match=> SQL Update command
=No match=> SQL Save Error command
(Hope you understand what I mean - but it works!)
I now have to repeat this a number of times, where my source-sql changes. So what I want to do is to insert a Script Component in front of the Source and set my User::Sql variable like:
Variables.Sql = "SELECT d, e FROM f"
All of the above is contained in a Data Flow. When I have created one I can then copy that one and only change the Sql variable in the script and then it should all work.
My problem is: When I insert the Script Command it asks me if it is a Source, Destination or Transscript script. And by only setting the variable it does not produce any rows for output and cannot connect to my Source.
Anyone know how to make that work?
(I have simplified the above. I actually want to update multiple variables and use those in my Source, Lookup and Error update as well - therefore it is not more simple just to change the SQL script in the initial Source! But being able to do the above, I will be able to achieve what I want :-))
You should set your variable containing the SQL query in the control flow, before you execute the dataflow.
Then you need to use that variable as an expression in your Dataflow. You can parametrize the query used in the lookup or any other parameters of your dataflow.
If your dataflows really have always the same structure, you could even generate a list of queries and call your dataflow task in a loop, preventing the duplication of the same tasks.

SSIS - Extract multiple databases based on lookup table

How do I create a package that extracts multiple databases(and all tables in each database) from another server based on a lookup table (also found on the other server) that contains a column where all the databases I need to extract is listed ?
I need to use the lookup table because new databases is created from time to time on the source, so I cannot just create a job that extracts a "static set" of databases to a destination. It needs to be a bit dynamic...
Furthermore I also need to extract the databases incremental where I can use a timestamp that exists in all databases/tables.
Im new to SSIS, so an "easy" guide would be appriciated.
Thanks
As a rough idea, you could work with SSIS Package Configurations and executing packages from within packages, and then use the Transfer SQL Server Objects Task:
Make a "Main package" that iterates over the column in your lookup table.
For each entry, it should UPDATE the Package Config entry of your second SSIS package accordingly. Use the "SQL Server" configuration for that second package.
The Main package should then execute the second package - there is a also a task for this.
The second package looks at its config to find out which server to get databases from and uses the Transfer SQL Server Objects Task to do so.
then the Main package continues with the next entry from your lookup table.
Ideally you would want to have your "second SSIS package" inside SQL Server's MSDB rather than the file system. Its easier to execute.

From an SSIS package how should I execute a sql script stored in a table as a varchar(max)?

I'm storing large (varchar(max)) SQL scripts in a table. I'd like to execute the scripts in an SSIS package.
Looking at other posts on this site it's easy enough to get the varchar(max) into an object variable. But then what to do? Is there a way for an Execute Sql Task (SQLSourceType of Variable) to specify an Object variable rather than a String variable?
Is there an approach that will work?
Here's how I might approach it:
Add a Data Flow task to your control flow
Add a Source (ADO.NET) that connects to your database
Create a Package level Object variable (for the next step)
Add a Recordset destination that populates your data into the Object variable created in the previous step
Back on the control flow:
Create a package level String variable for the "current" query (see next step)
Add a For Each ADO.NET enumerator
Connect the previous Data Flow task to the For Each task
Configure the For Each to use the Object variable as a source, and to store the column index with the SQL into the String variable
Add an Execute SQL task inside the For Each task
Configure it to execute a SQL Command from Variable, and pick the string variable containing the current query
Basically it will collect the queries from the table, then for each collected query, assign it to a variable, and then the Execute SQL command can pull the command text from that variable.

Table Variables in SSIS

In one SQL Task can I create a table variable
DELCARE #TableVar TABLE (...)
Then in another SQL Task or DataSource destination and select or insert into the table variable?
The other option I have considered is using a Temp Table.
CREATE TABLE #TempTable (...)
I would prefer to use Table Variable so that it remains in memory. But can use temp table if it is not possible to use table variable. Also I cannot use the record set destination as I need to preform straight SQL tasks on it later on.
The use case that this is trying to solve is essentially performing a transformation in the stead of BizTalk. There is a very large flat file to flat file transformation that BizTalk has to transform unfortunately the data volume would produce unacceptable load on the BizTalk server so the idea is to off load it to SSIS. However, it is not a simple row to row transformation, there are different types of rows which have relations to each other. The first task in SSIS is to load the row into appropriate (temp) tables, then in the second data task a select is preformed with the correct format for output.
You could use some of the techniques in this post: http://consultingblogs.emc.com/jamiethomson/archive/2006/11/19/SSIS_3A00_-Using-temporary-tables.aspx
especially the ones about using RetainSameConnection=TRUE on the connection manager.
I would be interested to see more information about what use case you have that requires you to write out data to a temp table or table variable before further SSIS processing. Couldn't you take care of all of the SQL required steps in your source query before you start processing the dataflow with SSIS?
Table variables are not kept solely in memory and can be written to disk under memory pressure. I tend to use table variables for very small lookups. If you Need to push a table into SQL Server due to necessary and complex transformations, then use a 'permanent' temp table that is truncated within the SSIS package prior to insert. Simple and will get what you need done.
The SSIS package would be run in a job. I assume it runs inside a SQL job. In that case, using a temp table won't harm. SQL Jobs are generally run after office hours so it does not matter.