I am using select statement in excel source to select just specific columns data from excel for import.
But I am wondering, is it possible to select data such way when I select for example column with name: Column_1, but if this column is not exists in excel then it will try to select column with name Column_2? Currently if Column_1 is missing, then data flow task fails.
Use a Script task and write .net code to read the excel file and then perform the check for the Column_1 availability in the file. If the column does not present then use Column_2 as input. Script Task in SSIS can act as a source.
SSIS is metadata based and will not support dynamic metadata, however you can use Script Component as #nitin-raj suggested to handle all known source columns. There is a good post below on how it can be done.
Dynamic File Connections
If you have many such files that can have varying columns then it is better to create a custom component.However, you cannot have dynamic metadata even with custom component, the set of columns should be known upfront to SSIS.
If the list of columns keep changing and you cannot know in advance what are expected columns then you are better off handling the entire thing in C#/VB.Net using Script Task of control flow
As a best practice, because SSIS meta data is static, any data quality and formatting issues in source files should be corrected before ssis data flow task runs.
I have seen this situation before and there is a very simple fix. In the beginning of your ssis package, using a file task to create copy of the source excel file and then run a c# script or execute a powershell to rename the columns so that if column 1 does not exist, it is either added at the appropriate spot in excel file or in case the column name is wrong is it corrected.
As a result of this, you will not need to refresh your ssis meta data every time it fails. This is a standard data standardization practice.
The easiest way is to add two data flow tasks, one data flow for each Excel source select statement and use precedence constraints to execute the second data flow when the first one fails.
The disadvantage of this approach is that if the first data flow task fails for another reason, it will also try to execute the second one. You will need some advanced error handling to check if the error is thrown due to missing columns or not.
But if have a similar situation, I will use a Script Task to check if the column exists and build the SQL command dynamically. Note that this SQL command must always return the same metadata (you must use aliases).
Helpful links
Overview of SSIS Precedence Constraints
Working with Precedence Constraints in SQL Server Integration Services
Precedence Constraints
Related
I have a ETL project that I need to load data from some 50K Access .MDB databases in a folder to sql server. Problem with those 50K databases files is that they have different schemas and I need the ETL process to be able to identify the differences and respond correctly.
For example, in some of the .MDB files there are table A, B and C. However in some other tables there are only table A and B (Same table A and B as compared to the other tables, just table C is missing). I need to put a check on each OLE DB source to see what tables are there to achieve logic like IF table A exists, load table A, otherwise, bypass the load.
I've done my googling and searched SO but all the error handling or check methods I could find are for the execute SQL task or data conversion task. So if anyone could shed some light on solution to my above case, I would be deeply appreciated.
Thanks.
In a nutshell - SSIS assumes that metadata does not change.
However, with some tricks, this restriction can be reduced; below is the list of suggested tricks:
Test for existence of specific table (see example here How to use SQL to query the metadata in Microsoft Office Access? Like SQL Server's sys.tables, sys.columns etc) and based on the result - do conditional execution of the following tasks.
All SQL requests to MS Access tables should have DelayValidation property set to True. Reason - postpone SQL command validation from package start to specific task execution. Some tasks (for missing tables) will not be executed; thus, it will not be validated and will not fire validation error.
Will come directly to the question.
Have 2 parameter like filename and table name. The requirement is to upload the data from the excel sheet to the database table enter in the other parameter. This should be in run time. No hardcoding of field names and that program should be flexible enough to suite any table. Please help.
I can think of two possible approaches:
Dynamic code generation -- write a program which writes a program
Use dynamic type tools
For 1. try googling
For 2. see https://wiki.scn.sap.com/wiki/display/Snippets/Example+-+create+a+dynamic+internal+table - this wiki shows a way (not sure if it is overkill as it creates the type from scratch whereas any table in your SAP system is already a defined type in the Data Dictionary).
You can do easily reference a parameterised table in Open SQL e.g. MODIFY (p_tab) ...
Perhaps you could do a generic SPLIT of a line read in from file by the delimiter into a table of fields - you can then use ASSIGN COMPONENT to match the fields you have read in to the fields in your internal type.
If you are doing this I think a white list of allowed tables would be wise - and auth checks. Otherwise someone could upload SAP standard tables with no authorisation.
I have a series of task that are very similar:
SELECT a,b FROM c
Lookup in another table and change value in column b.
Save new value back to c and if not match, send the result on to an error table.
That part is pretty straight forward and illustrated here:
Source ==> Lookup =match=> SQL Update command
=No match=> SQL Save Error command
(Hope you understand what I mean - but it works!)
I now have to repeat this a number of times, where my source-sql changes. So what I want to do is to insert a Script Component in front of the Source and set my User::Sql variable like:
Variables.Sql = "SELECT d, e FROM f"
All of the above is contained in a Data Flow. When I have created one I can then copy that one and only change the Sql variable in the script and then it should all work.
My problem is: When I insert the Script Command it asks me if it is a Source, Destination or Transscript script. And by only setting the variable it does not produce any rows for output and cannot connect to my Source.
Anyone know how to make that work?
(I have simplified the above. I actually want to update multiple variables and use those in my Source, Lookup and Error update as well - therefore it is not more simple just to change the SQL script in the initial Source! But being able to do the above, I will be able to achieve what I want :-))
You should set your variable containing the SQL query in the control flow, before you execute the dataflow.
Then you need to use that variable as an expression in your Dataflow. You can parametrize the query used in the lookup or any other parameters of your dataflow.
If your dataflows really have always the same structure, you could even generate a list of queries and call your dataflow task in a loop, preventing the duplication of the same tasks.
Team,
My objective is to data load from Excel to Sql Tables using SSIS. However the excels are quite dynamic i.e. their column count could vary OR the order of existing columns may change. But the destination table will be the same...
So I was contemplating on few options like:
1) Using SQL Command in "Excel Source" - But unfortunately I have to keep "first row as header" setting as false(To resolve the issue of Excel Connection Mngr sensing the datatype as numeric based on first few records). So the querying based on header doesnt work here.
2) The other oprtion in my mind is Script Task and write C# code to read excel based on the columns I know. So in this case the order and insertion/deletion of new columns won't matter.
Suggest me whether Script Task is the only option available for me? Any other simple way to achieve the same in SSIS? Also if possible suggest me a reference for the same.
Thanks,
Justin Samuel.
If you need to automate the process, then I'd definitely go with a script component / OleDbDataAdapter combo (you can't use a streamreader because Excel is a proprietary format). If not, go with the import wizard.
If you try to use a connection manager based solution, it's going to fail when the file layout changes. With the script component / OleDbDataAdapter combo, you can add logic in to interpret the fields and standardize the record layout before loading. You can also create an error buffer and gracefully push error values to it with Try / Catch.
Here's some links on how to use the script component as a source in the data flow task:
http://microsoft-ssis.blogspot.com/2011/02/script-component-as-source-2.html
http://beyondrelational.com/modules/2/blogs/106/posts/11126/ssis-script-component-split-single-row-to-multiple-rows.aspx
This could be done easily using "Import and Export Data" tool available with SQL Server.
Step 1: Specify your Excel as source and your SQL Server DB as destination.
Step 2: Provide necessary mappings.
Step: 3 In the final screen, you can specify to "Save as SSIS Package" and to File System. A relevant dtsx SSIS package would be created for you.
After the SQL Server Import and Export Wizard has created the package and copied the data, you can use the SSIS Designer to open and change the saved package by adding tasks, transformations, and event-driven logic.
(Since it works based on Header, order should not matter. And if a particular column is missing, it should automatically take NULL for that)
Reference: http://msdn.microsoft.com/en-us/library/ms140052.aspx
I am currently using a database that is poorly designed and a slow pipeline so i decided to copy a small portion of the database (15 tables) and only bring over some of those tables for example i want to bring only the rows that have a certain id.
But this is not a one time move i need to add all the stuff that is added to the old database added to the new one on an hourly basis. My research has led me to SSIS and that it may have a way of accomplishing this but i have found no clear examples on how it is done if in fact it is possible. Thanks in advance.
Yes it is possible . You can schedule your ssis package through sql agent to run on hourly basis .
For a table ,you can drag a Data Flow Task on to the control flow .Inside DFT ,you need to place an oledb Source component ,Lookup ,Data conversion (if the types are different in source and target table) and Oledb destination .
oledb Source component : Create a variable of type string and in the expression write your sql query to fetch the data based on ID.Now use this variable in source component.
Lookup: You need to select your source table and combine the primary key column from the source and destination table.It acts similar to inner join query .After combining the primary key from the both the tables ,select the columns which you need from the source .
Oledb destination : Simply select your target table and map the columns from Lookup no matched output .If you need to update the values from the source then use Lookup matched output and connect it to an execute SQL task and write the update query .
Please go thru the link and SO
Scheduling of SSIS package