Team,
My objective is to data load from Excel to Sql Tables using SSIS. However the excels are quite dynamic i.e. their column count could vary OR the order of existing columns may change. But the destination table will be the same...
So I was contemplating on few options like:
1) Using SQL Command in "Excel Source" - But unfortunately I have to keep "first row as header" setting as false(To resolve the issue of Excel Connection Mngr sensing the datatype as numeric based on first few records). So the querying based on header doesnt work here.
2) The other oprtion in my mind is Script Task and write C# code to read excel based on the columns I know. So in this case the order and insertion/deletion of new columns won't matter.
Suggest me whether Script Task is the only option available for me? Any other simple way to achieve the same in SSIS? Also if possible suggest me a reference for the same.
Thanks,
Justin Samuel.
If you need to automate the process, then I'd definitely go with a script component / OleDbDataAdapter combo (you can't use a streamreader because Excel is a proprietary format). If not, go with the import wizard.
If you try to use a connection manager based solution, it's going to fail when the file layout changes. With the script component / OleDbDataAdapter combo, you can add logic in to interpret the fields and standardize the record layout before loading. You can also create an error buffer and gracefully push error values to it with Try / Catch.
Here's some links on how to use the script component as a source in the data flow task:
http://microsoft-ssis.blogspot.com/2011/02/script-component-as-source-2.html
http://beyondrelational.com/modules/2/blogs/106/posts/11126/ssis-script-component-split-single-row-to-multiple-rows.aspx
This could be done easily using "Import and Export Data" tool available with SQL Server.
Step 1: Specify your Excel as source and your SQL Server DB as destination.
Step 2: Provide necessary mappings.
Step: 3 In the final screen, you can specify to "Save as SSIS Package" and to File System. A relevant dtsx SSIS package would be created for you.
After the SQL Server Import and Export Wizard has created the package and copied the data, you can use the SSIS Designer to open and change the saved package by adding tasks, transformations, and event-driven logic.
(Since it works based on Header, order should not matter. And if a particular column is missing, it should automatically take NULL for that)
Reference: http://msdn.microsoft.com/en-us/library/ms140052.aspx
Related
I am using select statement in excel source to select just specific columns data from excel for import.
But I am wondering, is it possible to select data such way when I select for example column with name: Column_1, but if this column is not exists in excel then it will try to select column with name Column_2? Currently if Column_1 is missing, then data flow task fails.
Use a Script task and write .net code to read the excel file and then perform the check for the Column_1 availability in the file. If the column does not present then use Column_2 as input. Script Task in SSIS can act as a source.
SSIS is metadata based and will not support dynamic metadata, however you can use Script Component as #nitin-raj suggested to handle all known source columns. There is a good post below on how it can be done.
Dynamic File Connections
If you have many such files that can have varying columns then it is better to create a custom component.However, you cannot have dynamic metadata even with custom component, the set of columns should be known upfront to SSIS.
If the list of columns keep changing and you cannot know in advance what are expected columns then you are better off handling the entire thing in C#/VB.Net using Script Task of control flow
As a best practice, because SSIS meta data is static, any data quality and formatting issues in source files should be corrected before ssis data flow task runs.
I have seen this situation before and there is a very simple fix. In the beginning of your ssis package, using a file task to create copy of the source excel file and then run a c# script or execute a powershell to rename the columns so that if column 1 does not exist, it is either added at the appropriate spot in excel file or in case the column name is wrong is it corrected.
As a result of this, you will not need to refresh your ssis meta data every time it fails. This is a standard data standardization practice.
The easiest way is to add two data flow tasks, one data flow for each Excel source select statement and use precedence constraints to execute the second data flow when the first one fails.
The disadvantage of this approach is that if the first data flow task fails for another reason, it will also try to execute the second one. You will need some advanced error handling to check if the error is thrown due to missing columns or not.
But if have a similar situation, I will use a Script Task to check if the column exists and build the SQL command dynamically. Note that this SQL command must always return the same metadata (you must use aliases).
Helpful links
Overview of SSIS Precedence Constraints
Working with Precedence Constraints in SQL Server Integration Services
Precedence Constraints
We want some of our customers to be able to export some data into a file and then we have a job that imports that into a blank copy of a database at our location. Note: a DBA would not be involved. This would be a function within our application.
We can ignore table schema differences - they will match. We have different tables to deal with.
So on the customer side the function would ran somethiug like:
insert into myspecialstoragetable select * from source_table
insert into myspecialstoragetable select * from source_table_2
insert into myspecialstoragetable select * from source_table_3
I then run a select * from myspecialstoragetable and get a .sql file they can then ship to me which we can then use some job/sql script to import into our copy of the db.
I'm thinking we can use XML somehow, but I'm a little lost.
Thanks
Have you looked at the bulk copy utility bcp? You can wrap it with your own program to make it easier for less sophisticated users.
Since it is a function within your application, in what language is the application front-end written ? If it is .NET, you can use Data Transformation Services in SQL Server to do a sample export. In the last step, you could save the steps into a VB/.NET module. If necessary, modify this file to change table names etc. Integrate this DTS module into your application. While doing the sample export, export it to a suitable format such as .CSV, .Excel etc, whichever format from which you will be able to import into a blank database.
Every time the user wants do an export, he will have to click on a button that would invoke the DTS module integrated into your application, that will dump the data to the desired format. He can mail such file to you.
If your application is not written in .NET, in whichever language it is written, it will have options to read data from SQL Server and dump them to a .CSV or text file with delimiters. If it is a primitive language, you may have to do it by concatenating the fields of every record, by looping through the records and writing to a file.
XML would be too far-fetched for this, though it's not impossible. At your end, you should have the ability to parse the XML file and import it into your location. Also, XML is not really suited if the no. of records are too large.
You probably think of a .sql file, as in MySql. In SQL Server, .sql files, that are generated by the 'Generate Scripts' function of SQL Server's interface, are used for table structures/DDL rather than the generation of the insert statements for each of the record's hard values.
I want to transfer SQL query results to a new csv file. This is because I have placed my SQL query inside a loop which will generate export query results to csv file each time. I'm using MS SQL Server 2012. I don't want to take GUI option.
Sql Server is not really designed to import and export files. You can use bulk copy program but I dont think it works in tsql code (looping). You can use openrowset but you need to set a special flag that opens up your surface area of attack which some do not want to do.
The answer is SSIS (or a tool like Talend). It comes with Sql and is designed by MS as the go to tool for import and export from Sql. If you were to right click on the data base, choose tasks and then export the wizard eventually creates and executes an SSIS package.
I recommend you reconsider a GUI option.
ps - Another answer was to use save results as. I have heard of problems using this method including problems with delimiters or text qualified fields.
There are multiple ways to attain this. Either you can export the resultset using BCP or using IMPORT/ EXPORT or using CTRL+SHIFT+S (this will change the resultset to SAVE AS. Hope this may help.
What's the easiest way to export data to excel from SQL Server 2000.
I want to do this from commands I can type into query analyzer.
I want the column names to appear in row 1.
In Query Analyzer, go to the Tools -> Options menu. On the Results tab, choose to send your output to a CSV file and select the "Print column headers" option. The CSV will open in Excel and you can then save it as a .XLS/.XLSX
Manual copy and paste is the only way to do exactly what you're asking. Query Analyzer can include the column names when you copy the results, but I think you may have to enable that somewhere in the options first (it's been a while since I used it).
Other alternatives are:
Write your own script or program to convert a result set into a .CSV or .XLS file
Use a DTS package to export to Excel
Use bcp.exe (but it doesn't include column names, so you have to kludge it)
Use a linked server to a blank Excel sheet and INSERT the data
Generally speaking, you cannot export data from MSSQL to a flat file using pure TSQL, because TSQL cannot manipulate anything outside the database (using a linked server is sort of cheating). So you usually need to use some sort of client application anyway, whether it's bcp.exe, dtswiz.exe or your own program.
And as a final comment, MSSQL 2000 is no longer supported (unless your company has an extended maintenance agreement) so you may want to look into upgrading at some point.
Say the source data comes in excel format, below is how I import the data.
Converting to csv format via MS Excel
Roughly find bad rows/columns by inspecting
backup the table that needs to be updated in SQL Query Analyzer
truncate the table (may need to drop foreign key constraint as well)
import data from the revised csv file in SQL Server Enterprise Manager
If there's an error like duplicate columns, I need to check the original csv and remove them
I was wondering how to make this procedure more effecient in every step? I have some idea but not complete.
For step 2&6, using scripts that can check automatically and print out all error row/column data. So it's easier to remove all errors once.
For step 3&5, is there any way to automatically update the table without manually go through the importing steps?
Could the community advise, please? Thanks.
I believe in SQL 2000 you still have DTS (Data Transformation Services) part of Enterprise Manager. Using that you should be able to create a workflow that does all of these steps in sequence. I believe it can actually natively import Excel as well. You can run everything from SQL queries to VBScript so there's pretty much nothing you can't do.
I used to use it for these kind of bucket brigade jobs all the time.