I have task to import multiple excel files into sql server table, while iteration some excel files have header name changed or some times data type issue, due to which DFT fails and NO further iteration, and package fails.
I want to log file names which got fail and want to continue loading excel file using DFT till the last excel file, and in the end I would like to have file names in one place which got fail, so that I could try again on them with some correction, can anybody help me out in this case?
Related
I have an Excel workbook that contains a series of sheets for import into a database. For most of the sheets I'm using a query to get at the data along the lines of:
SELECT [F1], [F2] FROM [Sheet Name$A2:B10000]
In my development I had a workbook with some test data for each sheet. Now someone else is using the package to import a copy of the workbook, but one of the sheets is blank because they don't want any data loaded for that particular table. Unfortunately, when the package tries to run that portion of the import it errors out with a VS_NEEDSNEWMETADATA error.
This can't be a unique problem, but I haven't been able to find any solutions online. Any suggestions on how to get around it?
In a similar case, I would use a Script Task to check if the Excel sheet contains data, if so, the Data Flow Task is executed (using precedence constraints).
Also, make sure to set the Data Flow Task Delay Validation property to True.
Useful links:
Check if Excel file is empty in C#
How to check if Excel-workbook is empty?
Working with Precedence Constraints in SQL Server Integration Services
I am creating multiple xl files from ssis package. I assign dynamic file name to each workbook in the For Each Loop in the FileSystemTask where I copy a blank template from template directory to the output directory where all few hundreds Excel files need to be output. Part of the file name comes from the values in the loop.
Each file has 2 spreadsheets. Different SQL Query is a source for each. When I output files with 1 spreadsheet filled it is no problem. However, when I add the second source-destination or another data flow i get an error that connection string is in invalid format or other... Should I use second set of OLEDB Source/ Excel Destination within same data flow or use another data flow for second spreadsheet?
I am trying to import a CSV into MSSQL 2008 by using the flat file import method but I am getting an Overflow error. Any ideas on how to go around it?
I used the tool before for files containing up to 10K-15K records but this file has 75K records in it....
These are the error messages
Messages
Error 0xc020209c: Data Flow Task 1: The column data for column "ClientBrandID" overflowed the disk I/O buffer.
(SQL Server Import and Export Wizard)
Error 0xc0202091: Data Flow Task 1: An error occurred while skipping data rows.
(SQL Server Import and Export Wizard)
Error 0xc0047038: Data Flow Task 1: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "Source - Shows_csv" (1) returned error code 0xC0202091. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
(SQL Server Import and Export Wizard)
This could be a format problem of the csv file e.g. the delimiter. Check if the delimiters are consistent within the file.
It could also be a problem of blank lines. I had a similar problem a while ago. I've solved it by removing all blank lines in the csv file. Worth a try anyway.
You may have one or more bad data elements. Try loading a small subset of your data to determine if it's a small number of bad records or a large one. This will also tell you if your loading scheme is working and your datatypes match.
Sometimes you can quickly spot data issues if you open the csv file in excel.
Another possible reason for this error is that input file has wrong encoding. So, when you manually check data, it seems fine. For example, in my case correct files were in 8-bit ANSI, and wrong files in UTF-16 - you can tell the difference by looking at files size, wrong files were twice bigger than correct files.
I'm looking for alternate data import solutions. Currently my process is as follows:
Open a large xlsx file in excel
Replace all "|" (pipes) with a space or another unique character
Save the file as pipe-delimited CSV
Use the import wizard in SQL Server Management Studio 2008 R2 to import the CSV file
The process works; however, steps 1-3 take a long time since the files being loaded are extremely large (approx. 1 million records).
Based on some research, I've found a few potential solutions:
a) Bulk Import - This unfortunately does not eliminate steps 1-3 mentioned above since the files need to be converted to a flat (or CSV) format
b) OpenRowSet/OpenDataSource - There are 2 issues with this problem. First, it takes a long time to load (about 2 hours for a million records). Second, when I try to load many files at once (about 20 file each containing 1 million records), I receive an "out-of-memory" error
I haven't tried SSIS; I've heard it has issues with large xlsx files
So this leads to my question. Are there any solutions/alternate options out there that will make importing of large excel files faster?
Really appreciate the help.
I love Excel as a data visualization tool but it's pants as a data transport layer. My preference is to either query it with the JET/ACE driver or use C# for non-tabular data.
I haven't cranked it up to the millions but I'd have to believe the first approach would have to be faster than your current simply based on the fact that you do not have to perform double reads and writes for your data.
Excel Source as Lookup Transformation Connection
script task in SSIS to import excel spreadsheet
Something I have done before (and I bring up because I see your file type is XLSX, not XLS) is open the file though winzip, pull the XML data out, then import it. Starting in 2007, the XLSX file is really a zip file with many folders/files in it. if the excel file is simple (not a lot of macros, charts, formating, etc), you can just pull the data from the XML file that is in the background. I know you can see it through WINZIP, I dont know about other compression apps.
I have SSIS Package which Exports Data from table to Excel file
Control Flow :-
Data Flow :-
This is My Step :-
Drop Excel Table
Create Excel Table with format as of my Select Query which i used to retrieve data from database
Insert Data from Database to Excel file
I Used Query Like Select * From Table Where --Some Condition
I retrieve 3000 rows out of 10000 rows and put that 3000 rows in my excel sheet.
But when open my excel sheet i saw scrollbar which goes till 10000th row and ends hence my excel sheet size also increses . how can i reduce my excel sheet size ? my excel sheet contains only 3000 rows then why blank cells which goes till 10000th row ?
SQL Server 2008 &
Visual Studio 2008 with BIDS
I believe your issue is around the method in which you are using to create the file. You have two alternatives and both should fix your issue:
Solution #1:
You can create an Excel file with those predefined columns, essentially your empty output file - this would act as your 'Template File'. Your flow would then be this:
File System Task - Copy template file to output or working directory (rename if necessary)
OLEDB Source Task - Query your source for the data (3000)
Data Conversion Task
Excel Destination Task - Put data into new Excel file
Note: You already have steps 2 thru 3 complete, you just need to make sure you are connecting to the new Excel file. Also, to clarify, step 1 is outside the Control Flow Task.
This way is helpful because you always have a blank and consistently formatted Excel file to copy and work with.
Solution #2:
The other option is to use a Script Task and create the Excel file - you could also load the data into the file in this task. This requires some basic understanding of VB.NET or C#. Basically you would need to get a XLS library (like NPOI). This is more complicated, but gives you the best functionality.
I recommend you try solution #1 and see how that works for you.
Drop table SheetName doesn't delete the sheet instead it just deletes the row . If for the 1st time you have loaded 10K rows and then again executed the package by restricting the number of rows to 3K ,the excel file will still contain those 10K empty rows as it retains the sheet along with the empty spaces .
You can use script task to delete the sheet using COM obects .But for that you need to place the Excel PIA(Primary Interop Assemply) to make it visible for VSA or else create a new excel file every time the package runs
Else as suggested by Nicarus use File System Task to delete the existing file and create a new Excel file on every execution .
Diagram :
File System Task :
Use the same components and the query for Create Table using Execute SQL task and your DFT