Looking for efficient methods of loading large excel (xlsx) files into SQL - sql

I'm looking for alternate data import solutions. Currently my process is as follows:
Open a large xlsx file in excel
Replace all "|" (pipes) with a space or another unique character
Save the file as pipe-delimited CSV
Use the import wizard in SQL Server Management Studio 2008 R2 to import the CSV file
The process works; however, steps 1-3 take a long time since the files being loaded are extremely large (approx. 1 million records).
Based on some research, I've found a few potential solutions:
a) Bulk Import - This unfortunately does not eliminate steps 1-3 mentioned above since the files need to be converted to a flat (or CSV) format
b) OpenRowSet/OpenDataSource - There are 2 issues with this problem. First, it takes a long time to load (about 2 hours for a million records). Second, when I try to load many files at once (about 20 file each containing 1 million records), I receive an "out-of-memory" error
I haven't tried SSIS; I've heard it has issues with large xlsx files
So this leads to my question. Are there any solutions/alternate options out there that will make importing of large excel files faster?
Really appreciate the help.

I love Excel as a data visualization tool but it's pants as a data transport layer. My preference is to either query it with the JET/ACE driver or use C# for non-tabular data.
I haven't cranked it up to the millions but I'd have to believe the first approach would have to be faster than your current simply based on the fact that you do not have to perform double reads and writes for your data.
Excel Source as Lookup Transformation Connection
script task in SSIS to import excel spreadsheet

Something I have done before (and I bring up because I see your file type is XLSX, not XLS) is open the file though winzip, pull the XML data out, then import it. Starting in 2007, the XLSX file is really a zip file with many folders/files in it. if the excel file is simple (not a lot of macros, charts, formating, etc), you can just pull the data from the XML file that is in the background. I know you can see it through WINZIP, I dont know about other compression apps.

Related

What is the best way to import data using insert statements into a table in MS SQL?

I have exported a table from another db into an .sql file as insert statements.
The exported file has around 350k lines in it.
When i try to simply run them, I get a "not enough memory" error before the execution even starts.
How can import this file easily?
Thanks in advance,
Orkun
You have to manually split sql file into smaller pieces. Use Notepad++ or some other editor capable to handle huge files.
Also, since you wrote that you have ONE table, you could try with utility or editor which can automatically split file into pieces of predefined size.
Use SQLCMD utility.. search MICROSOFT documentation.. with that you just need to gives some parameters. One of them is file path.. no need to go through the pain of splitting and other jugglery..

Importing Massive text file into a sql server database

I am currently trying to import a text file with 180+ million records with about 300+ columns into my sql server database. Needless to say the file is roughly 70 GBs large. I have been at it for days and when i get close something happens and it craps out on me. I need the quickest and most efficient way to do this import. I have tried the wizard which should have been the easiest, then i tried just saving as an ssis package. I havent been able to figure out how to do a bulk import with the settings i think would work great. The error i keep on getting is 'not enough virtual memory'. I changed my virtual memory to 36 gigs . My system has 24 gigs of physical memory. Please help me.
If you are using BCP (and you should be for files this large), use a batch size. Otherwise, BCP will attempt to load all records in one transaction.
By command line: bcp -b 1000
By C#:
using (System.Data.SqlClient.SqlBulkCopy bulkCopy =
new System.Data.SqlClient.SqlBulkCopy(sqlConnection))
{
bulkCopy.DestinationTableName = destinationTableName;
bulkCopy.BatchSize = 1000; // 1000 rows
bulkCopy.WriteToServer(dataTable); // May also pass in DataRow[]
}
Here are the highlights from this MSDN article:
Importing a large data file as a single batch can be problematic, so
bcp and BULK INSERT let you import data in a series of batches, each
of which is smaller than the data file. Each batch is imported and
logged in a separate transaction...
Try reducing the maximum server memory for SQL Server to as low as you can get away with. (Right click the SQL instance in Mgmt Studio -> properties -> memory).
This may free up enough memory for the OS & SSIS to process such a big text file.
I'm assuming the whole process is happening locally on the server.
I had a similar problem with SQL 2012 and trying to import (as a test) around 7 million records into a database. I too ran out of memory and had to cut the bulk import into smaller pieces. The one thing to note is that all the memory that the import process uses (no matter what manner you leverage) up a ton of memory and won't release said system memory until the server was rebooted. I'm not sure if this is intended behavior by SQL Server but it's something to note for your project.
Because I was using the SEQUENCE command with this process I had to leverage T-sql code saved as sql scripts and then use SQLCMD in small pieces to lessen the memory overhead.
You'll have to play around with what works for you and highly recommend to not run the script all at once.
It's going to be a pain in the ass to break it down in smaller pieces and import it in but in the long run you'll be happier.

importing a text file using pgAdmin

I have just downloaded pgAdmin 1.14.3 in an effort to import, query, and manage large textfiles. These textfiles are either quote comma quote delimited or tab delimited (they come as quote comma quote and I edited many for use with another software). While version 1.16 allows an import function, it has not been released yet and I am wondering how to import data into a newly created table using pgAdmin.
The text files range from 12MB to 2GB, so I'm looking for a comprehensive solution that would not involve importing row by row. I tried this with phppgadmin, but ran into file size limitations embedded in the php.ini file (separate post) and am trying this as a possible workaround. I'm a little new to SQL, so not really sure of all the commands possible at my fingertips. Any helps is appreciated - thanks!
You can issue a COPY statement, like this:
COPY table_name (column_name)
FROM 'd:\test.sql';
Query returned successfully: 6 rows affected, 31 ms execution time.
See the documentation here:
http://www.postgresql.org/docs/9.1/static/sql-copy.html
Note that I did not test this in PgAdmin for large files, but using psql I have never seen a case where the file had been too big for COPY.

Bulk Exporting Non-Tabular Excel Data into a Database

We're facing a massive data migration that consists of about 1500 Excel Spreadsheets that were used to print out data. Because they were designed to be visual, the data is stored in several fields throughout the spreadsheet. I'm looking for a way to map out those fields, and then do a bulk import to bring all of it into a single large table (or series of tables, if needed).
If it was all in a table format, this wouldn't be a problem - but I'm not sure of any way to import, and somehow map the fields to be imported).
Any thoughts or ideas?
If we're speaking in generalities, I'd look at performing this task via a .NET program and use the ACE OLEDB provider to query the Excel data. Then use the language of your choice to parse the data into something more manageable and write to the database. This approach worked well to import some very non-tabular data into SQL Server.
Quick list of SO questions with C# examples of code to query Excel.
How to query excel file in C# using a detailed query
OleDB Read Excel decimal value
Importing excel files in c#

Importing an .RPT (6 gigs) file into SQL Server 2005

I'm trying to import two seperate .RPT files into SQL, one is small, one is large. Both have issues with determining where the columns are seperated.
My solution for this was to import the file into access, define the columns and then save it as a txt file.
This worked perfectly.
The problem however is the larger file is 6 gigs and MS Access won't allow me to open it. When trying to change the extension to simply .txt and importing it into SQL, everything comes down under one column (despite there being 10) and there is no way to accurately seperate the data.
Please help!
As Tony stated Access has a hard 2GB limit on database size.
You don't say what kind of file the .RPT file is. If it is a text file, then you could break it into smaller chunks by reading it line by line and appending it into temporary files. Then import/export these smaller files one at a time.
Keep in mind the 2GB limit is on the Access database, so your temporary text files will need to be somewhat smaller because the import will likely introduce some additional overhead. Also, you may need to compact/repair the database in between import/export cycles to reclaim space in the database; simply deleting the records is not enough.
If the file has column delimiters or fixed column widths you can try the following in SQL Management Studio:
Right click on a database, select "Tasks" and then "Import data...". This will take you through a wizard where you can define the source columns and map them to an existing or new table.