SQL Server 2008 INSERT Optimization - sql

I've to INSERT a lot of rows (more than 1.000.000.000) to a SQL Server data base. The table has an AI Id, two varchar(80) cols and a smalldatetime with GETDATE as default value. The last one is just for auditory, but necesary.
I'd like to know the best (fastest) way to INSERT the rows. I've been reading about BULK INSERT. But if posible I'd like to avoid it because the app does not run on the same server where database is hosted and I'd like to keep them as isolated as posible.
Thanks!
Diego

Another option would be bcp.
Alternatively, if you're using .NET you can use the SqlBulkCopy class to bulk insert data. This is something I've blogged about on the performance of, which you may be interested in as I compared SqlBulkCopy vs another way of bulk loading data to SQL Server from .NET (using SqlDataAdapter). Basic example loading 100,000 rows took 0.8229s using SqlBulkCopy vs. 25.0729s using the SqlDataAdapter approach.

Create an SSIS package that will copy the file to SQL server machine and then use the data flow task to import data from file to SQL server database.

There is no faster/more efficient way than BULK INSERT and when you're dealing with such large ammount of data, do not even think about anything from .NET, because thanks to GC, managing millions of object in memory causes massive performance degradation.

Related

SQL Server 2008 : What is the best way for inserting big chunk of data?

We need to extract 54M rows from one database to another. Columns of two tables are similar but not exactly same so there is some conversion work to do. I've started a cursor, but is there any better and also performance friendly way for inserting big chunk of data?
Performance and logging-wise, the best options to move large amounts of data are with SSIS or other bulk operations such as BCP export/import.
As far as performance I would suggest you could do the following
1) Creating a Stored proc to do the task - you can call stored proc using ssis
2) Add SQL Agent job if necessary.

Insert data from C# into SQL Server 2008 R2

I have nearly 7 billion rows of data in memory (list<T> and sortedlist<T,T>) in C#. I want to insert this data into tables in SQL Server. To do this, I define different SqlConnection for each collection and set connection pool to False.
First, I tried to insert data with connected mode (ExecuteNonQuery). Even I defined Parallel.Invoke and called all insert methods for different collections concurrently, it is too slow and up to now I couldn't finish it (I couldn't discriminate any differences between sequential and concurrent insert).
Also, I tried to create an object from SqlDataTable. To fill tables I read all data from collections once and add data to SqlDataTable. In this case I set SqlBatchSize=10000 and SqlTimeOut=0 for SqlBulkCopy. But this one also is very slow.
How can I insert a huge amount of data into SQL Server fast?
Look for 'BULK INSERT'. The technique is available for various RDBMS. Basically, you create a (text)file with one line per record and tell the server to consume this text file. This is the fastest approach I could think of. I import 50 million rows in a couple of seconds that way.
You already discovered SqlBulkCopy but you say it is slow. This can be because of two reasons:
You are using too small batches. Try to stream the rows in using a custom IDataReader that you pass to WriteToServer (or just use bigger DataTables)
Your table has nonclustered indexes. Disable them pre-import and regenerate them
You can't go faster than with bulk-import, though.

Impact of bulk insertion and bulk deletion to the MS SQL server 2008

Does anybody know what's the impact of MSSQL 2008 Database when executing insert and delete SQL statement for around 100,000 records each run after a period of time?
I heard from my client saying that for mysql and for its specific data type, after loading and clearing the database for a period of time, the data will become fragmented/corrupted. I wonder if this also happens to MS SQL? Or what will be the possible impact to the database?
Right now the statements we use to load and reload the data in to all the tables in the database are simple INSERT and DELETE statements.
Please advice. Thank you in advance! :)
-Shen
The transaction log will likely grow due to all the inserts/deletes, and depending on the data which is being deleted/inserted and table structure there will likely be data fragmentation
The data won't be 'corrupted' - if this is happening in MySql, it sounds like a bug in that particular storage engine. Fragmentation shouldn't corrupt a database, but does hamper performance
You can combat this using a table rebuild, a table recreate or a reorganise.
There's plenty of good info regarding fragmentation online. A good article is here:
http://www.simple-talk.com/sql/database-administration/defragmenting-indexes-in-sql-server-2005-and-2008/

Select 100K+ records quickly

I need to select some 100k+ records from a SQL table and do some processing and then do a bulk insert to another table. I am using SQLBulkCopy to to do the bulk insert which runs quickly. For getting the 100k+ records, I am currently using DataReader.
Problem: Sometimes I am getting a timeout error in DataReader. I have increased the timeout to some managable number.
Is there anything like SQLBulkCopy for selecting records in a bulk batch?
Thanks!
Bala
It sound like you should do all your processing inside sql server. Or split data into chunks.
A quote from this msdn page:
Note
No special optimization techniques exist for bulk-export operations. These operations simply select the data from the source table by using a SELECT statement.
However, on that same page, it mentions the bcp utlity can "bulk export" data from SQL Server to a file.
I suggest you try your query with bcp, and see if it's significantly faster. If it's not, I'd give up and try fiddling with your batch sizes, or look harder at moving the processing into SQL Server.

MS-SQL Bulk Insert with RODBC

Is it possible to perform a bulk insert into an MS-SQL Server (2000, 2005, 2008) using the RODBC package?
I know that I can do this using freebcp, but I'm curious if the RODBC package implements this portion of the Microsoft SQL API and if not, how difficult it would be to implement it.
check out the new odbc and DBI packages. DBI::dbWriteTable writes around 20,000 records per second... Much much faster than the Row Inserts from RODBC::sqlSave()
You're probably looking for ?sqlSave which uses a parametrized INSERT INTO query (taking place in one operation) when you set Fast=True.
Now You can use dbBulkCopy from the new rsqlserver package:
A typical scenario:
You create a matrix
you save it as a csv file
You call dbBulkCopy to read fil and insert it using internally bcp tool of MS Sql server.
This assume that your table is already created in the data base:
dat <- matrix(round(rnorm(nrow*ncol),nrow,ncol)
id.file = "temp_file.csv"
write.csv(dat,file=id.file,row.names=FALSE)
dbBulkCopy(conn,'NEW_BP_TABLE',value=id.file)
Using RODBC, the fastest insert we've been able to create (260 million row insert) looks like the following (in R pseudo code):
ourDataFrame <- sqlQuery(OurConnection, "SELECT myDataThing1, myDataThing2
FROM myData")
ourDF <- doStuff(ourDataFrame)
write.csv(ourDF,ourFile)
sqlQuery(OurConnection, "CREATE TABLE myTable ( la [La], laLa [LaLa]);
BULK INSERT myTable FROM 'ourFile'
WITH YOURPARAMS=yourParams;")
If you're running this from between servers, you need a network drive that the R server can write to (e.g. one server with permissions for writing to the DB uses Rscript to productionalize the code), and the SQL Server can read from.
From everything I can find, there is NO solution for bulk insert to MySQL and nothing that works with SSIS which is why Microsoft is including in-database analytics with SQL Server 2016 after buying Revolution R Analytics.
I tried to comment on the previous answer but don't have the reputation to do it.
The rsqlserver package needs to run with rClr and neither of those packages are well-behaved, especially because rsqlserver's INSERT functions have poor data type handling. So if you use it, you'll have no idea what you're looking at in the SQL table as much of the information in your data.frame will have been transformed.
Considering the RODBC package has been around for 15 years, I'm pretty disappointed that no one has created a bulk insert function...
Our n2khelper package can use bcp (bulkcopy) when it is available. When not available it falls back to multiple INSERT statements.
You can find the package on https://github.com/INBO-Natura2000/n2khelper
Install it with devtools::install_git("INBO-Natura2000/n2khelper") and look for the odbc_insert() function.