Why Bulk Import is faster than bunch of INSERTs? - sql

I'm writing my graduate work about methods of importing data from a file to SQL Server table. I have created my own program and now I'm comparing it with some standard methods such as bcp, BULK INSERT, INSERT ... SELECT * FROM OPENROWSET(BULK...) etc. My program reads in lines from a source file, parses them and imports them one by one using ordinary INSERTs. The file contains 1 million lines with 4 columns each. And now I have the situation that my program takes 160 seconds while the standard methods take 5-10 seconds.
So the question is why are BULK operations faster? Do they use special means or something? Can you please explain it or give me some useful links or something?

BULK INSERT can be a minimally logged operation (depending on various
parameters like indexes, constraints on the tables, recovery model of
the database etc). Minimally logged operations only log allocations
and deallocations. In case of BULK INSERT, only extent allocations are
logged instead of the actual data being inserted. This will provide
much better performance than INSERT.
Compare Bulk Insert vs Insert
The actual advantage, is to reduce the amount of data being logged in the transaction log.
In case of BULK LOGGED or SIMPLE recovery model the advantage is significant.
Optimizing BULK Import Performance
You should also consider reading this answer : Insert into table select * from table vs bulk insert
By the way, there are factors that will influence the BULK INSERT performance :
Whether the table has constraints or triggers, or both.
The recovery model used by the database.
Whether the table into which data is copied is empty.
Whether the table has indexes.
Whether TABLOCK is being specified.
Whether the data is being copied from a single client or copied in
parallel from multiple clients.
Whether the data is to be copied between two computers on which SQL
Server is running.

I think you can find a lot of articles on it, just search for "why bulk insert is faster". For example this seems to be a good analysis:
https://www.simple-talk.com/sql/performance/comparing-multiple-rows-insert-vs-single-row-insert-with-three-data-load-methods/
Generally, any database has a lot of work for a single insert: checking the constraints, building indices, flush to disk. This complex operation can be optimized by the database when doing several in one operation, and not calling the engine one by one.

First of all, inserting row for row is not optimal. See this article on set logic and this article on what's the fastest way to load data into SQL Server.
Second, BULK import is optimized for large loads. This has all to do with page flushing, writing to log, indexes and various other things in SQL Server. There's an technet article on how you can optimize BULK INSERTS, this sheds some light on how BULK is faster. But I cant link more than twice, so you'll have to google for "Optimizing Bulk Import Performance".

Related

Import in plsql developer using sqlldr is very slow

I have a large .sql file(with 1 Million records) which has insert statements.
this is provided by external system I have no control over.
I have to import this data into my database table, I thought it is a simple job, But Alas how wrong I was.
I am using plsql developer from AllroundAutomations, I went to
Tools -- Import Tables -- SQL Inserts -- pointed exe to sqlldr.exe,
and input to my .sql file with insert statements.
But this process is very slow only inserting around 100 records in a minute, I was expecting this whole process to take not more than an hour.
Is there a better way to do this, sounds simple to just import all data, but it takes hell lot of time.
P.S: I am a developer and not DBA and not an expert on Oracle, so any help appreciated.
When running massive numbers of INSERT's your should first drop all indexes on the table, then disable all constraints, then run your INSERT statements. You should also modify your script to include a COMMIT after every 1000 records or so. Afterwards re-add your indexes, re-enable all constraints, and gather statistics on that table (DBMS_STATS.GATHER_TABLE_STATS).
Best of luck.

Logging of different species of INSERT statement

There is a four kind of inserting data in table in SQL:
INSERT INTO TableName (ColumnList) VALUES (ValuesList)
INSERT INTO TableName (ColumnList) SELECT Columns FROM OtherTable
INSERT INTO TableName (ColumnList) EXEC SomeProc
SELECT Columns INTO TableName FROM OtherTable
Every INSERT statement is logged in transaction log, and my question is what kind of INSERT has minimal logging?
What is the order in using, based on performance?
The Data Loading Performance Guide has a good summary of minimally logged operations:
To support high-volume data loading scenarios, SQL Server implements
minimally logged operations. Unlike fully logged operations, which use
the transaction log to keep track of every row change, minimally
logged operations keep track of extent allocations and metadata
changes only. Because much less information is tracked in the
transaction log, a minimally logged operation is often faster than a
fully logged operation if logging is the bottleneck. Furthermore,
because fewer writes go the transaction log, a much smaller log file
with a lighter I/O requirement becomes viable.
Out of the different types of insert statements you provided, two can be classified as bulk load operations, which have the opportunity to be minimally logged if other prerequisites have been met:
INSERT ... SELECT – The method for performing bulk load in process with SQL Server from local queries or any OLE DB source. This
method is only ailable as a minimally logged operation in SQL Server
2008.
SELECT INTO – The method for creating a new table containing the results of a query; utilizes bulk load optimizations.
However, note that there are prerequisites and conditions that need to be met in order for one of these bulk load operations to be minimally logged...
Running the database under the Bulk-Logged or Simple recovery models
Enabling Trace Flag 610 if you're running Sql Server 2008 or newer
Whether or not the table has a clustered index
Whether or not the table is empty
Even the execution plan chosen by the optimizer
If you meet these conditions, then you may see better performance by performing a bulk logged insert as described in the article...
But again, the prerequisites for this happening are pretty complex, so I would recommend reading the article before creating / changing commands with the expectation that they will be minimally logged.
EDIT:
One clarification... Note that it is the recovery model of the destination database that is relevant. For example, if you're inserting into a temporary table from tables in a database that has a full recovery model... since the temporary table resides in tempdb, which has a simple recovery model, the insert into the temporary table is a good candidate to be written as a bulk-logged operation in order to be minimally logged.

Minimally Logged Insert Into

I have an INSERT statement that is eating a hell of a lot of log space, so much so that the hard drive is actually filling up before the statement completes.
The thing is, I really don't need this to be logged as it is only an intermediate data upload step.
For argument's sake, let's say I have:
Table A: Initial upload table (populated using bcp, so no logging problems)
Table B: Populated using INSERT INTO B from A
Is there a way that I can copy between A and B without anything being written to the log?
P.S. I'm using SQL Server 2008 with simple recovery model.
From Louis Davidson, Microsoft MVP:
There is no way to insert without
logging at all. SELECT INTO is the
best way to minimize logging in T-SQL,
using SSIS you can do the same sort of
light logging using Bulk Insert.
From your requirements, I would
probably use SSIS, drop all
constraints, especially unique and
primary key ones, load the data in,
add the constraints back. I load
about 100GB in just over an hour like
this, with fairly minimal overhead. I
am using BULK LOGGED recovery model,
which just logs the existence of new
extents during the logging, and then
you can remove them later.
The key is to start with barebones
tables, and it just screams. Building
the index once leaves you will no
indexes to maintain, just the one
index build per index.
If you don't want to use SSIS, the point still applies to drop all of your constraints and use the BULK LOGGED recovery model. This greatly reduces the logging done on INSERT INTO statements and thus should solve your issue.
http://msdn.microsoft.com/en-us/library/ms191244.aspx
Upload the data into tempdb instead of your database, and do all the intermediate transformations in tempdb. Then copy only the final data into the destination database. Use batches to minimize individual transaction size. If you still have problems, look into deploying trace flag 610, see The Data Loading Performance Guide and Prerequisites for Minimal Logging in Bulk Import:
Trace Flag 610
SQL Server 2008 introduces trace flag
610, which controls minimally logged
inserts into indexed tables.

How many inserts can you have in a sql transaction

I have a task to do that will require me using a transaction to ensure that many inserts will be completed or the entire update rolled back.
I am concerned about the amount of data that needs to be inserted in this transaction and whether this will have a negative affect on the server.
We are looking at about 10,000 records in table1 and 60,0000 records into table2.
Is this safe to do in a single transaction?
have you thought about using a bulk data loader like SSIS or the data import wizard that comes with sql server?
the data import wizard is pretty simple.
In management studio right click on the database you want to import data into. Then select tasks and import data. Follow the wizard prompts. If a record fails the whole transaction will fail.
I have loaded millions of records this way (and using SSIS).
it is safe, however keep in mind that you might be blocking other users during that time. Also take a look at bcp or BULK INSERT to make the inserts faster

How to efficiently insert data into index-rich oracle db?

How do we insert data about 2 million rows into a oracle database table where we have many indexes on it?
I know that one option is disabling index and then inserting the data. Can anyone tell me what r the other options?
bulk load with presorted data in index key order
Check SQL*Loader out (especially the paragraph about performance optimization) : it is the standard bulk loading utility for Oracle, and it does a good job once you know how to use it (as always with Oracle).
there are many tricks to fasten de insert, below i wrote some of them
if you use sequence.nextval for insert make sure sequence has big cache value (1000 is enough usually)
drop indexes before insert and create afterwards (make sure you get the create scripts of indexes before dropping) while creating you can use parallel option
if target table has fk dependencies disable them before insert and after insert enable again. if you are sure of your data you can use novalidate option (novalidate option is valid for oracle, other rdbms systems probably have similar option)
if you select and insert you can give parallel hint for select statement and for insert you can use append hint (direct-path insert ) (direct-path insert concept is valid for oracle, other rdbms systems probably have similar option)
Not sure how you are inserting the records; if you can; insert the data in smaller chunks. In my experience 50 sets of 20k records is often quicker than 1 x 1000000
Make sure your database files are large enough before you start save you from database growth during the insert ...
If you are sure about the data, besides the index you can disable referential and constraint checks. You can also lower the transaction isolation level.
All these options come with a price, though. Each option increases your risk of having corrupt data in the sense that you may end up with null FK's etc.
As an another option, one can use oracle advanced and faster data pump (expdp, impdp) utilities availability 10 G onward. Though, Oracle still supports old export/import (exp, imp).
Oracle provides us with many choices for data loading, some way faster than others:
Oracle10 Data Pump Oracle import utility
SQL insert and merge
statements PL/SQL bulk loads for the forall PL/SQL operator
SQL*Loader
The pros/cons of each can be found here ..