Logging of different species of INSERT statement - sql

There is a four kind of inserting data in table in SQL:
INSERT INTO TableName (ColumnList) VALUES (ValuesList)
INSERT INTO TableName (ColumnList) SELECT Columns FROM OtherTable
INSERT INTO TableName (ColumnList) EXEC SomeProc
SELECT Columns INTO TableName FROM OtherTable
Every INSERT statement is logged in transaction log, and my question is what kind of INSERT has minimal logging?
What is the order in using, based on performance?

The Data Loading Performance Guide has a good summary of minimally logged operations:
To support high-volume data loading scenarios, SQL Server implements
minimally logged operations. Unlike fully logged operations, which use
the transaction log to keep track of every row change, minimally
logged operations keep track of extent allocations and metadata
changes only. Because much less information is tracked in the
transaction log, a minimally logged operation is often faster than a
fully logged operation if logging is the bottleneck. Furthermore,
because fewer writes go the transaction log, a much smaller log file
with a lighter I/O requirement becomes viable.
Out of the different types of insert statements you provided, two can be classified as bulk load operations, which have the opportunity to be minimally logged if other prerequisites have been met:
INSERT ... SELECT – The method for performing bulk load in process with SQL Server from local queries or any OLE DB source. This
method is only ailable as a minimally logged operation in SQL Server
2008.
SELECT INTO – The method for creating a new table containing the results of a query; utilizes bulk load optimizations.
However, note that there are prerequisites and conditions that need to be met in order for one of these bulk load operations to be minimally logged...
Running the database under the Bulk-Logged or Simple recovery models
Enabling Trace Flag 610 if you're running Sql Server 2008 or newer
Whether or not the table has a clustered index
Whether or not the table is empty
Even the execution plan chosen by the optimizer
If you meet these conditions, then you may see better performance by performing a bulk logged insert as described in the article...
But again, the prerequisites for this happening are pretty complex, so I would recommend reading the article before creating / changing commands with the expectation that they will be minimally logged.
EDIT:
One clarification... Note that it is the recovery model of the destination database that is relevant. For example, if you're inserting into a temporary table from tables in a database that has a full recovery model... since the temporary table resides in tempdb, which has a simple recovery model, the insert into the temporary table is a good candidate to be written as a bulk-logged operation in order to be minimally logged.

Related

Data flow insert lock

I have an issue with my data flow task locking, this task compares a couple of tables, from the same server and the result is inserted into one of the tables being compared. The table being inserted into is being compared by a NOT EXISTS clause.
When performing fast load the task freezes with out errors when doing a regular insert the task gives a dead lock error.
I have 2 other tasks that perform the same action to the same table and they work fine but the amount of information being inserted is alot smaller. I am not running these tasks in parallel.
I am considering using no locks hint to get around this because this is the only task that writes to a cerain table partition, however I am only coming to this conclusion because I can not figure out anything else, aside from using a temp table, or a hashed anti join.
Probably you have so called deadlock situation. You have in your DataFlow Task (DFT) two separate connection instances to the same table. The first conn instance runs SELECT and places Shared lock on the table, the second runs INSERT and places a page or table lock.
A few words on possible cause. SSIS DFT reads table rows and processes it in batches. When number of rows is small, read is completed within a single batch, and Shared lock is eliminated when Insert takes place. When number of rows is substantial, SSIS splits rows into several batches, and processes it consequentially. This allows to perform steps following DFT Data Source before the Data Source completes reading.
The design - reading and writing the same table in the same Data Flow is not good because of possible locking issue. Ways to work it out:
Move all DFT logic inside single INSERT statement and get rid of DFT. Might not be possible.
Split DFT, move data into intermediate table, and then - move to the target table with following DFT or SQL Command. Additional table needed.
Set a Read Committed Snapshot Isolation (RCSI) on the DB and use Read Committed on SELECT. Applicable to MS SQL DB only.
The most universal way is the second with an additional table. The third is for MS SQL only.

Why Bulk Import is faster than bunch of INSERTs?

I'm writing my graduate work about methods of importing data from a file to SQL Server table. I have created my own program and now I'm comparing it with some standard methods such as bcp, BULK INSERT, INSERT ... SELECT * FROM OPENROWSET(BULK...) etc. My program reads in lines from a source file, parses them and imports them one by one using ordinary INSERTs. The file contains 1 million lines with 4 columns each. And now I have the situation that my program takes 160 seconds while the standard methods take 5-10 seconds.
So the question is why are BULK operations faster? Do they use special means or something? Can you please explain it or give me some useful links or something?
BULK INSERT can be a minimally logged operation (depending on various
parameters like indexes, constraints on the tables, recovery model of
the database etc). Minimally logged operations only log allocations
and deallocations. In case of BULK INSERT, only extent allocations are
logged instead of the actual data being inserted. This will provide
much better performance than INSERT.
Compare Bulk Insert vs Insert
The actual advantage, is to reduce the amount of data being logged in the transaction log.
In case of BULK LOGGED or SIMPLE recovery model the advantage is significant.
Optimizing BULK Import Performance
You should also consider reading this answer : Insert into table select * from table vs bulk insert
By the way, there are factors that will influence the BULK INSERT performance :
Whether the table has constraints or triggers, or both.
The recovery model used by the database.
Whether the table into which data is copied is empty.
Whether the table has indexes.
Whether TABLOCK is being specified.
Whether the data is being copied from a single client or copied in
parallel from multiple clients.
Whether the data is to be copied between two computers on which SQL
Server is running.
I think you can find a lot of articles on it, just search for "why bulk insert is faster". For example this seems to be a good analysis:
https://www.simple-talk.com/sql/performance/comparing-multiple-rows-insert-vs-single-row-insert-with-three-data-load-methods/
Generally, any database has a lot of work for a single insert: checking the constraints, building indices, flush to disk. This complex operation can be optimized by the database when doing several in one operation, and not calling the engine one by one.
First of all, inserting row for row is not optimal. See this article on set logic and this article on what's the fastest way to load data into SQL Server.
Second, BULK import is optimized for large loads. This has all to do with page flushing, writing to log, indexes and various other things in SQL Server. There's an technet article on how you can optimize BULK INSERTS, this sheds some light on how BULK is faster. But I cant link more than twice, so you'll have to google for "Optimizing Bulk Import Performance".

SQL Server locking on SELECT

I have built a web service that queries new data using an iterator (bigint) from a big table (100+ million rows) in an accounting system (SQL Server 2008 R2 Standard Edition).
The provider of the database has forced us to read uncommited transactions to ensure that we do not lock up the table for inserts (updates are never made).
Lately this has caused us trouble due to rollbacks. The accounting system has rollback rows that was already read by the web service due to errors and timeouts causing my system to store data that never should existed.
I think reading commited data would solve this but the accounting system provider will not let us since they are worried that it will lock inserts into the table.
Can the select actually block inserts and how would we best solve it?
Try selecting only data that you know isn't dirty (i.e. in the process of being written). For instance, if your table has a column createddate filled with the date the row was inserted on the database, then create a where condition retrieving only rows inserted after 5 minutes have passed.
Example
SELECT col1, col2, col3, createddate
FROM table WITH(NOLOCK)
WHERE createddate < dateadd(minute,-5,getdate())
This is not a technical issue as much as a design issue. I would recommend that you create a new table that is a copy of this table with committed transactions after whatever time frame is necessary to determine the committed data. Otherwise I would recommend finding another vendor for your accounting system.

Minimally Logged Insert Into

I have an INSERT statement that is eating a hell of a lot of log space, so much so that the hard drive is actually filling up before the statement completes.
The thing is, I really don't need this to be logged as it is only an intermediate data upload step.
For argument's sake, let's say I have:
Table A: Initial upload table (populated using bcp, so no logging problems)
Table B: Populated using INSERT INTO B from A
Is there a way that I can copy between A and B without anything being written to the log?
P.S. I'm using SQL Server 2008 with simple recovery model.
From Louis Davidson, Microsoft MVP:
There is no way to insert without
logging at all. SELECT INTO is the
best way to minimize logging in T-SQL,
using SSIS you can do the same sort of
light logging using Bulk Insert.
From your requirements, I would
probably use SSIS, drop all
constraints, especially unique and
primary key ones, load the data in,
add the constraints back. I load
about 100GB in just over an hour like
this, with fairly minimal overhead. I
am using BULK LOGGED recovery model,
which just logs the existence of new
extents during the logging, and then
you can remove them later.
The key is to start with barebones
tables, and it just screams. Building
the index once leaves you will no
indexes to maintain, just the one
index build per index.
If you don't want to use SSIS, the point still applies to drop all of your constraints and use the BULK LOGGED recovery model. This greatly reduces the logging done on INSERT INTO statements and thus should solve your issue.
http://msdn.microsoft.com/en-us/library/ms191244.aspx
Upload the data into tempdb instead of your database, and do all the intermediate transformations in tempdb. Then copy only the final data into the destination database. Use batches to minimize individual transaction size. If you still have problems, look into deploying trace flag 610, see The Data Loading Performance Guide and Prerequisites for Minimal Logging in Bulk Import:
Trace Flag 610
SQL Server 2008 introduces trace flag
610, which controls minimally logged
inserts into indexed tables.

read-access to a MyISAM table during a long INSERT?

On mysql and using only myisam tables, I need to access the contents of a table during the course of a long-running INSERT.
Is there a way to prevent the INSERT from locking the table in a way that keeps a concurrent SELECT from running?
This is what I am driving at: to inspect how many records have been inserted up to now. Unfortunately WITH (NOLOCK) does not work on mysql and I could only find commands that control the transaction locks (eg, setting the transaction isolation level to READ UNCOMMITTED) -- which, from my understanding, should not apply to myisam tables at all since they don't support transactions in the first place.
MyISAM locking will block selects. Is there a reason for using MyISAM above InnoDB? If you don't want to change your engine, I suspect this might be a solution for you:
1: Create a materialized view of the table using a cron job (or other scheduled task) that your application can query without blocking.
2: Use a trigger to count up the number of inserts that have occurred, and look up the number of inserts using this meta-data table.