Minimally Logged Insert Into - sql

I have an INSERT statement that is eating a hell of a lot of log space, so much so that the hard drive is actually filling up before the statement completes.
The thing is, I really don't need this to be logged as it is only an intermediate data upload step.
For argument's sake, let's say I have:
Table A: Initial upload table (populated using bcp, so no logging problems)
Table B: Populated using INSERT INTO B from A
Is there a way that I can copy between A and B without anything being written to the log?
P.S. I'm using SQL Server 2008 with simple recovery model.

From Louis Davidson, Microsoft MVP:
There is no way to insert without
logging at all. SELECT INTO is the
best way to minimize logging in T-SQL,
using SSIS you can do the same sort of
light logging using Bulk Insert.
From your requirements, I would
probably use SSIS, drop all
constraints, especially unique and
primary key ones, load the data in,
add the constraints back. I load
about 100GB in just over an hour like
this, with fairly minimal overhead. I
am using BULK LOGGED recovery model,
which just logs the existence of new
extents during the logging, and then
you can remove them later.
The key is to start with barebones
tables, and it just screams. Building
the index once leaves you will no
indexes to maintain, just the one
index build per index.
If you don't want to use SSIS, the point still applies to drop all of your constraints and use the BULK LOGGED recovery model. This greatly reduces the logging done on INSERT INTO statements and thus should solve your issue.
http://msdn.microsoft.com/en-us/library/ms191244.aspx

Upload the data into tempdb instead of your database, and do all the intermediate transformations in tempdb. Then copy only the final data into the destination database. Use batches to minimize individual transaction size. If you still have problems, look into deploying trace flag 610, see The Data Loading Performance Guide and Prerequisites for Minimal Logging in Bulk Import:
Trace Flag 610
SQL Server 2008 introduces trace flag
610, which controls minimally logged
inserts into indexed tables.

Related

DB2: Working with concurrent DDL operations

We are working on a data warehouse using IBM DB2 and we wanted to load data by partition exchange. That means we prepare a temporary table with the data we want to load into the target table and then use that entire table as a data partition in the target table. If there was previous data we just discard the old partition.
Basically you just do "ALTER TABLE target_table ATTACH PARTITION pname [starting and ending clauses] FROM temp_table".
It works wonderfully, but only for one operation at a time. If we do multiple loads in parallel or try to attach multiple partitions to the same table it's raining deadlock errors from the database.
From what I understand, the problem isn't necessarily with parallel access to the target table itself (locking it changes nothing), but accesses to system catalog tables in the background.
I have combed through the DB2 documentation but the only reference to the topic of concurrent DDL statements I found at all was to avoid doing them. The answer to this question, can't be to simply not attempt it?
Does anyone know a way to deal with this problem?
I tried to have a global, single synchronization table to lock if you want to attach any partitions, but it didn't help either. Either I'm missing something (implicit commits somewhere?) or some of the data catalog updates even happen asynchronously, which makes the whole problem much worse. If that is the case, is there are any chance at all to query if the attach is safe to perform at any given moment?

Why Bulk Import is faster than bunch of INSERTs?

I'm writing my graduate work about methods of importing data from a file to SQL Server table. I have created my own program and now I'm comparing it with some standard methods such as bcp, BULK INSERT, INSERT ... SELECT * FROM OPENROWSET(BULK...) etc. My program reads in lines from a source file, parses them and imports them one by one using ordinary INSERTs. The file contains 1 million lines with 4 columns each. And now I have the situation that my program takes 160 seconds while the standard methods take 5-10 seconds.
So the question is why are BULK operations faster? Do they use special means or something? Can you please explain it or give me some useful links or something?
BULK INSERT can be a minimally logged operation (depending on various
parameters like indexes, constraints on the tables, recovery model of
the database etc). Minimally logged operations only log allocations
and deallocations. In case of BULK INSERT, only extent allocations are
logged instead of the actual data being inserted. This will provide
much better performance than INSERT.
Compare Bulk Insert vs Insert
The actual advantage, is to reduce the amount of data being logged in the transaction log.
In case of BULK LOGGED or SIMPLE recovery model the advantage is significant.
Optimizing BULK Import Performance
You should also consider reading this answer : Insert into table select * from table vs bulk insert
By the way, there are factors that will influence the BULK INSERT performance :
Whether the table has constraints or triggers, or both.
The recovery model used by the database.
Whether the table into which data is copied is empty.
Whether the table has indexes.
Whether TABLOCK is being specified.
Whether the data is being copied from a single client or copied in
parallel from multiple clients.
Whether the data is to be copied between two computers on which SQL
Server is running.
I think you can find a lot of articles on it, just search for "why bulk insert is faster". For example this seems to be a good analysis:
https://www.simple-talk.com/sql/performance/comparing-multiple-rows-insert-vs-single-row-insert-with-three-data-load-methods/
Generally, any database has a lot of work for a single insert: checking the constraints, building indices, flush to disk. This complex operation can be optimized by the database when doing several in one operation, and not calling the engine one by one.
First of all, inserting row for row is not optimal. See this article on set logic and this article on what's the fastest way to load data into SQL Server.
Second, BULK import is optimized for large loads. This has all to do with page flushing, writing to log, indexes and various other things in SQL Server. There's an technet article on how you can optimize BULK INSERTS, this sheds some light on how BULK is faster. But I cant link more than twice, so you'll have to google for "Optimizing Bulk Import Performance".

Logging of different species of INSERT statement

There is a four kind of inserting data in table in SQL:
INSERT INTO TableName (ColumnList) VALUES (ValuesList)
INSERT INTO TableName (ColumnList) SELECT Columns FROM OtherTable
INSERT INTO TableName (ColumnList) EXEC SomeProc
SELECT Columns INTO TableName FROM OtherTable
Every INSERT statement is logged in transaction log, and my question is what kind of INSERT has minimal logging?
What is the order in using, based on performance?
The Data Loading Performance Guide has a good summary of minimally logged operations:
To support high-volume data loading scenarios, SQL Server implements
minimally logged operations. Unlike fully logged operations, which use
the transaction log to keep track of every row change, minimally
logged operations keep track of extent allocations and metadata
changes only. Because much less information is tracked in the
transaction log, a minimally logged operation is often faster than a
fully logged operation if logging is the bottleneck. Furthermore,
because fewer writes go the transaction log, a much smaller log file
with a lighter I/O requirement becomes viable.
Out of the different types of insert statements you provided, two can be classified as bulk load operations, which have the opportunity to be minimally logged if other prerequisites have been met:
INSERT ... SELECT – The method for performing bulk load in process with SQL Server from local queries or any OLE DB source. This
method is only ailable as a minimally logged operation in SQL Server
2008.
SELECT INTO – The method for creating a new table containing the results of a query; utilizes bulk load optimizations.
However, note that there are prerequisites and conditions that need to be met in order for one of these bulk load operations to be minimally logged...
Running the database under the Bulk-Logged or Simple recovery models
Enabling Trace Flag 610 if you're running Sql Server 2008 or newer
Whether or not the table has a clustered index
Whether or not the table is empty
Even the execution plan chosen by the optimizer
If you meet these conditions, then you may see better performance by performing a bulk logged insert as described in the article...
But again, the prerequisites for this happening are pretty complex, so I would recommend reading the article before creating / changing commands with the expectation that they will be minimally logged.
EDIT:
One clarification... Note that it is the recovery model of the destination database that is relevant. For example, if you're inserting into a temporary table from tables in a database that has a full recovery model... since the temporary table resides in tempdb, which has a simple recovery model, the insert into the temporary table is a good candidate to be written as a bulk-logged operation in order to be minimally logged.

skip-lock-tables and mysqldump

Daily we run mysql dumps on about 50 individual databases, package them up and then store them offsite. Some of these databases are rather large and contain myisam tables (which CANNOT be changed so suggesting it is pointless).. I have been reading up on using the skip-lock-tables option when doing a dump but have not read what the downside would be. All I see are basically different iterations of "it could have adverse effects if data is inserted to a table while it is dumping."
What are these adverse effects? Does it just mean we will miss those queries upon a restore or will it mean the dump file will be broken and useless? I honestly could care less if we lose NEW data posted after the dump has started as I am just looking for a snapshot in time.
Can I rely on these database dumps to contain all the data that was saved before issuing the dump.
--skip-lock-tables parameter instructs the mysqldump utility not to issue a LOCK TABLES command before obtaining the dump which will acquire a READ lock on every table. All tables in the database should be locked, for improved consistency in case of a backup procedure. Even with skip-lock-tables, while a table is dumped, will not receive any INSERTs or UPDATEs whatsoever, as it will be locked due the SELECT required to obtain all records from the table. It looks like this
SELECT SQL_NO_CACHE * FROM my_large_table
and you can see it in the process list with the SHOW PROCESSLIST command.
If you are using the MyISAM engine which is non-transactional, locking the tables will not guarantee referential integrity and data consistency in any case, I personally use the --skip-lock-tables parameter almost always. In InnoDB use the --single-transaction parameter for the expected effect.
Hope this helps.

Need to alter column types in production database (SQL Server 2005)

I need help writing a TSQL script to modify two columns' data type.
We are changing two columns:
uniqueidentifier -> varchar(36) * * * has a primary key constraint
xml -> nvarchar(4000)
My main concern is production deployment of the script...
The table is actively used by a public website that gets thousands of hits per hour. Consequently, we need the script to run quickly, without affecting service on the front end. Also, we need to be able to automatically rollback the transaction if an error occurs.
Fortunately, the table only contains about 25 rows, so I am guessing the update will be quick.
This database is SQL Server 2005.
(FYI - the type changes are required because of a 3rd-party tool which is not compatible with SQL Server's xml and uniqueidentifier types. We've already tested the change in dev and there are no functional issues with the change.)
As David said, execute a script in a production database without doing a backup or stop the site is not the best idea, that said, if you want to do changes in only one table with a reduced number of rows you can prepare a script to :
Begin transaction
create a new table with the final
structure you want.
Copy the data from the original table
to the new table
Rename the old table to, for example,
original_name_old
Rename the new table to
original_table_name
End transaction
This will end with a table that is named as the original one but with the new structure you want, and in addition you maintain the original table with a backup name, so if you want to rollback the change you can create a script to do a simple drop of the new table and rename of the original one.
If the table has foreign keys the script will be a little more complicated, but is still possible without much work.
Consequently, we need the script to
run quickly, without affecting service
on the front end.
This is just an opinion, but it's based on experience: That's a bad idea. It's better to have a short, (pre-announced if possible) scheduled downtime than to take the risk.
The only exception is if you really don't care if the data in these tables gets corrupted, and you can be down for an extended period.
In this situation, based on th types of changes you're making and the testing you've already performed, it sounds like the risk is very minimal, since you've tested the changes and you SHOULD be able to do it safely, but nothing is guaranteed.
First, you need to have a fall-back plan in case something goes wrong. The short version of a MINIMAL reasonable plan would include:
Shut down the website
Make a backup of the database
Run your script
test the DB for integrity
bring the website back online
It would be very unwise to attempt to make such an update while the website is live. you run the risk of being down for an extended period if something goes wrong.
A GOOD plan would also have you testing this against a copy of the database and a copy of the website (a test/staging environment) first and then taking the steps outlined above for the live server update. You have already done this. Kudos to you!
There are even better methods for making such an update, but the trade-off of down time for safety is a no-brainer in most cases.
And if you absolutely need to do this in live then you might consider this:
1) Build an offline version of the table with the new datatypes and copied data.
2) Build all the required keys and indexes on the offline tables.
3) swap the tables out in a transaction. 00 you could rename the old table to something else as an emergency backup.
sp_help 'sp_rename'
But TEST FIRST all of this in a prod like environment. And make sure your backups are up to date. AND do this when you are least busy.