Truncate Statement Taking Too much time - sql

I have a table Which has more than 1 million records, I have created a stored Procedure to insert data in that table, before Inserting the data I need to truncate the table but truncate is taking too long.
I have read on some links that if a table is used by another person or some locks are applied then truncate takes too long time but here I am the only user and I have applied no locks on that.
Also no other transactions are open when I tried to truncate the table.
As my database is on SQL Azure I am not supposed to drop the indexes as it does not allow me to insert the data without an index.

Drop all the indexes from the table and then truncate, if you want to insert the data then insert data and after inserting the data recreate the indexes

When deleting from Azure you can get into all sorts of trouble, but truncate is almost always an issue of locking. If you can't fix that you can always do this trick when deleting from Azure.
declare #iDeleteCounter int =1
while #iDeleteCounter > 0
begin
begin transaction deletes;
with deleteTable as
(
select top 100000 * from mytable where mywhere
)
delete from deleteTable
commit transaction deletes
select #iDeleteCounter = count(1) from mytable where mywhere
print 'deleted 100000 from table'
end

Related

How to force a running t-sql query (half done) to commit?

I have database on Sql Server 2008 R2.
On that database a delete query on 400 Million records, has been running for 4 days , but I need to reboot the machine. How can I force it to commit whatever is deleted so far? I want to reject that data which is deleted by running query so far.
But problem is that query is still running and will not complete before the server reboot.
Note : I have not set any isolation / begin/end transaction for the query. The query is running in SSMS studio.
If machine reboot or I cancelled the query, then database will go in recovery mode and it will recovering for next 2 days, then I need to re-run the delete and it will cost me another 4 days.
I really appreciate any suggestion / help or guidance in this.
I am novice user of sql server.
Thanks in Advance
Regards
There is no way to stop SQL Server from trying to bring the database into a transactionally consistent state. Every single statement is implicitly a transaction itself (if not part of an outer transaction) and is executing either all or nothing. So if you either cancel the query or disconnect or reboot the server, SQL Server will from transaction log write the original values back to the updated data pages.
Next time when you delete so many rows at once, don't do it at once. Divide the job in smaller chunks (I always use 5.000 as a magic number, meaning I delete 5000 rows at the time in the loop) to minimize transaction log use and locking.
set rowcount 5000
delete table
while ##rowcount = 5000
delete table
set rowcount 0
If you are deleting that many rows you may have a better time with truncate. Truncate deletes all rows from the table very efficiently. However, I'm assuming that you would like to keep some of the records in the table. The stored procedure below backs up the data you would like to keep into a temp table then truncates then re-inserts the records that were saved. This can clean a huge table very quickly.
Note that truncate doesn't play well with Foreign Key constraints so you may need to drop those then recreate them after cleaned.
CREATE PROCEDURE [dbo].[deleteTableFast] (
#TableName VARCHAR(100),
#WhereClause varchar(1000))
AS
BEGIN
-- input:
-- table name: is the table to use
-- where clause: is the where clause of the records to KEEP
declare #tempTableName varchar(100);
set #tempTableName = #tableName+'_temp_to_truncate';
-- error checking
if exists (SELECT [Table_Name] FROM Information_Schema.COLUMNS WHERE [TABLE_NAME] =(#tempTableName)) begin
print 'ERROR: already temp table ... exiting'
return
end
if not exists (SELECT [Table_Name] FROM Information_Schema.COLUMNS WHERE [TABLE_NAME] =(#TableName)) begin
print 'ERROR: table does not exist ... exiting'
return
end
-- save wanted records via a temp table to be able to truncate
exec ('select * into '+#tempTableName+' from '+#TableName+' WHERE '+#WhereClause);
exec ('truncate table '+#TableName);
exec ('insert into '+#TableName+' select * from '+#tempTableName);
exec ('drop table '+#tempTableName);
end
GO
You must know D(Durability) in ACID first before you understand why database goes to Recovery mode.
Generally speaking, you should avoid long running SQL if possible. Long running SQL means more lock time on resource, larger transaction log and huge rollback time when it fails.
Consider divided your task some id or time. For example, you want to insert large volume data from TableSrc to TableTarget, you can write query like
DECLARE #BATCHCOUNT INT = 1000;
DECLARE #Id INT = 0;
DECLARE #Max = ...;
WHILE Id < #Max
BEGIN
INSERT INTO TableTarget
FROM TableSrc
WHERE PrimaryKey >= #Id AND #PrimaryKey < #Id + #BatchCount;
SET #Id = #Id + #BatchCount;
END
It's ugly more code and more error prone. But it's the only way I know to deal with huge data volume.

SQL Truncate, Delete, Drop advise

I have a table in a SQL db that I want to remove the data from? I want to keep the columns though.
e.g. my table has 3 columns, Name, Age, Date. I don't want to remove these, i just want to remove the data.
Should I should Truncate, Delete or Drop?
Don't drop - it will delete the data and the definition.
If you delete - the data is gone and auto-increment values go on from the last value.
If you truncate - then it is like you just did create the table. No data and all counters resetted
Truncate is very fast - like quick format of the table. It does not require any extra space when deleting. You can not rollback his operation. You can not specify conditions. This is best choice for deleting all data from table.
Delete is much slower and you need extra space for this operation, because you must be able to rollback the data. If you need to delete all data, use truncate. If you need to specify conditions, use delete.
Drop table - you can delete data from table by dropping and creating it again like you would truncate it, but it is slower and you can have some other problems, like foreign key dependencies. I definitely don't recommend this operation.
delete from TableName should do the trick
DROPing the table will remove the colums too.
Delete: The DELETE Statement is used to delete rows from a table.
Truncate: The SQL TRUNCATE command is used to delete all the rows from the table and free the space containing the table.
*Drop:* The SQL DROP command is used to remove an object from the database. If you drop a table, all the rows in the table is deleted and the table structure is removed from the database
Truncate the table. That would be good option in your case
We can rollback the data in conditions of Delete, Truncate & Drop.
But must be used Begin Transaction before executing query Delete, Drop & Truncate.
Here is example :
Create Database Ankit
Create Table Tbl_Ankit(Name varchar(11))
insert into tbl_ankit(name) values('ankit');
insert into tbl_ankit(name) values('ankur');
insert into tbl_ankit(name) values('arti');
Select * From Tbl_Ankit
/*======================For Delete==================*/
Begin Transaction
Delete From Tbl_Ankit where Name='ankit'
Rollback
Select * From Tbl_Ankit
/*======================For Truncate==================*/
Begin Transaction
Truncate Table Tbl_Ankit
Rollback
Select * From Tbl_Ankit
/*======================For Drop==================*/
Begin Transaction
Drop Table Tbl_Ankit
Rollback
Select * From Tbl_Ankit

Lock table while inserting

I have a large table that get populated from a view. This is done because the view takes a long time to run and it is easier to have the data readily available in a table. A procedure is run every so often that updates the table.
TRUNCATE TABLE LargeTable
INSERT INTO LargeTable
SELECT *
FROM viewLargeView
WITH (HOLDLOCK)
I would like to lock this table when inserting so if someone tries to select a record they will not receive none after the truncate. The lock I am using seems to lock the view and not the table.
Is there a better way to approach this problem?
It's true that your correct locking hint affects the source view.
To make it so that nobody can read from the table while you're inserting:
insert into LargeTable with (tablockx)
...
You don't have to do anything to make the table look empty until after the insert completes. An insert always runs in a transaction, and no other process can read uncommitted rows, unless they explicitly specify with (nolock) or set transaction isolation level read uncommitted. There is no way to protect from that as far as I know.
BEGIN TRY
BEGIN TRANSACTION t_Transaction
TRUNCATE TABLE LargeTable
INSERT INTO LargeTable
SELECT *
FROM viewLargeView
WITH (HOLDLOCK)
COMMIT TRANSACTION t_Transaction
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION t_Transaction
END CATCH

Fastest way to update 120 Million records

I need to initialize a new field with the value -1 in a 120 Million record table.
Update table
set int_field = -1;
I let it run for 5 hours before canceling it.
I tried running it with transaction level set to read uncommitted with the same results.
Recovery Model = Simple.
MS SQL Server 2005
Any advice on getting this done faster?
The only sane way to update a table of 120M records is with a SELECT statement that populates a second table. You have to take care when doing this. Instructions below.
Simple Case
For a table w/out a clustered index, during a time w/out concurrent DML:
SELECT *, new_col = 1 INTO clone.BaseTable FROM dbo.BaseTable
recreate indexes, constraints, etc on new table
switch old and new w/ ALTER SCHEMA ... TRANSFER.
drop old table
If you can't create a clone schema, a different table name in the same schema will do. Remember to rename all your constraints and triggers (if applicable) after the switch.
Non-simple Case
First, recreate your BaseTable with the same name under a different schema, eg clone.BaseTable. Using a separate schema will simplify the rename process later.
Include the clustered index, if applicable. Remember that primary keys and unique constraints may be clustered, but not necessarily so.
Include identity columns and computed columns, if applicable.
Include your new INT column, wherever it belongs.
Do not include any of the following:
triggers
foreign key constraints
non-clustered indexes/primary keys/unique constraints
check constraints or default constraints. Defaults don't make much of difference, but we're trying to keep
things minimal.
Then, test your insert w/ 1000 rows:
-- assuming an IDENTITY column in BaseTable
SET IDENTITY_INSERT clone.BaseTable ON
GO
INSERT clone.BaseTable WITH (TABLOCK) (Col1, Col2, Col3)
SELECT TOP 1000 Col1, Col2, Col3 = -1
FROM dbo.BaseTable
GO
SET IDENTITY_INSERT clone.BaseTable OFF
Examine the results. If everything appears in order:
truncate the clone table
make sure the database in in bulk-logged or simple recovery model
perform the full insert.
This will take a while, but not nearly as long as an update. Once it completes, check the data in the clone table to make sure it everything is correct.
Then, recreate all non-clustered primary keys/unique constraints/indexes and foreign key constraints (in that order). Recreate default and check constraints, if applicable. Recreate all triggers. Recreate each constraint, index or trigger in a separate batch. eg:
ALTER TABLE clone.BaseTable ADD CONSTRAINT UQ_BaseTable UNIQUE (Col2)
GO
-- next constraint/index/trigger definition here
Finally, move dbo.BaseTable to a backup schema and clone.BaseTable to the dbo schema (or wherever your table is supposed to live).
-- -- perform first true-up operation here, if necessary
-- EXEC clone.BaseTable_TrueUp
-- GO
-- -- create a backup schema, if necessary
-- CREATE SCHEMA backup_20100914
-- GO
BEGIN TRY
BEGIN TRANSACTION
ALTER SCHEMA backup_20100914 TRANSFER dbo.BaseTable
-- -- perform second true-up operation here, if necessary
-- EXEC clone.BaseTable_TrueUp
ALTER SCHEMA dbo TRANSFER clone.BaseTable
COMMIT TRANSACTION
END TRY
BEGIN CATCH
SELECT ERROR_MESSAGE() -- add more info here if necessary
ROLLBACK TRANSACTION
END CATCH
GO
If you need to free-up disk space, you may drop your original table at this time, though it may be prudent to keep it around a while longer.
Needless to say, this is ideally an offline operation. If you have people modifying data while you perform this operation, you will have to perform a true-up operation with the schema switch. I recommend creating a trigger on dbo.BaseTable to log all DML to a separate table. Enable this trigger before you start the insert. Then in the same transaction that you perform the schema transfer, use the log table to perform a true-up. Test this first on a subset of the data! Deltas are easy to screw up.
If you have the disk space, you could use SELECT INTO and create a new table. It's minimally logged, so it would go much faster
select t.*, int_field = CAST(-1 as int)
into mytable_new
from mytable t
-- create your indexes and constraints
GO
exec sp_rename mytable, mytable_old
exec sp_rename mytable_new, mytable
drop table mytable_old
I break the task up into smaller units. Test with different batch size intervals for your table, until you find an interval that performs optimally. Here is a sample that I have used in the past.
declare #counter int
declare #numOfRecords int
declare #batchsize int
set #numOfRecords = (SELECT COUNT(*) AS NumberOfRecords FROM <TABLE> with(nolock))
set #counter = 0
set #batchsize = 2500
set rowcount #batchsize
while #counter < (#numOfRecords/#batchsize) +1
begin
set #counter = #counter + 1
Update table set int_field = -1 where int_field <> -1;
end
set rowcount 0
If your int_field is indexed, remove the index before running the update. Then create your index again...
5 hours seem like a lot for 120 million recs.
set rowcount 1000000
Update table set int_field = -1 where int_field<>-1
see how fast that takes, adjust and repeat as necessary
What I'd try first is
to drop all constraints, indexes, triggers and full text indexes first before you update.
If above wasn't performant enough, my next move would be
to create a CSV file with 12 million records and bulk import it using bcp.
Lastly, I'd create a new heap table (meaning table with no primary key) with no indexes on a different filegroup, populate it with -1. Partition the old table, and add the new partition using "switch".
When adding a new column ("initialize a new field") and setting a single value to each existing row, I use the following tactic:
ALTER TABLE MyTable
add NewColumn int not null
constraint MyTable_TemporaryDefault
default -1
ALTER TABLE MyTable
drop constraint MyTable_TemporaryDefault
If the column is nullable and you don't include a "declared" constraint, the column will be set to null for all rows.
declare #cnt bigint
set #cnt = 1
while #cnt*100<10000000
begin
UPDATE top(100) [Imp].[dbo].[tablename]
SET [col1] = xxxx
WHERE[col1] is null
print '#cnt: '+convert(varchar,#cnt)
set #cnt=#cnt+1
end
Sounds like an indexing problem, like Pabla Santa Cruz mentioned. Since your update is not conditional, you can DROP the column and RE-ADD it with a DEFAULT value.
In general, recommendation are next:
Remove or just Disable all INDEXES, TRIGGERS, CONSTRAINTS on the table;
Perform COMMIT more often (e.g. after each 1000 records that were updated);
Use select ... into.
But in particular case you should choose the most appropriate solution or their combination.
Also bear in mind that sometime index could be useful e.g. when you perform update of non-indexed column by some condition.
If the table has an index which you can iterate over I would put update top(10000) statement in a while loop moving over the data. That would keep the transaction log slim and won't have such a huge impact on the disk system. Also, I would recommend to play with maxdop option (setting it closer to 1).

Copy one column to another for over a billion rows in SQL Server database

Database : SQL Server 2005
Problem : Copy values from one column to another column in the same table with a billion+
rows.
test_table (int id, bigint bigid)
Things tried 1: update query
update test_table set bigid = id
fills up the transaction log and rolls back due to lack of transaction log space.
Tried 2 - a procedure on following lines
set nocount on
set rowcount = 500000
while #rowcount > 0
begin
update test_table set bigid = id where bigid is null
set #rowcount = ##rowcount
set #rowupdated = #rowsupdated + #rowcount
end
print #rowsupdated
The above procedure starts slowing down as it proceeds.
Tried 3 - Creating a cursor for update.
generally discouraged in SQL Server documentation and this approach updates one row at a time which is too time consuming.
Is there an approach that can speed up the copying of values from one column to another. Basically I am looking for some 'magic' keyword or logic that will allow the update query to rip through the billion rows half a million at a time sequentially.
Any hints, pointers will be much appreciated.
I'm going to guess that you are closing in on the 2.1billion limit of an INT datatype on an artificial key for a column. Yes, that's a pain. Much easier to fix before the fact than after you've actually hit that limit and production is shut down while you are trying to fix it :)
Anyway, several of the ideas here will work. Let's talk about speed, efficiency, indexes, and log size, though.
Log Growth
The log blew up originally because it was trying to commit all 2b rows at once. The suggestions in other posts for "chunking it up" will work, but that may not totally resolve the log issue.
If the database is in SIMPLE mode, you'll be fine (the log will re-use itself after each batch). If the database is in FULL or BULK_LOGGED recovery mode, you'll have to run log backups frequently during the running of your operation so that SQL can re-use the log space. This might mean increasing the frequency of the backups during this time, or just monitoring the log usage while running.
Indexes and Speed
ALL of the where bigid is null answers will slow down as the table is populated, because there is (presumably) no index on the new BIGID field. You could, (of course) just add an index on BIGID, but I'm not convinced that is the right answer.
The key (pun intended) is my assumption that the original ID field is probably the primary key, or the clustered index, or both. In that case, lets take advantage of that fact, and do a variation of Jess' idea:
set #counter = 1
while #counter < 2000000000 --or whatever
begin
update test_table set bigid = id
where id between #counter and (#counter + 499999) --BETWEEN is inclusive
set #counter = #counter + 500000
end
This should be extremely fast, because of the existing indexes on ID.
The ISNULL check really wasn't necessary anyway, neither is my (-1) on the interval. If we duplicate some rows between calls, that's not a big deal.
Use TOP in the UPDATE statement:
UPDATE TOP (#row_limit) dbo.test_table
SET bigid = id
WHERE bigid IS NULL
You could try to use something like SET ROWCOUNT and do batch updates:
SET ROWCOUNT 5000;
UPDATE dbo.test_table
SET bigid = id
WHERE bigid IS NULL
GO
and then repeat this as many times as you need to.
This way, you're avoiding the RBAR (row-by-agonizing-row) symptoms of cursors and while loops, and yet, you don't unnecessarily fill up your transaction log.
Of course, in between runs, you'd have to do backups (especially of your log) to keep its size within reasonable limits.
Is this a one time thing? If so, just do it by ranges:
set counter = 500000
while #counter < 2000000000 --or whatever your max id
begin
update test_table set bigid = id where id between (#counter - 500000) and #counter and bigid is null
set counter = #counter + 500000
end
I didn't run this to try it, but if you can get it to update 500k at a time I think you're moving in the right direction.
set rowcount 500000
update test_table tt1
set bigid = (SELECT tt2.id FROM test_table tt2 WHERE tt1.id = tt2.id)
where bigid IS NULL
You can also try changing the recover model so you don't log the transactions
ALTER DATABASE db1
SET RECOVERY SIMPLE
GO
update test_table
set bigid = id
GO
ALTER DATABASE db1
SET RECOVERY FULL
GO
First step, if there are any, would be to drop indexes before the operation. This is probably what is causing the speed degrade with time.
The other option, a little outside the box thinking...can you express the update in such a way that you could materialize the column values in a select? If you can do this then you could create what amounts to a NEW table using SELECT INTO which is a minimally logged operation (assuming in 2005 that you are set to a recovery model of SIMPLE or BULK LOGGED). This would be pretty fast and then you can drop the old table, rename this table to to old table name and recreate any indexes.
select id, CAST(id as bigint) bigid into test_table_temp from test_table
drop table test_table
exec sp_rename 'test_table_temp', 'test_table'
I second the
UPDATE TOP(X) statement
Also to suggest, if you're in a loop, add in some WAITFOR delay or COMMIT between, to allow other processes some time to use the table if needed vs. blocking forever until all the updates are completed