What is the best way to delete all of a large table in t-sql?

What is the best way to delete all of a large table in t-sql? - sql

We've run across a slightly odd situation. Basically there are two tables in one of our databases that are fed tons and tons of logging info we don't need or care about. Partially because of this we're running out of disk space.
I'm trying to clean out the tables, but it's taking forever (there are still 57,000,000+ records after letting this run through the weekend... and that's just the first table!)
Just using delete table is taking forever and eats up drive space (I believe because of the transaction log.) Right now I'm using a while loop to delete records X at a time, while playing around with X to determine what's actually fastest. For instance X=1000 takes 3 seconds, while X=100,000 takes 26 seconds... which doing the math is slightly faster.
But the question is whether or not there is a better way?
(Once this is done, going to run a SQL Agent job go clean the table out once a day... but need it cleared out first.)

TRUNCATE the table or disable indexes before deleting
TRUNCATE TABLE [tablename]
Truncating will remove all records from the table without logging each deletion separately.

To add to the other responses, if you want to hold onto the past day's data (or past month or year or whatever), then save that off, do the TRUNCATE TABLE, then insert it back into the original table:
SELECT
*
INTO
tmp_My_Table
FROM
My_Table
WHERE
<Some_Criteria>
TRUNCATE TABLE My_Table
INSERT INTO My_Table SELECT * FROM tmp_My_Table
The next thing to do is ask yourself why you're inserting all of this information into a log if no one cares about it. If you really don't need it at all then turn off the logging at the source.

1) Truncate table
2) script out the table, drop and recreate the table

TRUNCATE TABLE [tablename]
will delete all the records without logging.

Depending on how much you want to keep, you could just copy the records you want to a temp table, truncate the log table, and copy the temp table records back to the log table.

If you can work out the optimum x this will constantly loop around the delete at the quickest rate. Setting the rowcount limits the number of records that will get deleted in each step of the loop. If the logfile is getting too big; stick a counter in the loop and truncate every million rows or so.
set ##rowcount x
while 1=1
Begin
delete from table
If ##Rowcount = 0 break
End
Change the logging mode on the db to simple or bulk logged will reduce some of the delete overhead.

check this
article from MSDN Delete_a_Huge_Amount_of_Data_from
Information on Recovery Models
and View or Change the Recovery Model of a Database

Related

SQL - Delete table with 372 million rows starting from first row

I have a table with 372 million rows, I want to delete old rows starting from the first ones without blocking the DB. How can I reach that?
The table have
id | memberid | type | timeStamp | message |
1 123 10 2014-03-26 13:17:02.000 text
UPDATE:
I deleted about 30 GB of Space in DB, but my DISC is ON 6gb space yet..
Any suggestion to get that free space?
Thank you in advance!

select 1;
while(##ROWCOUNT > 0)
begin
WAITFOR DELAY '00:00:10';
delete top(10) from tab where <your-condition>;
end
delete in chunks using above sql

You may want to consider another approach:
Create a table based on the existing one
Adjust the identity column in the empty table to start from the latest value from the old table (if there is any)
Swap the two tables using sp_rename
Copy the records in batches into the new table from the old table
You can do whatever you want with the old table.
BACKUP your database before you start deleting records / play with tables.

the best performance is to query data by id, then:
delete from TABLENAME where id>XXXXX
is the lowest impact you can execute.
You can also divide the operation in suboperations limiting the number of deleted rows for each operation adding ROWCONT declatarion,
example if you want to delete only 5.000.000 of rows per call you can do this:
SET ROWCOUNT=5000000;
delete from TABLENAME where id>XXXXX;
here you can find a reference https://msdn.microsoft.com/it-it/library/ms188774%28v=sql.120%29.aspx?f=255&MSPPError=-2147217396

The answer to the best way to delete rows from an Oracle table is: It
depends! In a perfect world where you can take the table offline for
maintenance, a complete reorganization is always best because it does
the delete and places the table back into a pristine state. We will
address the tools for doing large scale deletes and the appropriate
methods for each environment.
Factors and tools for massive deletes
The choice of the delete methods depends on many factors:
Is the target table partitioned? Partitioning greatly improves delete performance. For example, it is common to have a large time-based table partition and deleting elderly rows from these table can be as simple as dropping the desired partition. See these notes on managing partitioned tables.
Can you reorganize the table after the delete to remove fragmentation?
What percentage of the table will be deleted? In cases where you are deleting more than 30-50% of the rows in a very large table it is faster to use CTAS to delete from a table than to do a vanilla delete and a reorganization of the table blocks and a rebuild of the constraints and indexes.
Do you want to release the space consumed by the deleted rows? If you know that the empty space will be re-used by subsequent DML then you will want to leave the empty space within the table. Conversely, if you want to released the space back onto the tablespace then you will need to reorganize the table.
There are many tools that you can use to delete from large tables:
dbms_metadata.get_ddl: This procedure wil punch-off the definitions of all table indexes and constraints.
dbms_redefinition: This procedure will reorganize a table while it remains available for updating.
Create Table as Select: You can use CTAS to copy a table while removing rows in bulk.
Rename table: If you copy a table when deleting rows you can rename it back to its original name.
COMMIT: In cases where a delete might run for many hours, even the largest UNDO log will not be able to hold the rollback information and it becomes necessary to do the delete in a PL/SQL loop, issuing a COMMIT every zillion-rows to free-up the undo logs. This approach will be re-startable automatically because the delete will pick-up where it left off as on your last commit checkpoint.
More information visit here

T-SQL Optimize DELETE of many records

I have a table that can grew to millions records (50 millions for example). On each 20 minutes records that are older than 20 minutes are deleted.
The problems is that if the table has so many records such deletion can take a lot of time and I want to make it faster.
I can not do "truncate table" because I want to remove only records that are older than 20 minutes. I suppose that when doing the "delete" and filtering the information that need to be delete, the server is creating log file or something and this take much time?
Am I right? Is there a way to stop any flag or option to optimize the delete, and then to turn on the stopped option?

To expand on the batch delete suggestion, i'd suggest you do this far more regularly (every 20 seconds perhaps) - batch deletions are easy:
WHILE 1 = 1
BEGIN
DELETE TOP ( 4000 )
FROM YOURTABLE
WHERE YourIndexedDateColumn < DATEADD(MINUTE, -20, GETDATE())
IF ##ROWCOUNT = 0
BREAK
END
Your inserts may lag slightly whilst they wait for the locks to release but they should insert rather than error.
In regards to your table though, a table with this much traffic i'd expect to see on a very fast raid 10 array / perhaps even partitioned - are your disks up to it? Are your transaction logs on different disks to your data files? - they should be
EDIT 1 - Response to your comment
TO put a database into SIMPLE recovery:
ALTER DATABASE Database Name SET RECOVERY='SIMPLE'
This basically turns off transaction logging on the given database. Meaning in the event of data loss you would need loose all data since your last full backup. If you're OK with that, well this should save a lot of time when running large transactions. (NOTE that as the transaction is running, the logging still takes place in SIMPLE - to enable the rolling back of the transaction).
If there are tables within your database where you cant afford to loose data you'll need to leave your database in FULL recovery mode (i.e. any transaction gets logged (and hopefully flushed to *.trn files by your servers maintenance plans). As i stated in my question though, there is nothing stopping you having two databases, 1 in FULL and 1 in SIMPLE. the FULL database would be fore tables where you cant afford to loose any data (i.e. you could apply the transaction logs to restore data to a specific time) and the SIMPLE database would be for these massive high-traffic tables that you can allow data loss on in the event of a failure.
All of this is relevant assuming your creating full (*.bak) files every night & flushing your log files to *.trn files every half hour or so).
In regards to your index question, it's imperative your date column is indexed, if you check your execution plan and see any "TABLE SCAN" - that would be an indicator of a missing index.
Your date column i presume is DATETIME with a constraint setting the DEFAULT to getdate()?
You may find that you get better performance by replacing that with a BIGINT YYYYMMDDHHMMSS and then apply a CLUSTERED index to that column - note however that you can only have 1 clustered index per table, so if that table already has one you'll need to use a Non-Clustered index. (in case you didnt know, a clustered index basically tells SQL to store the information in that order, meaning that when you delete rows > 20 minutes SQL can literally delete stuff sequentially rather than hopping from page to page.

The log problem is probably due to the number of records deleted in the trasaction, to make things worse the engine may be requesting a lock per record (or by page wich is not so bad)
The one big thing here is how you determine the records to be deleted, i'm assuming you use a datetime field, if so make sure you have an index on the column otherwise it's a sequential scan of the table that will really penalize your process.
There are two things you may do depending of the concurrency of users an the time of the delete
If you can guarantee that no one is going to read or write when you delete, you can lock the table in exclusive mode and delete (this takes only one lock from the engine) and release the lock
You can use batch deletes, you would make a script with a cursor that provides the rows you want to delete, and you begin transtaction and commit every X records (ideally 5000), so you can keep the transactions shorts and not take that many locks
Take a look at the query plan for the delete process, and see what it shows, a sequential scan of a big table its never good.

Unfortunately for the purpose of this question and fortunately for the sake of consistency and recoverability of the databases in SQL server, putting a database into Simple recovery mode DOES NOT disable logging.
Every transaction still gets logged before committing it to the data file(s), the only difference would be that the space in the log would get released (in most cases) right after the transaction is either rolled back or committed in the Simple recovery mode, but this is not going to affect the performance of the DELETE statement in one way or another.

I had a similar problem when I needed to delete more than 70% of the rows from a big table with 3 indexes and a lot of foreign keys.
For this scenario, I saved the rows I wanted in a temp table, truncated the original table and reinserted the rows, something like:
SELECT * INTO #tempuser FROM [User] WHERE [Status] >= 600;
TRUNCATE TABLE [User];
INSERT [User] SELECT * FROM #tempuser;
I learned this technique with this link that explains:
DELETE is a a fully logged operation , and can be rolled back if something goes wrong
TRUNCATE Removes all rows from a table without logging the individual row deletions
In the article you can explore other strategies to resolve the delay in deleting many records, that one worked to me

MS SQL : The best way to delete rows from a ginormous table

I have a very large table [X], which has 170 million rows, and we need to archive data to keep only used records in [X]. We are doing this to keep our system fast as it is slowing down. We are only using a small amount of rows from the whole table (speaking of less then 10%), so we can afford to archive a lot of data into for example Archive.[X].
The problem is that when we try to delete records, it takes a lot of time. Now we have run the following checks for troubleshooting to see any possibilities why it takes so long
1) The table is indexed
2) No un-indexed foreign keys
3) No triggers doing extra work in the background on delete
Have any of you ever encountered a similar scenario? What is the best procedure to follow when doing something similar? And are there any tools out there that can help?
I appreciate your help!

Options
Why not take the 10% into a new table?
Batch delete/insert not in a transaction (see below)
Partition table (aka let the engine deal with it)
To populate an archive table
SELECT 'starting' -- sets ##ROWCOUNT
WHILE ##ROWCOUNT <> 0
BEGIN
DELETE TOP (50000) dbo.Mytable
OUTPUT DELETED.* INTO ArchiveTable
WHERE SomeCol < <Afilter>
-- maybe CHECKPOINT
WAIT FOR DELAY ...
END

You should go for partitioning your database/table.

Delete large portion of huge tables

I have a very large table (more than 300 millions records) that will need to be cleaned up. Roughly 80% of it will need to be deleted. The database software is MS SQL 2005. There are several indexes and statistics on the table but not external relationships.
The best solution I came up with, so far, is to put the database into "simple" recovery mode, copy all the records I want to keep to a temporary table, truncate the original table, set identity insert to on and copy back the data from the temp table.
It works but it's still taking several hours to complete. Is there a faster way to do this ?

As per the comments my suggestion would be to simply dispense with the copy back step and promote the table containing records to be kept to become the new main table by renaming it.
It should be quite straightforward to script out the index/statistics creation to be applied to the new table before it gets swapped in.
The clustered index should be created before the non clustered indexes.
A couple of points I'm not sure about though.
Whether it would be quicker to insert into a heap then create the clustered index afterwards. (I guess no if the insert can be done in clustered index order)
Whether the original table should be truncated before being dropped (I guess yes)

#uriDium -- Chunking using batches of 50,000 will escalate to a table lock, unless you have disabled lock escalation via alter table (sql2k8) or other various locking tricks.

I am not sure what the structure of your data is. When does a row become eligible for deletion? If it is a purely ID based on date based thing then you can create a new table for each day, insert your new data into the new tables and when it comes to cleaning simply drop the required tables. Then for any selects construct a view over all the tables. Just an idea.
EDIT: (In response to comments)
If you are maintaining a view over all the tables then no it won't be complicated at all. The complex part is coding the dropping and recreating of the view.
I am assuming that you don't want you data to be locked down too much during deletes. Why not chunk the delete operations. Created a SP that will delete the data in chunks, 50 000 rows at a time. This should make sure that SQL Server keeps a row lock instead of a table lock. Use the
WAITFOR DELAY 'x'
In your while loop so that you can give other queries a bit of breathing room. Your problem is the old age computer science, space vs time.

How-To delete 8,500,000 Records from one table on sql server

delete activities
where unt_uid is null
would be the fastest way but nobody can access the database / table until this statement has finished so this is a no-go.
I defined a cursor to get this task done during working time but anyway the impact to productivity is to big.
So how to delete these record so that the normal use of this database is guaranteed?
It's a SQL-2005 Server on a 32-bit Win2003. Second Question is: How Long would you estimate for this job to be done (6 hours or 60 hours)? (Yes, i know that depends on the load but assume that this is a small-business environment)

You can do it in chunks. For example, every 10 seconds execute:
delete from activities where activityid in
(select top 1000 activityid from activities where unt_uid is null)
Obviously define the row count (I arbitrarily picked 1000) and interval (I picked 10 seconds) which makes the most sense for your application.

Perhaps instead of deleting the records from your table, you could create a new identical table, insert the records you want to keep, and then rename the tables so the new one replaces the old one. This would still take some time, but the down-time on your site would be pretty minimal (just when swapping the tables)

Who can access the table will depend on your transaction isolation mode, I'd guess.
However, you're broadly right - lots of deletes is bad, particularly if your where clause means it cannot use an index - this means the database probably won't be able to lock only the rows it needs to delete, so it will end up taking a big lock on the whole table.
My best recommendation would be to redesign your application so you don't need to delete these rows, or possibly any rows.
You can either do this by partitioning the table such that you can simply drop partitions instead, or use the "copy the rows you want to keep then drop the table" recipe suggested by others.

I'd use the "nibbling delete" technique. From http://sqladvice.com/blogs/repeatableread/archive/2005/09/20/12795.aspx:
DECLARE #target int
SET #target = 2000
DECLARE #count int
SET #count = 2000
WHILE #count = 2000 BEGIN
DELETE FROM myBigTable
WHERE targetID IN
(SELECT TOP (#target) targetID
FROM myBigTable WITH(NOLOCK)
WHERE something = somethingElse)
SELECT #count = ##ROWCOUNT
WAITFOR DELAY '000:00:00.200'
END
I've used it for exactly this type of scenario.
The WAITFOR is important to keep, it allows other queries to do their work in between deletes.

In a small-business environment, it seems odd that you would need to delete 500,000 rows in standard operational behavior without affecting any other users. Typically for deletes that large, we're making a new table and using TRUNCATE/INSERT or sp_rename to overwrite the old one.
Having said that, in a special case, one of my monthly processes regularly can delete 200m rows in batches of around 3m at a time if it detects that it needs to re-run the process which generated those 200m rows. But this is a single-user process in a dedicated data warehouse database, and I wouldn't call it a small-business scenario.
I second the answers recommending seeking alternative approaches to your design.

i would create a task for this and schedule it to run during offpeak hours. But i would not suggest you to delete in the table being used. Move the rows you want to keep to new table and totally drop the current table with lots of rows you want to delete.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas