I have a classic "sales" database that contains millions of rows in certain tables. On each of these large tables, I have an associated "delete" trigger and "backup" table.
This backup table keeps "deleted" rows for the last 7 days : the trigger starts by copying deleted rows into that backup table, then will perform a delete in the backup in this fashion :
CREATE TRIGGER dbo.TRIGGER
ON dbo.EXAMPLE_DATA
FOR DELETE AS
INSERT INTO EXAMPLE_BACKUP
select getDate(), *
from deleted
DELETE from EXAMPLE_BACKUP
where modified < dateadd(dd, -7, getDate())
The structure of the backup table is similar to the original data table (keys, values). The only difference is that I add in the backup tables a "modified" field, which I integrate to the key.
A colleague of mine told me I should use "a loop" because my delete statement will cause timeouts/issues as soon as the backup table contains several millions of rows. Will that delete actually blow up at some point ? Should I do something in a different manner ?
It looks like Sybase 12.5 supports table partitioning; if your design is such that the data can be retained for exactly 7 days (using a hard breakpoint), you could partition your table on the day of the year, and construct a view to represent the current data. As the clock ticks past a certain day, you could truncate the older partitions explicitly.
Just a thought.
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc20020_1251/html/databases/X15880.htm
Otherwise, deleting in a loop is a reasonable method for deleting large subsets of data without blowing up your transaction log. Here's one method using SQL Server:
http://sqlserverperformance.wordpress.com/2011/08/13/gradually-deleting-data-in-sql-server/
Related
I have a table with 372 million rows, I want to delete old rows starting from the first ones without blocking the DB. How can I reach that?
The table have
id | memberid | type | timeStamp | message |
1 123 10 2014-03-26 13:17:02.000 text
UPDATE:
I deleted about 30 GB of Space in DB, but my DISC is ON 6gb space yet..
Any suggestion to get that free space?
Thank you in advance!
select 1;
while(##ROWCOUNT > 0)
begin
WAITFOR DELAY '00:00:10';
delete top(10) from tab where <your-condition>;
end
delete in chunks using above sql
You may want to consider another approach:
Create a table based on the existing one
Adjust the identity column in the empty table to start from the latest value from the old table (if there is any)
Swap the two tables using sp_rename
Copy the records in batches into the new table from the old table
You can do whatever you want with the old table.
BACKUP your database before you start deleting records / play with tables.
the best performance is to query data by id, then:
delete from TABLENAME where id>XXXXX
is the lowest impact you can execute.
You can also divide the operation in suboperations limiting the number of deleted rows for each operation adding ROWCONT declatarion,
example if you want to delete only 5.000.000 of rows per call you can do this:
SET ROWCOUNT=5000000;
delete from TABLENAME where id>XXXXX;
here you can find a reference https://msdn.microsoft.com/it-it/library/ms188774%28v=sql.120%29.aspx?f=255&MSPPError=-2147217396
The answer to the best way to delete rows from an Oracle table is: It
depends! In a perfect world where you can take the table offline for
maintenance, a complete reorganization is always best because it does
the delete and places the table back into a pristine state. We will
address the tools for doing large scale deletes and the appropriate
methods for each environment.
Factors and tools for massive deletes
The choice of the delete methods depends on many factors:
Is the target table partitioned? Partitioning greatly improves delete performance. For example, it is common to have a large time-based table partition and deleting elderly rows from these table can be as simple as dropping the desired partition. See these notes on managing partitioned tables.
Can you reorganize the table after the delete to remove fragmentation?
What percentage of the table will be deleted? In cases where you are deleting more than 30-50% of the rows in a very large table it is faster to use CTAS to delete from a table than to do a vanilla delete and a reorganization of the table blocks and a rebuild of the constraints and indexes.
Do you want to release the space consumed by the deleted rows? If you know that the empty space will be re-used by subsequent DML then you will want to leave the empty space within the table. Conversely, if you want to released the space back onto the tablespace then you will need to reorganize the table.
There are many tools that you can use to delete from large tables:
dbms_metadata.get_ddl: This procedure wil punch-off the definitions of all table indexes and constraints.
dbms_redefinition: This procedure will reorganize a table while it remains available for updating.
Create Table as Select: You can use CTAS to copy a table while removing rows in bulk.
Rename table: If you copy a table when deleting rows you can rename it back to its original name.
COMMIT: In cases where a delete might run for many hours, even the largest UNDO log will not be able to hold the rollback information and it becomes necessary to do the delete in a PL/SQL loop, issuing a COMMIT every zillion-rows to free-up the undo logs. This approach will be re-startable automatically because the delete will pick-up where it left off as on your last commit checkpoint.
More information visit here
I have a huge database with a table containing billion of records. I need to do monthly cleanup of this table (delete oldest records based on date field).
Since I need to delete a few hundred million records for one month worth of data, doing a DELETE or even deleting in chunks takes too long, because of indexes that slows the process.
bcp data out + truncate + bcp data in is also too long.
Now the solution I want to try is to partition the table into different filegroups (one month per partition). I get the part of building the partitions, but how will I delete a filegroup along with its data?
You can switch partitions to a new table and then drop that table. Filegroups do not really have anything to do with it other than the restriction that the table you switch to must be on the same filegroup. You do not necessarily have to map your partitions to separate filegroups although you may want to do that for other reasons.
Here's a good example of a partition-wise rolloff in sql server.
I have some database tables those contains some aggregated data. Their records (some thousand / tables) are recomputed periodically by an external .NET app, so the old data should be deleted and the new should be inserted periodically. Update is not an option in this case.
Between the delete / insert there is an intermediate time, when the records state is inconsistent (old ones are deleted, new ones are not in the table yet), so making select query in that state results an incorrect result.
I use subsonic simplerepository to handle database features.
What is the best practice / pattern to workaround / handle this state?
Three options come to my mind:
Create a transaction with a lock on reads until it is done. This only works if processes are relatively fast. A few thousand records shouldn't be too bad if you transact/lock a table at a time -- if you lock the whole process, that could be costly! But if data is related, this is what you'd have to do
Write to temporary versions of the table, then drop old tables and rename temp tables.
Same as above, except bulk copy from temp tables (not necessarily SQL temporary tables, but ancillary holding tables would suffice) into correct tables, first deleting from main table. you'd still want to use a transaction for this.
First off let me say I am running on SQL Server 2005 so I don't have access to MERGE.
I have a table with ~150k rows that I am updating daily from a text file. As rows fall out of the text file I need to delete them from the database and if they change or are new I need to update/insert accordingly.
After some testing I've found that performance wise it is exponentially faster to do a full delete and then bulk insert from the text file rather than read through the file line by line doing an update/insert. However I recently came across some posts discussing mimicking the MERGE functionality of SQL Server 2008 using a temp table and the output of the UPDATE statement.
I was interested in this because I am looking into how I can eliminate the time in my Delete/Bulk Insert method when the table has no rows. I still think that this method will be the fastest so I am looking for the best way to solve the empty table problem.
Thanks
I think your fastest method would be to:
Drop all foreign keys and indexes
from your table.
Truncate your
table.
Bulk insert your data.
Recreate your foreign keys and
indexes.
Is the problem that Joe's solution is not fast enough, or that you can not have any activity against the target table while your process runs? If you just need to prevent users from running queries against your target table, you should contain your process within a transaction block. This way, when your TRUNCATE TABLE executes, it will create a table lock that will be held for the duration of the transaction, like so:
begin tran;
truncate table stage_table
bulk insert stage_table
from N'C:\datafile.txt'
commit tran;
An alternative solution which would satsify your requirement for not having "down time" for the table you are updating.
It sounds like originally you were reading the file and doing an INSERT/UPDATE/DELETE 1 row at a time. A more performant approach than that, that does not involve clearing down the table is as follows:
1) bulk load the file into a new, separate table (no indexes)
2) then create the PK on it
3) Run 3 statements to update the original table from this new (temporary) table:
DELETE rows in the main table that don't exist in the new table
UPDATE rows in the main table where there is a matching row in the new table
INSERT rows into main table from the new table where they don't already exist
This will perform better than row-by-row operations and should hopefully satisfy your overall requirements
There is a way to update the table with zero downtime: keep two day's data in the table, and delete the old rows after loading the new ones!
Add a DataDate column representing the date for which your ~150K rows are valid.
Create a one-row, one-column table with "today's" DataDate.
Create a view of the two tables that selects only rows matching the row in the DataDate table. Index it if you like. Readers will now refer to this view, not the table.
Bulk insert the rows. (You'll obviously need to add the DataDate to each row.)
Update the DataDate table. View updates Instantly!
Delete yesterday's rows at your leisure.
SELECT performance won't suffer; joining one row to 150,000 rows along the primary key should present no problem to any server less than 15 years old.
I have used this technique often, and have also struggled with processes that relied on sp_rename. Production processes that modify the schema are a headache. Don't.
For raw speed, I think with ~150K rows in the table, I'd just drop the table, recreate it from scratch (without indexes) and then bulk load afresh. Once the bulk load has been done, then create the indexes.
This assumes of course that having a period of time when the table is empty/doesn't exist is acceptable which it does sound like could be the case.
I need to clean out a very bloated SQL database by deleting records that are older than two years from a number of tables. What is the most efficient way of doing this?.
Do you have any way to determine how "old" a record is? (i.e., is there a column in the table that represents either the age of the row or a date that can be used to calculate the age?). If so, it should be a simple
DELETE FROM Table WHERE Age > 2
For example, if you have a DateTime column called CreateDate, you could do this:
DELETE FROM Table WHERE DATEADD(year, 2, CreateDate) < getdate()
In addition to Adam Robinson's good answer: When performing this type of operation:
Run a SELECT query with the DELETE's WHERE clause first to make sure you're getting "the right data"
Do a full backup
Run the thing in "off" hours so as not to affect users too much
I've seen dba do this in a few different companies and it always seems to use the following format:
Backup the table
Drop any indexes
Select the rows you want to keep into a temp table
Truncate the original table
Insert (into your source table) from you temp table
Recreate the indexes
The benefit to this approach is that this update doesnt write to the logs so they don't get blown by thousands of delete entries. It's also faster.
The drawback is that the update doesn't write to the logs so your only option is to restore your backup.
You should think about putting house keeping in place. If the above, is too scary, then you could also use the house keeping to winnow the database over a matter of time.
In MSSQL, you could create a job to run daily which deletes the first 1000 rows of your query. To steal Adam's query -
DELETE TOP 1000 FROM Table WHERE DATEADD(year, 2, CreateDate) < getdate()
This would be very safe and would get rid of your data in three months or so safely and would them also maintain the size of the db in the future.
Your database will get to use this space in the future but if you want to recover the space you will need to shrink the database. Read around if you are interested - whether it is worth it depends on the amount of space to recover versus the total size of the db.