SQL - Delete table with 372 million rows starting from first row

SQL - Delete table with 372 million rows starting from first row - sql

I have a table with 372 million rows, I want to delete old rows starting from the first ones without blocking the DB. How can I reach that?
The table have
id | memberid | type | timeStamp | message |
1 123 10 2014-03-26 13:17:02.000 text
UPDATE:
I deleted about 30 GB of Space in DB, but my DISC is ON 6gb space yet..
Any suggestion to get that free space?
Thank you in advance!

select 1;
while(##ROWCOUNT > 0)
begin
WAITFOR DELAY '00:00:10';
delete top(10) from tab where <your-condition>;
end
delete in chunks using above sql

You may want to consider another approach:
Create a table based on the existing one
Adjust the identity column in the empty table to start from the latest value from the old table (if there is any)
Swap the two tables using sp_rename
Copy the records in batches into the new table from the old table
You can do whatever you want with the old table.
BACKUP your database before you start deleting records / play with tables.

the best performance is to query data by id, then:
delete from TABLENAME where id>XXXXX
is the lowest impact you can execute.
You can also divide the operation in suboperations limiting the number of deleted rows for each operation adding ROWCONT declatarion,
example if you want to delete only 5.000.000 of rows per call you can do this:
SET ROWCOUNT=5000000;
delete from TABLENAME where id>XXXXX;
here you can find a reference https://msdn.microsoft.com/it-it/library/ms188774%28v=sql.120%29.aspx?f=255&MSPPError=-2147217396

The answer to the best way to delete rows from an Oracle table is: It
depends! In a perfect world where you can take the table offline for
maintenance, a complete reorganization is always best because it does
the delete and places the table back into a pristine state. We will
address the tools for doing large scale deletes and the appropriate
methods for each environment.
Factors and tools for massive deletes
The choice of the delete methods depends on many factors:
Is the target table partitioned? Partitioning greatly improves delete performance. For example, it is common to have a large time-based table partition and deleting elderly rows from these table can be as simple as dropping the desired partition. See these notes on managing partitioned tables.
Can you reorganize the table after the delete to remove fragmentation?
What percentage of the table will be deleted? In cases where you are deleting more than 30-50% of the rows in a very large table it is faster to use CTAS to delete from a table than to do a vanilla delete and a reorganization of the table blocks and a rebuild of the constraints and indexes.
Do you want to release the space consumed by the deleted rows? If you know that the empty space will be re-used by subsequent DML then you will want to leave the empty space within the table. Conversely, if you want to released the space back onto the tablespace then you will need to reorganize the table.
There are many tools that you can use to delete from large tables:
dbms_metadata.get_ddl: This procedure wil punch-off the definitions of all table indexes and constraints.
dbms_redefinition: This procedure will reorganize a table while it remains available for updating.
Create Table as Select: You can use CTAS to copy a table while removing rows in bulk.
Rename table: If you copy a table when deleting rows you can rename it back to its original name.
COMMIT: In cases where a delete might run for many hours, even the largest UNDO log will not be able to hold the rollback information and it becomes necessary to do the delete in a PL/SQL loop, issuing a COMMIT every zillion-rows to free-up the undo logs. This approach will be re-startable automatically because the delete will pick-up where it left off as on your last commit checkpoint.
More information visit here

Related

How to delete records without generating redo log

I have a table like following:
Create Table Txn_History nologging (
ID number,
Comment varchar2(300),
... (Another 20 columns),
Std_hash raw(1000)
);
This table is 8GB with 19 Million rows with a growth of around 50,000 rows daily.
I need to delete 300,000 rows and update 100,000 rows. I know that normally delete and update statement will cause Oracle database to generate redo log. The only way I know to avoid this is to create a new table with the updated result.
However, consider that the delete and update statement is only talking about 2% of the entire table, it appears not very worth to create a new table, follow by all corresponding indexes.
Do you have any new idea?

To be honest I don't think that the redo generation is a big problem here: just 300k rows to delete and 100k rows to update... For such batch operations Oracle uses fast "array update" REDO operation. Probably you need to trace your operation to find out real bottlenecks and load profile(IO/CPU, access paths, triggers, indexes, etc).
Basically it's better to use the partitioning option properly to update/delete(or truncate) by whole partitions.
There is also new alter table ... move including rows where ... feature starting from Oracle 12.2:
https://blogs.oracle.com/sql/how-to-delete-millions-of-rows-fast-with-sql

ORACLE 11g SET COLUMN NULL for specific Partition of large table

I have a Composite-List-List partitioned table with 19 Columns and about 400 million rows. Once a week new data is inserted in this table and before the insert I need to set the values of 2 columns to null for specific partitions.
Obvious approach would be something like the following where COLUMN_1 is the partition criteria:
UPDATE BLABLA_TABLE
SET COLUMN_18 = NULL, SET COLUMN_19 = NULL
WHERE COLUMN_1 IN (VALUE1, VALUE2…)
Of course this would be awfully slow.
My second thought was to use CTAS for every partition that I need to set those two columns to null and then use EXCHANGE PARTITION to update the data in my big table. Unfortunately that wouldn’t work because it´s a Composite-Partition.
I could use the same approach with subpartitions but then I would have to use CATS about 8000 times and drop those tables afterwards every week. I guess that would not pass the upcoming code-review.
May somebody has another idea how to performantly solve this?
PS: I’m using ORACLE 11g as database.
PPS: Sorry for my bad English…..

You've ruled out updating through DDL (switch partitions), so this lets us with only DML to consider.
I don't think that it's actually that bad an update with a table so heavily partitioned. You can easily split the update in 8k mini updates (each a single tiny partition):
UPDATE BLABLA_TABLE SUBPARTITION (partition1) SET COLUMN_18 = NULL...
Each subpartition would contain 15k rows to be updated on average so the update would be relatively tiny.
While it still represents a very big amount of work, it should be easy to set to run in parallel, hopefully during hours where database activity is very light. Also the individual updates are easy to restart if one of them fails (rows locked?) whereas a 120M update would take such a long time to rollback in case of error.

If I were to update almost 90% of rows in table, I would check feasibility/duration of just inserting to another table of same structure (less redo, no row chaining/migration, bypass cache and so on via direct insert. drop indexes and triggers first. exclude columns to leave them null in target table), rename the tables to "swap" them, rebuild indexes and triggers, then drop the old table.
From my experience in data warehousing, plain direct insert is better than update/delete. More steps needed but it's done in less time overall. I agree, partition swap is easier said than done when you have to process most of the table and just makes it more complex for the ETL developer (logic/algorithm bound to what's in the physical layer), we haven't encountered need to do partition swaps so far.
I would also isolate this table in its own tablespaces, then alternate storage between these two tablespaces (insert to 2nd drop table from 1st, vice-versa in next run, resize empty tablespace to reclaim space).

Delete All / Bulk Insert

First off let me say I am running on SQL Server 2005 so I don't have access to MERGE.
I have a table with ~150k rows that I am updating daily from a text file. As rows fall out of the text file I need to delete them from the database and if they change or are new I need to update/insert accordingly.
After some testing I've found that performance wise it is exponentially faster to do a full delete and then bulk insert from the text file rather than read through the file line by line doing an update/insert. However I recently came across some posts discussing mimicking the MERGE functionality of SQL Server 2008 using a temp table and the output of the UPDATE statement.
I was interested in this because I am looking into how I can eliminate the time in my Delete/Bulk Insert method when the table has no rows. I still think that this method will be the fastest so I am looking for the best way to solve the empty table problem.
Thanks

I think your fastest method would be to:
Drop all foreign keys and indexes
from your table.
Truncate your
table.
Bulk insert your data.
Recreate your foreign keys and
indexes.

Is the problem that Joe's solution is not fast enough, or that you can not have any activity against the target table while your process runs? If you just need to prevent users from running queries against your target table, you should contain your process within a transaction block. This way, when your TRUNCATE TABLE executes, it will create a table lock that will be held for the duration of the transaction, like so:
begin tran;
truncate table stage_table
bulk insert stage_table
from N'C:\datafile.txt'
commit tran;

An alternative solution which would satsify your requirement for not having "down time" for the table you are updating.
It sounds like originally you were reading the file and doing an INSERT/UPDATE/DELETE 1 row at a time. A more performant approach than that, that does not involve clearing down the table is as follows:
1) bulk load the file into a new, separate table (no indexes)
2) then create the PK on it
3) Run 3 statements to update the original table from this new (temporary) table:
DELETE rows in the main table that don't exist in the new table
UPDATE rows in the main table where there is a matching row in the new table
INSERT rows into main table from the new table where they don't already exist
This will perform better than row-by-row operations and should hopefully satisfy your overall requirements

There is a way to update the table with zero downtime: keep two day's data in the table, and delete the old rows after loading the new ones!
Add a DataDate column representing the date for which your ~150K rows are valid.
Create a one-row, one-column table with "today's" DataDate.
Create a view of the two tables that selects only rows matching the row in the DataDate table. Index it if you like. Readers will now refer to this view, not the table.
Bulk insert the rows. (You'll obviously need to add the DataDate to each row.)
Update the DataDate table. View updates Instantly!
Delete yesterday's rows at your leisure.
SELECT performance won't suffer; joining one row to 150,000 rows along the primary key should present no problem to any server less than 15 years old.
I have used this technique often, and have also struggled with processes that relied on sp_rename. Production processes that modify the schema are a headache. Don't.

For raw speed, I think with ~150K rows in the table, I'd just drop the table, recreate it from scratch (without indexes) and then bulk load afresh. Once the bulk load has been done, then create the indexes.
This assumes of course that having a period of time when the table is empty/doesn't exist is acceptable which it does sound like could be the case.

Delete large portion of huge tables

I have a very large table (more than 300 millions records) that will need to be cleaned up. Roughly 80% of it will need to be deleted. The database software is MS SQL 2005. There are several indexes and statistics on the table but not external relationships.
The best solution I came up with, so far, is to put the database into "simple" recovery mode, copy all the records I want to keep to a temporary table, truncate the original table, set identity insert to on and copy back the data from the temp table.
It works but it's still taking several hours to complete. Is there a faster way to do this ?

As per the comments my suggestion would be to simply dispense with the copy back step and promote the table containing records to be kept to become the new main table by renaming it.
It should be quite straightforward to script out the index/statistics creation to be applied to the new table before it gets swapped in.
The clustered index should be created before the non clustered indexes.
A couple of points I'm not sure about though.
Whether it would be quicker to insert into a heap then create the clustered index afterwards. (I guess no if the insert can be done in clustered index order)
Whether the original table should be truncated before being dropped (I guess yes)

#uriDium -- Chunking using batches of 50,000 will escalate to a table lock, unless you have disabled lock escalation via alter table (sql2k8) or other various locking tricks.

I am not sure what the structure of your data is. When does a row become eligible for deletion? If it is a purely ID based on date based thing then you can create a new table for each day, insert your new data into the new tables and when it comes to cleaning simply drop the required tables. Then for any selects construct a view over all the tables. Just an idea.
EDIT: (In response to comments)
If you are maintaining a view over all the tables then no it won't be complicated at all. The complex part is coding the dropping and recreating of the view.
I am assuming that you don't want you data to be locked down too much during deletes. Why not chunk the delete operations. Created a SP that will delete the data in chunks, 50 000 rows at a time. This should make sure that SQL Server keeps a row lock instead of a table lock. Use the
WAITFOR DELAY 'x'
In your while loop so that you can give other queries a bit of breathing room. Your problem is the old age computer science, space vs time.

What is the best way to delete all of a large table in t-sql?

We've run across a slightly odd situation. Basically there are two tables in one of our databases that are fed tons and tons of logging info we don't need or care about. Partially because of this we're running out of disk space.
I'm trying to clean out the tables, but it's taking forever (there are still 57,000,000+ records after letting this run through the weekend... and that's just the first table!)
Just using delete table is taking forever and eats up drive space (I believe because of the transaction log.) Right now I'm using a while loop to delete records X at a time, while playing around with X to determine what's actually fastest. For instance X=1000 takes 3 seconds, while X=100,000 takes 26 seconds... which doing the math is slightly faster.
But the question is whether or not there is a better way?
(Once this is done, going to run a SQL Agent job go clean the table out once a day... but need it cleared out first.)

TRUNCATE the table or disable indexes before deleting
TRUNCATE TABLE [tablename]
Truncating will remove all records from the table without logging each deletion separately.

To add to the other responses, if you want to hold onto the past day's data (or past month or year or whatever), then save that off, do the TRUNCATE TABLE, then insert it back into the original table:
SELECT
*
INTO
tmp_My_Table
FROM
My_Table
WHERE
<Some_Criteria>
TRUNCATE TABLE My_Table
INSERT INTO My_Table SELECT * FROM tmp_My_Table
The next thing to do is ask yourself why you're inserting all of this information into a log if no one cares about it. If you really don't need it at all then turn off the logging at the source.

1) Truncate table
2) script out the table, drop and recreate the table

TRUNCATE TABLE [tablename]
will delete all the records without logging.

Depending on how much you want to keep, you could just copy the records you want to a temp table, truncate the log table, and copy the temp table records back to the log table.

If you can work out the optimum x this will constantly loop around the delete at the quickest rate. Setting the rowcount limits the number of records that will get deleted in each step of the loop. If the logfile is getting too big; stick a counter in the loop and truncate every million rows or so.
set ##rowcount x
while 1=1
Begin
delete from table
If ##Rowcount = 0 break
End
Change the logging mode on the db to simple or bulk logged will reduce some of the delete overhead.

check this
article from MSDN Delete_a_Huge_Amount_of_Data_from
Information on Recovery Models
and View or Change the Recovery Model of a Database

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas