How to delete a filegroup from a partitioned table (SQL Server 2012) - sql

I have a huge database with a table containing billion of records. I need to do monthly cleanup of this table (delete oldest records based on date field).
Since I need to delete a few hundred million records for one month worth of data, doing a DELETE or even deleting in chunks takes too long, because of indexes that slows the process.
bcp data out + truncate + bcp data in is also too long.
Now the solution I want to try is to partition the table into different filegroups (one month per partition). I get the part of building the partitions, but how will I delete a filegroup along with its data?

You can switch partitions to a new table and then drop that table. Filegroups do not really have anything to do with it other than the restriction that the table you switch to must be on the same filegroup. You do not necessarily have to map your partitions to separate filegroups although you may want to do that for other reasons.
Here's a good example of a partition-wise rolloff in sql server.

Related

Most efficient way to delete records from a huge table

I have a table tblcalldatastore which produce around 4000000 records daily. I want to create a daily job to delete any record order than 24 hours. What is the most efficient and less time taking way? Below query is my requirement.
delete from [tblcalldatastore]
where istestcase=0
and datediff(hour,receiveddate,GETDATE())>24
The better approach is to avoid delete entirely by using partitions on your table. Instead of deleting records, drop partitions.
For example, you can create a partition for each hour. Then you can drop the entire partition for the 25th hour in the past. Or you can basically have two partitions by day and drop the older one after 24 hours.
This approach has a big performance advantage, because partition drops are not logged at the record level, saving lots of time. They also do not invoke triggers or other checks, saving more effort.
The documentation on partitioning is here.
You might not want to go down the Partitions route.
It looks like you will typically be deleting approx half the data in your table every day.
Deletes are very expensive...
A much faster way to do this is to
Select INTO a New Table (the data you want to keep)
rename (or Drop) your old Table
Then Rename your new table to the old table name.
This should work out quicker - Unless you have heaps of Indexes & FKs...

SQL - Delete table with 372 million rows starting from first row

I have a table with 372 million rows, I want to delete old rows starting from the first ones without blocking the DB. How can I reach that?
The table have
id | memberid | type | timeStamp | message |
1 123 10 2014-03-26 13:17:02.000 text
UPDATE:
I deleted about 30 GB of Space in DB, but my DISC is ON 6gb space yet..
Any suggestion to get that free space?
Thank you in advance!
select 1;
while(##ROWCOUNT > 0)
begin
WAITFOR DELAY '00:00:10';
delete top(10) from tab where <your-condition>;
end
delete in chunks using above sql
You may want to consider another approach:
Create a table based on the existing one
Adjust the identity column in the empty table to start from the latest value from the old table (if there is any)
Swap the two tables using sp_rename
Copy the records in batches into the new table from the old table
You can do whatever you want with the old table.
BACKUP your database before you start deleting records / play with tables.
the best performance is to query data by id, then:
delete from TABLENAME where id>XXXXX
is the lowest impact you can execute.
You can also divide the operation in suboperations limiting the number of deleted rows for each operation adding ROWCONT declatarion,
example if you want to delete only 5.000.000 of rows per call you can do this:
SET ROWCOUNT=5000000;
delete from TABLENAME where id>XXXXX;
here you can find a reference https://msdn.microsoft.com/it-it/library/ms188774%28v=sql.120%29.aspx?f=255&MSPPError=-2147217396
The answer to the best way to delete rows from an Oracle table is: It
depends! In a perfect world where you can take the table offline for
maintenance, a complete reorganization is always best because it does
the delete and places the table back into a pristine state. We will
address the tools for doing large scale deletes and the appropriate
methods for each environment.
Factors and tools for massive deletes
The choice of the delete methods depends on many factors:
Is the target table partitioned? Partitioning greatly improves delete performance. For example, it is common to have a large time-based table partition and deleting elderly rows from these table can be as simple as dropping the desired partition. See these notes on managing partitioned tables.
Can you reorganize the table after the delete to remove fragmentation?
What percentage of the table will be deleted? In cases where you are deleting more than 30-50% of the rows in a very large table it is faster to use CTAS to delete from a table than to do a vanilla delete and a reorganization of the table blocks and a rebuild of the constraints and indexes.
Do you want to release the space consumed by the deleted rows? If you know that the empty space will be re-used by subsequent DML then you will want to leave the empty space within the table. Conversely, if you want to released the space back onto the tablespace then you will need to reorganize the table.
There are many tools that you can use to delete from large tables:
dbms_metadata.get_ddl: This procedure wil punch-off the definitions of all table indexes and constraints.
dbms_redefinition: This procedure will reorganize a table while it remains available for updating.
Create Table as Select: You can use CTAS to copy a table while removing rows in bulk.
Rename table: If you copy a table when deleting rows you can rename it back to its original name.
COMMIT: In cases where a delete might run for many hours, even the largest UNDO log will not be able to hold the rollback information and it becomes necessary to do the delete in a PL/SQL loop, issuing a COMMIT every zillion-rows to free-up the undo logs. This approach will be re-startable automatically because the delete will pick-up where it left off as on your last commit checkpoint.
More information visit here

Relation with DB size and performance

Is there any relation between DB size and performance in my case:
There is a table in my Oracle DB that is used for logging. Now it has almost close to over 120 million rows and increases at a rate of 1000 rows per min. Each row has 6-7 columns with basic string data.
It is for our client. We never take any data from there but we might need that in case of any issues. However its fine if we clean up every month or so.
However the actual issue is will it affect performance of other transactional tables in the same db? Assuming the disk space as unlimited.
If 1000 rows/minute are being inserted into this table then about 40 million rows would be added per month. If this table has indexes I'd say that the biggest issue will be that eventually index maintenance will become a burden on the system, so in that case I'd expect performance to be affected.
This table seems like a good candidate for partitioning. If it's partitioned on the date/time that each row is added, with each partition containing one month's worth of data, maintenance would be much simpler. The partitioning scheme can be set up so that partitions are created automatically as needed (assuming you're on Oracle 11 or higher), and then when you need to drop a month's worth of data you can just drop the partition containing that data, which is a quick operation which doesn't burden the system with a large number of DELETE operations.
Best of luck.

SQL Server 2012 - Column Store indexes - Reporting Solution

We (Team) are in process of putting Audit Reporting solution for a huge online transactional website.
Our auditing solution is to enable CDC on source table and tracking every change happens on the objects, grab them and push them into Destination table for reporting.
As of now we got one to one table in source - destination.
There will be only inserts in destination and no updates or delete.
So end of the day audit tables will grow large than actual source tables as these keeps history of changes.
My plan is flatten the destination tables to fewer based on subject / module, enable column store indexes and then utilize same for reporting.
Is there any suggestion on the above approach or there is any alternative.
I would recomend that you rather keep the table structure in a single table and have a look at Partitioned Tables and Indexes
SQL Server supports table and index partitioning. The data of
partitioned tables and indexes is divided into units that can be
spread across more than one filegroup in a database. The data is
partitioned horizontally, so that groups of rows are mapped into
individual partitions. All partitions of a single index or table must
reside in the same database. The table or index is treated as a single
logical entity when queries or updates are performed on the data.

Very large "delete from" statement with "where" close, should I optimize?

I have a classic "sales" database that contains millions of rows in certain tables. On each of these large tables, I have an associated "delete" trigger and "backup" table.
This backup table keeps "deleted" rows for the last 7 days : the trigger starts by copying deleted rows into that backup table, then will perform a delete in the backup in this fashion :
CREATE TRIGGER dbo.TRIGGER
ON dbo.EXAMPLE_DATA
FOR DELETE AS
INSERT INTO EXAMPLE_BACKUP
select getDate(), *
from deleted
DELETE from EXAMPLE_BACKUP
where modified < dateadd(dd, -7, getDate())
The structure of the backup table is similar to the original data table (keys, values). The only difference is that I add in the backup tables a "modified" field, which I integrate to the key.
A colleague of mine told me I should use "a loop" because my delete statement will cause timeouts/issues as soon as the backup table contains several millions of rows. Will that delete actually blow up at some point ? Should I do something in a different manner ?
It looks like Sybase 12.5 supports table partitioning; if your design is such that the data can be retained for exactly 7 days (using a hard breakpoint), you could partition your table on the day of the year, and construct a view to represent the current data. As the clock ticks past a certain day, you could truncate the older partitions explicitly.
Just a thought.
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc20020_1251/html/databases/X15880.htm
Otherwise, deleting in a loop is a reasonable method for deleting large subsets of data without blowing up your transaction log. Here's one method using SQL Server:
http://sqlserverperformance.wordpress.com/2011/08/13/gradually-deleting-data-in-sql-server/