Truncation of large table in SQL Server database - sql

I would like to completely clear one table in my SQL Server database.
Unfortunately, the table is large (> 90GB). I am going to use the TRUNCATE statement.
The question is whether I should pay attention to something before?
I am also wondering if it will somehow affect the server's disk space (currently about 110 GB free)?
After all the action, SHRINK DATABASE will probably be necessary.

TRUNCATE TABLE is faster and uses fewer system and transaction log resources
than DELETE with no WHERE clause,
but if you need even faster solution, you can create new version of the table (table1), drop the old table, and rename table1 into table.
R

Related

DROP TABLE or DELETE TABLE? Which is best practice?

Working on redesigning some databases in my SQL SERVER 2012 instance.
I have databases where I put my raw data (from vendors) and then I have client databases where I will (based on client name) create a view that only shows data for a specific client.
Because of the this data being volatile (Google Adwords & Google DFA) I typically just delete the last 6 days and insert 7 days everyday from the vendor databases. Doing this gives me comfort in knowing that Google has had time to solidify its data.
The question I am trying to answer is:
1. Instead of using views, would it be better use a 'SELECT INTO' statement and DROP the table everyday in the client database?
I'm afraid that by automating my process using the 'DROP TABLE' method will not scale well longterm. While testing it myself, it seems that performance is improved because it does not have to scan the entire table for the date range. I've also tested this with an index on the 'date' column and performance still seemed better with the 'DROP TABLE' method.
I am looking for best practices here.
NOTE: This is my first post. So I am not too familiar with how to format correctly. :)
Deleting rows from a table is a time-consuming process. All the deleted records get logged, and performance of the server suffers.
Instead, databases offer truncate table. This removes all the rows of the table without logging the rows, but keeps the structure intact. Also, triggers, indexes, constraints, stored procedures, and so on are not affected by the removal of rows.
In some databases, if you delete all rows from a table, then the operation is really truncate table. However, SQL Server is not one of those databases. In fact the documentation lists truncate as a best practice for deleting all rows:
To delete all the rows in a table, use TRUNCATE TABLE. TRUNCATE TABLE
is faster than DELETE and uses fewer system and transaction log
resources. TRUNCATE TABLE has restrictions, for example, the table
cannot participate in replication. For more information, see TRUNCATE
TABLE (Transact-SQL)
You can drop the table. But then you lose auxiliary metadata as well -- all the things listed above.
I would recommend that you truncate the table and reload the data using insert into or bulk insert.

Teradata Drop Column returns with "no more room"

I am trying to drop a varchar(100) column of a 150 GB table (4.6 billion records). All the data in this column is null. I have 30GB more space in the database.
When I attempt to drop the column, it says "no more room in database XY". Why does such an action needs so much space?
The ALTER TABLE statement needs a temporary storage for the altered version before overwriting the original table. I guess the the table that you are trying to alter occupies at least 1/3 of your total storage size
This could happen for a variety of reasons. It's possible that one of the AMP's in your database are full, this would cause that error even with a minor table alteration.
try running the following SQL to check space
select VProc, CurrentPerm, MaxPerm
from dbc.DiskSpace
where DatabaseName='XY';
also, you should check to see what column your primary index is on in this very large table. if the table is not skewed properly, you could also run into space issues when trying to alter a table or by running a query against it.
For additional suggestions I found a decent article on the kind of things you may want to investigate when the "no more room in database" error occurs - Teradata SQL Tutorial. Some of the suggestions include:
dropping any intermediary work or "sandbox" tables
implementing single value or multi-value compression.
dropping unwanted/unnecessary secondary indexes
removing data in dbc tables like accesslog or dbql tables
remove and archive old tables that are no longer used.

Does Adding a Column Lock a Table in SQL Server 2008?

I want to run the following on a table of about 12 million records.
ALTER TABLE t1
ADD c1 int NULL;
ALTER TABLE t2
ADD c2 bit NOT NULL
DEFAULT(0);
I've done it in staging and the timing seemed fine, but before I do it in production, I wanted to know how locking works on the table during new column creation (especially when a default value is specified). So, does anyone know? Does the whole table get locked, or do the rows get locked one by one during default value insertion? Or does something different altogether happen?
Prior to SQL Server 11 (Denali) the add non-null column with default will run an update behind the scenes to populate the new default values. Thus it will lock the table for the duration of the 12 million rows update. In SQL Server 11 this is no longer the case, the column is added online and no update occurs, see Online non-NULL with values column add in SQL Server 11.
Both in SQL Server 11 and prior a Sch-M lock is acquired on the table to modify the definition (add the new column metadata). This lock is incompatible with any other possible access (including dirty reads). The difference is in the duration: prior to SQL Server 11 this lock will be hold for a size-of-data operation (update of 12M rows). In SQL Server 11 the lock is only held for a short brief. In the pre-SQL Server 11 update of the rows no row lock needs to be acquired because the Sch-M lock on the table guarantees that there cannot be any conflict on any individual row.
Yes, it will lock the table.
A table, as a whole, has a single schema (set of columns, with associated types). So, at a minimum, a schema lock would be required to update the definition of the table.
Try to think about how things would work contrariwise - if each row was updated individually, how would any parallel queries work (especially if they involved the new columns)?
And default values are only useful during INSERT and DDL statements - so if you specify a new default for 10,000,000 rows, that default value has to be applied to all of those rows.
Yes, it will lock.
DDL statements issue a Schema Lock (see this link) which will prevent access to the table until the operation completes.
There's not really a way around this, and it makes sense if you think about it. SQL needs to know how many fields are in a table, and during this operation some rows will have more fields than others.
The alternative is to make a new table with the correct fields, insert into, then rename the tables to swap them out.
I have not read how the lock mechanism works when adding a column, but I am almost 100% sure row by row is impossible.
Watch when you do these types of things in SQL Server Manager with drag and drop (I know you are not doing this here, but this is a public forum), as some changes are destructive (fortunately, SQL Server 2008, at least R2, is safer here as it tells you "no can do" rather than just do it).
You can run both column additions in a single statement, however, and reduce the churn.

How to figure out which record has been deleted in an effiecient way?

I am working on an in-house ETL solution, from db1 (Oracle) to db2 (Sybase). We needs to transfer data incrementally (Change Data Capture?) into db2.
I have only read access to tables, so I can't create any table or trigger in Oracle db1.
The challenge I am facing is, how to detect record deletion in Oracle?
The solution which I can think of, is by using additional standalone/embedded db (e.g. derby, h2 etc). This db contains 2 tables, namely old_data, new_data.
old_data contains primary key field from tahle of interest in Oracle.
Every time ETL process runs, new_data table will be populated with primary key field from Oracle table. After that, I will run the following sql command to get the deleted rows:
SELECT old_data.id FROM old_data WHERE old_data.id NOT IN (SELECT new_data.id FROM new_data)
I think this will be a very expensive operation when the volume of data become very large. Do you have any better idea of doing this?
Thanks.
Which edition of Oracle ? If you have Enterprise Edition, look into Oracle Streams.
You can grab the deletes out of the REDO log rather than the database itself
One approach you could take is using the Oracle flashback capability (if you're using version 9i or later):
http://forums.oracle.com/forums/thread.jspa?messageID=2608773
This will allow you to select from a prior database state.
If there may not always be deleted records, you could be more efficient by:
Storing a row count with each query iteration.
Comparing that row count to the previous row count.
If they are different, you know you have a delete and you have to compare the current set with the historical data set from flashback. If not, then don't bother and you've saved a lot of cycles.
A quick note on your solution if flashback isn't an option: I don't think your select query is a big deal - it's all those inserts to populate those side tables that will really take a lot of time. Why not just run that query against the sybase production server before doing your update?

Bulk delete (truncate vs delete)

We have a table with a 150+ million records. We need to clear/delete all rows. Delete operation would take forever due to it writing to the t-logs and we cannot change our recovery model for the whole DB. We have tested the truncate table option.
What we realized that truncate deallocates pages from the table, and if I am not wrong makes them available for reuse but doesn't shrink the db automatically. So, if we want to reduce the DB size, we really would need to do run the shrink db command after truncating the table.
Is this normal procedure? Anything we need to be careful or aware about, or are there any better alternatives?
truncate is what you're looking for. If you need to slim down the db afterwards, run a shrink.
This MSDN refernce (if you're talking T-SQL) compares the behind the scenes of deleting rows versus truncating.
"Delete all rows"... wouldn't DROP TABLE (and re-recreate an empty one with same schema / indices) be preferable ? (I personally like "fresh starts" ;-) )
This said TRUNCATE TABLE is quite OK too, and yes, DBCC SHRINKFILE may be required afterwards if you wish to recover the space.
Depending on the size of the full database, the shrink may take a while; I've found it to go faster if it is shrunk in smaller chunks, rather than trying to get it back all at once.
One thing to remember with Truncate Table (as well as drop table) is going forward this will not work if you ever have foreign keys referencing the table.
As pointed out, if you can't use truncate or drop
SELECT 1
WHILE ##ROWCOUNT <> 0
DELETE TOP (100000) MyTable
You have a normal solution (truncate + shrink db) to remove all the records from a table.
As Irwin pointed out. The TRUNCATE command won't work while being referenced by a Foreign key constraint. So first drop the constraints, truncate the table and recreate the constraints.
If your concerned about performance and this is a regular routine for your system. You might want to look into moving this table to it's own data file, then run shrink only against the target datafile!