Does Adding a Column Lock a Table in SQL Server 2008? - sql

I want to run the following on a table of about 12 million records.
ALTER TABLE t1
ADD c1 int NULL;
ALTER TABLE t2
ADD c2 bit NOT NULL
DEFAULT(0);
I've done it in staging and the timing seemed fine, but before I do it in production, I wanted to know how locking works on the table during new column creation (especially when a default value is specified). So, does anyone know? Does the whole table get locked, or do the rows get locked one by one during default value insertion? Or does something different altogether happen?

Prior to SQL Server 11 (Denali) the add non-null column with default will run an update behind the scenes to populate the new default values. Thus it will lock the table for the duration of the 12 million rows update. In SQL Server 11 this is no longer the case, the column is added online and no update occurs, see Online non-NULL with values column add in SQL Server 11.
Both in SQL Server 11 and prior a Sch-M lock is acquired on the table to modify the definition (add the new column metadata). This lock is incompatible with any other possible access (including dirty reads). The difference is in the duration: prior to SQL Server 11 this lock will be hold for a size-of-data operation (update of 12M rows). In SQL Server 11 the lock is only held for a short brief. In the pre-SQL Server 11 update of the rows no row lock needs to be acquired because the Sch-M lock on the table guarantees that there cannot be any conflict on any individual row.

Yes, it will lock the table.
A table, as a whole, has a single schema (set of columns, with associated types). So, at a minimum, a schema lock would be required to update the definition of the table.
Try to think about how things would work contrariwise - if each row was updated individually, how would any parallel queries work (especially if they involved the new columns)?
And default values are only useful during INSERT and DDL statements - so if you specify a new default for 10,000,000 rows, that default value has to be applied to all of those rows.

Yes, it will lock.
DDL statements issue a Schema Lock (see this link) which will prevent access to the table until the operation completes.
There's not really a way around this, and it makes sense if you think about it. SQL needs to know how many fields are in a table, and during this operation some rows will have more fields than others.
The alternative is to make a new table with the correct fields, insert into, then rename the tables to swap them out.

I have not read how the lock mechanism works when adding a column, but I am almost 100% sure row by row is impossible.
Watch when you do these types of things in SQL Server Manager with drag and drop (I know you are not doing this here, but this is a public forum), as some changes are destructive (fortunately, SQL Server 2008, at least R2, is safer here as it tells you "no can do" rather than just do it).
You can run both column additions in a single statement, however, and reduce the churn.

Related

Truncation of large table in SQL Server database

I would like to completely clear one table in my SQL Server database.
Unfortunately, the table is large (> 90GB). I am going to use the TRUNCATE statement.
The question is whether I should pay attention to something before?
I am also wondering if it will somehow affect the server's disk space (currently about 110 GB free)?
After all the action, SHRINK DATABASE will probably be necessary.
TRUNCATE TABLE is faster and uses fewer system and transaction log resources
than DELETE with no WHERE clause,
but if you need even faster solution, you can create new version of the table (table1), drop the old table, and rename table1 into table.
R

Best way to do a long running schema change (or data update) in MS Sql Server?

I need to alter the size of a column on a large table (millions of rows). It will be set to a nvarchar(n) rather than nvarchar(max), so from what I understand, it will not be a long change. But since I will be doing this on production I wanted to understand the ramifications in case it does take long.
Should I just hit F5 from SSMS like I execute normal queries? What happens if my machine crashes? Or goes to sleep? What's the general best practice for doing long running updates? Should it be scheduled as a job on the server maybe?
Thanks
Please DO NOT just hit F5. I did this once and lost all the data in the table. Depending on the change, the update statement that is created for you actually stores the data in memory, drops the table, creates the new one that has the change you want, and populates the data from memory. However in my case one of the changes I made was adding a unique constraint so the population failed, and as the statement was over the data in memory was dropped. This left me with the new empty table.
I would create the table you are changing, with the change(s) you want, as a new table. Then select * into the new table, then re-name the tables in a single statement. If there is potential for data to be entered into the table while this is running and that is an issue, you may want to lock the table.
Depending on the size of the table and duration of the statement, you may want to save the locking and re-naming for later, and after the initial population of the new table do a differential population of new data and re-name the tables.
Sorry for the long post.
Edit:
Also, if the connection times out due to duration, then run the insert statement locally on the DB server. You could also create a job and run that, however it is essentially the same thing.

How the Alter Table command is handled by SQLServer?

We are using SQL Server 2008. We have an Existing database and it was required to ADD a new COLUMN to one of the Table which has 2700 rows only but one of its column is of type VARCHAR(8000). When i try to add new column (CHAR(1) NULL) by using ALTER table command, it takes too much time!! it took 5 minutes and the command was still running to i stopped the command.
Below is the command, i was trying to add new column:
ALTER TABLE myTable Add ColumnName CHAR(1) NULL
Can someone help me to understand that How the SQL Server handles
the ALTER Table command? what happens exactly?
Why it takes so much time to Add new column
EDIT:
What is the effect of Table size on ALTER Command?
Altering a table requires a schema lock. Many other operations require the same lock too. After all, it wouldn't make sense to add a column halfway a select statement.
So a likely explanation is that a process had the table locked for 5 minutes. The ALTER then has to wait until it gets the lock itself.
You can see blocked processes, and the blocking process, from the Activity Monitor in SQL Server Management Studio.
Well, one thing to bear in mind is that you were adding a new fixed length column to the table. The way that rows are structured in storage, all fixed length columns are placed before all of the variable length columns, for each row. So every row would have had to be updated in storage to make this change.
If, in turn, this caused the number of rows which could be stored on each page to change, a great many new allocations may have been required.
That being said, for the number of rows indicated, I wouldn't have though it should take 5 minutes - unless, as Andomar indicated, there was some lock contention also involved.

Add an identity column to existing table which is changing always

I have an existing table with 15 million rows in it. I want to add an identity column and make it primary key. The problem is this table is always moving (inserts, updates, deletes). Is it possible to add the identity column with this? Or I have to stop the backgroud processes (it is a tedious task) which updates this table?
Thanks
Vikram
Given that you have 15 million rows it might take some non-trivial amount of time to execute the ALTER TABLE statement.
Since SQL Server doesn't provide table hints for ALTER TABLE its pretty safe to assume that SQL Server takes a table lock when it executes an ALTER TABLE statement.
During this time no other process will be allowed to Select, insert, update, or delete so you don't have to worry about a race condition with some other process.
If the process takes long enough your other processes will experience timeout errors. Depending on how the processes are written this is either a bad thing or a non-issue, but you'll need to figure that out. If it were me I would turn them off.

How to figure out which record has been deleted in an effiecient way?

I am working on an in-house ETL solution, from db1 (Oracle) to db2 (Sybase). We needs to transfer data incrementally (Change Data Capture?) into db2.
I have only read access to tables, so I can't create any table or trigger in Oracle db1.
The challenge I am facing is, how to detect record deletion in Oracle?
The solution which I can think of, is by using additional standalone/embedded db (e.g. derby, h2 etc). This db contains 2 tables, namely old_data, new_data.
old_data contains primary key field from tahle of interest in Oracle.
Every time ETL process runs, new_data table will be populated with primary key field from Oracle table. After that, I will run the following sql command to get the deleted rows:
SELECT old_data.id FROM old_data WHERE old_data.id NOT IN (SELECT new_data.id FROM new_data)
I think this will be a very expensive operation when the volume of data become very large. Do you have any better idea of doing this?
Thanks.
Which edition of Oracle ? If you have Enterprise Edition, look into Oracle Streams.
You can grab the deletes out of the REDO log rather than the database itself
One approach you could take is using the Oracle flashback capability (if you're using version 9i or later):
http://forums.oracle.com/forums/thread.jspa?messageID=2608773
This will allow you to select from a prior database state.
If there may not always be deleted records, you could be more efficient by:
Storing a row count with each query iteration.
Comparing that row count to the previous row count.
If they are different, you know you have a delete and you have to compare the current set with the historical data set from flashback. If not, then don't bother and you've saved a lot of cycles.
A quick note on your solution if flashback isn't an option: I don't think your select query is a big deal - it's all those inserts to populate those side tables that will really take a lot of time. Why not just run that query against the sybase production server before doing your update?