Need to alter column types in production database (SQL Server 2005) - sql

I need help writing a TSQL script to modify two columns' data type.
We are changing two columns:
uniqueidentifier -> varchar(36) * * * has a primary key constraint
xml -> nvarchar(4000)
My main concern is production deployment of the script...
The table is actively used by a public website that gets thousands of hits per hour. Consequently, we need the script to run quickly, without affecting service on the front end. Also, we need to be able to automatically rollback the transaction if an error occurs.
Fortunately, the table only contains about 25 rows, so I am guessing the update will be quick.
This database is SQL Server 2005.
(FYI - the type changes are required because of a 3rd-party tool which is not compatible with SQL Server's xml and uniqueidentifier types. We've already tested the change in dev and there are no functional issues with the change.)

As David said, execute a script in a production database without doing a backup or stop the site is not the best idea, that said, if you want to do changes in only one table with a reduced number of rows you can prepare a script to :
Begin transaction
create a new table with the final
structure you want.
Copy the data from the original table
to the new table
Rename the old table to, for example,
Rename the new table to
End transaction
This will end with a table that is named as the original one but with the new structure you want, and in addition you maintain the original table with a backup name, so if you want to rollback the change you can create a script to do a simple drop of the new table and rename of the original one.
If the table has foreign keys the script will be a little more complicated, but is still possible without much work.

Consequently, we need the script to
run quickly, without affecting service
on the front end.
This is just an opinion, but it's based on experience: That's a bad idea. It's better to have a short, (pre-announced if possible) scheduled downtime than to take the risk.
The only exception is if you really don't care if the data in these tables gets corrupted, and you can be down for an extended period.
In this situation, based on th types of changes you're making and the testing you've already performed, it sounds like the risk is very minimal, since you've tested the changes and you SHOULD be able to do it safely, but nothing is guaranteed.
First, you need to have a fall-back plan in case something goes wrong. The short version of a MINIMAL reasonable plan would include:
Shut down the website
Make a backup of the database
Run your script
test the DB for integrity
bring the website back online
It would be very unwise to attempt to make such an update while the website is live. you run the risk of being down for an extended period if something goes wrong.
A GOOD plan would also have you testing this against a copy of the database and a copy of the website (a test/staging environment) first and then taking the steps outlined above for the live server update. You have already done this. Kudos to you!
There are even better methods for making such an update, but the trade-off of down time for safety is a no-brainer in most cases.

And if you absolutely need to do this in live then you might consider this:
1) Build an offline version of the table with the new datatypes and copied data.
2) Build all the required keys and indexes on the offline tables.
3) swap the tables out in a transaction. 00 you could rename the old table to something else as an emergency backup.
sp_help 'sp_rename'
But TEST FIRST all of this in a prod like environment. And make sure your backups are up to date. AND do this when you are least busy.


Best way to do a long running schema change (or data update) in MS Sql Server?

I need to alter the size of a column on a large table (millions of rows). It will be set to a nvarchar(n) rather than nvarchar(max), so from what I understand, it will not be a long change. But since I will be doing this on production I wanted to understand the ramifications in case it does take long.
Should I just hit F5 from SSMS like I execute normal queries? What happens if my machine crashes? Or goes to sleep? What's the general best practice for doing long running updates? Should it be scheduled as a job on the server maybe?
Please DO NOT just hit F5. I did this once and lost all the data in the table. Depending on the change, the update statement that is created for you actually stores the data in memory, drops the table, creates the new one that has the change you want, and populates the data from memory. However in my case one of the changes I made was adding a unique constraint so the population failed, and as the statement was over the data in memory was dropped. This left me with the new empty table.
I would create the table you are changing, with the change(s) you want, as a new table. Then select * into the new table, then re-name the tables in a single statement. If there is potential for data to be entered into the table while this is running and that is an issue, you may want to lock the table.
Depending on the size of the table and duration of the statement, you may want to save the locking and re-naming for later, and after the initial population of the new table do a differential population of new data and re-name the tables.
Sorry for the long post.
Also, if the connection times out due to duration, then run the insert statement locally on the DB server. You could also create a job and run that, however it is essentially the same thing.

SQL Create or Replace Table in Oracle

We have a oracle database and we have been running into problems with our build and install procedures where when we update the table schema (add, modify columns, triggers, etc) it doesn't always get deployed to all the instances.
Right now we handle schema updates by putting notes on the install steps for the build to run alter table commands, etc. But these always assume you are going from the last build (i.e. build 3 is installed and we are going to 4). If 1 is installed, there might be alter scripts going from 1 to 2, then 2 to 3, then 3 to 4. So this is a giant pain of a manual process that we often mess up and miss an altar.
Is there a easy way to do a "create or replace" on a table without dropping it and losing data? Essentially we want to compare the current table to what it should be and update it. We do not want to backup the table, drop it, create it, and then restore it.
"Essentially we want to compare the current table to what it should be and update it"
Assuming you have a good source version that you want to use to update the other instances, you can Toad's schema compare (you need the DBA Admin module or Toad Xpert Edition) and generate the scripts needed to update a single table, a set of tables, or whatever list of objects you choose.
I would say that the scripts should still be checked/verified before running against the target instance. Some changes may be best handled in a different way (rename a column vs drop/create for example). So be careful.
One more note that others will probably bring up is that this problem shows definite holes in your company's change management process (which is a much bigger topic than this question).

Why does a temp table work but not a permanent table?

I've written a SQL query for a report that creates a permanent table and then performs a bunch of inserts and updates to get all the data, according to company policy. It runs fine in SQL Server Management Studio and in Crystal Reports 2008 on my machine. However, when I schedule it to run on the server with SAP BusinessObjects Central Management Console, it fails with the error "Associated statement not prepared."
I have found that changing this permanent table to be a temp table makes the query work. Why would this be?
Some research shows that this error is sometimes sent instead of the true error. Other people reporting it talk of foreign key and (I would also assume) duplicate key errors.
Things I would check:
Does your permanent table have any unique constraints that might be violated? Or any foreign key constraints?
Are you creating indexes on the table after it has been created?
Are you creating any views over this permanent table?
What happens if the table already exists before the job is run?
What happens to the table if the job fails?
Are there any intermediate steps (such as within a stored procedure) that might involve additional temp or permanent tables?
ETA: Also check what schema the permanent table belongs to: is it usually created with "dbo"? Are you specifying that explicitly? Is there any chance that there might be a permissions problem?
That is often a generic error. Are you able to run it on the server as the account that it is scheduled to run as? It is most likely a permission error or constraint issue.
Assuming you really need a regular table, why it's not possible to create the permanent table once, vs creating it every time you run the query?
Recreating regular user table each time query runs does not seem right. But to make it work you may try to recreate the table in a separate batch or query (e.g. put GO in the script, that splits it into separate queries).
Regarding why it happens, I'm thinking about statement caching. Server compiles the query and stores the result for some time in case same query has to run again. So it's my speculation that it tries to run the compiled query which refers to the table you have already dropped and recreated under the same name. Name is the same, but physically it's a new table. You could hit some bug in the server this way. Just a speculation, it can be different kind of problem.
Without seeing code it's a guess, but being that you are creating a permanent table everytime you run the report, I assume you must be dropping the table at some point? (Or you'd have a LOT of tables building up over time.)
I suggest a couple angles to consider:
1) Make certain to prefix tables (perhaps by a session ID or soemthing) if you are concerned about concurrency/locking issues and the like so each report run has a table exclusive to itself.
2) If you are dropping the table at the end, instead adjust your logic to leave the table be. Write code that drops when you (re)start the operation. It's possible the report is clinging to the table and you are destroying it prematurely.

Keep table downtime to a minimum by renaming old table, then filling a new version?

I have a handful or so of permanent tables that need to be re-built on a nightly basis.
In order to keep these tables "live" for as long as possible, and also to offer the possibility of having a backup of just the previous day's data, another developer vaguely suggested
taking a route similar to this when the nightly build happens:
create a permanent table (a build version; e.g., tbl_build_Client)
re-name the live table (tbl_Client gets re-named to tbl_Client_old)
rename the build version to become the live version (tbl_build_Client gets re-named to tbl_Client)
To rename the tables, sp_rename would be in use.
Do you see any more efficient ways to go about this,
or any serious pitfalls in the approach? Thanks in advance.
Trying to flush out gbn's answer and recommendation to use synonyms,
would this be a rational approach, or am I getting some part horribly wrong?
Three real tables for "Client":
1. dbo.build_Client
2. dbo.hold_Client
3. dbo.prev_Client
Because "Client" is how other procs reference the "Client" data, the default synonym is
FOR dbo.hold_Client
Then take these steps to refresh data yet keep un-interrupted access.
(1.a.) TRUNCATE dbo.prev_Client (it had yesterday's data)
(1.b.) INSERT INTO dbo.prev_Client the records from dbo.build_Client, as dbo.build_Client still had yesterday's data
(2.a.) TRUNCATE dbo.build_Client
(2.b.) INSERT INTO dbo.build_Client the new data build from the new data build process
(2.c.) change the synonym
FOR dbo.build_Client
(3.a.) TRUNCATE dbo.hold_Client
(3.b.) INSERT INTO dbo.hold_Client the records from dbo.build_Client
(3.c.) change the synonym
FOR dbo.hold_Client
Use indirection to avoid manuipulating tables directly:
Have 3 tables: Client1, Client2, Client3 with all indexes, constraints and triggers etc
Use synonyms to hide the real table eg Client, ClientOld, ClientToLoad
To generate the new table, you truncate/write to "ClientToLoad"
Then you DROP and CREATE the synonyms in a transaction so that
Client -> what was ClientToLoad
ClientOld -> what was Client
ClientToLoad -> what was ClientOld
You can use SELECT base_object_name FROM sys.synonyms WHERE name = 'Client' to work out what the current indirection is
This works on all editions of SQL Server: the other way is "partition switching" which requires enterprise edition
Some things to keep in mind:
Replication - if you use replication, I don't believe you'll be able to easily implement this strategy
Indexes - make sure that any indexes you have on the tables are carried over to your new/old tables as needed
Logging - i don't remember whether or not sp_rename is fully logged, so you may want to test that in case you need to be able to rollback, etc.
Those are the possible drawbacks I can think of off the top of my head. It otherwise seems to be an effective way to handle the situation.
Except of missing step 0. Drop tbl_Client_old if exists solutions seems fine especially if you run it in explicit transaction. There is no backup of any previous data however.
The other solution, without renames and drops, and which I personally would prefer is to:
Copy all rows from tbl_Client to tbl_Client_old;
Truncate tbl_Client.
(Optional) Remove obsolete records from tbl_Client_old.
It's better in a way that you can control how much of the old data you can store in tbl_Client_old. Which solution will be faster depends on how much data is stored in tables and what indices in tables are.
if you use SQL Server 2008, why can't you try to use horisontal partitioning? All data contains in one table, but new and old data contains in separate partitions.

What is the best way to maintain a LastUpdatedDate column in SQL?

Suppose I have a database table that has a timedate column of the last time it was updated or inserted. Which would be preferable:
Have a trigger update the field.
Have the program that's doing the insertion/update set the field.
The first option seems to be the easiest since I don't even have to recompile to do it, but that's not really a huge deal. Other than that, I'm having trouble thinking of any reasons to do one over the other. Any suggestions?
The first option can be more robust because the database will be maintaining the field. This comes with the possible overhead of using triggers.
If you could have other apps writing to this table in the future, via their own interfaces, I'd go with a trigger so you're not repeating that logic anywhere else.
If your app is pretty much it, or any other apps would access the database through the same datalayer, then I'd avoid that nightmare that triggers can induce and put the logic directly in your datalayer (SQL, ORM, stored procs, etc.).
Of course you'd have to make sure your time-source (your app, your users' pcs, your SQL server) is accurate in either case.
Regarding why I don't like triggers:
Perhaps I was rash by calling them a nightmare. Like everything else, they are appropriate in moderation. If you use them for very simple things like this, I could get on board.
It's when the trigger code gets complex (and expensive) that triggers start to cause lots of problems. They are a hidden tax on every insert/update/delete query you execute (depending on the type of trigger). If that tax is acceptable then they can be the right tool for the job.
You didn't mention 3. Use a stored procedure to update the table. The procedure can set timestamps as desired.
Perhaps that's not feasible for you, but I didn't see it mentioned.
As long as I'm using a DBMS in whose triggers I trust, I'd always go with the trigger option. It allows the DBMS to take care of as many things as possible, which is usually a good thing.
It work make sure under any circumstances that the timestamp column has the correct value. The overhead would be negligible.
The only thing that would be against triggers is portability. If that's not an issue, I don't think there is a question which direction to go.
I would say trigger just in case that someone uses something besides your app to update the table, you probably also want to have a LastUpdatedBy and use SUSER_SNAME() for that, this way you can see who did the update
I'm a proponent of stored procedures for everything. Your update proc could contain a GETDATE() for the column.
And I don't like triggers for this kind of update. Lack of visibility of triggers tends to cause confusion.
This sounds like business logic to me ... I would be more disposed to putting this in the code. Let the database manage the storage of data ... No more and no less.
Triggers are a blessing and a curse.
Blessing: You can use them to enable all kinds of custom constraint checking and data management without backend systems knowledge or changes.
Curse: You don't know whats happening behind your back. Concurrency issues/deadlocks by additional objects brought into transactions that were not origionally expected. Phantom behavior including session environment changes, unreliable rowcounts. Excessive triggering of conditions..additional hotspot/performance penalties.
The answer to this question (Update dates implicitly(trigger) or explicitly (code)) ususally weights heavily on context. For example if you are using last change date as an informational field you might want to only change it when a 'user' actually makes salient changes to a row vs an automated process that simply updates some sort of internal marker users don't care about.
If you are using the trigger for change synchronization or you have no control over code that is executing a trigger makes a lot more sense.
My advise on trigger use it to be careful. Most systems allow you to filter execution based on the operation and fields changed. Proper use of 'before' vs 'after' triggers can have a significant performance impacts.
Finally a few systems are capable of executing a single trigger on multiple changes (multiple rows effected within a transaction) your code should be prepared to apply itself as a bulk update to multiple rows.
Normally I'd say do it database side, but it depends on your application. If you're using LINQ-to-SQL you can just set the field as Timestamp and have your DAL use the Timestamp field for concurrency. It handles it for you automatically, so having to repeat code is a non event.
If you're writing your DAL yourself though, then I'd be more likely to handle this on the database side as it makes writing user interfaces far more flexible - although, I'd likely do this in a stored procedure that has "public" access and the tables locked down - you don't want just any clown coming along and bypassing your stored procedure by writing to the tables directly... unless you plan on making your DAL a standalone component that any future application must use to access the database, in which case, you could code it directly into the DAL - of course, you should only do this if you can guarantee that everyone accessing the database is doing so through your DAL component.
If you're going to allow "public" access to the database to insert into tables, then you'll have to go with the trigger because otherwise anyone can insert/update a single field in the table and the updated field could never get updated.
I would have the date maintained at the database, i.e., a trigger, stored procedure, etc. In most of your database-driven applications the user app is not going to be the only means by which the business users get at data. There are reporting tools, extracts, user SQL, etc. There's also updates and corrections that are done by the DBA that the application won't be providing the date for as well.
But honestly the #1 reason I wouldn't do it from the application is you have no control over the date/time on the client machine. They might be rolling it back to get more days out of a trial license on something or may just want to do bad things to your program.
You can do this without the trigger if your database supports default values on the fields. For example, in SQL Server 2005 I have a table with a field created like this:
create table dbo.Repository
last_updated datetime default getdate(),
then the insert code just leaves that field out of the insert field list.
I forgot that only worked for the first insert - I do have an update trigger as well, to update the date fields and put a copy of the updated record in my history table - which I would post ... but the editor keeps erroring out on my code ...
create trigger dbo.Repository_Upd on dbo.Repository instead of update
-- Trigger: Repository_Upd
-- Author: Ron Savage
-- Date: 09/28/2008
-- Description:
-- This trigger sets the last_updated and updated_by fields before the update
-- and puts a copy of the updated row into the Repository_History table.
-- Modification History:
-- Date Init Comment
-- 10/22/2008 RS Blocked .prm files from updating the history as they
-- get updated every time the cfg file is run.
-- 10/21/2008 RS Updated the insert into the history table to use the
-- d.last_updated field from the Repository table rather
-- than getdate() to avoid micro second differences.
-- 09/28/2008 RS Created.
-- Update the record but fill in the updated_by, updated_system and
-- last_updated date with current information.
update cr set
cr.filename = i.filename,
cr.created_by = i.created_by,
cr.created_system = i.created_system,
cr.create_date = i.create_date,
cr.updated_by = user,
cr.updated_system = host_name(),
cr.last_updated = getdate(),
cr.content = i.content
Repository cr
JOIN Inserted i
on (i.config_id = cr.config_id);
-- Put a copy in the history table
declare #extention varchar(3);
select #extention = lower(right(filename,3)) from Inserted;
if (#extention <> 'prm')
Insert into Repository_History
user as updated_by,
host_name() as updated_system,
Inserted i
JOIN Repository d
on (d.config_id = i.config_id);