SQL/Schema comparison and upgrade - sql-server-2005

I have a simple situation. A large organisation is using several different versions of some (desktop) application and each version has it's own database structure. There are about 200 offices and each office will have it's own version, which can be one of 7 different ones. The company wants to upgrade all applications to the latest versions, which will be version 8.
The problem is that they don't have a separate database for each version. Nor do they have a separate database for each office. They have one single database which is handled by a dedicated server, thus keeping things like management and backups easier. Every office has it's own database schema and within the schema there's the whole database structure for their specific application version. As a result, I'm dealing with 200 different schema's which need to be upgraded, each with 7 possible versions. Fortunately, every schema knows the proper version so checking the version isn't difficult.
But my problem is that I need to create upgrade scripts which can upgrade from version 1 to version 2 to version 3 to etc... Basically, all schema's need to be bumped up one version until they're all version 8. Writing the code that will do this is no problem. the challenge is how to create the upgrade script from one version to the other? Preferably with some automated tool. I've examined RedGate's SQL Compare and Altova's DatabaseSpy but they're not practical. Altova is way too slow. RedGate requires too much processing afterwards, since the generated SQL Script still has a few errors and it refers to the schema name. Furthermore, the code needs to become part of a stored procedure and the code generated by RedGate doesn't really fit inside a single procedure. (Plus, it's doing too much transaction-handling, while I need everything within a single transaction.
I have been considering using another SQL Comparison tool but it seems to me that my case is just too different from what standard tools can deliver. So I'm going to write my own comparison tool. To do this, I'll be using ADOX with Delphi to read the catalogues for every schema version in the database, then use this to write the SQL Statements that will need to upgrade these schema's to their next version. (Comparing 1 with 2, 2 with 3, 3 with 4, etc.) I'm not unfamiliar with generating SQL-Script-Generators so I don't expect too many problems. And I'll only be upgrading the table structures, not any of the other database objects.
So, does anyone have some good tips and tricks to apply when doing this kind of comparisons? Things to be aware of? Practical tips to increase speed?

I still think RedGate is the way to go. It is true that it does not always catch all the dependencies, and you may need to hack on it a bit, but it gets you 95% of the way there, and would be a huge timesaver IMO.
Once you have the script generated, you can easily hack on the way error handling and transactions are done, the output is very well documented, so it is trivial to see what is going on.
One possibility would be, rather than modify each database in place, do this:
create your a new version 8 database (DB_NEW)
migrate all of the data from the old database (DB) (you will need up to 7 different data migration scripts for this)
validate new database
if success, rename DB to DB_OLD and rename DB_NEW to DB

Creating new database then migrating data is the best way. Probably you will need to create number of data transformation scripts, but I assume that differences between data structure are not huge. After migration I recommend to use any data comparison tool which allows sql-query results comparing to verify migration success.

Redgate is the answer, you can compare the different schemas and will also generate scripts for you based on the difference.

Related

How to keep 2 Database Schemas consistent without effecting the data at all?

I have two server machines (One for development, other for Clients) with SQL Server 2008 installations. Whenever a developer makes changes to tables/views/stored procedures in the Development Server, it needs to reflect the Client Server as well.
Currently, I am manually handling all changes like new columns in Tables, changes in Stored procedures etc. Can DB scripts or replication automate the entire procedure for me? Or is there some better solution to keep database schemas consistent.
Help will be highly appreciated.
Thanks!
I highly recommend to create an environment where all schema changes are done exclusively through SQL scripts - never "manually" in any environment. Each developer has to commit the script related to his/her bugfixed (or new features) to a version control system.
Typically you'd have one big script that creates the database from scratch and one for each version upgrade (from 1.0 to 1.1, one from 1.1 to 1.2 and so on)
If you have the man power it is also very handy to maintain one "from-scratch" script for each version. Whether you need that or not depends on how often an installation on an empty system is done.
We have very good experience with using Liquibase to maintain all this. It automatically keeps track which patches have been applied to a database and which need to be run during an upgrade. It also prevents you to run the same migration twice.
A problem that all database applications have, and a difficult one to resolve. Such a solution cannot be scheduled, as the changes made by developers need to be tested first, and you certainly don't want untested code merged with your live database. This question is of interest to me because I'm currently writing a generic solution to resolve this issue once and for all.
But in the meantime, we're using an open-source product called Open DBDiff (Google it - you can't miss it), which could do with some polishing but works well enough. You pass it your source and target databases, and it generates a script to make the target the same as the source. It does seem to have some trouble copying assemblies and user roles, but for everything, I haven't had any trouble.
I believe a human should do the deployments, after making sure the changes have been tested and properly checked into the source control. This is not something to automate fully.
Human should use the tools though. I use Visual Studio 2010 Professional, which has a powerful schema comparison tool, generates and executes deployment scripts and has source control integration.

How can I run multiple versions of my application against the same database?

Backround: I have multiple versions of my application running in my production environment. Depending on the user account that's used, the user will have access to a different version of the software.
Environment: Currently SQL Server 2005, imminently migrating to SQL Server 2008, ASP.Net
Problem: Each version of the software may or may not use different versions of the stored procedures that interact with the data in the database. Currently when a version is changed, whoever changes it creates a new copy and appends an incremental version number to the end. At this point we have numerous versions of some stored procs and only one version of others and nobody is sure which version of the application is pointing to which version of the stored procs - it's a mess.
I'm looking for a solution that will neatly package the stored procs up for any given version of the application to be used as the basis for a new version. This means that the new version of the application can be pointed to the new set of procs which can be rewritten or modified to the end user's content without affecting other versions of the applications that are currently available in production.
I originally thought about schemas, but one part of the issue is that the procs are heavily coupled with other procs and user defined functions, so when copying to another schema, we'd new to map and replace all these links which isn't ideal.
It seems like this is a problem that should already be solved, but I don't know what I'm looking for in order to find a viable solution.
Does anyone have any ideas?
Given the mess you describe, if you must stick with stored procs, I would be inclined to have only one copy of each stored proc, but have all stored procs take versionNumber as the final parameter. Then each stored proc has one name, and any version specific code is there in the context of the stored proc it relates to. Stored procs that have no version specific code just ignore that versionNumber paremter (but still have it in their signature).
One of the big problems I see in your current situation is that if there are 4 nearly identical versions of a stored proc, and someone fixes a bug in one of them, it is too easy to forget to make the identical fix in all the other versions. The solution I described would at least reduce the amount of duplicate code, and reduce the chance of bug fixes only fixing one version of the stored proc.
In this situation though, I would also feel a strong urge to get application version specific logic out of the database and into the codebase, and leave in the database that logic that is data specific but application agnostic.
Assuming that there are no backwards-incompatible schema and data changes between versions, I would move all of the application logic into the application code. Once you have that, then you can use the wealth of existing versioning technologies to deploy multiple versions of the application that all point at the same database.
Trying to version store procedures is, in many cases, a fruitless effort because there simply isn't the needed access to the code in most RDBMSystems. In general, stored procedures are considered to be a potentially good idea than just didn't pan out.

Database schemas WAY out of sync - need to get up to date without losing data

The problem: we have one application that has a portion which is used by a very small subset of the total users, and that part of the application is running off of a separate database as well. In a perfect world, the schemas of the two databases would be synced up, but such is not the case. Some migrations have been run on the smaller database, most haven't; and furthermore, there is nothing such as revision number to be able to easily identify which have and which haven't. We would like to solve this quandary for future projects. During a discussion we've come up with the following possible plan of action, and I am wondering if anyone knows of any project which has already solved this problem:
What we would like to do is create an empty database from the schema of the large fully-migrated database, and then move all of the data from the smaller non-migrated database into that empty one. If it makes things easier, it can probably be assumed for the sake of this problem specifically that no migrations have ever removed anything, only added.
Else, if there are other known solutions, I'd like to hear them as well.
You could use a schema comparison tool like Red-Gate's SQL Compare. You can synchronize the changes and not lose any data. I wrote about this and many alternative tools ranging widely in price here:
http://bertrandaaron.wordpress.com/2012/04/20/re-blog-the-cost-of-reinventing-the-wheel/
The nice thing is that most tools have trial versions. So, you can try them our for 14 days (fully functional) and only buy it if it meets your expectations. I can't speak for the other tools, but I've been using RG for years and it is a very capable and reliable tool.
(Updated 2012-06-23 to help prevent link-rot.)
Red-Gate's SQL Compare as Aaron Bertrand mentions in his answer is a very good option. However, if you are not permitted to purchase something, an option is to try something like:
1) For each database, script out all the tables, constraints, indexes, views, procedures, etc.
2) run a DIFF, and go through all the differences and make sure that the small DB can accept them. If not implement any changes (including data) necessary onto the small DB so it can accept the changes.
3) create a new empty database from the schema of the large DB
4) import the data from the small DB into the nee DB.
You could also reverse engineer your database into Visual Studio as a database project. Visual Studio Team Suite Database Edition GDR R2 (I know long name) has the capability to do a schema comparison and data comparison, but the beauty of this approach is that you get all of your database into a nice database project where you can manage change and integrate with source control. This would allow you to build from a common source and deploy consistent changes.

How is Database Migration done?

i remember in my previous job, i needed to do data migration. in that case, i needed to migrate to a new system, i was to develop, so it has a different table schema. i think 1st, i should know:
in general, how is data migrated (with the same schema) to a different DB engine. eg. MySQL -> MSSQL. in my case, my destination DB was MySQL and i used MySQL Migration Toolkit
i am thinking, in an enterprise app, there may be stored procedures, triggers that also need to be imported.
if table schema is different, how will i then go abt doing this? in my prev job, what i did was import data (in my case, from Access) into my destination (MySQL) leaving table structures. then use SQL to select data and manipulate as required into final destination tables.
in my case, where i dont have documentation for the old db, and the columns was not named correctly, eg. it uses say 'field1', 'field2' etc. i needed to trace from the application code what the columns mean. any better way? or sometimes, columns contain multiple values in delimited data, is reading code the only way?
I really depends, but from your question I assume you want to hear what other people do.
So here is what I do in my current project.
I have to migrate from Oracle to Oracle but to a completely different schema.
The old system was 2-tier (old client, old database) the new system is 3-tier (new client, business logic, new database). We have more than 600 tables in the new schema.
After much pondering we scraped the idea of doing a migration from old database to new database in SQL. We decided that in our case i would be much easier to go:
old database -> old client -> business logic -> new database
In the old database much of the data is stored in strange ways and the old client
mangles it in complex ways. We have access to the source code of the old client but it is a very large system.
We wrote a migration tool that sits above the old client and the business logic.
We have some SQL before and some SQL after that but the bulk of data is migrated via
old client and business logic.
The downside is that it is slow, a complete migration taking more than 190 hours in our case but otherwise it works well.
UPDATE
As far as stored procedures and triggers are concerned:
Even as we use the same DBMS in old and new system (both Oracle) the procedures and
triggers are written from scratch for the new system.
When I've performed database migrations, I've used the application instead a general tool to migrate the database. The application connects to two databases and copies objects from one to the other. You don't have to worry about schema or permissions or whatnot since all that is handled in the application, just like what happens when you set up the application in the first place.
Of course, this may not help you if your application doesn't support this. But if you're writing an application, I strongly recommend doing it this way.
I recommend the wikipedia article for a good overview and links to the main commercial tools (and some non-commercial ones). Stored procedures (and kin, e.g. user-defined function), if abundant, are going to be the "hot spots" in the migration, requiring rare abd costly human skills -- as soon as you get away from the "declarative" mood of mainstream SQL, and into procedural code, you cannot expect automated tools to do a decent job (Turing's Theorem says that they actually can't, in a sufficiently general case;-). So, you need engineers with a good understanding of the procedural trappings of BOTH engines -- the one you're migrating from, the one you're migrating to. You can buy that -- it's one of the niches where consultants make REALLY good money!-)
If you are using MS SQL Server, you can use SSMS to script out the schema and all data in one go: SQL Server 2008: Script Data as Inserts.
If you are not using any/many non-standard SQL constructs, then you might be able to manually edit this scipt without too much effort.

Dynamic patching of databases

Please forgive my long question. I have an idea for a design that I could use some comments on. Is it a good idea to do this? And what are the pit falls I should be aware of? Are there other similar implementations that are better?
My situation is as follows:
I am working on a rewrite of a windows forms application that connects to a SQL 2008 (earlier it was SQL 2005) server. The application is an "expert-system" for an engineering company where we store structured data about constructions. We have control of all installations of the client software, we have no external customers or users, they are all internal to the company, and they are all be trusted not to do anything malicious to the software or database.
The current design doesn't have too many tables (about 10 - 20) but some of them have millions of records that belong to several hundred constructions. The systems performance has been ok so far, but it is starting to degrade as we are pushing the limits of the design.
As part of the rewrite I am considering splitting the database into one master database and several "child" databases where each describes one construction. Each child database should be of identical design. This should eliminate the performance problems we are seeing today since the data stored in each database would be less than one percent of the total data amount.
My concern is that instead of maintaining one database we will now get hundreds of databases that must be kept up to date. The system is constantly evolving as the companys requirements change (you know how it is), and while we try to look forward to reduce the number of changes the changes will come. So we will need a system where we keep track of all database changes done to the system so they can be applied to the child databases. Updating the client application won't be a problem, we have good control of that aspect.
I am thinking of a change tracing system where we store database scripts for all changes in a table in the master database. We can then give each change a version number and we can store a current version number in each child database. When the client program connects to a child database we can then check the version number of the database against the current version number of the master database and if there are patches with version numbers greater than the version number of the child database we run these and update the child database to the latest version.
As I see it this should work well. Any changes to the system will first be tested and validated before committed as a new version of the database. The change will then be applied to the database the first time a user opens it. I suppose we would open the database in exclusive mode while applying the changes, but as long as the changes aren't too frequent this should not be a problem.
So what do you think? Will this work? Have any of you done something similar? Should we scrap the solution and go for the monolithic system instead?
Have you considered partitioning your large tables by 'construction'? This could alleviate some of the growing pains by splitting the storage for the tables across files/physical devices without needing to change your application.
Adding spindles (more drives) and performing a few hours of DBA work can often be cheaper than modifying/adapting software.
Otherwise, I'd agree with #heikogerlach and these similar posts:
How do I version my ms sql database
Mechanisms for tracking DB schema changes
How do you manage databases in development, test and production?
I have a similar situation here, though I use MySQL. Every database has a versions table that contains the version (simply an integer) and a short comment of what has changed in this version. I use a script to update the databases. Every database change can be in one function or sometimes one change is made by multiple functions. Functions contain the version number in the function name. The script looks up the highest version number in a database and applies only the functions that have a higher version number in order.
This makes it easy to update databases (just add new change functions) and allows me to quickly upgrade a recovered database if necessary (just run the script again).
Even when testing the changes before this allows for defensive changes. If you make some heavy changes on a table and you want to play it safe:
def change103(...):
"Create new table."
def change104(...):
"""Transfer data from old table to new table and make
complicated changes in the process.
"""
def change105(...):
"Drop old table"
def change106(...):
"Rename new table to old table"
if in change104() is something going wrong (and throws an exception) you can simply delete the already converted data from the new table, fix your change function and run the script again.
But I don't think that changing a database dynamically when a client connects is a good idea. Sometimes changes can take some time. And the software that accesses a database should match the schema of the database. You have somehow to keep them in sync. Maybe you could distribute a new software version and then you want to upgrade the database when a client is actually starting to use this new software. But I haven't tried that.
Better don't create additional databases. At first glance you may think that you'll get some performance gain, but actually you get support nightmare. Remember - what can break, does break sooner or later.
It is way simpler to perform and optimize queries in single database. It is much easier manage user permissions in single database. It is much easier to make consistent backups for single database.
Like KenG said, if you need break your large tables - consider partitioning them. And add some drives :)
But at first run SQL profiler on your database and optimize indexes and queries. Several million rows is usually not a big problem to handle (unless your customer needs live totaling over half of these, in which case no partitioning can help).
I know that this is a crazy answer but here it goes...
I currently have a similar scenario where I need to keep control of database versions in multiple locations for a system using MS SQL Server.
What I am doing now is using Ruby on Rails ActiveRecord Migrations to keep control of database versions. Yes I know that we are talking about Windows systems but this works fine for me. (By the way, my system is programmed in VB and .NET)
I have installed Rails on each server, when I need to update the database schema I copy the migration files to the server and run rake db:migrate which updates the database to the latest version or rollbacks it to a desired version.
As a side effect you will have a set of migration files for your database schema in an database independent language (in this case ruby) that you can apply to other database engines and that you can put under source control too.
I know that this is a strange solution in which a totally different technology is used but it does not hurt to learn new approaches. You can find additional information here.
I have become a better .Net programmer since I learned Ruby on Rails. I asked here before a question about this approach.