Multi System Database structure based copying/updating best practice - sql

so after searching and not finding similar cases I want to open a new question.
So here is the case:
We are working with a large database with a very complicated data structure. Also we are working on multiple systems to ensure stability (development, testing, quality and productive) and its always a struggle so move data between those systems. As I said the data structure is very large and there is also a lot of logic inside the database. Customers are able to add new data parts as configuration and there is also a static income of data which are used for statistics and monitoring. So let me explain the problem with a small example:
Lets take this Database as an example. We have some families making some contest with each other. And they will create some statistics about the points they make.
The Purple Tables are fixed configurations. They are created once and they can only be changed via an Operator. Those changes will be done and tested in the development system first.
The Yellow Tables are changing configurations. Each Family is able to create or delete multiple Contests and assign their kids.
The Red Table is just plain data. Each time a kid makes points, a new row is added with the amount and current time and the relation to the kid and contest.
This table will be the base for the later statistics.
This Database is developed on two systems a productive one which is used by the families and a develop system which is used by the programmers/operators.
While developing the programmers will add test data like kids families contests and points. And while using the families will create new contests and assign new kids and will fill up the point table.
It's necessary to copy new/tested/fixed families from the development to the productive system.
Its also necessary to copy Contests, Contest-Kid-Assignments and Points from the productive to the development system to find new errors.
Also it must be possible to change the table structure on the development system and transmit this change to the productive system. (This shouldn't be the main topic here sometimes it can be such a large changes that there just is no easy way, so lets keep this point simple but keep it in mind.)
I want to copy parts of the tables to another system but be able to ignore some tables (for example: Points) and I want to make sure to not copy kids without their parent family so there is no "parentless" object in the database.
Question: What would be a good and save way to do this?
I don't need a solution for a specific database type or some scripts. I'm looking for tools, libraries or good practice. (But just as a note we're using mssql.)
We are currently making a tool for this problem (not going well: unstable, overly complicated, slow and possible reinventing the wheel).
Also a lot of devs I know just copy the whole database (making a backup and running it into another server) But this is also making problems: users are being copied and their guid change so they loose permissions etc. I don't think this is a good solution. Also the database is down for quite a long time and its never a smooth process.
Making it manually is sometimes the easiest way but considering the size of our data structure its not just a huge piece of work there is also a large possibility for mistakes.
So I'm hoping someone knows a tool or something similar to help me out.

Welcome to the pains of development for a Stateful entity like a database. :) RedGate makes a tool called SQL Source Control that is good for moving changed data and Schema into Production, and it can interface with source control solutions such as GIT. It's a bit pricey, but it's the best I've found. One option for keeping dev up to date with prod data and dev changes is one I concocted at my last place of employment, which was... not 100% perfect, but better than nothing, and free. It was developed in Powershell, and it went something like this:
Create Pre-restore, Pre-dacpac and Post-dacpac SQL scripts to store data and
permission diffs between dev and prod
Use SQLPackage.EXE to make DacPac of Dev(Dacpac is basically an xml schema of db, no
data)
Execute Pre-restore Proc (Often copying out test data that needs to be persisted)
Restore Prod over Dev
Execute Pre-dacpac script (any DDL That could cause data loss may need to go here)
Use SQLPackage.EXE to apply DacPac made in step 2 to Newly restored database
Execute Post-Dacpac Script (Permissions, restoration of data copied in step 3)
Again, like I said, it worked and automated the restoration of prod data into our dev environment while keeping our dev changes intact, but it required a good bit of upkeep and maintenance. Also, keep in mind, once your DB reaches a certain size, doing a nightly restore is no longer a viable option due to the time it takes to restore.

Related

Is it good practice to count on the file system as a database?

I'm working on an ASP.net web application that uses SQL as a database back-end. One issue that I have is that it sometimes takes a while to get my DBA to create or modify tables in the database which under no circumstance am I allowed to modify on my own.
Here is something that I do is when I expect users to upload files with their data.
Suppose the user uploads a new record for a table called Student_Records. The user uploads a record with fname Bob and lname Smith. The record is assigned primary key 123 The user also uploads two files: attendance_record.pdf and homework_record.pdf. Let's suppose that I have a network share: \\foo\bar where the files are saved.
One way of handling this situtation would be to have a table Student_Records_Files that associates the key 123 with Bob Smith. However, since I have trouble getting tables created, I've gone and done something different: When I save the files on the server, I call them 123_attendance_record.pdf and 123_homework_record.pdf. That way, I can easily identify what table record each file is associated with without having to create a new SQL table. I am, in essence, using the file system itself as a join table (Obviously, the file system is a type of database).
In my code for retrieving the files, I scan the directory \\foo\bar and look for files that begin with each primary key number from Student_Records.
It seems to work very well, but is it good practice?
There is nothing wrong with using the file system to store files. It's what it is used for.
There are a few things to keep in mind though.
I would consider a better method of storing the files - perhaps a directory for each user, rather than simply appending the user id to the filename.
Ensure that the file store is resilient and backed up with the same regularity as your database. If your database is configured to give you a backup every 10 minutes, but your file store only does a backup every day (or worse week) then you might be in for a world of pain.
Also consider what would happen if the user uploads two documents that are the same name.
First of all, I think it's a bad practice, in general, to design your architecture based on how responsive your DBA is. Any given compromise based on this approach may or may not be a big deal, but over time it will result in a poorly designed system.
Second, making the file name this critical seems dangerous to me; there's no protection against a person or application modifying the filename without realizing its importance.
Third, one of the advantages of having a table to maintain the join between the person and the file is that you can add additional data, such as: when was the file uploaded, what is the MIME type, has the file been read by anyone through the system, is this file a newer version of a previous file, etc. etc. Metadata can be very powerful, and the filesystem offers only limited ways to store it.
There are really two questions here. One is, given that for administrative reasons you cannot get changes made to the database schema, is it acceptable to devise some workaround. To that I'd have to say yes. What else can you do? In theory, if it takes two weeks to get the DBA to make a schema change for you, then this two weeks should be added to any deadline that you are given. In practice, this almost never happens. I've often worked places where some paperwork or whatever required two weeks before I could even begin work, and then I'd be given two weeks and one day to do the project. Sometimes you just have to put it together with rubber bands and bandaids.
Two is, is it a good idea to build a naming convention into file names and use this to identify files and their relationship to other data. I've done this at times and it's generally worked for me, though I have a perhaps irrational emotional feeling that it's not a good idea.
On the plus side, (a) By building information into a file name, you make it easy for both the computer and a human being to identify file associations. (Human readable as long as the naming convention is straightforward enough, anyway.) (b) By eliminating the separate storage of a link, you eliminate the possibility of a bad link. A file with the appropriate name may not exist, of course, but a database record with appropriate keys may not exist, or the file reference in such a record may be null or invalid. So it seems to solve one problem there without creating any new problems.
Potential minuses are: (a) You may have characters in the key that are not legal in file names. You may be able to just strip such characters out, or this may cause duplicates. The only safe thing to do is to escape them in some way, which is a pain. (b) You may exceed the legal length of a file name. Not as much of an issue as it was in the bad old 8.3 days. (c) You can't share files. If a database record points to a file, then two db records could point to the same file. If you must make two copies of a file, not only does this waste disk space, but it also means that if the file is updated, you must be sure to update all copies. If in your application it would make no sense to share files, than this isn't an issue.
You have to manage the files in some way, but you had to do that anyway.
I really can't think of any over-riding minuses. As I say, I've done this on occassion and didn't run into any particular problems. I'm interested in seeing others' responses.
I think it is not good practice because you are making your working application very dependent on specific implementation details and it would make it pretty hard to work with in the future to maintain, or if other people later needed access to your code/api.
Now weather you should do this or not is a whole different question. If you are really taking that much of a performance hit and it is significantly easier to work with how you have it, then I would say go ahead and break the rules. Ideally its good to follow best practice methods, but sometimes you have to bend the rules a little to make things work.
First, why is this a table change as opposed to a data change? Once you have the tables set up you should only need to update rows in that table every time that a user adds new files. If you have to put up with this one-time, two-week delay then bite the bullet and just get it done right.
Second, instead of trying to work around the problem why don't you try to fix the problem? Why is the process of implementing table changes so slow? Are you at least able to work on a development database (in which you have control to test and try out these changes)? Even if it's your own laptop you can at least continue on with development. Work with your manager, the DBA, and whoever else you need to, in order to improve the process. Would it help to speed things up if your scripts went through a formal testing process before you handed them off to the DBA so that he doesn't need to test the scripts, etc. himself?
Third, if this is a production database then you should probably be building in this two-week delay into your development cycle. You know that it takes two weeks for the DBA to review and implement changes in production, so make sure that if you have a deadline for releasing functionality that you have enough lead time for it.
Building this kind of "data" into a filename has inherent problems as others have pointed out. You have no relational integrity guarantees and the "data" can be changed without knowledge of the rest of the application/database.
It's best to keep everything in the database.
Network file I/O is spotty at best. In addition, its slower than the DB I/O.
If the DBA is difficult in getting small changes into the database, you
may be dealing with:
A political control issue. Maybe he just knows DB stuff and is threatened
when he perceives others moving in on his turf. Whatever his reasons, you need
to GET WORK DONE. Period. Document all the extra time / communication / work
you need to do for each small change and take that up with the management.
If the first level of management is unwilling to see things your way,
(it does not matter what their reasons are), escalate the issue
to the next level of management. In the past, I've gotten results this way.
It was more of a political territory problem than a technical problem.
The DBA eventually gave up and gave me full access to the TEST system BUT
he also stipulated that I would need to learn his testing process,
naming convention, his DB standards and practices, his way of testing, etc.
I was game.
I would also need to fix any database problems arising from changes I introduced.
This was fair and I got to wear the DBA hat in addition to the developer hat.
I got the freedom I needed and he got one less thing to worry about.
A process issue. Maybe the DBA needs to put every small DB change you submit
through a gauntlet of testing and performance analysis. Maybe he has a highly
normalized DB schema and because he has the big picture, he needs to normalize or
denormalize your requested DB changes to fit into the existing schema.
Ask to work with him. Ask him for a full DB design diagram.
Get a good sense of his DB design philosophy. Implement your DB changes with
his DB design philosophy in mind. Show that you understand that he's trying
to keep the DB in good order (understand normalization, relational constraints,
check constraints) Give him less to worry about. He needs to trust that you
will not muck up his database.
Accumulate all the small changes into a lengthy script and submit them to the DBA.
This way, you won't have to wait for each small change to go through all of his
process / testing. In addition, you're giving him a bigger picture view of your
development planning (that is in step with his DB design philosophy) instead of
just the play by play.

Anybody using SQL Source Control from Red Gate

We have been looking into possible solutions for our SQL Source control. I just came across Red Gates SQL Source control and wondered if anyone has implemented it? I am going to download the trial and give it a shot, but just wanted to see if others have real experience.
As always greatly appreciate the input
--S
I have updated my original post below to reflect changes in the latest versions of SQL Source Control (3.0) and SQL Compare (10.1).
Since this question was asked over a year ago, my response may not be that helpful to you, but for others who may currently be evaluating SSC, I thought I would throw in my two cents. We just started using SQL Source Control (SSC) and overall I am fairly satisfied with it so far. It does have some quirks though, especially if you are working in a shared database environment (as opposed to every developer working locally) and particularly working in a legacy environment where objects in the same database are divided haphazardly between development teams.
To give a brief overview of how we are using the product in our organization, we are working in a shared environment where we all make changes to the same development database, so we attached the shared database to the source control repository. Each developer is responsible for making changes to the objects in the database through SQL Server Management Studio (SSMS), and when they are finished, they can commit their changes to source control. When we are ready to deploy to staging, the build master (me) merges the development branch of the database code to the main (staging) branch and then runs SQL Compare using the main branch repository version of the database as the source and the live staging database as the target, and SQL Compare generates the necessary scripts to deploy the changes made to the staging environment. Staging to production deployments works in similar fashion. One other important point to note is that, given the fact that we are sharing the same database with other development teams, we use a built in feature of SSC that allows you to create filters on database objects by name, type, etc. We manually set up filters on our specific team's objects, excluding all other objects, so that we don't accidentally commit other development team's changes when we do our deployments.
So in general it's a fairly simple product to set up and use and it's really nice because you're always working with live objects in SSMS, as opposed to disconnected script files stored in a separate source repository that run the risk of getting out of sync. It's also nice because SQL Compare generates the deployment scripts for you so you don't have to worry about introducing errors as you would if you were creating the scripts on your own. And as SQL Compare is a very mature and stable product, you can feel pretty confident that it's going to create the proper scripts for you.
With that being said, however, here are some of the quirks that I have run into so far:
SSC is pretty chatty out of the box in terms of communicating with the db server in order to keep track of database items that are out of sync with the source control repository. It polls every few milliseconds and if you add in multiple developers all working against the same database using SSC, you can imagine that our dba's weren't very happy. Fortunately, you can easily reduce your polling frequency to something more acceptable, although at the cost of sacrificing responsive visual notifications of when objects have been changed.
Using the object filtering feature, you can't easily tell from looking at objects in SSMS which objects are included in your filter. So you don’t know for sure if an object is under source control, unlike in Visual Studio, where icons are used to indicate source controlled objects.
The object filtering GUI is very clunky. Due to the fact that we are working in a legacy database environment, there is currently not a clear separation between the objects that our team owns and those owned by other teams, so in order to prevent us from accidentally committing/deploying other teams’ changes, we have set up a filtering scheme to explicitly include each specific object that we own. As you can imagine, this becomes quite cumbersome, and as the GUI to edit the filters is set up to enter one object at a time, it could become quite painful, especially trying to set up your environment for the first time (I ended up writing an application to do this). Going forward, we are creating a new schema for our application to better facilitate object filtering (besides being a better practice anyway).
Using the shared database model, developers are allowed to commit any pending changes to a source controlled database, even if the changes are not theirs. SSC does give you a warning if you try to check in a bunch of changes that these changes might not be yours, but other than that you’re on your own. I actually find this to be one of SSC’s most dangerous “quirks”.
SQL Compare can’t currently share the object filters created by SSC, so you would have to manually create a matching filter in SQL Compare, so there is a danger that these could get out of sync. I just ended up cut-and-pasting the filters from the underlying SSC filter file into the SQL Compare project filter to avoid dealing with the clunky object filtering GUI. I believe that the next version of SQL Compare will allow it to share filters with SSC, so at least this problem is only a short term one. (NOTE: This issue has been resolved in the latest version of SQL Compare. SQL Compare can now use the object filters created by SSC.)
SQL Compare also can’t compare against a SSC database repository when launched directly. It has to be launched from within SSMS. I believe that the next version of SQL Compare will provide this functionality, so again it’s another short term problem. (NOTE: This issue has been resolved in the latest version of SQL Compare.)
Sometimes SQL Compare isn’t able to create the proper scripts to get the target database from one state to another, usually in the case where you are updating the schema of existing tables that aren’t empty, so you currently have to write manual scripts and manage the process yourself. Fortunately, this will be addressed through “migration scripts” in the next release of SSC, and from looking at the early release version of the product, it appears that the implementation of this new feature was well thought out and designed. (NOTE: Migration scripts functionality has been officially released. However, it does not currently support branching. If you want to use migration scripts, you will need to run sql compare against your original development code branch... the one where you checked in your changes... which is pretty clunky and has forced me to modify my build process in a less than ideal way in order to work around this limitation. Hopefully this will be addressed in a future release.)
Overall, I am pretty happy with the product and with Redgate’s responsiveness to user feedback and the direction that the product is taking. The product is very easy to use and well designed, and I feel that in the next release or two the product will probably give us most, if not all, of what we need.
I use SQL Compare for generating scripts when going from dev -> test -> production and it saves me tons of time.
For source control though, we use SVN and ScriptDB (http://scriptdb.codeplex.com/) though. I mainly use source control of SQL scripts for keeping track of changes. I think that rolling back a version of the database seldomly (if ever) works since data may have changed when making structure changes.
This works fine for a few of our current projects (largest is 200 tables and 2000 sprocs). The main reason for doing this though is cost since not all team members have to buy SQL Compare (I avoid adding dependencies to commercial projects unless really needed).
We performed an extensive evaluation of Red Gate's product and found a few major flaws. If you want to look at who changed an object, you can't do it without SysAdmin privileges. The product needs to look at the trace on your server, which requires those rights. I'm on a 5+ person team, and not knowing who had pending changes is what will stop us from using the product.
I just started working for a new company and they use Redgate SQL Source Control for all their projects, amonst them a large and complex one. It does the job well in tandem with TFS. The only drawback from my point of view is that the SQL Server Management Studio integration is highly unstable. Frequent crashes of SQL Server Management Studio happen when the tools are installed.

Getting a significant amount of data into a SQL Server (Express) database at time of deployment

For most database-backed projects I've worked on, there is a need to get "startup" or test data into the database before deploying the project. Examples of startup data: a table that lists all the countries in the world or a table that lists a bunch of colors that will be used to populate a color palette.
I've been using a system where I store all my startup data in an Excel spreadsheet (with one table per worksheet), then I have a utility script in SQL that (1) creates the database, (2) creates the schemas, (3) creates the tables (including primary and foreign keys), (4) connects to the spreadsheet as a linked server, and (5) inserts all the data into the tables.
I mostly like this system. I find it very easy to lay out columns in Excel, verify foreign key relationships using simple lookup functions, perform concatenation operations, copy in data from web tables or other spreadsheets, etc. One major disadvantage of this system is the need to sync up the columns in my worksheets any time I change a table definition.
I've been going through some tutorials to learn new .NET technologies or design patterns, and I've noticed that these typically involve using Visual Studio to create the database and add tables (rather than scripts), and the data is typically entered using the built-in designer. This has me wondering if maybe the way I'm doing it is not the most efficient or maintainable.
Questions
In general, do you find it preferable to build your whole database via scripts or a GUI designer, such as SSMSE or Visual Studio?
What method do you recommend for populating your database with startup or test data and why?
Clarification
Judging by the answers so far, I think I should clarify something. Assume that I have a significant amount of data (hundreds or thousands of rows) that needs to find its way into the database. This data could be sourced from various places, such as text files, spreadsheets, web tables, etc. I've received several suggestions to script this process using INSERT statements, but is this really viable when you're talking about a lot of data?
Which leads me to...
New questions
How would you write a SQL script to take the country data on this page and insert it into the database?
With Excel, I could just copy/paste the table into a worksheet and run my utility script, and I'd basically be done.
What if you later realized you needed a new column, CapitalCity?
With Excel, I could take that information from this page, paste it into Excel, and with a quick text-to-column manipulation, I'd have the data in the format I need.
I honestly didn't write this question to defend Excel as the best way or even a good way to get data into a database, but the answers so far don't seem to be addressing my main concern--how to get all this data into your database. Writing a script with hundreds of INSERT statements by hand would be extremely time consuming and error prone. Somehow, this script needs to be machine generated, but how?
I think your current process is fine for seeding the database with initial data. It's simple, easy to maintain, and works for you. If you've got a good database design with adequate constraints then it doesn't really matter how you seed the initial data. You could use an intermediate tool to generate scripts but why bother?
SSIS has a steep learning curve, doesn't work well with source control (impossible to tell what changed between versions), and is very finicky about type conversions from Excel. There's also an issue with how many rows it reads ahead to determine the data type -- you're in deep trouble if your first x rows contain numbers stored as text.
1) I prefer to use scripts for several reasons.
• Scripts are easy to modify, and plus when I get ready to deploy my application to a production environment, I already have the scripts written so I'm all set.
• If I need to deploy my database to a different platform (like Oracle or MySQL) then it's easy to make minor modifications to the scripts to work on the target database.
• With scripts, I'm not dependent on a tool like Visual Studio to build and maintain the database.
2) I like good old fashioned insert statements using a script. Again, at deployment time scripts are your best friend. At our shop, when we deploy our applications we have to have scripts ready for the DBA's to run, as that's what they expect.
I just find that scripts are simple, easy to maintain, and the "least common denominator" when it comes to creating a database and loading up data to it. By least common denominator, I mean that the majority of people (i.e. DBA's, other people in your shop that might not have visual studio) will be able to use them without any trouble.
The other thing that's important with scripts is that it forces you to learn SQL and more specfically DDL (data definition language). While the hand-holding GUI tools are nice, there's no substitute for taking the time to learn SQL and DDL inside out. I've found that those skills are invaluable to have in almost any shop.
Frankly, I find the concept of using Excel here a bit scary. It obviously works, but it's creating a dependency on an ad-hoc data source that won't be resolved until much later. Last thing you want is to be in a mad rush to deploy a database and find out that the Excel file is mangled, or worse, missing entirely. I suppose the severity of this would vary from company to company as a function of risk tolerance, but I would be actively seeking to remove Excel from the equation, or at least remove it as a permanent fixture.
I always use scripts to create databases, because scripts are portable and repeatable - you can use (almost) the same script to create a development database, a QA database, a UAT database, and a production database. For this reason it's equally important to use scripts to modify existing databases.
I also always use a script to create bootstrap data (AKA startup data), and there's a very important reason for this: there's usually more scripting to be done afterward. Or at least there should be. Bootstrap data is almost invariably read-only, and as such, you should be placing it on a read-only filegroup to improve performance and prevent accidental changes. So you'll generally need to script the data first, then make the filegroup read-only.
On a more philosophical level, though, if this startup data is required for the database to work properly - and most of the time, it is - then you really ought to consider it part of the data definition itself, the metadata. For that reason, I don't think it's appropriate to have the data defined anywhere but in the same script or set of scripts that you use to create the database itself.
Test data is a little different, but in my experience you're usually trying to auto-generate that data in some fashion, which makes it even more important to use a script. You don't want to have to manually maintain an ad-hoc database of millions of rows for testing purposes.
If your problem is that the test or startup data comes from an external source - a web page, a CSV file, etc. - then I would handle this with an actual "configuration database." This way you don't have to validate references with VLOOKUPS as in Excel, you can actually enforce them.
Use SQL Server Integration Services (formerly DTS) to pull your external data from CSV, Excel, or wherever, into your configuration database - if you need to periodically refresh the data, you can save the SSIS package so it ends up being just a couple of clicks.
If you need to use Excel as an intermediary, i.e. to format or restructure some data from a web page, that's fine, but the important thing IMO is to get it out of Excel as soon as possible, and SSIS with a config database is an excellent repeatable method of doing that.
When you are ready to migrate the data from your configuration database into your application database, you can use SQL Server Management Studio to generate a script for the data (in case you don't already know - when you right click on the database, go to Tasks, Generate Scripts, and turn on "Script Data" in the Script Options). If you're really hardcore, you can actually script the scripting process, but I find that this usually takes less than a minute anyway.
It may sound like a lot of overhead, but in practice the effort is minimal. You set up your configuration database once, create an SSIS package once, and refresh the config data maybe once every few months or maybe never (this is the part you're already doing, and this part will become less work). Once that "setup" is out of the way, it's really just a few minutes to generate the script, which you can then use on all copies of the main database.
Since I use an object-relational mapper (Hibernate, there is also a .NET version), I prefer to generate such data in my programming language. The ORM then takes care of writing things into the database. I don't have to worry about changing column names in the data because I need to fix the mapping anyway. If refactoring is involved, it usually takes care of the startup/test data also.
Excel is an unnecessary component of this process.
Script the current version the database components that you want to reuse, and add the script to your source control system. When you need to make changes in the future, either modify the entities in the database and regenerate the script, or modify the script and regenerate the database.
Avoid mixing Visual Studio's db designer and Excel as they only add complexity. Scripts and SQL Management Studio are your friends.

Dynamic patching of databases

Please forgive my long question. I have an idea for a design that I could use some comments on. Is it a good idea to do this? And what are the pit falls I should be aware of? Are there other similar implementations that are better?
My situation is as follows:
I am working on a rewrite of a windows forms application that connects to a SQL 2008 (earlier it was SQL 2005) server. The application is an "expert-system" for an engineering company where we store structured data about constructions. We have control of all installations of the client software, we have no external customers or users, they are all internal to the company, and they are all be trusted not to do anything malicious to the software or database.
The current design doesn't have too many tables (about 10 - 20) but some of them have millions of records that belong to several hundred constructions. The systems performance has been ok so far, but it is starting to degrade as we are pushing the limits of the design.
As part of the rewrite I am considering splitting the database into one master database and several "child" databases where each describes one construction. Each child database should be of identical design. This should eliminate the performance problems we are seeing today since the data stored in each database would be less than one percent of the total data amount.
My concern is that instead of maintaining one database we will now get hundreds of databases that must be kept up to date. The system is constantly evolving as the companys requirements change (you know how it is), and while we try to look forward to reduce the number of changes the changes will come. So we will need a system where we keep track of all database changes done to the system so they can be applied to the child databases. Updating the client application won't be a problem, we have good control of that aspect.
I am thinking of a change tracing system where we store database scripts for all changes in a table in the master database. We can then give each change a version number and we can store a current version number in each child database. When the client program connects to a child database we can then check the version number of the database against the current version number of the master database and if there are patches with version numbers greater than the version number of the child database we run these and update the child database to the latest version.
As I see it this should work well. Any changes to the system will first be tested and validated before committed as a new version of the database. The change will then be applied to the database the first time a user opens it. I suppose we would open the database in exclusive mode while applying the changes, but as long as the changes aren't too frequent this should not be a problem.
So what do you think? Will this work? Have any of you done something similar? Should we scrap the solution and go for the monolithic system instead?
Have you considered partitioning your large tables by 'construction'? This could alleviate some of the growing pains by splitting the storage for the tables across files/physical devices without needing to change your application.
Adding spindles (more drives) and performing a few hours of DBA work can often be cheaper than modifying/adapting software.
Otherwise, I'd agree with #heikogerlach and these similar posts:
How do I version my ms sql database
Mechanisms for tracking DB schema changes
How do you manage databases in development, test and production?
I have a similar situation here, though I use MySQL. Every database has a versions table that contains the version (simply an integer) and a short comment of what has changed in this version. I use a script to update the databases. Every database change can be in one function or sometimes one change is made by multiple functions. Functions contain the version number in the function name. The script looks up the highest version number in a database and applies only the functions that have a higher version number in order.
This makes it easy to update databases (just add new change functions) and allows me to quickly upgrade a recovered database if necessary (just run the script again).
Even when testing the changes before this allows for defensive changes. If you make some heavy changes on a table and you want to play it safe:
def change103(...):
"Create new table."
def change104(...):
"""Transfer data from old table to new table and make
complicated changes in the process.
"""
def change105(...):
"Drop old table"
def change106(...):
"Rename new table to old table"
if in change104() is something going wrong (and throws an exception) you can simply delete the already converted data from the new table, fix your change function and run the script again.
But I don't think that changing a database dynamically when a client connects is a good idea. Sometimes changes can take some time. And the software that accesses a database should match the schema of the database. You have somehow to keep them in sync. Maybe you could distribute a new software version and then you want to upgrade the database when a client is actually starting to use this new software. But I haven't tried that.
Better don't create additional databases. At first glance you may think that you'll get some performance gain, but actually you get support nightmare. Remember - what can break, does break sooner or later.
It is way simpler to perform and optimize queries in single database. It is much easier manage user permissions in single database. It is much easier to make consistent backups for single database.
Like KenG said, if you need break your large tables - consider partitioning them. And add some drives :)
But at first run SQL profiler on your database and optimize indexes and queries. Several million rows is usually not a big problem to handle (unless your customer needs live totaling over half of these, in which case no partitioning can help).
I know that this is a crazy answer but here it goes...
I currently have a similar scenario where I need to keep control of database versions in multiple locations for a system using MS SQL Server.
What I am doing now is using Ruby on Rails ActiveRecord Migrations to keep control of database versions. Yes I know that we are talking about Windows systems but this works fine for me. (By the way, my system is programmed in VB and .NET)
I have installed Rails on each server, when I need to update the database schema I copy the migration files to the server and run rake db:migrate which updates the database to the latest version or rollbacks it to a desired version.
As a side effect you will have a set of migration files for your database schema in an database independent language (in this case ruby) that you can apply to other database engines and that you can put under source control too.
I know that this is a strange solution in which a totally different technology is used but it does not hurt to learn new approaches. You can find additional information here.
I have become a better .Net programmer since I learned Ruby on Rails. I asked here before a question about this approach.

Best Database Change Control Methodologies

As a database architect, developer, and consultant, there are many questions that can be answered. One, though I was asked recently and still can't answer good, is...
"What is one of, or some of, the best methods or techniques to keep database changes documented, organized, and yet able to roll out effectively either in a single-developer or multi-developer environment."
This may involve stored procedures and other object scripts, but especially schemas - from documentation, to the new physical update scripts, to rollout, and then full-circle. There are applications to make this happen, but require schema hooks and overhead. I would rather like to know about techniques used without a lot of extra third-party involvement.
The easiest way I have seen this done without the aid of an external tool is to create a "schema patch" if you will. The schema patch is just a simple t-sql script. The schema patch is given a version number within the script and this number is stored in a table in the database to receive the changes.
Any new changes to the database involve creating a new schema patch that you can then run in sequence which would then detect what version the database is currently on and run all schema patches in between. Afterwards the schema version table is updated with whatever date/time the patch was executed to store for the next run.
A good book that goes into details like this is called Refactoring Databases.
If you wish to use an external tool you can look at Ruby's Migrations project or a similar tool in C# called Migrator.NET. These tools work by creating c# classes/ruby classes with an "Forward" and "Backward" migration. These tools are more feature rich because they know how to go forward as well as backwards in the schema patches. As you stated however, you are not interested in an external tool, but I thought I would add that for other readers anyways.
I rather liked this series:
http://odetocode.com/Blogs/scott/archive/2008/02/03/11746.aspx
In my case I have a script generate every time I change the database, I named the script like 00001.sql, n.sql and I have a table with de number of last script I have execute. You can also see Database Documentation
as long as you add columns/tables to your database it will be an easy task by scripting these changes in advance in sql-files. you just execute them. maybe you have some order to execute them.
a good solution would be to make one file per table, so that all changes belonging to this table would be visible to who-ever is working on the table (its like working on a class). the same is valid for stored procedures or views.
a more difficult task (and therefore maybe tools would be good) is to step back. as long as you just added tables/columns maybe this would not be a big issue. but if you have dropped columns on an update, and now you have to undo your update, the data is not there anymore. you will need to get this data from the backup. but keep in mind, if you have more then a few tables this could be a big task, and in the normal case you should undo your update very fast!
if you could just restore the backup, then its fine in this moment. but, if you update on monday, your clients work till wednesday and then they see that some data is missing (which you just dropped out of a table) then you could not just restore the old database.
i have a model-based approach in my mind (sorry, not implemented at the moment) in which schema-changes are "modeled" (e.g. per xml) and during an update a processor (e.g. a c# program) creates all necessary "sql" and e.g. moves data to a "dropDatabase". the data can reside there, and if for some reason i need to restore some of the dropped data, i can just do it with the processor. i think over some time (years) this approach is not as bad because otherwise developers don't touch "old" tables because they don't know anymore if the table or column is really necessary. with this approach you don't risk too lot if you drop something!
What I do is:
All the DDL commands required to recreate the schema (and the stored procedures and the indexes, etc) are in a script.
To be sure the script is OK, it is tested from time to time (create a database, run the script and restore the backup and check the database works well).
For change control, the script is kept in a Version Control System (I typically use Subversion).
The trick is that, if the database cannot be brought down to recreate with, say, an added column, I have two changes to make, an ALTER TABLE + a modification in the script. A bit more work but, in the long term, it wins.