Here's the scenario.
Two identical databases:
One Live database, one Archive database, they're suppose have the exact same schema (table, view, indexes, SPs, functions), the only difference is the data in the databases. The data in Live DB will be archived with some business rules and apparently the data in Archive DB will be different from in Live DB.
The challenge is that we keep on patching changes (SP change, function change, data change, or even table schema change) to the Live DB in each release. Unfortunately, the changes required on Archive DB are forgotten for a long time and the issues have just not been addressed yet. It will happen one day that the out-of-sync DBs come back and bite us.
Here's what I want to do: I want to synchronize non-data related changes from Live DB to Archive DB. Either automated or manually.
Any idea is welcome. Here are some ideas that have come to my mind:
replication? I find replication does not fit this scenario quite well.
scripting the SP/function/view changes? I can manually pull out the scripts and combine them together. What about the table schema changes? It's difficult for me to track back to find out what's happened on table schema changes.
I know there's Redgate and other product can do the job but I'd like to explore the full potential.
If anybody can point out some feasible way that'd be great.
If you are using SQL Server, Visual Studio Team System Database Edition has a schema comparison and patching tool. Have a look at this article
Related
Here is the use-case: we need to backup some of the tables from a client server, copy it to our servers, restore it, then running some queries using ODBC.
I managed to do this process for the entire database by using probkup for backup, prorest for restore and proserve to make it accessible for SQL queries.
However, some of the databases are big (> 8GB), so we are looking for a solution to do the backup for only the tables we need. I didn't find anything with the documentation of probkup how this can be done.
Progress only supports full database backups.
To get the effect that you are looking for you could dump (export) the tables that you want and then load them into an empty database.
"proutil dump" and "proutil load" are where you want to start digging.
The details will vary depending on exactly what you want to do and what resources and capabilities you have available to you.
Another option would be to replicate the tables in question to a partial database. Progress has a product called "pro2" that can help with that. It is usually pointed at SQL targets but you could also point it at a Progress database.
Or, if you have programming skills, you could put together a solution using replication triggers (under the covers that's what pro2 does...)
probkup and prorest are block-level programs and can't do a backup or restore by table.
To do what you're asking for, you'll need to do a dump the data from the source db's tables and then load it into the target db.
If your object is simply to maintain a copy of the DB, you might also try incremental backups. Depending upon your situation, that might speed things up a bit.
Other options include various forms of DB replication, which allow you to keep real- or near-real-time copies of your database.
OpenEdge Replication. With the correct license, you can do query-only access on the replication target, which is good for reporting and analysis.
Third-party replication products. These can be more flexible in terms of both target DBs and limiting the tables to be replicated.
Home-grown replication (by copying and applying AI files). This is not terribly complicated, but you have to factor the cost of doing the work and maintaining the system. There are some scripts out there that can get you started.
Or, as Tom said, you can get clever with replication via triggers.
Firstly, let me apologize for the title, as it probably isn't as clear as I think it is.
What I'm looking for is a way to keep sample data in a database (SQL, 2005 2008 and Express) that get modified every so often. At present I have a handful of scripts to populate the database with a specific set of data, but every time the database is changed all the scripts have to be more or less rewritten and I was looking for some alternatives.
I've seen a number of tools and other software for creating sample data in a database, some free and some not. Are there any other methods I haven’t considered?
Thanks in advance for any input.
Edit: Also, if anyone has any advice at all in dealing with keeping data in sync with a changing application or database, that would be of some help as well.
If you are looking for tools for SQL server, go visit Red Gate Software, they have the best tools. They have a data compare tool that you can use to keep lookup type tables up-to-date and a SQL compare tool that you can use to keep the tables synched up between two datbases. So using SQL data compare, create a datbase with all the sample data you want. Then periodically refresh your testing db (or your prod db if these are strictly lookup type tables) using the compare tool.
I also like the alternative of having a script (you can use Red Gate's tool to create scripts) because that means you can store this info in your source control and use it as part of a deployment package to other servers.
You could save them in another database or the same db in different tables distinguished by the name, like employee_test
Joseph,
Do you need to keep just the data in sync, or the schema as well?
One solution to the data question would be SQL Server snapshots. You create a snapshot of your initial configuration, so any changes to the "real" database don't show up in the snapshot. Then, when you need to reset the table, select from the snapshot into a new table. I'm not sure how it will work if the schema changes, but it might be worth a try.
For generation of sample data, the Database project in Visual Studio has functionality that will create fake/random data.
Let me know if this make sense.
Erick
here's a more general question on how you handle database schema changes in a development team.
We are a team of developers and the databases used during development are running locally on everyone's box as we want to avoid the requirement to have web access all the time. So running a single central database instance somewhere is not a real option.
Whenever one of us decides that it is time to extend/change the db schema, we mail database files (MYI/MYD) or SQL files to execute around, or give others instructions on the phone what they need to do to get the changed code running on their local DBs. That's not the perfect approach for sure. The same problem arises when we need to adjust the DB schema on staging or production once a new release is ready.
I was wondering ... how do you guys handle this kind of stuff? For source code, we use SVN.
Really appreciate your input!
Thanks,
Michael
One approach we've used in the past is to script the entire DDL for the database, along with any test/setup data needed. Store that in SVN, then when there's a change, any developer can pull down the changes, drop the database, and rebuild it from the script files.
At the very least you should have the scripts of all the objects in the database (tables, stored procedures, etc) under source control.
I don't think mailing schema changes is a real option for a professional development team.
We had a system on one of my previous teams that was the best I've encountered for dealing with this situation.
The nightly build of the application included a build of a database (SQL Server). The database got built to the Test DB server. Each developer then had a DTS package (this was a while ago, and I'm sure they upgraded to SSIS packages) to pull down that nightly DB build to their local DB environment.
This kept the master copy in one location and put the onus on the developers to keep their local dev databases fresh.
At my work, we deal with pretty large databases that are time-consuming to generate, so for us, starting from scratch with a new DB isn't ideal. Like Harper, we have our DDL in SVN. Additionally, we store a version number in a database table. Every check-in that changes the DB must be accompanied by a script that:
Will upgrade the database schema and modify any existing data appropriately, and
Will update the version number in the database.
Further, we number the scripts and database versions such that a script we've written knows how to upgrade further along a branch or from an older branch to a newer one without any input from the developer (apart from the database name and the directory to the upgrade scripts).
Thus, if I've got a copy of a customer's 4GB DB that's from a year old version and I want to test how their data will work with the version we cut yesterday, I can just run our script and let it handle the upgrades rather than having to start from scratch and redo every INSERT, UPDATE and DELETE performed since the database was created.
We have a non-SQL description of the database schema. When the application starts, it compares the desired database schema with the actual database schema, and performs whatever ADD TABLE, ADD COLUMN, ADD INDEX, etc. statements it needs to do to get the database to look right.
This doesn't handle every case; sometimes you have to delete the database and recreate if if you've changed something that the schema resolver can't handle, but most of the time we don't need to worry about it.
I'd certainly keep the database schema in source code control.
At my present job, every time there's a schema change, we write the SQL for the change (alter table xyz add column ...) and put it in SVN. Then developers can update test databases by running this script. It's pretty clumsy but it works.
At a previous job I wrote some code that at application start-up would automatically compare the actual database schema to what it expected, and if it was not up to date perform the updates. Mostly this was done for deployment reasons: When we shipped new copies of the software, it would then automatically update the user's database. But it was also handy for developers.
I think there should be some generic SQL tool to do this. Maybe there is, but I've never seen one.
I have a database server with few main databases, and few dozens of small ones.
These small databases are kind of intermediary/staging databases for data import from various sources into main database. Data import is a daily task. They are all quite similar in structure as the implementation of these data imports are similar, so basically they have a configuration tables, which define mapping, conversions etc, and the data tables, which contain the results of the import.
Some time ago there have been only the handful of small ones, but now I have more then 20 of them will grow further with the number of supported data feeds.
I have just migrated all the server environment to SQL Server 2008, and having some time now for clean-up/refactoring, I am thinking to merge all of data-import databases into just one database, and use database schema to separate them.
Question-0: Any other ideas for the described situation?
Question-1: Shall I change from a separate database to a separate schema?
Question-2: !!!: Any tricky thing to be careful about in database schema implementation?
Edit-1: highlighted question-2 as the most 'unanswered' currently.
In your instance, I would probably put merge the databases into one. I don't really see a reason to have them separated, and merging them will reduce the amount of work you have to do to support backups etc. If you were importing data from a data source once and then never using the staging tables again, I could see the reason to bring up separate databases to handle the data transformation. Since you use these tables on an ongoing basis, I would much rather keep them together so that I only have to go to one place to find the full end to end state of the production data and the data load states.
2008 is really good at handling database partitioning too, if the db gets too large, or you need to separate data for security reasons you get the benefit of having a single db with the advantages like having several smaller ones. You won't get that with multiple smaller dbs.
When we migrated we had a very similar situation and I ended up moving everything into one some-what large Importing database like you have hinted towards. We did not, however, separate them using schemas.
Because the database is the unit of referential integrity and backup, if you are bringing in large amounts of data for staging which does not need to be backed up on the same schedule, it might be easiest to keep it in a separate DB.
You can use a single DB with multiple file groups and different backups, but it will require a lot more design.
The basic factors this will depend on are: recovery model, backup objectives, usage patterns and amount of effort to design and maintain your file group design.
All the prior answers work for me, particularly your comment about selectively combining databases -- if some are very busy, very large, or process sensitive data, you might want to keep them separate, or in separate groupings. This would make it easier to configure backups/restores and disk/drive allocation (give the busy ones their own set of spindles).
Like possibly most database developers, I have dealt almost exclusively with objects in the dbo schema, but I have done some recent work with other schemas. The main gotcha I've encountered is remembering to always specify the schema when referring to any database object. Never assume that any given connection will reference an object in the schema you want it to--always be clear and precise!
I would put all your import staging tables in one database separate from your regular production databse as the backup needs may be very different. This database should also contains things like your configuration management for SSIS packages, any logging tables, any import metadata tables (we keep track of every run of the imports and the status of that run as well as a bazillion other things about the import like the filename, the normal file size, etc. Comes in handy for researching problems and for adding checks to the processing. We usea a schema that is by client and then an additional schema for objects realted to the importing/exporting process (logs, meta data etc.)
We've got a product which utilizes multiple SQL Server 2005 databases with triggers. We're looking for a sustainable solution for deploying and upgrading the database schemas on customer servers.
Currently, we're using Red Gate's SQL Packager, which appears to be the wrong tool for this particular job. Not only does SQL Packager appear to be geared toward individual databases, but the particular (old) version we own has some issues with SQL Server 2005. (Our version of SQL Packager worked fine with SQL Server 2000, even though we had to do a lot of workarounds to make it handle multiple databases with triggers.)
Can someone suggest a product which can create an EXE or a .NET project to do the following things?
* Create a main database with some default data.
* Create an audit trail database.
* Put triggers on the main database so audit data will automatically be inserted into the audit trail database.
* Create a secondary database that has nothing to do with the main database and audit trail database.
And then, when a customer needs to update their database schema, the product can look at the changes between the original set of databases and the updated set of databases on our server. Then the product can create an EXE or .NET project which can, on the customer's server...
* Temporarily drop triggers on the main database so alterations can be made.
* Alter database schemas, triggers, stored procedures, etc. on any of the original databases, while leaving the customer's data alone.
* Put the triggers back on the main database.
Basically, we're looking for a product similar to SQL Packager, but one which will handle multiple databases easily. If no such product exists, we'll have to make our own.
Thanks in advance for your suggestions!
I was looking for this product myself, knowing that RedGate solution worked fine for "one" DB; unfortunately I have been unable to find such tool :(
In the end, I had to roll my own solution to do something "similar". It was a pain in the… but it worked.
My scenario was way simpler than yours, as we didn't have triggers and T-SQL.
Later, I decided to take a different approach:
Every DB change had a SCRIPT. Numbered. 001_Create_Table_xXX.SQL, 002_AlterTable_whatever.SQL, etc.
No matter how small the change is, there's got to be a script. The new version of the updater does this:
Makes a BKP of the customerDB (just in case)
Starts executing scripts in Alphabetical order. (001, 002...)
If a script fails, it drops the BD. Logs the Script error, Script Number, etc. and restores the customer's DB.
If it finishes, it makes another backup of the customer's DB (after the "migration") and updates a table where we store the DB version; this table is checked by the app to make sure that the DB and the app are in sync.
Shows a nice success msg.
This turned out to be a little bit more "manual" but it has been really working with little effort for three years now.
The secret lies in keeping a few testing DBs to test the "upgrade" before deploying. But apart from a few isolated Dbs where some scripts failed because of data inconsistency, this worked fine.
Since your scenario is a bit more complex, I don't know if this kind of approach can be ok with you.
As of this writing (June 2009) there's still no product on the market that'll do all this for multiple databases. I work for Quest Software, makers of Change Director for SQL Server, another database change automation system. Ours doesn't handle multiple databases like you're after, and I've seen the others out there. No dice.
I wouldn't hold out hope for it either, given the directions I've seen in SQL Server management. Things are going more toward packaged applications being contained in a single database, and most of the code is focusing on that.