I'm currently performing a migration operation from a legacy database. I need to perform migration of millions of originating rows, breaking the original content apart into multiple destination parent / child rows.
As it's not a simple 1 to 1 migration and the the resulting rows are parent / children row based on identity generated keys, what's the best mechanism for performing the migration?
I'm assuming that I can't use bulk insert as the identity values for the child rows cannot be determined at the point of generating the script content? The only solution I can currently think of is to set the identity explicitly and then have a predetermined starting point for the import.
If anyone else has any input I'd appreciate the feedback.
This is my standard approach:
create your new data model
pull the data into the new DB unchanged
write (and run) a SQL script to perform the migration
test
(optional) drop the tables with the legacy data
You can get a long way towards migrating the data with plain SQL. For the case you described, you might not need to deal with a single Cursor to get it across.
Running the process in Query Analyzer (or an analog in your dbms), you'll have the advantage that you can wrap everything in a Transaction so that you can roll back if anything goes wacky along the way. Write it in little bits and test it in chunks, on your dev database. Once everything is working correctly, set the script loose on your production database.
Sorted.
Thanks for the suggestion but I'd prefer to produce a programmatic solution. I'm currently using Nant / CruiseControl to automate the tests and need something I can recreate on the fly based on the current live legacy content.
Related
In my application (C# application, using Entity-Framework and SQL Database), I am needed to create a daily task to update/insert data from a third-party application (both the applications are using SQL server database). For efficiency sake, I am looking for a way to determine what all records from the previous day have been modified and thus import only those records.
I know I can add a modified_on column to the source table and create a trigger to update that column when something is changed on that record, but that will need me to make changes to the third-party application's database schema which I want to avoid.
There's the change tracking feature but it's of limited use to you as you're using EF and that makes the way the data is queried awkward. You may be able to use it somehow, but I doubt it's elegant.
Way easier is to indeed change the schema but add only a single column of type rowversion. That binary datatype (loaded as byte[] in EF) is special and gets larger every time something (such as the third-party application) updates the row. No need for any triggers. You can look what the largest one is you already processed and then query all those that are larger than that.
In addition to change tracking suggested by John, in another answer, you can think of setting up Temporal tables.
You can run queries against the temporal tables to identify the changed records and pull them accordingly from main table.
I requested that a client send me a copy of their current MS SQL database. Instead of being given a database backup, or small set of scripts I could use to recreate the database, I was provided with hundreds upon hundreds of individual SQL scripts, and no instructions on the order in which they'd need to be run.
The scripts cannot simply be executed in one batch operation, as there are foreign key dependencies between tables. It appears as though they've limited these scripts to creating a single table or stored procedure per script.
Normally, I'd simply ask a client to provide the information in a more usable format, but they're not known for getting back to us in a timely manner, and our project timeline is already in jeopardy due to delays on their end.
Are there any tools I can use to recreate the database from this enormous set of scripts?
This may sound a bit arcane, but you can do the following, iteratively:
Put all the scripts into a list of "scripts to be run"
Run all the scripts in the "to be run" scripts
Remove the successful runs
Repeate 2-3 until no scripts are left
The scripts with no dependencies will finish in the first round. The ones that depend on them in the next round, and then so on and so on.
I would suggest that you operate all this from a metascript, that uses a database table to store the names of the available scripts.
Good luck.
If you set your folder of scripts as a data source in Red Gate SQL Compare, and specify a blank database as the target, it should allow you to compare and deploy to the target database. This is because the tool is able to read all SQL creation scripts recursively from the folder you specify. This is available as a fully functional 14-day trial, so you can easily test it in your scenario.
http://www.red-gate.com/products/sql-development/sql-compare/
The quickest (and by far the dirtiest) way of (maybe) doing this is to concatenate all of the scripts together, ensuring that you have a GO statement in between each one. Make sure there are no DROP statements in your scripts, or this technique won't work.
Run the concatenated script repeatedly for... I don't know, 10 or so iterations. Chances are you will have their database recreated properly in your test system.
If you're feeling more rigorous, go with Gordon's suggestion. I'm not really aware of a tool which will be able to reconstruct the dependencies, but you may want to take a look at Red-Gate's SQL Compare, which you can get a demo of for free, and can do some pretty magical things.
You can remove all the foreign keys constraints. Then, organize the scripts so that it first creates all the tables, then add back all the foreign keys. Finally create indexes.
Building on Gordon.
Split them up into one table each.
Count the number of FK and sort starting with least first.
Then remove the scripts that run as Gordon suggests.
Another potential problem is that is creates the table and fails on the FK and leaves the table.
You come back later to create the take and the table is already there so it fails.
If you parse them out with Table FKs
Start with Tables with no FK in a list
Then loop thru Tables with FK
Only add to the List if all the FK are already in the List.
If you know .NET then a class with string property table, sting property script, and a property List String of FK property names.
They should parse out pretty clean regex.
I'm writing an application that is using a database (currently MySQL 4) to store data.
It is likely that I will make changes to this in the form of updates later to add additional data. Updating the application is simple, it essentially comes down to overwriting the program files with the new ones. However how do I go about updating the database schema?
The database is remote and so my application might exist in several places, so simply dumping the ALTER and CREATE statements in an installer would result in the changes being made multiple times, and I have been asked explicitly for an automatic solution that allows for the application copies to be updated over a transition period, and for schema updates to be automatic.
I considered examining the schema at start-up to look for missing tables and columns, and adding them as needed, however this does not seem like a clean solution. I also considered putting some kind of “schema version” number on the database, but can’t see any way to do this short of a single row table with an int “Version” column which doesn’t seem a good way either.
I can highly recommend Liquibase. It really does work - I've used it and was very impressed.
Essentially, it keeps its own log of statements run on a database and runs them only if not already run/needed. It is XML driven and allows you to use optional pre- and post-execution statements and conditions. You check your XML files into your source control and invoke it from your build tool. It's even suitable for driving production releases.
It's magic.
Rather than rolling your own system for versioning your database it's probably worth looking into an existing framework that will manage it for you.
I use liquibase and have integrated into my build using the maven plugin. Worth checking out!
Just as you proposed, add a table where you store the current version of the database schema. Then you only have to apply the changes between your last schema update and the new release, and set the new version number accordingly. I've done this to update our production database about 300 times, it just works.
I was wondering what are the best practices in order to write SQL scripts to set up databases for production and/or development, for instance:
Should I include the CREATE DATABASE statement?
Should I create users for the database in the same script?
Is correct to disable FK check before executing the body of the script?
May I include the hole script in a transaction?
Is better to generate 1 script per database than one script for all of them?
Thanks!
The problem with your question is is hard to answer as it depends on the way the scripts are used in what you are trying to achieve. you also don't say which DB server you are using as there are tools provided which can make some tasks easier.
Taking your points in order, here are some suggestions, which will probably be very different to everyone elses :)
Should I include the CREATE DATABASE
statement?
What alternative are you thinking of using? If your question is should you put the CREATE DATABASE statement in the same script as the table creation it depends. When developing DB I use a separate create DB script as I have a script to drop all objects and so I don't need to create the database again.
Should I create users for the database in the same script?
I wouldn't, simply because the users may well change but your schema has not. Might as well manage those changes in a smaller script.
Is correct to disable FK check before executing the body of the script?
If you are importing the data in an attempt to recover the database then you may well have to if you are using auto increment IDs and want to keep the same values. Also you may end up importing the tables "out of order" an not want checks performed.
May I include the whole script in a transaction?
Yes, you can, but again it depends on the type of script you are running. If you are importing data after rebuilding a db then the whole import should work or fail. However, your transaction file is going to be huge during the import.
Is better to generate 1 script per database than one script for all of them?
Again, for maintenance purposes it's probably better to keep them separate.
This probably depends what kind of database and how it is used and deployed. I am developing a n-tier standard application that is deployed at many different customer sites.
I do not add a CREATE DATABASE statement in the script. Creating the the database is a part of the installation script which allows the user to choose server, database name and collation
I have no knowledge about the users at my customers sites so I don't add create users statements also the only user that needs access to the database is the user executing the middle tire application.
I do not disable FK checks. I need them to protect the consistency of the database, even if it is I who wrote the body scripts. I use FK to capture my errors.
I do not include the entire script in one transaction. I require from the users to take a backup of the db before they run any db upgrade scripts. For creating of a new database there is nothing to protect so running in a transaction is unnecessary. For upgrades there are sometimes extensive changes to the db. A couple of years ago we switched from varchar to nvarchar in about 250 tables. Not something you would like to do in one transaction.
I would recommend you to generate one script per database and version control the scripts separately.
Direct answers, please ask if you need to expand on any point
* Should I include the CREATE DATABASE statement?
Normally I would include it since you are creating and owning the database.
* Should I create users for the database in the same script?
This is also a good idea, especially if your application uses specific users.
* Is correct to disable FK check before executing the body of the script?
If the script includes data population, then it helps to disable it so that the order is not too important, otherwise you can get into complex scripts to insert (without fk link), create fk record, update fk column.
* May I include the hole script in a transaction?
This is normally not a good idea. Especially if data population is included as the transaction can become quite unwieldy large. Since you are creating the database, just drop it and start again if something goes awry.
* Is better to generate 1 script per database than one script for all of them?
One per database is my recommendation so that they are isolated and easier to troubleshoot if the need arises.
For development purposes it's a good idea to create one script per database object (one script for each table, stored procedure, etc). If you check them into your source control system that way then developers can check out individual objects and you can easily keep track of versions and know what changed and when.
When you deploy you may want to combine the changes for each release into one single script. Tools like Red Gate SQL compare or Visual Studio Team System will help you do that.
Should I include the CREATE DATABASE statement?
Should I create users for the database in the same script?
That depends on your DBMS and your customer.
In an Oracle environment you will probably never be allowed to do such a thing (mainly because in the Oracle world a "database" is something completely different than e.g. in the PostgreSQL or MySQL world).
Sometimes the customer will have a DBA that won't let you create databases (or schemas or users - depending on the DBMS in use). So you will need to supply that information to the DBA in order for him/her to prepare the environment for your script.
May I include the hole script in a transaction?
That totally depends on the DBMS that you are using.
Some DBMS don't support transactional DDL and will implicitely commit any transaction when you execute a DDL statement, so you need to consider the order of your installation script.
For populating the tables with data I would definitely try to do that in a single transaction, but again this depends on your DBMS.
Some DBMS are faster if you commit only once or very seldomly (Oracle and PostgreSQL fall into this category) but will slow down if you commit more often.
Other DBMS handle smaller but more transactions better and will slow down if the transactions get too big (SQL Server and MySQL tend to fall into that direction)
The best practices will differ considerably on whether it is the first time set-up or a new version being pushed. For the first time set-up yes you need create database and create table scripts. For a new version, you need to script only the changes from the previous version, so no create database and no create table unless it is a new table. Now you need alter table statements becasue you don't want to lose the existing data. I do usually write stored procs, functions and views with a drop and create statment as dropping those pbjects doesn't generally affect the underlying data.
I find it best to create all database changes with scripts that are stored in source control under the version. So if a client is new, you run the create version 1.0 scripts, then apply all the other versions in order. If a client is just upgrading from version 1.2 to version 1.3, then you run just the scripts in version 1.3 source control repository. This would also include scripts to populate or add records to lookup tables.
For transactions you may want to break them up into several chunks not to leave a prod database locked in one transaction.
We also write reversal scripts to return to the old version if need be. This makes life easier if you have a part of a change that causes unanticipated problems on prod (usually performance issues).
I am writing code to migrate data from our live Access database to a new Sql Server database which has a different schema with a reorganized structure. This Sql Server database will be used with a new version of our application in development.
I've been writing migrating code in C# that calls Sql Server and Access and transforms the data as required. I migrated for the first time a table which has entries related to new entries of another table that I have not updated recently, and that caused an error because the record in the corresponding table in SQL Server could not be found
So, my SqlServer productions table has data only up to 1/14/09, and I'm continuing to migrate more tables from Access. So I want to write an update method that can figure out what the new stuff is in Access that hasn't been reflected in Sql Server.
My current idea is to write a query on the SQL side which does SELECT Max(RunDate) FROM ProductionRuns, to give me the latest date in that field in the table. On the Access side, I would write a query that does SELECT * FROM ProductionRuns WHERE RunDate > ?, where the parameter is that max date found in SQL Server, and perform my translation step in code, and then insert the new data in Sql Server.
What I'm wondering is, do I have the syntax right for getting the latest date in that Sql Server table? And is there a better way to do this kind of migration of a live database?
Edit: What I've done is make a copy of the current live database. Which I can then migrate without worrying about changes, then use that to test during development, and then I can migrate the latest data whenever the new database and application go live.
I personally would divide the process into two steps.
I would create an exact copy of Access DB in SQLServer and copy all the data
Copy the data from this temporary SQLServer DB to your destination database
In that way you can write set of SQL code to accomplish second step task
Alternatively use SSIS
Generally when you convert data to a new database that will take it's place in porduction, you shut out all users of the database for a period of time, run the migration and turn on the new database. This ensures no changes to the data are made while doing the conversion. Of course I never would have done this using c# either. Data migration is a database task and should have been done in SSIS (or DTS if you have an older version of SQL Server).
If the databse you are converting to is just in development, I would create a backup of the Access database and load the data from there to test the data loading process and to get the data in so you can do the application development. Then when it is time to do the real load, you just close down the real database to users and use it to load from. If you are trying to keep both in synch wile you develop, well I wouldn't do that but if you must, make a nightly backup of the file and load first thing in the morning using your process.
You may want to look at investing in a tool like SQL Data Compare.
I believe it has support for access databases too, and you can download a trial.
I you are happy with you C# code, but it fails because of the constraints in your destination database you temporarily can disable them and then enable after you copy the whole lot.
I am assuming that your destination database is brand new DB with no data, and not used by anyone when the transfer happens
It sounds like you have two problems:
You're migrating data from one database to another.
You're changing your schema.
Doing either of these things is tricky if you are trying to migrate the data while people are using the data.
The simplest approach is to migrate the data based on a static copy of the data, and also to queue updates to that data from the moment you captured the static copy. I don't know how easy this is in Access, but in SQLServer or Oracle you can use the redo logs for this or a manual solution using triggers. The poor-man's way of doing this is to make triggers for all the relevant tables that log the primary key of the records that have changed. Then after the old database is shut off you can iterate over those keys and get those records from the old database and put them into the new database. Just copy the whole record; if the record was deleted then delete it from the new database.
Your problem is compounded by the fact that you can't simply copy the data, you have to transform it. This means you probably have to shut down both databases and re-migrate the records based on the change list. It will take a lot of planning to ensure you get things right and I'd recommend writing a testing script that can validate that the resulting data is correct.
Also I'd ensure that the code for the migration runs inside one of the databases if possible. Otherwise you are copying the data twice and this will significantly harm the performance.