I've been using Entity Framework code first migrations for a while now on a brand new project, and they've been working fine so far on a the main trunk version of the application.
However, we're now at a point in the project where branches are being created as we have multiple workstreams going on. As part of the last lot of work, we realised that using migrations across branches can be problematic - so my question is what have people found the best way of managing this?
For example (I've obviously simplified these for the sake of discussion):
Branch A:
Developer 1 adds an 'Add-UserDateCreated' migration which adds a field to the User entity. The migration file contains the code to add the field, and has the model state at the time.
Branch B:
Developer 2 adds an 'Add-UserMiddleName' migration which adds another field to the User entity. The migration file contains the code to add the field, and the has the model state at the time (but it obviously DOESN'T have the field added in the other migration).
These migrations work fine on their branches, but when you merge them back into the trunk, you get stuck:
You can't just keep the individual migration files as the stored model state won't be correct. For example, the 'Add-UserMiddleName' migration SHOULD have the state of the model with the 'Add-UserDateCreated' field added - but it won't.
You can't merge the migrations into one file as you hit the same issue*
Which means the only way to truely avoid these issues is to work on a different copy of the database for each branch, ignore the migrations when you do the merge into the trunk, and add a one shot master migration when the merge is complete (on the trunk version of the database) - but then you potentially end up with lots of changes in one migration which is never a good idea (and also lose any custom code you've written in your migration class).
So how do other people manage these kind of situations? I'd be curious to know other peoples' opinions.
*I'm curious what issues this would actually cause in the real world, if anyone knows?
When you are working on a branch and require a migration, you always need to consider the state of the other (active) branches. For example, if there are migrations on trunk that you do not have on the branch, you should merge them to the branch before creating your new migration. Once you have created the migration on the branch, it is probably a good idea to merge it back to trunk.
So in your example, developer A and B needs to communicate with each other. Developer B, realizing that a migration is needed, should check if other migrations have been done and merge them to her branch before creating the "middle user name" migration.
I find that it is useful to think of migrations as a dead-lock for the entire team (if that makes sense to you.) New migrations or plans to create them should be mentioned in daily stand ups. Sometimes, when things are hectic, it might be useful to keep a list of all migrations on a white board for all team members to see.
I also find that it is crucial that the migrations are small commits, making them easily portable between branches.
It should also be noted that it helps to use a version control system that is good with branching, merging and cherry-picking, like Git, for example.
Related
I work in a development team using very basic git principles to develop our project. So every feature is developed in a feature branch and gets merged to when ready.
Often it is necessary to make changes to our database, adding tables, altering columns. Sometimes this includes migration needs. (Casting datatypes, etc)
At the moment we simply write a SQL file containing these changes. And the one "bringing the stuff to production" has to keep track which SQL files are already applied and which still needs to be. If migrations need to be applied comments in the sql file tell you that - Frankly it's a mess ;D
Are there any buzzwords, projects, principles which apply to this scenario?
I stumbled across goose which fullfills all my dreams :) You can make "simple" migrations via plain sql files or you can make complex programmatic changes via go.
Right now I'm working on updating a Rails app and the database has some issues. It's also being converted from MySQL to PostgreSQL.
There's 3 columns being used to track one time value. The time the facility opens on Monday is being recorded as monday_open_hour, monday_open_minute, monday_open_ampm. I'd like to merge these into a single time field.
There are also several fields being used for only 1% of the 3000+ records, so I'd like to break those out into a separate table.
What would be the best way to do this? I imagine it could probably be done in SQL with some kind of stored procedures/cursors. Is there a way to do it with Ruby/Rails?
The Rails way to deal with incremental database changes is to use migrations. Migrations let you apply incremental changes to your schema or database contents in an orderly fashion, even as you're collaborating with a team. There are nice helpers for common tasks like creating and dropping tables, renaming columns, and simple things like that, but you can drop to arbitrary SQL if you need to (although, be aware that that will most likely tie you to your current database, and make further moves more difficult).
Basically, you can generate a new migration with rails generate migration ConsolidateDateColumns (for example). This will create a template for you in the db/migrate directory; see the Rails Guides entry to get started on writing them. When you're ready to apply it, run rake db:migrate.
The advantages of doing it this way are that it lets you easily apply the same changes to different environments (development, test, production, staging, or across your development team) and keep them in sync, and it encourages you to keep things reversible whenever possible, so you maintain some degree of freedom to migrate back and forth if you need to.
One more thing: it sounds like you're going to be doing a lot of major changes in quick succession. Make sure that you take a backup of your original database before you begin, and thoroughly test your work against a reduced test set in a separate environment before you run it against the real thing!
I'm recently started using Play2 on a project, and read the section on evolutions. And while the example they cite seems fine if my project had 1 table, it seems like it would be very messy if I had 10-20 tables in 1.sql and then changes to them split up over 2.sql, 3.sql and so on.
In Ruby on Rails, Symfony, and others, you define your up/down migrations per entity.
My question is, what is the best way to setup your evolutions in Play2? Should I have all my tables in 1.sql and then make little changes to them over 2.sql and so on? Or is there a way to have a separate .sql file for each table?
Also, are there any examples of large, open source Play2 projects where I could see how it would look?
Actually Play has not possibility to divide evolutions by entities.
IMHO it's rather matter of taste, you can add each entity in single next evolution, anyway only difference will be that counter of evolution will be bigger, I don't think that will help you to keep evolutions cleaner.
Typical workflow is starting from ... good planning. Just create some graph representation of your schema and try to add there as many things as you need. It helps a lot while the project startup and also in next steps of development.
If you are gonna to use Ebean, create all models from your graph and let the plugin to create automatic first evolution file, probably you will save a lot of time on writing evolutions for relations, constraints, etc. Spend some time for fixing and checking initial schema before further development.
After that you need to disable automatic updates as they drops whole DB and recreates tables them from the scratch (there's no diff schema update in Ebean).
It's also matter of taste but I prefer to combine several changes into single evolutions (so again planning...) instead of creating next and next files for every single change ad hoc.
I'm in the research phase trying to adopt 2012 Database Projects on an existing small project. I'm a C# developer, not a DBA, so I'm not particularly fluent with best practices. I've been searching google and stackoverflow for a few hours now but I still don't know how to handle some key deployment scenarios properly.
1) Over the course of several development cycles, how do I manage multiple versions of my database? If I have a client on v3 of my database and I want to upgrade them to v8, how do I manage this? We currently manage hand-crafted schema and data migration scripts for every version of our product. Do we still need to do this separately or is there something in the new paradigm that supports or replaces this?
2) If the schema changes in such a way that requires data to be moved around, what is the best way to handle this? I assume some work goes in the Pre-Deployment script to preserve the data and then the Post-Deploy script puts it back in the right place. Is that the way of it or is there something better?
3) Any other advice or guidance on how best to work with these new technologies is also greately appreciated!
UPDATE: My understanding of the problem has grown a little since I originally asked this question and while I came up with a workable solution, it wasn't quite the solution I was hoping for. Here's a rewording of my problem:
The problem I'm having is purely data related. If I have a client on version 1 of my application and I want to upgrade them to version 5 of my application, I would have no problems doing so if their database had no data. I'd simply let SSDT intelligently compare schemas and migrate the database in one shot. Unfortunately clients have data so it's not that simple. Schema changes from version 1 of my application to version 2 to version 3 (etc) all impact data. My current strategy for managing data requires I maintain a script for each version upgrade (1 to 2, 2 to 3, etc). This prevents me from going straight from version 1 of my application to version 5 because I have no data migration script to go straight there. The prospect creating custom upgrade scripts for every client or managing upgrade scripts to go from every version to every greater version is exponentially unmanageable. What I was hoping was that there was some sort of strategy SSDT enables that makes managing the data side of things easier, maybe even as easy as the schema side of things. My recent experience with SSDT has not given me any hope of such a strategy existing but I would love to find out differently.
I've been working on this myself, and I can tell you it's not easy.
First, to address the reply by JT - you cannot dismiss "versions", even with declarative updating mechanics that SSDT has. SSDT does a "pretty decent" job (provided you know all the switches and gotchas) of moving any source schema to any target schema, and it's true that this doesn't require verioning per se, but it has no idea how to manage "data motion" (at least not that i can see!). So, just like DBProj, you left to your own devices in Pre/Post scripts. Because the data motion scripts depend on a known start and end schema state, you cannot avoid versioning the DB. The "data motion" scripts, therefore, must be applied to a versioned snapshot of the schema, which means you cannot arbitrarily update a DB from v1 to v8 and expect the data motion scripts v2 to v8 to work (presumably, you wouldn't need a v1 data motion script).
Sadly, I can't see any mechanism in SSDT publishing that allows me to handle this scenario in an integrated way. That means you'll have to add your own scafolding.
The first trick is to track versions within the database (and SSDT project). I started using a trick in DBProj, and brought it over to SSDT, and after doing some research, it turns out that others are using this too. You can apply a DB Extended Property to the database itself (call it "BuildVersion" or "AppVersion" or something like that), and apply the version value to it. You can then capture this extended property in the SSDT project itself, and SSDT will add it as a script (you can then check the publish option that includes extended properties). I then use SQLCMD variables to identify the source and target versions being applied in the current pass. Once you identify the delta of versions between the source (project snapshot) and target (target db about to be updated), you can find all the snapshots that need to be applied. Sadly, this is tricky to do from inside the SSDT deployment, and you'll probably have to move it to the build or deployment pipeline (we use TFS automated deployments and have custom actions to do this).
The next hurdle is to keep snapshots of the schema with their associated data motion scripts. In this case, it helps to make the scripts as idempotent as possible (meaning, you can rerun the scripts without any ill side-effects). It helps to split scripts that can safely be rerun from scripts that must be executed one time only. We're doing the same thing with static reference data (dictionary or lookup tables) - in other words, we have a library of MERGE scripts (one per table) that keep the reference data in sync, and these scripts are included in the post-deployment scripts (via the SQLCMD :r command). The important thing to note here is that you must execute them in the correct order in case any of these reference tables have FK references to each other. We include them in the main post-deploy script in order, and it helps that we created a tool that generates these scripts for us - it also resolves dependency order. We run this generation tool at the close of a "version" to capture the current state of the static reference data. All your other data motion scripts are basically going to be special-case and most likely will be single-use only. In that case, you can do one of two things: you can use an IF statement against the db build/app version, or you can wipe out the 1 time scripts after creating each snapshot package.
It helps to remember that SSDT will disable FK check constraints and only re-enable them after the post-deployment scripts run. This gives you a chance to populate new non-null fields, for example (by the way, you have to enable the option to generate temporary "smart" defaults for non-null columns to make this work). However, FK check constraints are only disabled for tables that SSDT is recreating because of a schema change. For other cases, you are responsible for ensuring that data motion scripts run in the proper order to avoid check constraints complaints (or you manually have disable/re-enable them in your scripts).
DACPAC can help you because DACPAC is essentially a snapshot. It will contain several XML files describing the schema (similar to the build output of the project), but frozen in time at the moment you create it. You can then use SQLPACKAGE.EXE or the deploy provider to publish that package snapshot. I haven't quite figured out how to use the DACPAC versioning, because it's more tied to "registered" data apps, so we're stuck with our own versioning scheme, but we do put our own version info into the DACPAC filename.
I wish I had a more conclusive and exhasutive example to provide, but we're still working out the issues here too.
One thing that really sucks about SSDT is that unlike DBProj, it's currently not extensible. Although it does a much better job than DBProj at a lot of different things, you can't override its default behavior unless you can find some method inside of pre/post scripts of getting around a problem. One of the issues we're trying to resolve right now is that the default method of recreating a table for updates (CCDR) really stinks when you have tens of millions of records.
-UPDATE: I haven't seen this post in some time, but apparently it's been active lately, so I thought I'd add a couple of important notes: if you are using VS2012, the June 2013 release of SSDT now has a Data Comparison tool built-in, and also provides extensibility points - that is to say, you can now include Build Contributors and Deployment Plan Modifiers for the project.
I haven't really found any more useful information on the subject but I've spent some time getting to know the tools, tinkering and playing, and I think I've come up with some acceptable answers to my question. These aren't necessarily the best answers. I still don't know if there are other mechanisms or best practices to better support these scenarios, but here's what I've come up with:
The Pre- and Post-Deploy scripts for a given version of the database are only used migrate data from the previous version. At the start of every development cycle, the scripts are cleaned out and as development proceeds they get fleshed out with whatever sql is needed to safely migrate data from the previous version to the new one. The one exception here is static data in the database. This data is known at design time and maintains a permanent presence in the Post-Deploy scripts in the form of T-SQL MERGE statements. This helps make it possible to deploy any version of the database to a new environment with just the latest publish script. At the end of every development cycle, a publish script is generated from the previous version to the new one. This script will include generated sql to migrate the schema and the hand crafted deploy scripts. Yes, I know the Publish tool can be used directly against a database but that's not a good option for our clients. I am also aware of dacpac files but I'm not really sure how to use them. The generated publish script seems to be the best option I know for production upgrades.
So to answer my scenarios:
1) To upgrade a database from v3 to v8, I would have to execute the generated publish script for v4, then for v5, then for v6, etc. This is very similar to how we do it now. It's well understood and Database Projects seem to make creating/maintaining these scripts much easier.
2) When the schema changes from underneath data, the Pre- and Post-Deploy scripts are used to migrate the data to where it needs to go for the new version. Affected data is essentially backed-up in the Pre-Deploy script and put back into place in the Post-Deploy script.
3) I'm still looking for advice on how best to work with these tools in these scenarios and others. If I got anything wrong here, or if there are any other gotchas I should be aware of, please let me know! Thanks!
In my experience of using SSDT the notion of version numbers (i.e. v1, v2...vX etc...) for databases kinda goes away. This is because SSDT offers a development paradigm known as declarative database development which loosely means that you tell SSDT what state you want your schema to be in and then let SSDT take responsibility for getting it into that state by comparing against what you already have. In this paradigm the notion of deploying v4 then v5 etc.... goes away.
Your pre and post deployment scripts, as you correctly state, exist for the purposes of managing data.
Hope that helps.
JT
I just wanted to say that this thread so far has been excellent.
I have been wrestling with the exact same concerns and am attempting to tackle this problem in our organization, on a fairly large legacy application. We've begun the process of moving toward SSDT (on a TFS branch) but are at the point where we really need to understand the deployment process, and managing custom migrations, and reference/lookup data, along the way.
To complicate things further, our application is one code-base but can be customized per 'customer', so we have about 190 databases we are dealing with, for this one project, not just 3 or so as is probably normal. We do deployments all the time and even setup new customers fairly often. We rely heavily on PowerShell now with old-school incremental release scripts (and associated scripts to create a new customer at that version). I plan to contribute once we figure this all out but please share whatever else you've learned. I do believe we will end up maintaining custom release scripts per version, but we'll see. The idea about maintaining each script within the project, and including a From and To SqlCmd variable is very interesting. If we did that, we would probably prune along the way, physically deleting the really old upgrade scripts once everybody was past that version.
BTW - Side note - On the topic of minimizing waste, we also just spent a bunch of time figuring out how to automate the enforcement of proper naming/data type conventions for columns, as well as automatic generation for all primary and foreign keys, based on naming conventions, as well as index and check constraints etc. The hardest part was dealing with the 'deviants' that didn't follow the rules. Maybe I'll share that too one day if anyone is interested, but for now, I need to pursue this deployment, migration, and reference data story heavily. Thanks again. It's like you guys were speaking exactly what was in my head and looking for this morning.
There is a development_structure.sql inside my /db folder of my rails application (rails 2.3.4, ruby 1.8.7) and I am not sure exactly what it does.
Is it needed for some specific environment? (I think I read somewhere that it's used for tests)
Do I need to add it to my git repository?
This post has been used as a reference by a coworker of mine, but the two answers are not exact or informative enough.
development_structure.sql is a low-level dump of the schema, which is necessary when you start to use proprietary database features - either you want to or not, you're going to use them at some point.
Regarding the question of storing it or not, there's some debate. Here is an informative post: http://www.saturnflyer.com/blog/jim/2010/09/14/always-check-in-schema-rb/.
And my take on this follows.
The objective of the development_structure.sql is to sync, for any given commit, the database structure with the code, without having previous knowledge of the schema structure, that is, without having to rely on a pre-existing state of the schema to get the new one.
In a nutshell, by having a schema structure available, whenever you change branch/commit, you load it directly and forget it.
This is mostly valid for dynamic and "crowded" projects, where different branches have differences in the underlying schema structure.
Without having the schema structure stored, you would need to always use an existing reference schema in your database, and migrate it back or forward every time you change branch/commit; several real-world cases can make this process inefficient (e.g. when another branch doesn't have some migrations you currently have, or some migrations can't be rolled back).
Another problem is automated builds, which suffer from the same problems, and even worse, they can't apply manual changes.
The only downside is that it requires a certain habit, which is, to store it every time you run a migration. Easy to say, but also easy to forget.
I don't say you can't live without development_structure.sql - of course you can.
But if you have it, when changing branch/commit you just load-and-forget; if you don't, you [may] have to go through a series of manual steps.
You should not add it to your git repository.
It is a file created automatically by rails when you run migrations with your database.yml configured to connect against a mysql database.
You can view it as an alternative to schema.rb
I believe you can force rails to create it by adding in your environment.rb:
config.active_record.schema_format = :sql
When present this file is used for example by:
rake db:test:clone_structure
Edit
Relevant section in Ruby On Rails Guides.
http://guides.rubyonrails.org/migrations.html#schema-dumping-and-you
They recommend to check it into source control on the wiki.
I personally like to keep it out of it. I like to be able to run all migrations very quickly. It is for me a good sign. If migrations become slow I feel like I am not in total control of my environment anymore. Slowness in migrations generally means I have a lot of data in my development database which I feel wrong.
However, It seems to be a matter of personal taste nowadays.
Follow your instincts on this one.
It's created when you run a rake task to clone your development database to your test database. The development database is outputted to SQL which is then read in to your test DB. You can safely delete it.
In rails 3, you don't even have to write this line,
config.active_record.schema_format = :sql
You can generate this structure.sql file by simply running the above rake command mentioned above