Database migration with south: how to deal with added fields - migration

I am developing a django project which lives on Heroku. As the project evolves, a problem occurs when new fields are added to a data model: during makemigration, south will ask something like the following:
...
? 1. Quit now, and add a default to the field in models.py
? 2. Specify a one-off value to use for existing columns now
...
It isn't easy get around this (I tried to choose the second answer but south wouldn't accept whatever I entered).
Questions:
1) Some suggest that null=True should be added to the new field to avoid such a problem. However, the field should not be null. I end up making a workaround: makemigration with null=True, delete null=True, perform makemigration again, and then push to Heroku.
Is this the right way of doing it?
2) Using south migration causes another problem. Because we are a development team, every member may perform migration in his/her environment. When it comes to pushing to GitHub, the 00*_* files in each app's migration folder may conflict each other. How to solve it? Currently, we just ignore the 00*_* files since our project is not in production yet. What if it is in production later?
3) I only ran makemigrations and migrate and never used schemamigration. The system runs fine though. Do I have to run schemamigration?

1)
When you get an error message like that, basically it's asking for a one-off default. What exactly you should put in there when selecting option 2 depends entirely on your situation, but it must be a valid Python expression. For example:
If I have an IntegerField I want to add, I could enter 0
For a CharField, I could use "my string" (Note the quotes must be there)
For a ForeignKey, I would specify the primary key of the object I want to use.
2)
This is the hardest part about South. Conflicting migrations have to be merged carefully, and manually. The Django 1.7 migrations that replace South are much better at doing this automatically, so if at all possible, upgrade to 1.7+.
3)
I believe makemigrations is an alias for schemamigration. Get used to makemigrations because it's what the command is called in Django 1.7+.

Related

Schema migration in Redis

I have an application using redis. I used a key name user:<id> to store user info. Then locally I changed my app code to use key name user:<id>:data for that purpose.
I am scared by the fact that if I git push this new code to my production server things will break. And the reason for this is that since my production redis server would already have the keys will older key names.
So the only way I think is to stop my app, change all the older key names to new ones & then restart it. Do you have a better alternative? Thanks for help :)
Pushing new code to your production environment is always a scary business (that's why only the toughest survive in this profession ;)). I strongly recommend that before you change your production code and database, make sure that you test the workflow and its results locally.
Almost any update to the application requires its stoppage - even if only to replace the relevant files. This is even truer for any changes that involve a database exactly because of the reason you had mentioned.
Even if you can deploy your code changes without stopping the application per se (e.g. a PHP page), you will still want the database change to be done "atomically" - i.e. without any application requests intervening and possibly breaking. While some database can be taken offline for maintenance, even then you usually stop the app or else errors will be generated all over the place.
If that is indeed the case, you'll be stopping the app (or putting it into maintenance mode) regardless of the database change, so we take your question to actually mean: what's the fastest way to rename all/some keys in my database?
To answer that question, similarly to the pseudo-code suggested above, I suggest you use a Lua script such as the following and EVAL it once you stop the app:
for _,k in ipairs(redis.call('keys', 'user:*')) do
if k:sub(-5) ~= ':data' then
redis.call('rename', k, k .. ':data')
end
end
A few notes about this script that you should keep in mind:
Although the KEYS command is not safe to use in production, since you are doing maintenance it can be used safely. For all other use cases where you need to scan you keys, Redis' SCAN is much more advisable.
Since Lua scripts are "atomic", you can in theory run this script without stopping the app - as long as the script runs (which depends on the size of your dataset) the app's requests will be blocked. Put differently, this approach solves the concern of getting mixed key names (old & new). This, however, is probably not what you'd want to do in any case because a) your app may still error/timeout during that time but mainly because b) it will need to be able to handle both types of key names (i.e. running with old keys -> short/long pause -> running with new keys) making your code much more complex.
The if condition is not required if you're going to run the script only once and succeed.
Depending on the actual contents of your database, you may want to further filter out keys that should not be renamed.
To ensure compatibility, refrain from hardcoding / computationally generating key names - instead, they should be passed as arguments to the script.
You can run a migration script in your redis client language, using RENAME.
If you don't have any other control over the total of keys, you first issue a KEYS user:* to list all keys, then substring for getting the numeric id, then renaming.
You can issue all of this in a transaction.
So a little pseudocode and redis commands:
MULTI
KEYS user:*
For each key {
id = <Get id from key>
RENAME user:id user:id:data
}
EXEC
Got it ?

Migrations don't run on hosting

I'm using MigratorDotNet to manage Rails-style migrations for my web app. I have a workflow where, if I delete all the tables in the database, I can access an installation view that will run MigratorDotNet and create all the necessary tables.
This works locally. For some reason, when I upload my code to my Arvixe hosting, the migrations just never run. I get this odd error:
There is already an object named 'SchemaInfo' in the database.
This is odd because, prior to running migrations, I manually deleted all the tables in the database (to make sure it wasn't left over from a previous install).
My code essentially boils down to:
new Migrator.Migrator("SqlServer", connectionString.ToString(), migrationsAssembly).MigrateToLastVersion();
I've already verified by logging that the connection string is correct (production/hosting settings), and the assembly is correctly loaded (name and version).
Works locally, but not on Arvixe. How do I troubleshoot this?
This is a dark day.
It turns out (oddly) that the root cause was my hosting company used a schema other than dbo for my database. Because of this, the error message I saw (SchemaInfo already exists) was talking about their table.
My solution, unfortunately, was to rip out MigratorDotNet and go with FluentMigator instead. not only did this solve the problem, but it also gave me a more intelligible error message (one referring to the schema names).
While it doesn't seem possible to auto-set the schema, and while I need to switch the schema on my dev vs. production machine, it's still a solvable problem (and a better API, IMO). I googled, but did not find any way to change the default schema in migratordotnet.
I'm sorry for the issues that you were having. On shared hosting, unfortunately the only way that we may be able to change the schema is manually. If you are still looking for a solution that requires our assistance, please forward your ticket ID to qa .at. arvixe.com as well as arvand .at. arvixe.com and we can look into the best way to resolve this.

Rails: Best practice for handling development data

I have the following scenario:
I'm starting development of a long project (around 6 months) and I need to have some information on the database in order to test my features. The problem is that right now, I don't have the forms to insert this information (I will in the future) but I need the information loaded on the DB, what's the best way to handle this? Specially considering that once the app is complete, I won't need this process anymore.
As an example, lets say I have tasks that need to be categorized. I've begun working on the tasks, but I need to have some categories loaded on my db already.
I'm working with Rails 3.1 btw.
Thanks in advance!
Edit
About seeds:I've been told that seeds are not the way to go if your data may vary a bit, since you'd have to delete all information and reinsert it again. Say.. I want to change or add categories, then I'd have to edit the seeds.rb file, do my modifications and then delete and reload all data...., is there another way? Or are seeds the defenitely best way to solve this problem?
So it sounds like you'll possibly be adding, changing, or deleting data along the way that will be intermingled amongst other data. So seeds.rb is out. What you need to use are migrations. That way you can search for and identify the data you want to change through a sequential process, which migrations are exactly designed for. Otherwise I think your best bet is to change the data manually through the rails console.
EDIT: A good example would be as follows.
You're using Capistrano to handle your deployment. You want to add a new Category, Toys, to your system. In a migration file then you would add Category.create(:name => "Toys") or something similar in your migration function (I forget what they call it now in Rails 3.1, I know there's only a single method though), run rake db:migrate locally, test your changes, commit them, then if it's acceptable deploy it using cap:deploy and that will run the new migration against your production database, insert the new category, and make it available for use in the deployed application.
That example aside, it really depends on your workflow. If you think that adding new data via migrations won't hose your application, then go for it. I will say that DHH (David Heinemeier Hansson) is not a fan of it, as he uses it strictly for changing the structure of the database over time. If you didn't know DHH is the creator of Rails.
EDIT 2:
A thought I just had, which would let you skip the notion of using migrations if you weren't comfortable with it. You could 100% rely on your db/seeds.rb file. When you think of "seeds.rb" you think of creating information, but this doesn't necessarily have to be the case. Rather than just blindly creating data, you can check to see if the pertinent data already exists, and if it does then modify and save it, but if it doesn't exist then just create a new record plain and simple.
db/seeds.rb
toys = Category.find_by_name("Toys")
if toys then
toys.name = "More Toys"
toys.save
else
Category.create(:name => "More Toys")
end
Run rake db:seeds and that code will run. You just need to consistently update the seeds.rb file every time you change your data, so that 1) it's searching for the right data value and 2) it's updating the correct attributes.
In the end there's no right or wrong way to do this, it's just whatever works for you and your workflow.
The place to load development data is db/seeds.rb. Since you can write arbitrary Ruby code there, you can even load your dev data from external files, for instance.
there is a file called db/seeds.rb
you can instantiate records using it
user1=User.create(:email=>"user#test.com",
:first_name=>"user",
:last_name=>"name",
:bio=>"User bio...",
:website=>"http://www.website.com",
:occupation=>"WebDeveloper",
:password=>"changeme",
:password_confirmation=>"changeme",
:avatar => File.open(File.join(Rails.root, '/app/assets/images/profiles/image.png'))
)
user2=User.create(:email=>"user2#test.com",
:first_name=>"user2",
:last_name=>"name2",
:bio=>"User2 bio...",
:website=>"http://www.website.com",
:occupation=>"WebDeveloper",
:password=>"changeme",
:password_confirmation=>"changeme",
:avatar => File.open(File.join(Rails.root, '/app/assets/images/profiles/image.png'))
)
Just run rake db:seed from command line to get it into the db

Can I change a model from embedded to referenced without losing data?

I made a bad decision as I was designing a MongoDB database to embed a model rather than reference it in an associated model. Now I need to make the embedded model a referenced model, but there is already a healthy amount of data in the database (or document?).
I'm using Mongoid, so I reasoned I can just change embedded_in to referenced_in. Before I start, I figured I'd ask people who know better than I do. How can I transition the embedded data already in the database to the document for the associated model.
class Building
embeds_many :landlords
..
end
class Landlord
embedded_in :building
...
end
Short answer - Incrementally.
Create a copy of Landlord, name it Landlord2.
Make it referenced in Building.
Copy all data from Landlord to Landlord2.
Delete Landlord.
Rename Landlord2 to Landlord.
Users should not be able to CRUD Landlord during steps 3-5 (ideally). You still can get away with only locking CRUD on 4-5. Just make sure you make all updates that happened during copying, before removing Landlords.
Just chan ging the model like you have above will not work, the old data will still be in a different strucutre in the db.
Very similar the previous answer, one of the things I have done to do this migration before is to do it dynamically, while the system is running and being used by the users.
I had the data layer separated from the logic, so it let me add some preprocessors and inject code to do the following.
Lets say we start with the old datamodel, then release new code that does the following:
On every access to the document, you would have to check whether the embedded property exists, if it does, create a new entry associated as a reference and save to the database and delete the embedded property from the documents. Once this is done for a couple of days, a lot of my data got migrated and then I just had to run a similar script for everything that was not touched, made the job of migrating the data much easier and simpler and I did not have to run long running scripts or get the system offline to perform the conversion.
You may not ha ve that requirement, so Pick accordingly.

Do you put your database static data into source-control ? How?

I'm using SQL-Server 2008 with Visual Studio Database Edition.
With this setup, keeping your schema in sync is very easy. Basically, there's a 'compare schema' tool that allow me to sync the schema of two databases and/or a database schema with a source-controlled creation script folder.
However, the situation is less clear when it comes to data, which can be of three different kind :
static data referenced in the code. typical example : my users can change their setting, and their configuration is stored on the server. However, there's a system-wide default value for each setting that is used in case the user didn't override it. The table containing those default settings grows as more options are added to the program. This means that when a new feature/option is checked in, the system-wide default setting is usually created in the database as well.
static data. eg. a product list populating a dropdown list. The program doesn't rely on the existence of a specific product in the list to work. This can be for example a list of unicode-encoded products that should be deployed in production when the new "unicode version" of the program is deployed.
other data, ie everything else (logs, user accounts, user data, etc.)
It seems obvious to me that my third item shouldn't be source-controlled (of course, it should be backuped on a regular basis)
But regarding the static data, I'm wondering what to do.
Should I append the insert scripts to the creation scripts? or maybe use separate scripts?
How do I (as a developer) warn the people doing the deployment that they should execute an insert statement ?
Should I differentiate my two kind of data? (the first one being usually created by a dev, while the second one is usually created by a non-dev)
How do you manage your DB static data ?
I have explained the technique I used in my blog Version Control and Your Database. I use database metadata (in this case SQL Server extended properties) to store the deployed application version. I only have scripts that upgrade from version to version. At startup the application reads the deployed version from the database metadata (lack of metadata is interpreted as version 0, ie. nothing is yet deployed). For each version there is an application function that upgrades to the next version. Usually this function runs an internal resource T-SQL script that does the upgrade, but it can be something else, like deploying a CLR assembly in the database.
There is no script to deploy the 'current' database schema. New installments iterate trough all intermediate versions, from version 1 to current version.
There are several advantages I enjoy by this technique:
Is easy for me to test a new version. I have a backup of the previous version, I apply the upgrade script, then I can revert to the previous version, change the script, try again, until I'm happy with the result.
My application can be deployed on top of any previous version. Various clients have various deployed version. When they upgrade, my application supports upgrade from any previous version.
There is no difference between a fresh install and an upgrade, it runs the same code, so I have fewer code paths to maintain and test.
There is no difference between DML and DDL changes (your original question). they all treated the same way, as script run to change from one version to next. When I need to make a change like you describe (change a default), I actually increase the schema version even if no other DDL change occurs. So at version 5.1 the default was 'foo', in 5.2 the default is 'bar' and that is the only difference between the two versions, and the 'upgrade' step is simply an UPDATE statement (followed of course by the version metadata change, ie. sp_updateextendedproperty).
All changes are in source control, part of the application sources (T-SQL scripts mostly).
I can easily get to any previous schema version, eg. to repro a customer complaint, simply by running the upgrade sequence and stopping at the version I'm interested in.
This approach saved my skin a number of times and I'm a true believer now. There is only one disadvantage: there is no obvious place to look in source to find 'what is the current form of procedure foo?'. Because the latest version of foo might have been upgraded 2 or 3 versions ago and it wasn't changed since, I need to look at the upgrade script for that version. I usually resort to just looking into the database and see what's in there, rather than searching through the upgrade scripts.
One final note: this is actually not my invention. This is modeled exactly after how SQL Server itself upgrades the database metadata (mssqlsystemresource).
If you are changing the static data (adding a new item to the table that is used to generate a drop-down list) then the insert should be in source control and deployed with the rest of the code. This is especially true if the insert is needed for the rest of the code to work. Otherwise, this step may be forgotten when the code is deployed and not so nice things happen.
If static data comes from another source (such as an import of the current airport codes in the US), then you may simply need to run an already documented import process. The import process itself should be in source control (we do this with all our SSIS packages), but the data need not be.
Here at Red Gate we recently added a feature to SQL Data Compare allowing static data to be stored as DML (one .sql file for each table) alongside the schema DDL that is currently supported by SQL Compare.
To understand how this works, here is a diagram that explains how it works.
The idea is that when you want to push changes to your target server, you do a comparison using the scripts as the source data source, which generates the necessary DML synchronization script to update the target. This means you don't have to assume that the target is being recreated from scratch each time. In time we hope to support static data in our upcoming SQL Source Control tool.
David Atkinson, Product Manager, Red Gate Software
I have come across this when developing CMS systems.
I went with appending the static data (the stuff referenced in the code) to the database creation scripts, then a separate script to add in any 'initialisation data' (like countries, initial product population etc).
For the first two steps, you could consider using an intermediate format (ie XML) for the data, then using a home grown tool, or something like CodeSmith to generate the SQL, and possible source files as well, if (for example) you have lookup tables which relate to enumerations used in the code - this helps enforce consistency.
This has another benefit that if the schema changes, in many cases you don't have to regenerate all your INSERT statements - you just change the tool.
I really like your distinction of the three types of data.
I agree for the third.
In our application, we try to avoid putting in the database the first, because it is duplicated (as it has to be in the code, the database is a duplicate). A secondary benefice is that we need no join or query to get access to that value from the code, so this speed things up.
If there is additional information that we would like to have in the database, for example if it can be changed per customer site, we separate the two. Other tables can still reference that data (either by index ex: 0, 1, 2, 3 or by code ex: EMPTY, SIMPLE, DOUBLE, ALL).
For the second, the scripts should be in source-control. We separate them from the structure (I think they typically are replaced as time goes, while the structures keeps adding deltas).
How do I (as a developer) warn the people doing the deployment that they should execute an insert statement ?
We have a complete procedure for that, and a readme coming with each release, with scripts and so on...
First off, I have never used Visual Studio Database Edition. You are blessed (or cursed) with whatever tools this utility gives you. Hopefully that includes a lot of flexibility.
I don't know that I'd make that big a difference between your type 1 and type 2 static data. Both are sets of data that are defined once and then never updated, barring subsequent releases and updates, right? In which case the main difference is in how or why the data is as it is, and not so much in how it is stored or initialized. (Unless the data is environment-specific, as in "A" for development, "B" for Production. This would be "type 4" data, and I shall cheerfully ignore it in this post, because I've solved it useing SQLCMD variables and they give me a headache.)
First, I would make a script to create all the tables in the database--preferably only one script, otherwise you can have a LOT of scripts lying about (and find-and-replace when renaming columns becomes very awkward). Then, I would make a script to populate the static data in these tables. This script could be appended to the end of the table script, or made it's own script, or even made one script per table, a good idea if you have hundreds or thousands of rows to load. (Some folks make a csv file and then issue a BULK INSERT on it, but I'd avoid that is it just gives you two files and a complex process [configuring drive mappings on deployment] to manage.)
The key thing to remember is that data (as stored in databases) can and will change over time. Rarely (if ever!) will you have the luxury of deleting your Production database and replacing it with a fresh, shiny, new one devoid of all that crufty data from the past umpteen years. Databases are all about changes over time, and that's where scripts come into their own. You start with the scripts to create the database, and then over time you add scripts that modify the database as changes come along -- and this applies to your static data (of any type) as well.
(Ultimately, my methodology is analogous to accounting: you have accounts, and as changes come in you adjust the accounts with journal entries. If you find you made a mistake, you never go back and modify your entries, you just make a subsequent entries to reverse and fix them. It's only an analogy, but the logic is sound.)
The solution I use is to have create and change scripts in source control, coupled with version information stored in the database.
Then, I have an install wizard that can detect whether it needs to create or update the db - the update process is managed by picking appropriate scripts based on the stored version information in the database.
See this thread's answer. Static data from your first two points should be in source control, IMHO.
Edit: *new
all-in-one or a separate script? it does not really matter as long as you (dev team) agree with your deployment team. I prefer to separate files, but I still can always create all-in-one.sql from those in the proper order [Logins, Roles, Users; Tables; Views; Stored Procedures; UDFs; Static Data; (Audit Tables, Audit Triggers)]
how do you make sure they execute it: well, make it another step in your application/database deployment documentation. If you roll out application which really needs specific (new) static data in the database, then you might want to perform a DB version check in your application. and you update the DB_VERSION to your new release number as part of that script. Then your application on a start-up should check it and report an error if the new DB version is required.
dev and non-dev static data: I have never seen this case actually. More often there is real static data, which you might call "dev", which is major configuration, ISO static data etc. The other type is default lookup data, which is there for users to start with, but they might add more. The mechanism to INSERT these data might be different, because you need to ensure you do not destoy (power-)user-created data.