How to scale Play2 Evolutions - sql

I'm recently started using Play2 on a project, and read the section on evolutions. And while the example they cite seems fine if my project had 1 table, it seems like it would be very messy if I had 10-20 tables in 1.sql and then changes to them split up over 2.sql, 3.sql and so on.
In Ruby on Rails, Symfony, and others, you define your up/down migrations per entity.
My question is, what is the best way to setup your evolutions in Play2? Should I have all my tables in 1.sql and then make little changes to them over 2.sql and so on? Or is there a way to have a separate .sql file for each table?
Also, are there any examples of large, open source Play2 projects where I could see how it would look?

Actually Play has not possibility to divide evolutions by entities.
IMHO it's rather matter of taste, you can add each entity in single next evolution, anyway only difference will be that counter of evolution will be bigger, I don't think that will help you to keep evolutions cleaner.
Typical workflow is starting from ... good planning. Just create some graph representation of your schema and try to add there as many things as you need. It helps a lot while the project startup and also in next steps of development.
If you are gonna to use Ebean, create all models from your graph and let the plugin to create automatic first evolution file, probably you will save a lot of time on writing evolutions for relations, constraints, etc. Spend some time for fixing and checking initial schema before further development.
After that you need to disable automatic updates as they drops whole DB and recreates tables them from the scratch (there's no diff schema update in Ebean).
It's also matter of taste but I prefer to combine several changes into single evolutions (so again planning...) instead of creating next and next files for every single change ad hoc.

Related

Multi System Database structure based copying/updating best practice

so after searching and not finding similar cases I want to open a new question.
So here is the case:
We are working with a large database with a very complicated data structure. Also we are working on multiple systems to ensure stability (development, testing, quality and productive) and its always a struggle so move data between those systems. As I said the data structure is very large and there is also a lot of logic inside the database. Customers are able to add new data parts as configuration and there is also a static income of data which are used for statistics and monitoring. So let me explain the problem with a small example:
Lets take this Database as an example. We have some families making some contest with each other. And they will create some statistics about the points they make.
The Purple Tables are fixed configurations. They are created once and they can only be changed via an Operator. Those changes will be done and tested in the development system first.
The Yellow Tables are changing configurations. Each Family is able to create or delete multiple Contests and assign their kids.
The Red Table is just plain data. Each time a kid makes points, a new row is added with the amount and current time and the relation to the kid and contest.
This table will be the base for the later statistics.
This Database is developed on two systems a productive one which is used by the families and a develop system which is used by the programmers/operators.
While developing the programmers will add test data like kids families contests and points. And while using the families will create new contests and assign new kids and will fill up the point table.
It's necessary to copy new/tested/fixed families from the development to the productive system.
Its also necessary to copy Contests, Contest-Kid-Assignments and Points from the productive to the development system to find new errors.
Also it must be possible to change the table structure on the development system and transmit this change to the productive system. (This shouldn't be the main topic here sometimes it can be such a large changes that there just is no easy way, so lets keep this point simple but keep it in mind.)
I want to copy parts of the tables to another system but be able to ignore some tables (for example: Points) and I want to make sure to not copy kids without their parent family so there is no "parentless" object in the database.
Question: What would be a good and save way to do this?
I don't need a solution for a specific database type or some scripts. I'm looking for tools, libraries or good practice. (But just as a note we're using mssql.)
We are currently making a tool for this problem (not going well: unstable, overly complicated, slow and possible reinventing the wheel).
Also a lot of devs I know just copy the whole database (making a backup and running it into another server) But this is also making problems: users are being copied and their guid change so they loose permissions etc. I don't think this is a good solution. Also the database is down for quite a long time and its never a smooth process.
Making it manually is sometimes the easiest way but considering the size of our data structure its not just a huge piece of work there is also a large possibility for mistakes.
So I'm hoping someone knows a tool or something similar to help me out.
Welcome to the pains of development for a Stateful entity like a database. :) RedGate makes a tool called SQL Source Control that is good for moving changed data and Schema into Production, and it can interface with source control solutions such as GIT. It's a bit pricey, but it's the best I've found. One option for keeping dev up to date with prod data and dev changes is one I concocted at my last place of employment, which was... not 100% perfect, but better than nothing, and free. It was developed in Powershell, and it went something like this:
Create Pre-restore, Pre-dacpac and Post-dacpac SQL scripts to store data and
permission diffs between dev and prod
Use SQLPackage.EXE to make DacPac of Dev(Dacpac is basically an xml schema of db, no
data)
Execute Pre-restore Proc (Often copying out test data that needs to be persisted)
Restore Prod over Dev
Execute Pre-dacpac script (any DDL That could cause data loss may need to go here)
Use SQLPackage.EXE to apply DacPac made in step 2 to Newly restored database
Execute Post-Dacpac Script (Permissions, restoration of data copied in step 3)
Again, like I said, it worked and automated the restoration of prod data into our dev environment while keeping our dev changes intact, but it required a good bit of upkeep and maintenance. Also, keep in mind, once your DB reaches a certain size, doing a nightly restore is no longer a viable option due to the time it takes to restore.

Keeping track of development changes on database structure

I work in a development team using very basic git principles to develop our project. So every feature is developed in a feature branch and gets merged to when ready.
Often it is necessary to make changes to our database, adding tables, altering columns. Sometimes this includes migration needs. (Casting datatypes, etc)
At the moment we simply write a SQL file containing these changes. And the one "bringing the stuff to production" has to keep track which SQL files are already applied and which still needs to be. If migrations need to be applied comments in the sql file tell you that - Frankly it's a mess ;D
Are there any buzzwords, projects, principles which apply to this scenario?
I stumbled across goose which fullfills all my dreams :) You can make "simple" migrations via plain sql files or you can make complex programmatic changes via go.

Rails update old database

Right now I'm working on updating a Rails app and the database has some issues. It's also being converted from MySQL to PostgreSQL.
There's 3 columns being used to track one time value. The time the facility opens on Monday is being recorded as monday_open_hour, monday_open_minute, monday_open_ampm. I'd like to merge these into a single time field.
There are also several fields being used for only 1% of the 3000+ records, so I'd like to break those out into a separate table.
What would be the best way to do this? I imagine it could probably be done in SQL with some kind of stored procedures/cursors. Is there a way to do it with Ruby/Rails?
The Rails way to deal with incremental database changes is to use migrations. Migrations let you apply incremental changes to your schema or database contents in an orderly fashion, even as you're collaborating with a team. There are nice helpers for common tasks like creating and dropping tables, renaming columns, and simple things like that, but you can drop to arbitrary SQL if you need to (although, be aware that that will most likely tie you to your current database, and make further moves more difficult).
Basically, you can generate a new migration with rails generate migration ConsolidateDateColumns (for example). This will create a template for you in the db/migrate directory; see the Rails Guides entry to get started on writing them. When you're ready to apply it, run rake db:migrate.
The advantages of doing it this way are that it lets you easily apply the same changes to different environments (development, test, production, staging, or across your development team) and keep them in sync, and it encourages you to keep things reversible whenever possible, so you maintain some degree of freedom to migrate back and forth if you need to.
One more thing: it sounds like you're going to be doing a lot of major changes in quick succession. Make sure that you take a backup of your original database before you begin, and thoroughly test your work against a reduced test set in a separate environment before you run it against the real thing!

Why use database migrations instead of a version controlled schema

Migrations are undoubtedly better than just firing up phpMyAdmin and changing the schema willy-nilly (as I did during my php days), but after using them for awhile, I think they're fatally flawed.
Version control is a solved problem. The main function of migrations is to keep a history of changes to your database. But storing a different file for each change is a clumsy way to track them. You don't create a new version of post.rb (or a file representing the delta) when you want to add a new virtual attribute -- why should you create a new migration when you want to add a new non-virtual attribute?
Put another way, just as you check post.rb into version control, why not check schema.rb into version control and make the changes to the file directly?
This is functionally the same as keeping a file for each delta, but it's much easier to work with. My mental model is "I want table X to have such and such columns (or really, I want model X to have such and such properties)" -- why should you have to infer from this how to get there from the existing schema; just open up schema.rb and give table X the right columns!
But even the idea that classes wrap tables is an implementation detail! Why can't I just open up post.rb and say:
Class Post
t.string :title
t.text :body
end
If you went with a model like this, you'd have to make a decision about what to do with existing data. But even then, migrations are overkill -- when you migrate data, you're going to lose fidelity when you use a migration's down method.
Anyway, my question is, even if you can't think of a better way, aren't migrations kind of gross?
why not check schema.rb into version control and make the changes to the file directly?
Because the database itself is not in sync with version control.
For instance, you could be using the head of the source tree. But you're connecting to a database that was defined as some past version, not the version you have checked out. The migrations allow you to upgrade or downgrade the database schema from any version and to any version, incrementally.
But to answer your last question, yes, migrations are kind of gross. They implement a redundant revision control system on top of another revision control system. However, neither of these revision control systems is really in sync with the database.
Just to paraphrase what others have said: migrations allow you to protect the data as your schema evolves. The notion of maintaining a single schema.rb file is attractive only until your app goes into production. Thereafter, you'll need a way to migrate your existing users' data as your schema changes.
There are also data-related issues that are important to consider, which migrations solve.
Say an old version of my schema has a feet and inches column. For efficiency purposes, I want to combine that into just an inches column to make sorting and searching easier.
My migration can combine all of the feet and inches data into the inches column (feet * 12 + inches) while it's updating the database (i.e. just before it removes the feet column)
Obviously this being in a migration makes it automatically work when you later apply the changes to your production database.
As it stands, they're annoying and inadequate but quite possibly the best option we have available to us at present. Quite a few smart people have spent quite a lot of time working on the problem and this, so far, is about the best they've been able to come up with. After about 20 years of mostly hand-coding database version updates, I came very rapidly to appreciate migrations as a major improvement when I found ActiveRecord.
As you say, version control is a solved problem. Up to a point I'd agree: it's very solved for text files in particular, less so for other file types and not really very much at all for resources such as databases.
How do migrations look if you view them as version control deltas for databases? They're the sum of the deltas you have to apply to get a schema from one version to another. I'm not aware that even git, for all its super-powerfulness, can take two schema files and generate the necessary DDL to do that.
As far as declaring table content in the model, I believe that's what DataMapper does (no personal experience). I think there may be some DDL inference capabilities there as well.
"even if you can't think of a better way, aren't migrations kind of gross?"
Yes. But they're less gross than anything else we have. Do please let us know when you've completed the non-gross alternative.
I suppose given "even if you can't think of a better way", then yes, in the grand scheme of things, migrations are kind of gross. So are Ruby, Rails, ORMs, SQL, web apps, ...
Migrations have the (not insignificant) advantage that they exist. Gross-but-exists tends to win out over Pleasant-but-nonexistent. I'm sure there probably are pleasant and nonexistent ways to migrate your data, but I'm not sure what that means. :-)
OK, I'm going to take a wild guess here and say that you're probably working all by yourself. In a group development project the power of each individual to take responsibility for just his/her changes to the database required for the code that developer is writing is much much more important.
The alternative is that larger groups of programmers (e.g. 10-15 Java developers where I work) end up relying on a couple of dedicated full time database administrators to do that along with their other maintenance, optimization, etc. duties.

Do you put your indexes in source control?

And how do you keep them in synch between test and production environments?
When it comes to indexes on database tables, my philosophy is that they are an integral part of writing any code that queries the database. You can't introduce new queries or change a query without analyzing the impact to the indexes.
So I do my best to keep my indexes in synch betweeen all of my environments, but to be honest, I'm not doing very well at automating this. It's a sort of haphazard, manual process.
I periodocally review index stats and delete unnecessary indexes. I usually do this by creating a delete script that I then copy back to the other environments.
But here and there indexes get created and deleted outside of the normal process and it's really tough to see where the differences are.
I've found one thing that really helps is to go with simple, numeric index names, like
idx_t_01
idx_t_02
where t is a short abbreviation for a table. I find index maintenance impossible when I try to get clever with all the columns involved, like,
idx_c1_c2_c5_c9_c3_c11_5
It's too hard to differentiate indexes like that.
Does anybody have a really good way to integrate index maintenance into source control and the development lifecycle?
Indexes are a part of the database schema and hence should be source controlled along with everything else. Nobody should go around creating indexes on production without going through the normal QA and release process- particularly performance testing.
There have been numerous other threads on schema versioning.
The full schema for your database should be in source control right beside your code. When I say "full schema" I mean table definitions, queries, stored procedures, indexes, the whole lot.
When doing a fresh installation, then you do:
- check out version X of the product.
- from the "database" directory of your checkout, run the database script(s) to create your database.
- use the codebase from your checkout to interact with the database.
When you're developing, every developer should be working against their own private database instance. When they make schema changes they checkin a new set of schema definition files that work against their revised codebase.
With this approach you never have codebase-database sync issues.
Yes, any DML or DDL changes are scripted and checked in to source control, mostly thru activerecord migrations in rails. I hate to continually toot rails' horn, but in many years of building DB-based systems I find the migration route to be so much better than any home-grown system I've used or built.
However, I do name all my indexes (don't let the DBMS come up with whatever crazy name it picks). Don't prefix them, that's silly (because you have type metadata in sysobjects, or in whatever db you have), but I do include the table name and columns, e.g. tablename_col1_col2.
That way if I'm browsing sysobjects I can easily see the indexes for a particular table (also it's a force of habit, wayyyy back in the day on some dBMS I used, index names were unique across the whole DB, so the only way to ensure that is to use unique names).
I think there are two issues here: the index naming convention, and adding database changes to your source control/lifecycle. I'll tackle the latter issue.
I've been a Java programmer for a long time now, but have recently been introduced to a system that uses Ruby on Rails for database access for part of the system. One thing that I like about RoR is the notion of "migrations". Basically, you have a directory full of files that look like 001_add_foo_table.rb, 002_add_bar_table.rb, 003_add_blah_column_to_foo.rb, etc. These Ruby source files extend a parent class, overriding methods called "up" and "down". The "up" method contains the set of database changes that need to be made to bring the previous version of the database schema to the current version. Similarly, the "down" method reverts the change back to the previous version. When you want to set the schema for a specific version, the Rails migration scripts check the database to see what the current version is, then finds the .rb files that get you from there up (or down) to the desired revision.
To make this part of your development process, you can check these into source control, and season to taste.
There's nothing specific or special about Rails here, just that it's the first time I've seen this technique widely used. You can probably use pairs of SQL DDL files, too, like 001_UP_add_foo_table.sql and 001_DOWN_remove_foo_table.sql. The rest is a small matter of shell scripting, an exercise left to the reader.
I always source-control SQL (DDL, DML, etc). Its code like any other. Its good practice.
I am not sure indexes should be the same across different environments since they have different data sizes. Unless your test and production environments have the same exact data, the indexes would be different.
As to whether they belong in source control, am not really sure.
I do not put my indexes in source control but the creation script of the indexes. ;-)
Index-naming:
IX_CUSTOMER_NAME for the field "name" in the table "customer"
PK_CUSTOMER_ID for the primary key,
UI_CUSTOMER_GUID, for the GUID-field of the customer which is unique (therefore the "UI" - unique index).
On my current project, I have two things in source control - a full dump of an empty database (using pg_dump -c so it has all the ddl to create tables and indexes) and a script that determines what version of the database you have, and applies alters/drops/adds to bring it up to the current version. The former is run when we're installing on a new site, and also when QA is starting a new round of testing, and the latter is run at every upgrade. When you make database changes, you're required to update both of those files.
Using a grails app the indexes are stored in source control by default since you are defining the index definition inside of a file that represents your domain object. Just offering the 'Grails' perspective as an FYI.