Where do I store the .sql file that creates my database? [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am trying to develop a web application that, like most of applications do, uses a database. To create the database I write a .sql-file that contains all the changes I make to the database. I don't know why exactly but in the past I always found it hard to empty my database or modify it later on when I find that a change would make sense. Since I am still learning all this database related stuff the first layout of my database will always be changed. To keep track of all those changes I grew the habit to create a this .sql file.
In this file I am always dropping all the tables that are already existing and creating all the tables new. I kind of do this to have a reference of the actual state of my database always on hand. Changes to a file are way easier for me than directly using the command line database tool. The first question would really be: Is this actually a good practice or is there another way of organizing things that I didn't hear of yet?
The real question is: Where do I store this file? Is it good practice to store it in the same git-repository as the actual code? Should I put it in git at all? I also think of git/github as a cloud storage, if my harddrive burns, all my projects will still be there since I got them on github. If I don't have the .sql file in there I would have to set up the database from new.

The general category for this sort of code is "database migrations." There are tools specialized to these tasks and various web application frameworks may support various different DB migrations tools or they may have their own features.
Probably the most popular tool/suite in this category for Python (models implemented in SQLAlchemy) would be Alembic. Some of the other options include Flyway, Liquibase, and Sqitch.
In all of these cases you manage some abstraction of your database schema (SQL, XML, YAML) and these tools generate the necessary SQL and other code to perform "migrations" from each version of the schema to the next throughout the history of your project. Usually these are generated as increments. They start with initial database schema (possibly completely empty), build and initialize the schema into that version; then build and migrate through each step to arrive that the desired version.
This can become arbitrarily complex if you're making more radical changes to your schema. Adding an extra column to a table without a "NOT NULL" constraint is trivial. Adding new M:N relations through "NOT NULL" junction tables and migrating from some denormalized schema to a higher normal form, for example, can entail intermediary stages where you might need to drop some constraints and tolerate referential integrity violations through some transitional state.
I highly recommend reading these websites, their tutorials, HOWTOs and other documentation to gain a deeper understanding of why these tools exist and how they approach this problem space.

Yu are reinventing the wheel, my friend, your solution is versioning the DB Schema. And Yes, the changes should be added to the project files as almost all framework do. I recommend that you read the following question
How do you version your database schema?

Yes you should absolutely keep this code in source control. However, the exact location of the code within the repository is up to you and or your development team or management. Some good options would be an install, setup, SQL, or ddl folder.

Yes, what you are doing is correct.
Compare with ruby on rails: a file db/schema.rb contains the whole schema. It is good too have such a complete file. This allows you, for example, to easily bootstrap a new environment (e.g. a new testing environment). This file is obviously never used in production as it would wipe out all your data.
Then there are separate small files db/migrations/20171003_add_name_to_person_table.rb or whatever with incremental changes to the schema - called migrations. Those are used to change existing environments without losing data, with some mechanism to make sure each one is only run once per DB.
At your stage it is perfectly fine to be doing all of this manually. You can try to automate it as needed, later. It is good enough that you noticed that something is going on here.
That stuff must go into your code repository, wherever seems natural. /db, /schema, /etc might be some choices.

Related

Keeping track of development changes on database structure

I work in a development team using very basic git principles to develop our project. So every feature is developed in a feature branch and gets merged to when ready.
Often it is necessary to make changes to our database, adding tables, altering columns. Sometimes this includes migration needs. (Casting datatypes, etc)
At the moment we simply write a SQL file containing these changes. And the one "bringing the stuff to production" has to keep track which SQL files are already applied and which still needs to be. If migrations need to be applied comments in the sql file tell you that - Frankly it's a mess ;D
Are there any buzzwords, projects, principles which apply to this scenario?
I stumbled across goose which fullfills all my dreams :) You can make "simple" migrations via plain sql files or you can make complex programmatic changes via go.

Best way to migrate data from Access to SQL Server

The problem
Ok, sorry that my question is somewhat abstract and subjective, but will try to make it as specific as possible. So, the situation I am in is simple - I am remaking a very old MS Access application on a new website using ASP.NET MVC. As currently the MVC site is using SQL Server 2008 (for many well known reasons) I need to find a way to migrate the tables AND the data, because the information in the old database will be used in the new application.
Alright, so far so good, however there are a few problems. The old application is written in a different language, meaning that I want to translate table names, field names, and all other names that are there to English. Furthermore, I will be making some changes on the models themselves (change the type of some fields, add additional fields to some tables, remove old unnecessary ones and more). So technically I'll be 'having my way' with everything.
Researched solutions
With those things in mind I researched for the ways to migrate data from Access database to a SQL Server. Of course, there is a lot of information on the matter, in Stack Overflow alone there are more than a few questions and solutions. So why am I struggling to find the answer ? Well I found a few solutions that will be sufficient to some extend (actually will definitely solve my problems) but I am writing to ask if someone experienced has a better perspective on it than I do. Alright, the solutions and why I am still looking for advice: /I'll be listing just a couple of the most common and popular ones that I found, many of the others share the same capabilities and/or results /
Upsize Wizzard (Access) - this is a tool devised specifically for migrating tables and data from Access. It is my most favourite one for the moment as I find it kind of straightforward to work with and it provides good overall results. I was able to migrate the tables to SQL Server (along with the data of course) which more or less is what I am intending to do. It is fast, it seems like it allows you to migrate indexes, primary keys and even to my knowledge foreign keys (table relationships). The downsides of this tool, however, include that it ignores your queries (which I don't really need honestly) and it doesn't provide a way to change the model, names or types of the properties of the table you migrate - which is the thing I kind of prefer, because I will have to make more than a few changes, adding, renaming, deleting, etc. And then continue with the development process (of the application) which will lead to a few additional minor changes. And finally I would need to apply all changes (migration + all changes) on the production server, which overall is prone to mistakes as I will be doing it by hand (and there are more than a few tables).
SQL Server Migration Assistant (SSMA) - ok, this is a separate tool (not included in Access) with again the same idea - to migrate data from Access to ... possibly everywhere, haven't researched that. Overall it offers more functionality and customizing from the Upsize Wizard, but of course it does it in a more complicated way. I haven't put enough effort to make a migration with this tool yet, as it involves a lot of installations and additional work, but according to my research it provides almost all (if not all) of the functionality I require. The downside however comes with the naming. As I mentioned it allows you to apply changes on the tables, schema, fields, indexes, keys and probably everything, but the articles advice that I change the names in Access first, as it will be easier and the migration process will run more smoothly. I am not allowed to make changes on the original Access database, as it will remain functional until the publish of the 'renewed' project, and the data inside it is being used, so a mere copy of the file is a solution I am not particularly fond of, because I might loose new records. Also I cant predict the changes I would want to make in the development process (as I said I believe I would want/need to apply some additional changes later on when I find 'weaknesses' in my data design in the development process) so I find it to be a little half baked solution.
Conclusion
The options presented, the way I see them, are two:
Use the Upsize Wizard to migrate the access tables, then write a script that applies the changes I want to make. Then in the development process add any additional changes to the script. When ready to publish on the production server, reapply the migration with the wizard, run the changes script and pray everything is fine.
Get more involved with the SSMA tool and try producing an updated version of the tables with the migration process. (See how efficient the renaming is and decide whether to use copied file to rename and then find a way to migrate only new records or do it all in the SSMA). Then again write a script for the changes that occur in the development process and re-do and apply it all on the production server when ready and then pray everything is fine.
Option I have not yet seen, apply it and then pray everything is fine.
I have researched the matter for a couple of days now, and found a few more solutions that I do not believe are better by the mentioned. However I include the possibility of missing the 'big red X on the map', a practical and easy solution which seems like it was designed specifically for me (though I doubt that a little). Anyway, reducing all the madness that I have written so far to a few simple questions will look like:
Is anyone aware if my conclusions are correct? I am leaning towards option one as it is easier to accomplish.
Has anyone experienced/found a better way to do that, or just found some 'logic-leaps' in my writings as I am overthinking the entire thing a little and may be doing some obvious miscalculation.
Very sorry for asking a trivial question and one that includes decision making that may involve deeper understanding of my project and situation, yet I am working with rather sensitive data and would appreciate feedback, even if only to improve my confidence into the chosen approach.
There is one other tool/method you might want to consider that seems to cater to your specific needs more. This would be to use the data import/export tool that ships with sqlserver to do a complete copy of all data into a temporary location within sql server and then write custom queries to reorganize the names and other changes you want to make. Is a bit more work but you could use the end product as a seed method for your migrations ;) (if you are doing code first anyway)

What is db/development_structure.sql in a rails project?

There is a development_structure.sql inside my /db folder of my rails application (rails 2.3.4, ruby 1.8.7) and I am not sure exactly what it does.
Is it needed for some specific environment? (I think I read somewhere that it's used for tests)
Do I need to add it to my git repository?
This post has been used as a reference by a coworker of mine, but the two answers are not exact or informative enough.
development_structure.sql is a low-level dump of the schema, which is necessary when you start to use proprietary database features - either you want to or not, you're going to use them at some point.
Regarding the question of storing it or not, there's some debate. Here is an informative post: http://www.saturnflyer.com/blog/jim/2010/09/14/always-check-in-schema-rb/.
And my take on this follows.
The objective of the development_structure.sql is to sync, for any given commit, the database structure with the code, without having previous knowledge of the schema structure, that is, without having to rely on a pre-existing state of the schema to get the new one.
In a nutshell, by having a schema structure available, whenever you change branch/commit, you load it directly and forget it.
This is mostly valid for dynamic and "crowded" projects, where different branches have differences in the underlying schema structure.
Without having the schema structure stored, you would need to always use an existing reference schema in your database, and migrate it back or forward every time you change branch/commit; several real-world cases can make this process inefficient (e.g. when another branch doesn't have some migrations you currently have, or some migrations can't be rolled back).
Another problem is automated builds, which suffer from the same problems, and even worse, they can't apply manual changes.
The only downside is that it requires a certain habit, which is, to store it every time you run a migration. Easy to say, but also easy to forget.
I don't say you can't live without development_structure.sql - of course you can.
But if you have it, when changing branch/commit you just load-and-forget; if you don't, you [may] have to go through a series of manual steps.
You should not add it to your git repository.
It is a file created automatically by rails when you run migrations with your database.yml configured to connect against a mysql database.
You can view it as an alternative to schema.rb
I believe you can force rails to create it by adding in your environment.rb:
config.active_record.schema_format = :sql
When present this file is used for example by:
rake db:test:clone_structure
Edit
Relevant section in Ruby On Rails Guides.
http://guides.rubyonrails.org/migrations.html#schema-dumping-and-you
They recommend to check it into source control on the wiki.
I personally like to keep it out of it. I like to be able to run all migrations very quickly. It is for me a good sign. If migrations become slow I feel like I am not in total control of my environment anymore. Slowness in migrations generally means I have a lot of data in my development database which I feel wrong.
However, It seems to be a matter of personal taste nowadays.
Follow your instincts on this one.
It's created when you run a rake task to clone your development database to your test database. The development database is outputted to SQL which is then read in to your test DB. You can safely delete it.
In rails 3, you don't even have to write this line,
config.active_record.schema_format = :sql
You can generate this structure.sql file by simply running the above rake command mentioned above

Why use database migrations instead of a version controlled schema

Migrations are undoubtedly better than just firing up phpMyAdmin and changing the schema willy-nilly (as I did during my php days), but after using them for awhile, I think they're fatally flawed.
Version control is a solved problem. The main function of migrations is to keep a history of changes to your database. But storing a different file for each change is a clumsy way to track them. You don't create a new version of post.rb (or a file representing the delta) when you want to add a new virtual attribute -- why should you create a new migration when you want to add a new non-virtual attribute?
Put another way, just as you check post.rb into version control, why not check schema.rb into version control and make the changes to the file directly?
This is functionally the same as keeping a file for each delta, but it's much easier to work with. My mental model is "I want table X to have such and such columns (or really, I want model X to have such and such properties)" -- why should you have to infer from this how to get there from the existing schema; just open up schema.rb and give table X the right columns!
But even the idea that classes wrap tables is an implementation detail! Why can't I just open up post.rb and say:
Class Post
t.string :title
t.text :body
end
If you went with a model like this, you'd have to make a decision about what to do with existing data. But even then, migrations are overkill -- when you migrate data, you're going to lose fidelity when you use a migration's down method.
Anyway, my question is, even if you can't think of a better way, aren't migrations kind of gross?
why not check schema.rb into version control and make the changes to the file directly?
Because the database itself is not in sync with version control.
For instance, you could be using the head of the source tree. But you're connecting to a database that was defined as some past version, not the version you have checked out. The migrations allow you to upgrade or downgrade the database schema from any version and to any version, incrementally.
But to answer your last question, yes, migrations are kind of gross. They implement a redundant revision control system on top of another revision control system. However, neither of these revision control systems is really in sync with the database.
Just to paraphrase what others have said: migrations allow you to protect the data as your schema evolves. The notion of maintaining a single schema.rb file is attractive only until your app goes into production. Thereafter, you'll need a way to migrate your existing users' data as your schema changes.
There are also data-related issues that are important to consider, which migrations solve.
Say an old version of my schema has a feet and inches column. For efficiency purposes, I want to combine that into just an inches column to make sorting and searching easier.
My migration can combine all of the feet and inches data into the inches column (feet * 12 + inches) while it's updating the database (i.e. just before it removes the feet column)
Obviously this being in a migration makes it automatically work when you later apply the changes to your production database.
As it stands, they're annoying and inadequate but quite possibly the best option we have available to us at present. Quite a few smart people have spent quite a lot of time working on the problem and this, so far, is about the best they've been able to come up with. After about 20 years of mostly hand-coding database version updates, I came very rapidly to appreciate migrations as a major improvement when I found ActiveRecord.
As you say, version control is a solved problem. Up to a point I'd agree: it's very solved for text files in particular, less so for other file types and not really very much at all for resources such as databases.
How do migrations look if you view them as version control deltas for databases? They're the sum of the deltas you have to apply to get a schema from one version to another. I'm not aware that even git, for all its super-powerfulness, can take two schema files and generate the necessary DDL to do that.
As far as declaring table content in the model, I believe that's what DataMapper does (no personal experience). I think there may be some DDL inference capabilities there as well.
"even if you can't think of a better way, aren't migrations kind of gross?"
Yes. But they're less gross than anything else we have. Do please let us know when you've completed the non-gross alternative.
I suppose given "even if you can't think of a better way", then yes, in the grand scheme of things, migrations are kind of gross. So are Ruby, Rails, ORMs, SQL, web apps, ...
Migrations have the (not insignificant) advantage that they exist. Gross-but-exists tends to win out over Pleasant-but-nonexistent. I'm sure there probably are pleasant and nonexistent ways to migrate your data, but I'm not sure what that means. :-)
OK, I'm going to take a wild guess here and say that you're probably working all by yourself. In a group development project the power of each individual to take responsibility for just his/her changes to the database required for the code that developer is writing is much much more important.
The alternative is that larger groups of programmers (e.g. 10-15 Java developers where I work) end up relying on a couple of dedicated full time database administrators to do that along with their other maintenance, optimization, etc. duties.

Do you put your indexes in source control?

And how do you keep them in synch between test and production environments?
When it comes to indexes on database tables, my philosophy is that they are an integral part of writing any code that queries the database. You can't introduce new queries or change a query without analyzing the impact to the indexes.
So I do my best to keep my indexes in synch betweeen all of my environments, but to be honest, I'm not doing very well at automating this. It's a sort of haphazard, manual process.
I periodocally review index stats and delete unnecessary indexes. I usually do this by creating a delete script that I then copy back to the other environments.
But here and there indexes get created and deleted outside of the normal process and it's really tough to see where the differences are.
I've found one thing that really helps is to go with simple, numeric index names, like
idx_t_01
idx_t_02
where t is a short abbreviation for a table. I find index maintenance impossible when I try to get clever with all the columns involved, like,
idx_c1_c2_c5_c9_c3_c11_5
It's too hard to differentiate indexes like that.
Does anybody have a really good way to integrate index maintenance into source control and the development lifecycle?
Indexes are a part of the database schema and hence should be source controlled along with everything else. Nobody should go around creating indexes on production without going through the normal QA and release process- particularly performance testing.
There have been numerous other threads on schema versioning.
The full schema for your database should be in source control right beside your code. When I say "full schema" I mean table definitions, queries, stored procedures, indexes, the whole lot.
When doing a fresh installation, then you do:
- check out version X of the product.
- from the "database" directory of your checkout, run the database script(s) to create your database.
- use the codebase from your checkout to interact with the database.
When you're developing, every developer should be working against their own private database instance. When they make schema changes they checkin a new set of schema definition files that work against their revised codebase.
With this approach you never have codebase-database sync issues.
Yes, any DML or DDL changes are scripted and checked in to source control, mostly thru activerecord migrations in rails. I hate to continually toot rails' horn, but in many years of building DB-based systems I find the migration route to be so much better than any home-grown system I've used or built.
However, I do name all my indexes (don't let the DBMS come up with whatever crazy name it picks). Don't prefix them, that's silly (because you have type metadata in sysobjects, or in whatever db you have), but I do include the table name and columns, e.g. tablename_col1_col2.
That way if I'm browsing sysobjects I can easily see the indexes for a particular table (also it's a force of habit, wayyyy back in the day on some dBMS I used, index names were unique across the whole DB, so the only way to ensure that is to use unique names).
I think there are two issues here: the index naming convention, and adding database changes to your source control/lifecycle. I'll tackle the latter issue.
I've been a Java programmer for a long time now, but have recently been introduced to a system that uses Ruby on Rails for database access for part of the system. One thing that I like about RoR is the notion of "migrations". Basically, you have a directory full of files that look like 001_add_foo_table.rb, 002_add_bar_table.rb, 003_add_blah_column_to_foo.rb, etc. These Ruby source files extend a parent class, overriding methods called "up" and "down". The "up" method contains the set of database changes that need to be made to bring the previous version of the database schema to the current version. Similarly, the "down" method reverts the change back to the previous version. When you want to set the schema for a specific version, the Rails migration scripts check the database to see what the current version is, then finds the .rb files that get you from there up (or down) to the desired revision.
To make this part of your development process, you can check these into source control, and season to taste.
There's nothing specific or special about Rails here, just that it's the first time I've seen this technique widely used. You can probably use pairs of SQL DDL files, too, like 001_UP_add_foo_table.sql and 001_DOWN_remove_foo_table.sql. The rest is a small matter of shell scripting, an exercise left to the reader.
I always source-control SQL (DDL, DML, etc). Its code like any other. Its good practice.
I am not sure indexes should be the same across different environments since they have different data sizes. Unless your test and production environments have the same exact data, the indexes would be different.
As to whether they belong in source control, am not really sure.
I do not put my indexes in source control but the creation script of the indexes. ;-)
Index-naming:
IX_CUSTOMER_NAME for the field "name" in the table "customer"
PK_CUSTOMER_ID for the primary key,
UI_CUSTOMER_GUID, for the GUID-field of the customer which is unique (therefore the "UI" - unique index).
On my current project, I have two things in source control - a full dump of an empty database (using pg_dump -c so it has all the ddl to create tables and indexes) and a script that determines what version of the database you have, and applies alters/drops/adds to bring it up to the current version. The former is run when we're installing on a new site, and also when QA is starting a new round of testing, and the latter is run at every upgrade. When you make database changes, you're required to update both of those files.
Using a grails app the indexes are stored in source control by default since you are defining the index definition inside of a file that represents your domain object. Just offering the 'Grails' perspective as an FYI.