When I started to code my sinatra application I never used it before. Note that I had and still have no experience with RoR. I had one .rb file and one .haml and was happy. Now I had to split .rb file into about 10 'library' files as the whole application gets more and more complex.
I store some application logs/info in csv files and now I am getting conflicts when accessing the csv file. So I think that I need to introduce "proper" database solution. I want it to be part of my ruby (sinatra) application.
How can I introduce 'light' sql database into my sintra application?
I am on ruby 1.8.7 (2010-08-16 patchlevel 302) [i386-mingw32] soon upgrading to 1.9 (hopefully)
I'd recommend looking at Sequel. It's very flexible and powerful, and works well with SQLite, MySQL, Postgres, Oracle and other DBMs. It's not opinionated about how you talk to the database; You can use it as an ORM or with simple datasets, and allows embedded SQL or more programmatic approaches.
For ORM, both ActiveRecord and Sequel are recommended. About database, I guess sqlite3 will be good enough for your need. Also you can choose mysql or pg.
If you want to use active_record, you'll find this article very useful.
And if Sequel is the choice, just read Sequel documents here.
After the gems installed. You can start writing some code to connect the db. Then maybe some migration task to build database tables (and don't forget build some corresponding models). Both gem have similar syntax for migrations. After that, import your csv data and well done.
I have had no trouble using either Active Record or DataMapper to add object persistence to my Sinatra apps. People also tell me Sequel is very good but philosophically it is not not worlds apart from Active Record imho.
Active Record and Sequel favour a more database-centric approach, whereby you spell out your tables as a set of database and table definitions in a collection of migration files and merge them into a schema which is then used to build or update your database tables. If you really care about the underlying SQL database then one of these is for you. I find them to be six of one, a half-dozen the other.
DataMapper is more object-centric and lets you define the properties and object relationships you need in your object's own class definition; and then when your app launches you make sure you call DataMapper.auto_upgrade! and it upgrades the database to suit your object graph. The upside is that you only have one place to look to find what properties your object might have. The downside is you have less control over the specifics of the underlying databases, though it's not impossible to tightly define the mappings, DataMapper works well when you care about object graphs over database tables.
The good news is they pretty much all work in the same way once you have your mappings from object graph to SQL database tables defined. All support lazy or pre-emptive loading of related collections of objects, many_to_many relationships, polymorphism, etc, and tend to vary only in configuration and seeding details.
I often start projects using DataMapper just for its speed of throwing up and tearing down database schemas, as the app's object graph is still in flux; I refactor quickly to use Active Record when the schema has settled down. Next project I think I'll give Sequel a go though, as people do seem to rave about it.
I have had success using datamapper with Sinatra, put like the other post you can also use Sequel and Active Record. One advantage to maybe using Active Record though is if you do ever want to use/learn ROR, Active Record is the default ORM so that might be something that you want to consider.
If you don't want to go the ORM route you can always use the SQL-Ruby gem which will allow you to create and run sql queries. Here is some example code from the website http://sqlite-ruby.rubyforge.org/
require 'sqlite'
db = SQLite::Database.new( "data.db" )
db.execute( "select * from table" ) do |row|
p row
end
db.close
Related
Long ago I have learned sql and within the last years of application development I realized I only rarely really play with a real sql console or sql commands at all, especially since I mainly work with rails applications for a while now.
But right now I am working on getting a few Microsoft certifications, so for that I did end up relearning sql from scratch. And by that so many things come to mind that - I have to admit - I have forgotten over all the years. Yes, from a developers view, sql is important, but somehow I didn't need much of it... like stored procedures, functions, triggers, etc...
While I already found a nice blog from Nasir about using Views in Rails,
I am still wondering if I can use
functions
stored procedures
triggers
in Rails.
Triggers: Of course I wouldn't need do define triggers within a rails application. I would create them directly on the database management console. I'd just have to remember the things that are 'automatically' done. I would like to use those for logging purposes or pre-calculations of quick-access-tables ...
Functions: They should be easy to use I would think. Is it possible to add them via the 'select'-method of ActiveRecord?
Stored Procedures: How would one use those from Rails, i mean they could be valuable if you have several complex queries with multiple joins and calculation-based dependencies. I wonder a) how to call one and b) how to receive the results
Well, if you have more insight to the inner workings of Rails in relation to sql and could point out if these native sql-elements are available for/from a Rails-application it would be lovely if you could point at some Howto's, Tutorials.
Another thing I am wondering about is the use of foreign keys. Rails doesn't use them explicitly on the sql-side... would it be useful/helpful to manually add them to the database relations? Or would they hinder Rails' data access?
Thanks for any response, I am eager to find out what I can do between Rails and Sql to combine them in a maybe more efficient way.
As you will have noticed, and mention yourself: rails does a good job of hiding much of the sql/implementation details.
Still, I believe it is very important to use your sql and database wisely.
Validations
Validations should be defined on your database as much as possible, not only in rails. You define it in Rails to give nice user-feedback if needed. But ultimately you do not know how data gets into the database: some operator can use sql, maybe other programs interface with the database, or more frequent: two rails processes can insert data nearly simultaneously.
Foreign keys
Should most definitely be defined on your database. For rails is not needed, it will write the queries correctly, but this will guard your database against wrong data. This will safeguard your data integrity. If someones deletes a record and another record is still pointing to that, your database will complain.
Indexes
This is even more easily overlooked: create indexes! On your primary key (automatically), on much searched on fields (like name), on foreign key fields !!
Complicated queries
As much as rails helps you when retrieving items, for some queries it is much more efficient to write the query yourself. While I will avoid it as long as possible, find_by_sql is a powerful tool.
And rails is extremely powerful/helpful in treating the result of a find_by_sql as a normal result.
Stored procedures, functions, ...
Normally you do not need them when using rails. But there are some very valid cases where they are very useful. For instance, I have created a geographic information system, where we used stored procedures to create various spatial objects. In rails you can directly execute sql using the
YourModel.connection.execute(' .. your sql here .. ')
So even execute stored procedures. It will not be for everyone, but there are some very valid reasons to move work to the database. For example if you have to perform an operation over a whole lot of tables or rows, it could be very efficient to call a stored procedure instead of retrieving all the data, changing the rows, and saving them back. It depends on your problem at hand.
Conclusion
I want to make absolutely clear that Rails nicely abstracts the database away, and for everyday use this is just great. You should define foreign keys, indexes and constraints on your database. For the more advanced stuff like functions, stored procedures, complicated queries: Rails does not stop you from doing anything complicated if needed. One should consider your database as a tool as much as Rails is. But remember:
Premature optimization is the root of all evil. --Donald Knuth
So the options are available, but only use them if it is really necessary.
First of all, I don't particularly share your sentiments with regards to the following completely
Yes, from a developers view, sql is important, but somehow I didn't
need much of it... like stored procedures, functions, triggers, etc...
SQL is important if you're developing all the time for extreme "scalability" — and do note I'm using the term here very loosely as Rails does scale if you know how to go about it, but with limits as stories such as Twitter's switch from Rails etc. are quite ubiquitous.
However, Rails isn't always about extreme performance. It is first and foremost about developer 'happiness', increased productivity, and reduced time-to-market. This is done quite well, especially in Rails 3.x+ with the high abstraction introduced and it is this very abstraction that ultimately has resulted in Rails being considered as slow, and by most standards it definitely is.
Do note, that I do not intend to cause any 'language' flame-wars here but this is my personal take on the purpose of Rails; I like to call a duck a duck myself (much like two of my favourite languages: python & ruby).
The simple answer to your questions though can be realised as follows:
Triggers: AR callbacks
Functions: AR Query DSL, i.e. .where('name = ?', :full_name)
Stored Procedure: tied into model business logic, typically based on AR callback.
Another thing I am wondering about is the use of foreign keys. Rails
doesn't use them explicitly on the sql-side... would it be
useful/helpful to manually add them to the database relations?
Au contraire mon ami! All AR relations require that the various foreign keys are included in whatever appropriate migrations one includes. It's a common misconception that 'Rails will walk your dog and clean after it as well'. Remember that AR queries are ultimately expressed on your backend solution, in this case an SQL one for the sake of this topic, in SQL (alternatives are SQlite/PostgreSQL etc...).
However*, rails does not make use of FK constraints because you can actually make an _id field point to a non-existant record; they aren't really foreign keys in the strictest sense.
Note: the * indicates that credit for this statement should go to my friend David Workman (workmad3) and revision to the last paragraph of the above.
A colleague of mine is currently designing SQL queries like the one below to produce reports, which are displayed in excel files through an external data query.
At present, only reporting processes on the DB are required (no CRUD operations).
I am trying to convince him that it would be better to use a ruby ORM in order to be able to display the data in a rails/sinatra app.
Despite the obvious advantages in displaying the data, what advantages are there for him in learning to use an ORM like Sequel or Datamapper?
The SQL queries he is writing are clearly quite complex, and being relatively new to SQL, he often complains that it is very time-consuming and confusing.
Is it possible to write extremely complex queries with an ORM? and if so, which is the most suitable(I have heard Sequel is good for legacy dbs)? and what are the advantages of learning ruby and using an ORM versus sticking with plain SQL, in making complex database queries?
I'm the DataMapper maintainer, and I think for complex reporting you should use SQL.
While I do think someday we'll have a DSL that provides the power and conciseness of SQL, everything I've seen so far requires you to write more Ruby code than SQL for complex queries. I would much rather maintain a 5 line SQL query than 10-15 lines of Ruby code to describe the same complex operation.
Please note I say complex.. if you have something simple, use the ORM's build-in finders. However, I do believe there is a line you can cross where SQL becomes simpler. Now, most apps aren't just reporting. You may have alot of CRUD type operations, for which an ORM is perfectly suited and far better than doing those things by hand.
One thing that an ORM will usually provide is some sort of organization to your application logic. You can group code based around each model in the same file. It's usually there that I'll put the complex SQL query, rather than embedding it in the controller, eg:
class User
include DataMapper::Resource
property :id, Serial
property :name, String, :length => 1..100, :required => true
property :age, Integer, :min => 1, :max => 130
def self.some_complex_query
repository.adapter.select <<-SQL
SELECT ...
FROM ...
WHERE ...
... more complex stuff here ...
SQL
end
end
Then I can just generate the report using User.some_complex_query. You could also push the SQL query into a view if you wanted to further cleanup this code.
EDIT: By "view" in the above sentence I meant RDBMS view, rather than view in the MVC context. Just wanted to clear up any potential confusion.
If you are writing your queries by hand you have the chance to optimize them. When I look at that query I see some potential for optimizations (E.ICGROUPNAME LIKE '%san-fransisco%' or E.ICGROUPNAME LIKE '%bordeaux%' wont use an index = Table Scan).
When using an OR Mapper (the native Objects/Tables) for reporting you have no or little control over the resulting SQL Query.
But: You could put that query in an View or Stored Procedure and map that View/Proc with an OR Mapper. You can optimize your queries and you can use all features of your Application Framework.
Unless you're dealing with objects, an ORM is not necessary. It sounds like your friend simply needs to generate reports, in which case pure SQL is just fine so long as he knows what he's doing (e.g. avoiding SQL injection issues).
ORM stands for "Object-Relational Mapping". If you don't have the "O" (objects), then it's probably not a good fit for your app. Where ORMs really shine is in persisting objects to the database and loading them from a database.
ORM stands for Object Relational Mapping - but looking at the query your friend seems to be wanting a pretty specific table of sums and other items... I've not used Ruby's Sequel, but I've used Hibernate, and Python's SQLAlchemy (for Django/Turbogears) and while you can do these sorts of queries, I don't believe that is their strength.
The power of ORM comes from being able to finding Foo->Bar object relationships, say you want all the Bar objects for Foo's field greater then X... That sort of thing. Therefore I would not classify an ORM as a "good" solution, though moving to a real programming language like Ruby and doing the SQL through it instead of Excel... that in itself is a win.
Just my 2 cents.
In a situation like that, I'd probably write them by hand or use a View (if the DB you're using supports views)
ORM's are used when you have Objects (Business Objects). I am therefore assuming that you have an application with which you creating and Managing the Business Objects that are ultimately saved into the database. If you have then you have almost definitely got some representation of the relationships and probably many of the calculations you are going to use in reports. The problem with using SQL to directly access your database for reports is simply maintainability.
You typically put a lot of effort into ensuring that your Business Objects hide any details of their database. You implement business rules and do common calculations in your Business Objects. Build a common language for all members of the team etc etc. You then use an ORM to map to the database and use Habanero or NHibernate or something like that to do this. This is all great. We do this all in the name of Maintainability and is great. You can migrate your application change your design etc etc.
You now go and write SQL to run reports over time you have hundreds of report. Firstly they often duplicate logic you already have in your BusinessObjects (Usually without any tests) and even worse Bham Damb sorry maintainability is now stuffed forget about moving a that field from one table to another forget about splitting that table into two changing that relationship etc you have a number of reports that are going to break unexpectedly.
The problem with quering through your Domain Objects/Business Objects is simply one of performance.
In summary if you are using Domain Driven Design or Business Object concepts try to use these for reports. (You will probably run directly from DB using SQL or stored procs for performance reasons but try limit these use your Business Objects first and then use SQL).
The other option of course is using a separate reporting database (Like some of the BI concepts) The mapping from your transactional DB to your reporting DB is therefore in one place and easily changeable in cases where you want to change your design.
Domain Objects (Business Objects) and ORMs have all the knowledge to allow you to start building high performing queries that run directly on the Database while using the Domain Terminology. Lets hope that these continue to evolve to a point where this is a reality.
Until then if you are using Business Objects in your application try use them for Reporting when performance is an issue resort to SQL.
Migrations are undoubtedly better than just firing up phpMyAdmin and changing the schema willy-nilly (as I did during my php days), but after using them for awhile, I think they're fatally flawed.
Version control is a solved problem. The main function of migrations is to keep a history of changes to your database. But storing a different file for each change is a clumsy way to track them. You don't create a new version of post.rb (or a file representing the delta) when you want to add a new virtual attribute -- why should you create a new migration when you want to add a new non-virtual attribute?
Put another way, just as you check post.rb into version control, why not check schema.rb into version control and make the changes to the file directly?
This is functionally the same as keeping a file for each delta, but it's much easier to work with. My mental model is "I want table X to have such and such columns (or really, I want model X to have such and such properties)" -- why should you have to infer from this how to get there from the existing schema; just open up schema.rb and give table X the right columns!
But even the idea that classes wrap tables is an implementation detail! Why can't I just open up post.rb and say:
Class Post
t.string :title
t.text :body
end
If you went with a model like this, you'd have to make a decision about what to do with existing data. But even then, migrations are overkill -- when you migrate data, you're going to lose fidelity when you use a migration's down method.
Anyway, my question is, even if you can't think of a better way, aren't migrations kind of gross?
why not check schema.rb into version control and make the changes to the file directly?
Because the database itself is not in sync with version control.
For instance, you could be using the head of the source tree. But you're connecting to a database that was defined as some past version, not the version you have checked out. The migrations allow you to upgrade or downgrade the database schema from any version and to any version, incrementally.
But to answer your last question, yes, migrations are kind of gross. They implement a redundant revision control system on top of another revision control system. However, neither of these revision control systems is really in sync with the database.
Just to paraphrase what others have said: migrations allow you to protect the data as your schema evolves. The notion of maintaining a single schema.rb file is attractive only until your app goes into production. Thereafter, you'll need a way to migrate your existing users' data as your schema changes.
There are also data-related issues that are important to consider, which migrations solve.
Say an old version of my schema has a feet and inches column. For efficiency purposes, I want to combine that into just an inches column to make sorting and searching easier.
My migration can combine all of the feet and inches data into the inches column (feet * 12 + inches) while it's updating the database (i.e. just before it removes the feet column)
Obviously this being in a migration makes it automatically work when you later apply the changes to your production database.
As it stands, they're annoying and inadequate but quite possibly the best option we have available to us at present. Quite a few smart people have spent quite a lot of time working on the problem and this, so far, is about the best they've been able to come up with. After about 20 years of mostly hand-coding database version updates, I came very rapidly to appreciate migrations as a major improvement when I found ActiveRecord.
As you say, version control is a solved problem. Up to a point I'd agree: it's very solved for text files in particular, less so for other file types and not really very much at all for resources such as databases.
How do migrations look if you view them as version control deltas for databases? They're the sum of the deltas you have to apply to get a schema from one version to another. I'm not aware that even git, for all its super-powerfulness, can take two schema files and generate the necessary DDL to do that.
As far as declaring table content in the model, I believe that's what DataMapper does (no personal experience). I think there may be some DDL inference capabilities there as well.
"even if you can't think of a better way, aren't migrations kind of gross?"
Yes. But they're less gross than anything else we have. Do please let us know when you've completed the non-gross alternative.
I suppose given "even if you can't think of a better way", then yes, in the grand scheme of things, migrations are kind of gross. So are Ruby, Rails, ORMs, SQL, web apps, ...
Migrations have the (not insignificant) advantage that they exist. Gross-but-exists tends to win out over Pleasant-but-nonexistent. I'm sure there probably are pleasant and nonexistent ways to migrate your data, but I'm not sure what that means. :-)
OK, I'm going to take a wild guess here and say that you're probably working all by yourself. In a group development project the power of each individual to take responsibility for just his/her changes to the database required for the code that developer is writing is much much more important.
The alternative is that larger groups of programmers (e.g. 10-15 Java developers where I work) end up relying on a couple of dedicated full time database administrators to do that along with their other maintenance, optimization, etc. duties.
Assume Hibernate for the ORM.
I'm not sure how to ask this. I want to build an application that can replace part of another. For example, say I have an application with various modules, called the "big" app. This application may handle HR, financial, purchases, skill sets, etc. But maybe, for whatever reason, I don't like the skill set module, but I like the rest of the application. I want to build an app that uses the same database that the rest of the "big" app uses but use my software as the front end for that piece.
I could build my app and have it hit the database directly with no ORM. My question is is there an advantage to using an ORM here. I'm thinking there is because if the "big" app goes away and another app is purchased, we could continue to use my version of skill set because I am using hibernate instead of hitting things directly. I'm still learning but I thought that my application used objects that I named and that in the case I just described I'd have to change my mapping files only or/and my code very little.
Here is another example. I have a legacy application and legacy database. It uses database X. I decide that I no longer like the old terminal emulator application that is used to get the data and that I want a graphical version. I can use hibernate with my application and when I finally decide to get rid of the legacy database and change to the latest Oracle or SQL Server, I can do so with minimal headache? Or is my database going to change so much that it wouldn't have matter anyway (I'm suggesting that upon changing to a new database more information will want to be captured)?
I was hoping for comments, if I am misunderstanding why hibernate/ORM might or might not be a benefit.
Thank you.
I do not think you will have a huge benefit frmo hibernate if the database schema changes to something completely different, you might have to change more than just your mapping - especially if more "structure" is added to the database (tables, column and such schema things). That said, if the database was structured mostly the same way, but lets say just the column names and tables names changes and a couple of tables are merged or something like that - you can get by with just changing your mapping.
But I would really recommend using hbernate for database agnosticity, that's is a pretty easy path.
AND then just because it doesn't exactly helps you if your entire database is changed, it such an incredible amount of other forces, that I would choose that over direct DB access most of the time.
Lastly you could think about using a service-layer such as the repository pattern that abstracs away the data access, so the business of your appilcation wouldn't need to change if the database changes.
Switching from one DBMS to another (ala Oracle to SQL Server) is one thing that using an ORM would certainly make much easier.
As for switching from one "big app" to another "big app", I doubt if using an ORM would help that much. It's likely that the database structure and business logic would be different enough that you would find yourself rewriting lots of code anyways.
You can generate domain objects with Hibernate Tools, if you do that than it will be painless and fast. however if you write all the objects by hand you will die. i think its good idea to rewrite part of the app and get to know hibernate better.
I think it's generally a bad idea to make any decision based on the
unknowns versus the knowns. Whether you're deciding on a data
access/persistence strategy, what car to buy, or what college to go
to, you should put the most weight on the things you know you want
today, rather than worrying about what may or may not happen tomorrow.
So when considering ORMs, I wouldn't worry too much about things such as apps
"going away" or DBMSs changing (unless that's either already been talked about, or
there's a history of this in your company). I'm not saying that these aren't things that will never happen, but rather that they should take a back seat to the generally much more important considerations of maintainability, performance, and developer productivity.
So in short, choose an ORM based on its ability to solve the problems and satisfy the requirements that you have today.
And how do you keep them in synch between test and production environments?
When it comes to indexes on database tables, my philosophy is that they are an integral part of writing any code that queries the database. You can't introduce new queries or change a query without analyzing the impact to the indexes.
So I do my best to keep my indexes in synch betweeen all of my environments, but to be honest, I'm not doing very well at automating this. It's a sort of haphazard, manual process.
I periodocally review index stats and delete unnecessary indexes. I usually do this by creating a delete script that I then copy back to the other environments.
But here and there indexes get created and deleted outside of the normal process and it's really tough to see where the differences are.
I've found one thing that really helps is to go with simple, numeric index names, like
idx_t_01
idx_t_02
where t is a short abbreviation for a table. I find index maintenance impossible when I try to get clever with all the columns involved, like,
idx_c1_c2_c5_c9_c3_c11_5
It's too hard to differentiate indexes like that.
Does anybody have a really good way to integrate index maintenance into source control and the development lifecycle?
Indexes are a part of the database schema and hence should be source controlled along with everything else. Nobody should go around creating indexes on production without going through the normal QA and release process- particularly performance testing.
There have been numerous other threads on schema versioning.
The full schema for your database should be in source control right beside your code. When I say "full schema" I mean table definitions, queries, stored procedures, indexes, the whole lot.
When doing a fresh installation, then you do:
- check out version X of the product.
- from the "database" directory of your checkout, run the database script(s) to create your database.
- use the codebase from your checkout to interact with the database.
When you're developing, every developer should be working against their own private database instance. When they make schema changes they checkin a new set of schema definition files that work against their revised codebase.
With this approach you never have codebase-database sync issues.
Yes, any DML or DDL changes are scripted and checked in to source control, mostly thru activerecord migrations in rails. I hate to continually toot rails' horn, but in many years of building DB-based systems I find the migration route to be so much better than any home-grown system I've used or built.
However, I do name all my indexes (don't let the DBMS come up with whatever crazy name it picks). Don't prefix them, that's silly (because you have type metadata in sysobjects, or in whatever db you have), but I do include the table name and columns, e.g. tablename_col1_col2.
That way if I'm browsing sysobjects I can easily see the indexes for a particular table (also it's a force of habit, wayyyy back in the day on some dBMS I used, index names were unique across the whole DB, so the only way to ensure that is to use unique names).
I think there are two issues here: the index naming convention, and adding database changes to your source control/lifecycle. I'll tackle the latter issue.
I've been a Java programmer for a long time now, but have recently been introduced to a system that uses Ruby on Rails for database access for part of the system. One thing that I like about RoR is the notion of "migrations". Basically, you have a directory full of files that look like 001_add_foo_table.rb, 002_add_bar_table.rb, 003_add_blah_column_to_foo.rb, etc. These Ruby source files extend a parent class, overriding methods called "up" and "down". The "up" method contains the set of database changes that need to be made to bring the previous version of the database schema to the current version. Similarly, the "down" method reverts the change back to the previous version. When you want to set the schema for a specific version, the Rails migration scripts check the database to see what the current version is, then finds the .rb files that get you from there up (or down) to the desired revision.
To make this part of your development process, you can check these into source control, and season to taste.
There's nothing specific or special about Rails here, just that it's the first time I've seen this technique widely used. You can probably use pairs of SQL DDL files, too, like 001_UP_add_foo_table.sql and 001_DOWN_remove_foo_table.sql. The rest is a small matter of shell scripting, an exercise left to the reader.
I always source-control SQL (DDL, DML, etc). Its code like any other. Its good practice.
I am not sure indexes should be the same across different environments since they have different data sizes. Unless your test and production environments have the same exact data, the indexes would be different.
As to whether they belong in source control, am not really sure.
I do not put my indexes in source control but the creation script of the indexes. ;-)
Index-naming:
IX_CUSTOMER_NAME for the field "name" in the table "customer"
PK_CUSTOMER_ID for the primary key,
UI_CUSTOMER_GUID, for the GUID-field of the customer which is unique (therefore the "UI" - unique index).
On my current project, I have two things in source control - a full dump of an empty database (using pg_dump -c so it has all the ddl to create tables and indexes) and a script that determines what version of the database you have, and applies alters/drops/adds to bring it up to the current version. The former is run when we're installing on a new site, and also when QA is starting a new round of testing, and the latter is run at every upgrade. When you make database changes, you're required to update both of those files.
Using a grails app the indexes are stored in source control by default since you are defining the index definition inside of a file that represents your domain object. Just offering the 'Grails' perspective as an FYI.