We're making an application using Ruby on Rails with Mongoid.
I've been reading up on the documentation of mongoDB and the Mongoid gemregarding how to design the database schema when utilizing a document-based database. I've been going over it in my head for a while now, and would really appreciate some input from people with more experience than me to know whether or not I'm completely lost :p
Anywho, here's an overview (I've tried to keep it as simple as possible to keep the question facutal) of the appliction we're making:
The application consists of the following entities:
Users, Subjects, Skills, Tasks, Hints and Tutorials.
They are organized in the following manner:
Subjects consists of a set of 1..n Skills.
Skills consists of a set of Tasks, (sub-)Skills or both (i.e. skills can be a tree
structure, where one main skill (say, Geometry) is the root and other skills are
child nodes (for instance, the Pythagorean Rule might be a sub-skill)). However,
all skills, regardless of whether they have sub-skills or not, should consist of
0..n tasks.
Tasks have a set of 1..n Hints associated with them.
Hints are each associated with a particular task.
Tutorials are associated with 1..n Skills (this skill can be either a root
node or a leaf node in a skill tree).
Users can complete 0..n Tasks in order to complete 0..n Skills.
Now, we imagine that there will mostly be read queries called to the database for the collection of skills/tasks completed by certain users, and read queries to display the various skill trees associated with a subject. The main write queries will probably be related to relationship between various users and tasks, in the following form
User A completes Task B
and so forth. Also, we imagine that the size of the number of entitites would be as follows: Users > Hints > Tasks > Skills > Tutorials > Subjects
Currently, this is the solution we have in mind:
Subject.rb
has_and_belongs_to_many :skills
Skill.rb (uses Mongoid::Tree)
has_and_belongs_to_many :subjects
embeds_many :tasks
Task.rb
embedded_in :skill, :inverse_of => :tasks
embeds_many :hints
Hint.rb
embedded_in :task, :inverse_of => :hints
We haven't really started implementing the tutorials and the connection between user and skills/tasks yet, but we imagine that the relationship between user and skill/tasks necessarily has to be N:N (which I guess is rather inefficient).
Is using a document-based database for this kind of application a bad idea? And if not, how can we improve our schema to make it as efficient as possible?
Cheers, sorry for the wall of text :-)
In my honest opinion unless some part of your data is going to be completely unstructured I see no need to use a NoSQL solution for your problem.
I'm going out from a point of view that you do have databases knowledge so you are familiar with MySQL/PostgreSQL/etc.
I sincerely believe that PostgreSQL (or Mysql for the matter) will be easier for you to set up, to write your code, to maintain and perhaps eventually to scale up.
My take on NoSQL is use it when you have unstructured datasets and you need the flexibility of adding fields on the fly, there are pitfalls using MongoDB.
It's not a silver bullet.
Some of the pitfalls are for instance, in()s are slow, if you write something to mongo and then want to immediately read it, you have to expect you won't get it (sharding in mongo), map reduce is kind of a hassle (I haven't tried the aggregate framework with Moped but it does look promising, sometimes indexing can have issues with sharded collections).
Related
I'm trying to figure out most of the database design and normalization before I do much for my current project. Unfortunately I don't have much experience with database design, so it's a fairly slow process. One of the issues I'm trying to figure out is what's the best way to deal with a situation where one table may, or may not, be associated with another table.
A little background will help clear the question: I'm building a web application, using Rails 3.2, that helps manage races. People will be able to create accounts (/user accounts), host races, and manage the various aspects.
One thing is that the participants in a given race may or may not be users. In fact, we can assume that most of them will not be users. But for those who are, it would be nice to be able to link to their profiles (and, going the other way, link from their profiles to the races they've participated in).
It's sort of like blog posts where people can post anonymous comments, but if they do decide to log in and use their account then it's linked with the post in various ways.
I've searched for a while, but haven't really found solutions. I figure the way to do it is to have the Participants model note "has_one UserParticipation", which would usually be nil.
Is that a valid solution?
Is there a better way to go about this?
Here's a small diagram I threw together in Paint to concisely show the issue:
Question 2:
This is a little less important, but I figured I'd ask it in the same question because I've already posted the relevant question: several things will reference participants, is there any reason to set up a composite {Race_ID, Participant_Number} super key rather than always reference it using "race.participants"? (As far as I can tell, these would work very similarly.)
You may be over thinking it a bit. If I am following you correctly, this is a simple entity relationship diagram I whipped up in Dia:
Some explanation on the assoications of a User to Participants:
A Participant will have the belongs_to :user association, which is nil if there is no associated User.
A User will have the has_many :participants association, allowing none to many Participants relations. If there are none, a user instance will have user.participants equal an empty array.
As to the second question, you would only need to use both keys if you are querying for a specific participant for specific race, e.g. where participant_id = 7 and race_id = 4.
So a race has many participants (some of whom are users), and a participant has many races (hopefully :-).
Taking the user part of things out of the picture for a moment, this is a simple many-to-many relationship which Rails handles beautifully with has_and_belongs_to_many on both Race and Participant models, described here http://guides.rubyonrails.org/association_basics.html#the-has_and_belongs_to_many-association. Another alternative, not necessary in your case is has_many :through which creates a first-class model backed by the join table. But what you have described makes this unnecessary.
The relationship between User and participant is one-to-one, and conditional. It's not clear to me if you can be a user without being a participant but if you have a User who is a Participant, you want them related. This is a :has_one relation.
The cool part of Rails that I'll bet you're looking for is that relationships can be conditional, so in this case a Participant has_one User conditionally. The linked Rails Guide document describes how to define all of this.
Issue: Is there a better way to model the following or create a basic recommendation system than the database diagram below? If an extremely lengthy answer is necessary, you could instead just point me in the right direction and suggest things to research further.
I'm building a rudimentary event recommendation system by allowing users to answer questions and storing their user-answer relationship in the responses model. Each question's answer relates to a tag. Each event is also tagged. Thus, I should be able to provide users with recommended events via selecting matching tags and using this 5 table relationship. However, it seems like I would have to go through several has_many :through relationships in order to accomplish this, which I don't believe is preferred using Rails.
Would it be better to instead create a relationship from users to events via a background rake task or something, computing the relationship after questions are answered? Am I missing the concepts completely here and looking at this from the wrong angle? Eventually this system would be replaced with a more robust algorithm, perhaps using Mahout or something, but for now I'm just trying to get a simple proof of concept working.
Here's a link to the database diagram: Database Diagram
I'm working on a contact database in rails 3..
One thing thats really frustrating is how ugly the family relationship code is..
Is there a clean way of doing this in rails?
Basically all contacts are of the contact class (go figure!)
And contacts have many family_relationships (another model)
and many relatives through family_relationships.. The family relationship model also has one family relationship type (another model)
So far i've implemented this using the methods here http://railscasts.com/episodes/163-self-referential-association (using inverse relationships etc..)
But this just doesnt feel very clean.. and if i want to get all the contacts relatives, relationships etc.. i have to drop to raw SQL or join the arrays..
Is there a better (or definitive) way that this kinda thing is done in rails?
The Ancestry gem seems like it solves exactly this kind of problem:
Ancestry is a gem/plugin that allows the records of a Ruby on Rails ActiveRecord model to be organised as a tree structure (or hierarchy). It uses a single, intuitively formatted database column, using a variation on the materialised path pattern. It exposes all the standard tree structure relations (ancestors, parent, root, children, siblings, descendants) and all of them can be fetched in a single sql query. Additional features are STI support, scopes, depth caching, depth constraints, easy migration from older plugins/gems, integrity checking, integrity restoration, arrangement of (sub)tree into hashes and different strategies for dealing with orphaned records.
I'm looking for the better way (aka architecture) to have different kind of DBs ( MySQL + MongoDB ) backending the same Rails app.
I was speculating on a main Rails 3.1 app, mounting Rails 3.1 engines linking each a different kind of DB ...
... or having a main Rails 3.0.x app routing a sinatra endpoint for each MySQL/MongoDB istance ...
Do you think it's possible ..., any idea or suggestions ?
I notice some other similar questions here, but I think that "mounting apps" is moving fast in Rails 3.1 / Rack / Sinatra and we all need to adjust our paradigms.
Thanks in advance
Luca G. Soave
There's no need to completely over-complicate things by running two apps just to have two types of database. It sounds like you need DataMapper. It'll do exactly what you need out of the box. Get the dm-rails gem to integrate it with Rails.
In DataMapper, unlike ActiveRecord, you have to provide all the details about your underlying data store: what fields it has, how they map the attributes in your models, what the table names are (if in a database), what backend it uses etc etc.
Read the documentation... there's a bucket-load of code to give you an idea.
Each model is just a plain old Ruby object. The class definition just mixes in DataMapper::Resource, which gives you access to all of the DataMapper functionality:
class User
include DataMapper::Resource
property :id, Serial
property :username, String
property :password_hash, String
property :created_at, DateTime
end
You have a lot of control however. For example, I can specify that this model is not store in my default data store (repository) and that it's stored in one of the other configured data stores (which can be a NoSQL store, if you like).
class User
include DataMapper::Resource
storage_names[:some_other_repo] = 'whatever'
# ... SNIP ...
end
Mostly DM behaves like ActiveRecord on steroids. You get all the basics, like finding records (except you never have to use the original field names if your model abstracts them away):
new_users = User.all(:created_at.gte => 1.week.ago)
You get validations, you get observers, you get aggregate handling... then get a bunch of other stuff, like strategic eager-loading (solves the n+1 query problem), lazy loading of large text/blob fields, multiple repository support. The query logic is much nicer than AR, in my opinion. Just have a read of the docs. They're human-friendly. Not just an API reference.
What's the downside? Well, many gems don't take into account that you might not be using ActiveRecord, so there's a bit more searching to do when you need a gem for something. This will get better over time though, since before Rails 3.x seamlessly integrating DM with Rails wasn't so easy.
I dont fully understand your question., like
what is the problem you are facing right now using mongo and MySQL in same app, and
whats the reason for going multiple rails app dealing with different dbs.
Though am not an expert in ruby & rails(picked up few months ago), i like to add something here.
I am currently building the rails app utilizing both mongo and MySQL in the back end. Mongoid & ActiveRecord are the drivers. MySql for transactions and mongo for all other kind of data (geo spatial mainly). Its just straight forward. You can create different models inheriting from mongoid and activerecord.
class Item
include Mongoid::Document
field :name, :type => String
field :category, :type => String
end
and
class User < ActiveRecord::Base
end
And you can query both the way same way (except complex sql joins, also mongoid has some addition querying patterns for Geo spatial kind of queries)
Item.where(:category => 'car').skip(0).limit(10)
User.where(:name => 'ram')
Its a breeze. But there are some important points you need to know
Create your Active record models before the mongoid models. Once mongoid is activated (on rails g mongoid:config - mongoid.yml added) all the scaffolding , and generations works toward mongo db. Otherwise every time you need to delete the mongoid.yml before creating the Activerecord models
And don't use mongoid in a relational way. i know mongoid provides lot of options to define realtions. Like Belongs_to relations stores the refernece id's in child documents. Its quite opposite to the mongo DbRef. Its greatly confusing when leaving the mongo idioms for the favour of active record feel. So try to stick with the document nature of it. Use embed and DbRef whenever necessary. (may be someone corrcet me if am wrong)
Still Mongoid is a great work. Its fully loaded with features.
I have a couple specific needs for my search and I'm interested to get people's opinions on what search approach makes the most sense. Based on my explanation below, would you recommend that I use basic sql queries? Or step up to a more advanced search solution, like Sphinx?
I have two models that I want to search in: products and varieties.
product has_many :varieties
variety belongs_to :product
I need my search to recognize the relationship between products and varieties. However, varieties do not have their own existence on the site. So, when a user searches for a variety that's in the system, I need the search to return the corresponding product page on which the variety resides.
For example, let's say that the product is ball and the variety is bouncy. If a user searches for 'bouncy', I want the search to return the ball/show view.
The other tweak involves the results. If there's only one result for a given search, I want to render the product/show page. However, if there are multiple results, I want to render the product/index page, displaying the multiple results. My dataset is a pretty limited universe, so I think it's going to be fairly common that we have only one result.
Those are my requirements. Can I satisfy these requirements with standard sql queries and conditions? Or would you recommend a more advanced search approach?
Thanks!
Either solution will satisfy your requirement, but you can satisfy it with standard SQL queries only if your dataset is small. In that case, a DB index on the searched queries is important. You could take a look at scoped_search which I've used for small projects and gets the job done.
If you have a big dataset and plain SQL queries slow you down, sphinx (and thinking-sphinx) is the way to go. The only disadvantage of this approach is having to monitor and maintain another daemon, although it is very stable and lightweight. This solution is also very easy to implement, and there's a good community around thinking-sphinx.
Lastly, you may consider your database's full text search capabilities. If you are using PostgreSQL, tsearch is a great solution because it is very fast and built into your database process. There are a couple of Rails plugins for interacting with it: acts-as-tsearch and tsearchable. Try them out and see which one feels better to you.