Retrieving data across 5 tables in has_many :through

Retrieving data across 5 tables in has_many :through - ruby-on-rails-3

Issue: Is there a better way to model the following or create a basic recommendation system than the database diagram below? If an extremely lengthy answer is necessary, you could instead just point me in the right direction and suggest things to research further.
I'm building a rudimentary event recommendation system by allowing users to answer questions and storing their user-answer relationship in the responses model. Each question's answer relates to a tag. Each event is also tagged. Thus, I should be able to provide users with recommended events via selecting matching tags and using this 5 table relationship. However, it seems like I would have to go through several has_many :through relationships in order to accomplish this, which I don't believe is preferred using Rails.
Would it be better to instead create a relationship from users to events via a background rake task or something, computing the relationship after questions are answered? Am I missing the concepts completely here and looking at this from the wrong angle? Eventually this system would be replaced with a more robust algorithm, perhaps using Mahout or something, but for now I'm just trying to get a simple proof of concept working.
Here's a link to the database diagram: Database Diagram

Related

Embeddable vs one to many

I have seen an article in Dzone regarding Post and Post Details (two different entities) and the relations between them. There the post and its details are in different tables. But as I see it, Post Detail is an embeddable part because it cannot be used without the "parent" Post. So what is the logic to separate it in another table?
Please give me a more clear explanation when to use which one?

Embeddable classes represent the state of their parent classes. So to take your example, a StackOverflow POST has an ID which is invariant and used in an unbreakable URL for sharing e.g. http://stackoverflow.com/q/44017535/146325. There are a series of other attributes (state, votes, etc) which are scalar properties. When the post gets edited we have various versions of the text (which are kept and visible to people with sufficient rep). Those are your POST DETAILS.
"what is the logic to separate it in another table?"
Because keeping different things in separate tables is what relational databases do. The standard way of representing this data model is a parent table POST and child table POST_DETAIL with a defined relationship enforced through a foreign key.
Embeddable is a concept from object-oriented programming. Oracle does support object-relational constructs in the database. So it would be possible to define a POST_DETAIL Type and create a POST Table which has a column declared as a nested table of that Type. However, that would be a bad design for two reasons:
The SQL for working with nested tables is clunky. For instance, to get the POST and the latest version of its text would require unnesting the collection of details every time we need to display it. Computationally not much different from joining to a child table and filtering on latest version flag, but harder to optimise.
Children can have children themselves. In the case of Posts, Tags are details because they can vary due to editing. But if you embed TAG in POST_DETAIL embedded in POST how easy would it be to find all the Posts with an [oracle] tag?
This is the difference between Object-Oriented design and relational design.
OO is strongly hierarchical: everything is belongs to something and the way to get the detail is through the parent. This approach works well when dealing with single instances of things, and so is appropriate for UI design.
Relational prioritises commonality: everything of the same type is grouped together with links to other things. This approach is suited for dealing with sets of things, and so is appropriate for data management tasks (do you want to find all the employees who work in BERLIN or whose job is ENGINEER or who are managed by ELLIOTT?)
"give me a more clear explanation when to use which one"
Always store the data relationally in separate tables. Build APIs using OO patterns when it makes sense to do so.

Mongoid database schema

We're making an application using Ruby on Rails with Mongoid.
I've been reading up on the documentation of mongoDB and the Mongoid gemregarding how to design the database schema when utilizing a document-based database. I've been going over it in my head for a while now, and would really appreciate some input from people with more experience than me to know whether or not I'm completely lost :p
Anywho, here's an overview (I've tried to keep it as simple as possible to keep the question facutal) of the appliction we're making:
The application consists of the following entities:
Users, Subjects, Skills, Tasks, Hints and Tutorials.
They are organized in the following manner:
Subjects consists of a set of 1..n Skills.
Skills consists of a set of Tasks, (sub-)Skills or both (i.e. skills can be a tree
structure, where one main skill (say, Geometry) is the root and other skills are
child nodes (for instance, the Pythagorean Rule might be a sub-skill)). However,
all skills, regardless of whether they have sub-skills or not, should consist of
0..n tasks.
Tasks have a set of 1..n Hints associated with them.
Hints are each associated with a particular task.
Tutorials are associated with 1..n Skills (this skill can be either a root
node or a leaf node in a skill tree).
Users can complete 0..n Tasks in order to complete 0..n Skills.
Now, we imagine that there will mostly be read queries called to the database for the collection of skills/tasks completed by certain users, and read queries to display the various skill trees associated with a subject. The main write queries will probably be related to relationship between various users and tasks, in the following form
User A completes Task B
and so forth. Also, we imagine that the size of the number of entitites would be as follows: Users > Hints > Tasks > Skills > Tutorials > Subjects
Currently, this is the solution we have in mind:
Subject.rb
has_and_belongs_to_many :skills
Skill.rb (uses Mongoid::Tree)
has_and_belongs_to_many :subjects
embeds_many :tasks
Task.rb
embedded_in :skill, :inverse_of => :tasks
embeds_many :hints
Hint.rb
embedded_in :task, :inverse_of => :hints
We haven't really started implementing the tutorials and the connection between user and skills/tasks yet, but we imagine that the relationship between user and skill/tasks necessarily has to be N:N (which I guess is rather inefficient).
Is using a document-based database for this kind of application a bad idea? And if not, how can we improve our schema to make it as efficient as possible?
Cheers, sorry for the wall of text :-)

In my honest opinion unless some part of your data is going to be completely unstructured I see no need to use a NoSQL solution for your problem.
I'm going out from a point of view that you do have databases knowledge so you are familiar with MySQL/PostgreSQL/etc.
I sincerely believe that PostgreSQL (or Mysql for the matter) will be easier for you to set up, to write your code, to maintain and perhaps eventually to scale up.
My take on NoSQL is use it when you have unstructured datasets and you need the flexibility of adding fields on the fly, there are pitfalls using MongoDB.
It's not a silver bullet.
Some of the pitfalls are for instance, in()s are slow, if you write something to mongo and then want to immediately read it, you have to expect you won't get it (sharding in mongo), map reduce is kind of a hassle (I haven't tried the aggregate framework with Moped but it does look promising, sometimes indexing can have issues with sharded collections).

The Rails way for databases: person may_have (has_one ?) account

I'm trying to figure out most of the database design and normalization before I do much for my current project. Unfortunately I don't have much experience with database design, so it's a fairly slow process. One of the issues I'm trying to figure out is what's the best way to deal with a situation where one table may, or may not, be associated with another table.
A little background will help clear the question: I'm building a web application, using Rails 3.2, that helps manage races. People will be able to create accounts (/user accounts), host races, and manage the various aspects.
One thing is that the participants in a given race may or may not be users. In fact, we can assume that most of them will not be users. But for those who are, it would be nice to be able to link to their profiles (and, going the other way, link from their profiles to the races they've participated in).
It's sort of like blog posts where people can post anonymous comments, but if they do decide to log in and use their account then it's linked with the post in various ways.
I've searched for a while, but haven't really found solutions. I figure the way to do it is to have the Participants model note "has_one UserParticipation", which would usually be nil.
Is that a valid solution?
Is there a better way to go about this?
Here's a small diagram I threw together in Paint to concisely show the issue:
Question 2:
This is a little less important, but I figured I'd ask it in the same question because I've already posted the relevant question: several things will reference participants, is there any reason to set up a composite {Race_ID, Participant_Number} super key rather than always reference it using "race.participants"? (As far as I can tell, these would work very similarly.)

You may be over thinking it a bit. If I am following you correctly, this is a simple entity relationship diagram I whipped up in Dia:
Some explanation on the assoications of a User to Participants:
A Participant will have the belongs_to :user association, which is nil if there is no associated User.
A User will have the has_many :participants association, allowing none to many Participants relations. If there are none, a user instance will have user.participants equal an empty array.
As to the second question, you would only need to use both keys if you are querying for a specific participant for specific race, e.g. where participant_id = 7 and race_id = 4.

So a race has many participants (some of whom are users), and a participant has many races (hopefully :-).
Taking the user part of things out of the picture for a moment, this is a simple many-to-many relationship which Rails handles beautifully with has_and_belongs_to_many on both Race and Participant models, described here http://guides.rubyonrails.org/association_basics.html#the-has_and_belongs_to_many-association. Another alternative, not necessary in your case is has_many :through which creates a first-class model backed by the join table. But what you have described makes this unnecessary.
The relationship between User and participant is one-to-one, and conditional. It's not clear to me if you can be a user without being a participant but if you have a User who is a Participant, you want them related. This is a :has_one relation.
The cool part of Rails that I'll bet you're looking for is that relationships can be conditional, so in this case a Participant has_one User conditionally. The linked Rails Guide document describes how to define all of this.

Following functionality - database design problem - Rails 3

I need following functionality in my app (twitter like). One user can fallow other user.
I have model User, and I tried with self many-to-many relation, but I don't know how to implement this in my model.
Can you explain me some example how to do this?

Michael Hartl's tutorial has an entire section on follower relationships. I recommend reading it to get a better understanding of self many-to-many relations. It helped me a lot:
http://ruby.railstutorial.org/chapters/following-users
You could also use a gem like acts_as_follower, which abstracts much of the design details out for you:
https://github.com/tcocca/acts_as_follower

How do I structure my database so that two tables that constitute the same "element" link to another?

I read up on database structuring and normalization and decided to remodel the database behind my learning thingie to reduce redundancy.
I have different types of entries that can be learned. Gap texts/cloze tests (one text, many gaps) and simple known-unknown (one question, one answer) types.
Now I'm in a bit of a pickle:
gaps need exactly the same columns in the user table as question-answer types
but they need less columns than question-answer types (all that info is in the clozetests table)
I'm wishing for a "magic" foreign key that can point both to the gap and the terms table. Of course their ids would overlap though. I don't like having both a term_id and gap_id in the user_terms, that seems unelegant (but is the most elegant I can come up with after googling for a while, not knowing what name this pickle goes by).
I don't want a user_gaps analogue to user_terms, because then I'd be in the same pickle when it comes to the table user_terms_answers.
I put up this cardboard cutout collage of my schema. I didn't remove the stuff that isn't relevant for this question, but I can do that if anyone's confusion can be remedied like that. I think it looks super tidy already. Tidier than my mental concept of this at least.
Did I say any help would be greatly appreciated? Answerers might find themselves adulated for their wisdom.
Background story if you care, it's not really relevant to the question.
Before remodeling I had them all in one table (because I added the gap texts in a hurry), so that the gap texts were "normal" items without answers, while the gaps where items without questions. The application linked them together.
Edit
I added an answer after SO coughed up some helpful posts. I'm not yet 100% satisfied. I try to write views for common queries to this set up now and again I feel like I'll have to pull application logic for something that is database turf.

As mentioned in the comment, it is hard to answer without knowing the whole story. So, here is a story and a model to match. See if you can adapt this to you example.
School of (foreign) languages offers exams for several levels of language proficiency. The school maintains many pre-made tests for each level of each language (LangLevelTestNo).
Each test contains several (many) questions. Each question can be simple or of the close-text-type. Correct answers are stored for each simple question. Correct terms are stored for each gap of each close-text question.
Student can take an exam for a language level and is presented with one of the pre-made tests. For each student exam, the exam form is maintained which stores students answers for each question of the exam. Like a question, an answer may be of a simple of of a close-text-type.

After editing my question some Stackoverflow started relating the right questions to me.
I knew this was a common problem, but I really couldn't find it, just couldn't come up with the right search terms, I guess.
The following threads address similar problems and I'll try to apply that logic to my own design. They all propose adding a higher-level description for (in my case terms and gaps) like items. That makes sense and reflects the logic behind my application.
Relation Database Design
Foreign Key on multiple columns in one of several tables
Foreign Key refering to primary key across multiple tables
And this good person illustrates how to retrieve the data once it's broken up across tables. He also clues me to the keyword class table inheritance, so now I know what to google.
I'll post back with my edited schema once I've applied this. It does seem more elegant like this.
Edited schema

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas