Let's say I have a Rails 3 application with the following model associations:
user
belongs_to :group
item
belongs_to :group
belongs_to :user
If code is not carefully written, this can result in data discrepancies where:
item.group
and
item.user.group
no longer return the same group, when they should. An item should always only belong to only 1 group.
My understanding is that this duplicate association may have been created to make querying simpler (reduce the number of tables joined).
So my question is, is this just an outright terrible practice or is this a question of valid trade-offs, that there are cases where the data and association duplication are acceptable because we can make querying simpler with fewer joins.
UPDATE
So far seems like the answer is "trade offs" and not "bad practice/code smell".
There seems to be multiple ways this can be handled, probably with a mix of constraints, advantages, disadvantages, use cases, etc:
1) denormalized, duplicated data as above
2) item has_one :group, :through => :user
3) item delegate :group :to => :user
I'm trying to understand the differences between approach #2 and #3. After experimenting with both approaches in the console, seems like the queries produced by Rails when item.group is called will be different. (2) produces a single query that joins groups and users. (2) produces two queries, first to find the user and then to find the group based on the user.
I think this is a question of valid trade-offs. Strictly speaking, in a fully normalized database your items table wouldn't have a group column, instead it would always go through the users table to find the group. That has the least amount of duplication, and thus the highest data integrity, but at the cost of doing that extra join every time you want to find an item's group. I'm assuming that a user also only belongs to one group. If a user can belong to many groups, then I think you would have to have that items.group_id column to know to which of those groups an item belongs.
If you want the faster query performance on lookup, you can keep the extra association like you have, and add an extra before_* hook to make sure that item.group_id = item.user.group_id, and raise a validation error if they don't match. This would make validating/inserting slightly slower, but would maximize your data integrity and still let you get slightly better performance when reading from the database.
Related
I am trying to use where query with relationships.
How can I query using where with relations in this case?
This is model
User
has_many :projects
has_many :reasons, through: :projects
Project
belongs_to :user
has_many :reasons
Reasons
belongs_to :project
This is the codes which doesn't work
# GET /reasons
def index
reasons = current_user.reasons
updated_at = params[:updated_at]
# Filter with updated_at for reloading from mobile app
if updated_at.present?
# This one doesn't work!!!!!!!!!!!!
reasons = reasons.includes(:projects).where("updated_at > ?", Time.at(updated_at.to_i))
# Get all non deleted objects when logging in from mobile app
else
reasons = reasons.where(deleted: false)
end
render json: reasons
end
---Update---
This is correct thanks to #AmitA.
reasons = reasons.joins(:project).where("projects.updated_at > ?", Time.at(updated_at.to_i))
If you want to query all reasons whose projects have some constraints, you need to use joins instead of includes:
reasons = reasons.joins(:project).where("projects.updated_at > ?", Time.at(updated_at.to_i))
Note that when both includes and joins receive a symbol they look for association with that precise name. That's why you can't actually do includes(:projects), but must do includes(:project) or joins(:project).
Also note that the constraints on joined tables specified by where must refer to the table name, not the association name. That's why I used projects.updated_at (in plural) rather than anything else. In other words, when calling the where method you are in "SQL domain".
There is a difference between includes and joins. includes runs a separate query to load the dependents, and then populates them into the fetched active record objects. So:
reasons = Reason.where('id IN (1, 2, 3)').includes(:project)
Will do the following:
Run the query SELECT * FROM reasons WHERE id IN (1,2,3), and construct the ActiveRecord objects Reason for each record.
Look into each reason fetched and extract its project_id. Let's say these are 11,12,13. Then run the query SELECT * FROM projects WHERE id IN (11,12,13) and construct the ActiveRecord objects Project for each record.
Pre-populate the project association of each Reason ActiveRecord object fetched in step 1.
The last step above means you can then safely do:
reasons.first.project
And no query will be initiated to fetch the project of the first reason. This is why includes is used to solve N+1 queries. However, note that no JOIN clauses happen in the SQLs - they are separate SQLs. So you cannot add SQL constraints when you use includes.
That's where joins comes in. It simply joins the tables so that you can add where constraints on the joined tables. However, it does not pre-populate the associations for you. In fact, Reason.joins(:project), will never instantiate Project ActiveRecord objects.
If you want to do both joins and includes, you can use a third method called eager_load. You can read more about the differences here.
Or: "What happens if you decide wrong"
Doc says:
You should use has_many :through if you need validations, callbacks,
or extra attributes on the join model.
But should the "you should use" be a "you must use"?
The point is, we have one attribute in a join table, and this one is heavily discussed.
So what happens if I (we) decide to use the simpler HABTM and in one year the friendly attribute pops up? Is it possible to access it (more complex, ok), or have we go back to start and redesign?
All the answers (and there are a lot of) to this - "HABTM or :through", are more or less easy to decide; "take this or the other".
I want to know how to correct the error if we decide wrong.
Is it eg. possible to "push a model between", or how to access this one attribute, if it pops up?
Or is the better strategy to start with :through? just for sure
Yes, it's possible to convert a HABTM into a HMT.
The join table follows a naming convention of table + table in lexical order like "developers_projects".
Later, if you want to make the relationship HMT just create a model called DevelopersProject and use that as the join table. It's the same table. But you can then use migrations to add fields, and use the model to add validations, etc. etc.
The advantage of starting with HMT is that you get to call the join table whatever you want, but that's not hugely important.
I'm working on a Rails 3 application that has (for the sake of this question) Posts linked to multiple Categories and vice versa through has_and_belongs_to_many associations:
Post < ActiveRecord::Base
has_and_belongs_to_many :categories
end
Category < ActiveRecord::Base
has_and_belongs_to_many :posts
end
I'm trying to figure out how to write an ActiveRecord (or ARel) finder that retrieves all Posts where each Post is linked to both of two Categories. I understand the SQL I'm ultimately trying to generate (two INNER JOINS with aliases to be able to distinguish each one for the matching on each of the two Categories), but so far I haven't figured out a way to create the query without resorting to raw SQL bits.
The reason avoiding custom SQL is so important in this case is that the code I'm writing is generic and heavily data-driven, and it needs to mix with other filtering (and sorting) qualifiers on the query for Post objects, so I can't just hard code either method calls on Post (e.g. to access the collection of Categories) or custom SQL that might not mix well with the SQL generated by the other filters.
I'm open to switching to using a join model (has_many :through) if that somehow makes things easier, or even looking at other ORM options (DataMapper, Mongoid, etc.), but that seems like a huge change just to get something so basic working.
I'm stunned that this isn't easier/more obvious in ActiveRecord/ARel, but maybe I just don't know the magic keywords to search to find the answer. The documentation for ARel is also surprisingly slim, so I'm at a loss. Any help would be much appreciated!
After quite a bit more Googling, I found these two articles (from 2006!) that eventually lead me to the correct answer in ActiveRecord/ARel:
http://blog.hasmanythrough.com/2006/6/12/when-associations-arent-enough
http://blog.hasmanythrough.com/2006/6/12/when-associations-arent-enough-part-2
The (nearly) SQL-free code that produces what I'm looking for is a pretty clever use of the GROUP BY and HAVING operators in SQL:
Post.joins(:categories).where("categories.name" => ['catA','catB']).group('posts.id').having('COUNT(posts.id) = 2')
Basically, it finds all Posts associated with any of the given Categories (including duplicates if, as we hope, there are Posts that match multiple Categories), groups that list by the id field on Post, then trims the results down to only include groups that have exactly the number of matches we want.
I haven't tried mixing this code with my filters on other fields on Post, but I'm pretty sure it'll work.
I'm doing my first rails(3) application.
Associations don't make sense. First, even the rails guides don't
really explain what they do, they just explain how to use them.
From what I gather, associations do two things:
a) Allow ActiveRecord to optimize the structure of the database.
b) Allow ActiveRecord to offer an alternate ruby syntax for
joins and the like (SQL queries). I want this.
I'm trying to understand associations, and how to properly use them. Based
on the example below, it seems like associations are 'broken' or at least
the documentation is.
Consider a trivial version of my application. A teacher modifying wordlists
for study.
There are 3 relevant tables for this discussion. For clarity, I've simply
included the annotate(1) tool's definition of the table, and removed
unnecessary fields/columns.
A wordlist management table:
Table name: wordlist_mgmnt_records
id :integer not null, primary key
byline_id :integer(8) not null
A table that maps words to a word list:
Table name: wordlists
wordlist_mgmnt_id :integer not null
word_id :integer not null
We don't actually care about the words themselves. But we do care about
the last table, the bylines:
Table name: bylines
id :integer(8) not null, primary key
teacher_id :integer not null
comment :text not null
Bylines record who, what tool was used, where, when, etc. Bylines are
mainly used to trouble shoot what happened so I can explain to users what
they should have done (and/or repair their mistakes).
A teacher may modify one or more word list management records at a time
(aka single byline). Said another way, a single change may update multiple
word lists.
For wordlist_mgmnt_records the associations would be:
has_many :bylines # the same byline id can exist
# in many wordlist_mgmnt_records
But what's the corresponding entry for bylines?
The Beginning Rails 3 (Carneiro, et al) book says:
"Note: For has_one and has_many associations, adding a belongs_to
on the other side of the association is always recommended. The
rule of thumb is that the belongs_to declaration always goes in
the class with the foreign key."
[ Yes, I've also looked at the online rails guide(s) for this. Didn't
help. ]
For the bylines table/class do I really want to say?
belongs_to :wordlist_mgmnt_records
That really doesn't make sense. the bylines table basically belongs_to
every table in the data base with a bylines_id. So would I really say
belongs_to all of them? Wouldn't that set up foreign keys in all of the
other tables? That in turn would make changes more expensive (too many
CPU cycles) than I really want. Some changes hit lots of tables, some of
them very large. I prize speed in normal use, and am willing to wait to
find bylines without foreign keys when using bylines for cleanup/repair.
Which brings us full circle. What are associations really doing in rails,
and how does one use them intelligently?
Just using associations because you can doesn't seem to be the right
answer, but how do you get the added join syntax otherwise?
I'll try to help your confusion....
A byline can have multiple wordlist_mgmnt_records, so defining the has_many there seems to make sense.
I'm not sure I understand your confusion in the other direction. Since you have defined the attribute wordlist_mgmnt_records.byline_id, any given wordlist_mgmnt_record can only 'have' (belong_to) a single byline. You're simply defining the crows foot via ruby (if you like database diagrams):
wordlist_msgmnt_records (many)>>----------(one) byline
Or read in english: "One byline can have many wordlist_mgmnts, and many individual wordlist_mgmnt's can belong to a single byline"
Adding the belongs_to definition to the wordlist_mgmnt model doesn't affect the performance of the queries, it just let's you do things like:
#record = WordlistMgmntRecord.find(8)
#record_byline = #record.byline
Additionally you're able to do joins on tables like:
#records = WordlistMgmntRecord.joins(:byline).where({:byline => {:teacher_id => current_user.id}})
Which will execute this SQL:
SELECT wordlist_mgmnt_records.*
FROM wordlist_mgmnt_records
INNER JOIN bylines
ON wordlist_mgmnt_records.byline_id = bylines.id
WHERE bylines.teacher_id = 25
(Assuming current_user.id returned 25)
This is based off of your current DB design. If you find that there's a way you can implement the functionality you want without having byline_id as a foreign key in the wordlist_mgmnt_records table then you would modify your models to accomodate it. However this seems to be how a normalized database should look, and I'm not really sure what other way you would do it.
I was asked to make some kind of reporting (logging) service. The employee has locally installed web application (just some dynamic website, written in PHP) in many companies. This web app is some kind of survey. All data is saved on local database, but now the requirement is that this data (result of survey) will also be sent to central server after every form submit.
There are four types of surveys. They have organised it this way, that there are many Projects, and each Project can have only one Survey of each type (STI here?) and Survey belongs to one Project. Each Survey will receive a report from local app, so it will have many Reports. The Rails 3 app that logs this reports should mimic somehow this logic. First question is: does this AR structure make sense for you?
Project-1--------1-Survey-1-------*-Report
Project
has_one :survey
has_many :reports, :through => :survey
Survey
belongs_to :project
has_many :reports
Report
belongs_to :survey
Second question is about having multiple tables for one AR Model. If all data will be stored in reports table, the table will become huge very quickly, and efficient querying for reports that belong to specific survey might be a problem after some time. Maybe it would be better to have separate tables for each Survey? Like reports_<survey_id>. Is this possible?
Also, I am somehow forced to use MySQL, but if there is another, much better solution for this, I could try to push it through.
If you are still here, thank you for reading this :)
Second question is about having multiple tables for one AR Model. If all data will be stored in reports table, the table will become huge very quickly, and efficient querying for reports that belong to specific survey might be a problem after some time. Maybe it would be better to have separate tables for each Survey? Like reports_. Is this possible?
Yes, it is possible.
You can do it this way:
class Report < AR::Base
table_name_suffix = ''
end
Report.table_name_suffix = 'foo'
UPDATE
# simple example of sharding model
class Survey < AR::Base
has_many :reports
def shard_reports
Report.table_name_suffix = "_survey_#{self.id}"
returning(reports){ Report.table_name_suffix = "" }
end
end
Survey.first.reports_shard
Put an index on each of your foreign keys (e.g. Reports.survey_id) and take a breather. You're worrying entirely too much about the performance right now. You will need at least millions of records in your Reports table before you will see any performance problems from MySQL.