How to prevent rails `has_many` relation joining two huge tables - sql

I am using Ruby on Rails 3.1.10 in developing a web application.
Objective is to find all users that a user is following.
Let there be two models User and Following
In User model:
has_many :following_users, :through => :followings
When calling user.following_users, rails help generates a query that INNER JOIN between users and followings table by its magical default.
When users table has over 50,000 records while followings table has over 10,000,000 records, the inner join generated is resource demanding.
Any thoughts on how to optimize the performance by avoiding inner joining two big tables?

To avoid a single query with inner join, you can do 2 select queries by using the following method
# User.rb
# assuming that Following has a followed_id column for user that is being followed
def following_users_nojoin
#following_users_nojoin ||= User.where("id IN (?)", followings.map(&:followed_id))
end
This will not create a join table but would make two sql queries. One to get all the followings that belong to the user (unless it is already in the cache) and second query to find all the followed users. A user_id index on following, as suggested in the comment, would speed up the first query where we get all the followings for the user.
The above method would be faster than a single join query if the followings of a user have already been retrieved.
Read this for details on whether it is faster to make multiple select queries over a single query with join. The best way to find out which one is faster is to benchmark both methods on your production database.

Related

How do I access the joined columns when using custom Arel joins?

I have a simple database with the following schema:
Book has many Tags through Taggings
Book has many Users through ReadingStatuses
What I want to do is to list all of the books, their tags, and a reading status of the currently logged in user with each book. I've managed to write this using Arel (with the arel-helpers gem), but I don't know how to access the results in each book entry while iterating over the books array.
Here's the query
join_params = Book.arel_table.join(ReadingStatus.arel_table, Arel::OuterJoin)
.on(Book[:id].eq(ReadingStatus[:book_id])
.and(ReadingStatus[:user_id].eq(User.first.id)))
.join_sources
books = Book.all.includes(:tags).joins(join_params)
and the respective SQL it generates
SELECT "books".* FROM "books"
LEFT OUTER JOIN "reading_statuses"
ON "books"."id" = "reading_statuses"."book_id"
AND "reading_statuses"."user_id" = 'XXX'
There's nothing really to be done with the tags, since includes will automatically make everything work when calling book.tags, but what I don't know is how to access the ReadingStatus that is joined to each Book when iterating over the books result?
Try using the includes instead of joins. "includes" does eager fetching, but if you don't mind that it might make you query look a lot simpler.
You will also not have to explicitly mention the left outer join.
See if that helps:
Pulling multiple levels of data efficiently in Rails
Rails: How to fetch records that are 2 'has_many' levels deep?
Eager loading
Make sure you include the call to references
"ReadingStatus[:user_id].eq(User.first.id)" can be shifted into the where clause

How to eager load associated table with conditions and current user?

Im trying to reduce the number of queries from n+1 to a couple for hacks that have favorites.
User has many hacks. (the creator of the hack)
Hack has many favorites
User has many favorites
Favorites belongs to hack and user.
Favorites is just a join table. It only contains foreign keys, nothing else.
On the hacks index page, I display a star icon based on whether a favorite record exists for the given hack and current_user.
index.hmtl
- #hacks.each do |h|
- if h.favorites(current_user)
#star image
How do I use active record or raw SQL in a '.SQL' method call to eager load the relevant favorites?
I'm thinking first get the paginated hacks, get the current user, and query all of the favorites that have the hack and the current user.
I'm not sure on the syntax. Also Current_user cannot be called from models.
Update:
I tried these queries
#hacks = Hack.includes(:tags, :users, :favorites).where("favorites.user_id = ?", current_user.id).references(:favorites)
I've also tried the following, and both of them without the .references method
#hacks = Hack.includes(:tags, :users, :favorites).where(favorites: { user_id: current_user.id } ).references(:favorites)
The WHERE clause acts on hacks to limit the type of hacks.
But what I want is to limit the favorites to the ones with the condition instead (i.e. favorites.user_id = current_user.id).
(After all, many thousands of user may favorite a hack, but the client side is only concerned with the current user, so loading all the users that favorited the hack could be very expensive and bad)
How can I make conditions apply to eager loaded associations rather than the original model?
If you have belongs_to :user and belongs_to :hack, would it be safe to assume that you have a "has many through" relationship setup between favorites, hack and user? Following is how you could eager load Hack with Favorites and Users.
# app/controllers/hacks_controller.b
def index
#hacks = Hack.includes(favorites: :user)
end
The above code is going to run three queries, one to select all from hacks, second join between hacks with favorites, and third select all from users with users already selected in the second query through favorites.
Another way to execute these joins in a single query would be to use an inner join using joins:
# app/controllers/hacks_controller.b
def index
#hacks = Hack.joins(favorites: :user)
end
With this query, there is only one query with inner join between the three tables, users, favorites and hacks.

How do I use "includes" in a Rails 3 AREL statement?

I am trying to see the SQL behind and AREL statement:
Brand.where(:subdomain => "coke").includes(:products).to_sql
brand has_many products, and product belongs_to brand.
However, the above statement yields only:
"SELECT `brands`.* FROM `brands` WHERE `brands`.`subdomain` = 'coke'"
Why don't I see the information for the products table in the SQL?
When you use an includes statement, Rails will generate a separate query per table (so if you include 2 other tables, there will be 3 queries total).
You can use a joins statement instead and it will lump it all into one query, however you may experience a performance hit. Also, if any of your where(...) conditions query against the included table, it will be lumped into one query.
See this other similar question for more information on Rails' behavior.

Can you perform a Query on a Table that exists but has no Mode in Rails?

Rails auto-generated a join table in this relationship :
# User.rb
has_and_belongs_to :topics
# Topic.rb
has_and_belongs_to :users
I want to query the join table, topics_users directly for a to obtain an ID's.
This strategy I feel would be the fastest at getting the User ID's from the Topics, rather than looking up the Join Table, looking up the Users, and Getting the ID's from the users.
If you really want to you can execute manual sql queries as follows:
ActiveRecord::Base.connection.execute("SELECT id FROM users_topics WHERE etc")

Is eager loading same as join fetch?

Is eager fetch same as join fetch?
I mean whether eagerly fetching a has-many relation fires 2 queries or a single join query?
How does rails active record implement a join fetch of associations as it doesnt know the table's meta-data in first hand (I mean columns in the table)? Say for example i have
people - id, name
things - id, person_id, name
person has one-to-many relation with the things. So how does it generate the query with all the column aliases even though it cannot know it when i do a join fetch on people?
An answer hasn't been accepted so I will try to answer your questions as I understand them:
"how does it know all the fields available in a table?"
It does a SQL query for every class that inherits from ActiveRecord::Base. If the class is 'Dog', it will do a query to find the column names of the table 'dogs'. In production mode it should only do this query once per run of the server -- in development mode it does it a lot. The query will differ depending on the database you use, and it is usually an expensive query.
"Say if i have a same name for column in a table and in an associated table how does it resolve this?"
If you are doing a join, it generates sql using the table names as prefixes to avoid ambiguities. In fact, if you are doing a join in Rails and want to add a condition (using custom SQL) for name, but both the main table and join table have a name column, you need to specify the table name in your sql. (e.g. Human.join(:pets).where("humans.name = 'John'"))
"I mean whether eagerly fetching a has-many relation fires 2 queries or a single join query?"
Different Rails versions are different. I think that early versions did a single join query at all times. Later versions would sometimes do multiple queries and sometimes a single join query, based on the realization that a single join query isn't always as performant as multiple queries. I'm not sure of the exact logic that it uses to decide. Recently, in Rails 3, I am seeing multiple queries happening in my current codebase -- but maybe it sometimes does a join as well, I'm not sure.
It knows the columns through a type of reflection. Ruby is very flexible and allows you to build functionality that will be used/defined during runtime and doesn't need to be stated ahead of time. It learns the associated "person_id" column by interpreting the "belongs_to :person" and knowing that "person_id" is the field that would be associated and the table would be called "people".
If you do People.includes(:things) then it will generate 2 queries, 1 that gets the people and a second that gets the things that have a relation to the people that exist.
http://guides.rubyonrails.org/active_record_querying.html