Rails 3: Turning sort_by into Lazy Loading - ruby-on-rails-3

I have a series of users who each make numerous posts (and the posts receive "views"), and I have sorted them in various ways in the past.
For example, here is the 8 most viewed users:
#most_viewed = User.sort_by{|user| user.views}
And then in my model, user.rb file:
def views
self.posts.sum(:views) || 0
end
This worked fine in the past.
The problem is that my sorting used the sort_by method, which doesn't play nice with the will_paginate gem because it returns an Array instead of a Relation.
How can I do the following things in Rails 3, Active Record Relation, Lazy Loading style? For example, as named scopes?
Users with at least one post
Users with the greatest number of total views on their posts
Users with the posts with the greatest number of views (not total)
Users with the most recent posts
Users with the most number of posts

You can still paginate an array, but you will have to require 'will_paginate/array'
To do this, you could create a file (i.e.: "will_paginate_array_fix.rb") in your config/initializers directory. Add this as only line in that file:
require 'will_paginate/array'
restart your server, and you should be able to paginate normal ruby arrays.
you should use the order method. This will perform the sorting through SQL, which give better performance in the first place :)
User.order("users.views DESC")
or for ascending order
User.order("users.views ASC")
This will return a relation, so you can add other stuff such as where clauses, scoping, etc..
User.order("views DESC").where("name like 'r%'")
For an ordered list of users who's name start with r
visit http://guides.rubyonrails.org/active_record_querying.html for the official explanation

Related

How can I speed up this query in a Rails app?

I need help optimizing a series of queries in a Rails 5 app. The following explains what I am doing, but if it isn't clear let me know and I will try to go into better detail.
I have the following methods in my models:
In my IncomeReport model:
class IncomeReport < ApplicationRecord
def self.net_incomes_2015_totals_collection
all.map(&:net_incomes_2015).compact
end
def net_incomes_2015
(incomes) - producer.expenses_2015
end
def incomes
total_yield * 1.15
end
end
In my Producer model I have the following:
class Producer < ApplicationRecord
def expenses_2015
expenses.sum(&:expense_per_ha)
end
end
In the Expense model I have:
class Expense < ApplicationRecord
def expense_per_ha
total_cost / area
end
end
In the controller I have this
(I am using a gem called descriptive_statistics to get min, max, quartiles, etc in case you are wondering about that part at the end)
#income_reports_2015 = IncomeReport.net_incomes_2015_totals_collection.extend(DescriptiveStatistics)
Then in my view I use
<%= #income_reports_2015.descriptive_statistics[:min] %>
This code works when there are only a few objects in the database. However, now that there are thousands the query takes forever to give a result. It takes so long that it times out!
How can I optimize this to get the most performant outcome?
One approach might be to architecture your application differently. I think a service-oriented architecture might be of use in this circumstance.
Instead of querying when the user goes to this view, you might want to use a worker to query intermittently, then write to a CSV. Thus, a user navigates to this view and you could read from the CSV instead. This would run much faster because instead of doing a query then & there(when the user navigates to this page) you're simply reading from a file that was created before as a background process.
Obviously, this has its own set of challenges, but I've done this in the past to solve a similar problem. I wrote an app that fetched data from 10 different external API's once a minute. The 10 different fetches resulted in 10 objects in the db. 10 * 60 * 24 = 14,400 records in the DB per day. When a user would load the page requiring this data, they would load 7 days worth of records, 100,800 database rows. I ran into the same problem where the query being done at runtime resulted in a timeout, I wrote to a CSV and read it as a workaround.
What's the structure of IncomeReport? By looking at the code your problem lies in all from net_incomes_2015_totals_collection. all hits the database and returns all records then you map them. Overkill. Try to filter the data, query less, select less and get all the info you want directly with ActiveRecord. Ruby loops slows things down.
So, without know the table structure and its data, I'd do the following:
def self.net_incomes_2015_totals_collection
where(created_at: 2015_start_of_year..2015_end_of_year).where.not(net_incomes_2015: nil).pluck(:net_incomes_2015)
end
Plus I'd make sure there's a composide index for created_at and net_incomes_2015.
It will probably be slow but better than it is now. You should think about aggregating the data in the background (resque, sidekiq, etc) at midnight (and cache it?).
Hope it helps.
It looks like you have a few n+1 queries here. Each report grabs its producer in an an individual query. Then, each producer grabs each of its expenses in a different query.
You could avoid the first issue by throwing a preload(:producer) instead of the all. However, the sums later will be harder to avoid since sum will automatically fire a query.
You can avoid that issue with something like
def self.net_incomes_2015_totals_collection
joins(producer: :expenses).
select(:id, 'income_reports.total_yield * 1.15 - SUM(expenses.total_cost/expenses.area) AS net_incomes_2015').
group(:id).
map(&:net_incomes_2015).
compact
end
to get everything in one query.

Order results by weighted variables?

I have a listing of ~10,000 apps and I'd like to order them by certain columns, but I want to give certain columns more "weight" than others.
For instance, each app has overall_ratings and current_ratings. If the app has a lot of overall_ratings, that's worth 1.5, but the number of current_ratings would be worth, say 2, since the number of current_ratings shows the app is active and currently popular.
Right now there are probably 4-6 of these variables I want to take into account.
So, how can I pull that off? In the query itself? After the fact using just Ruby (remember, there are over 10,000 rows that would need to be processed here)? Something else?
This is a Rails 3.2 app.
Sorting 10000 objects in plain Ruby doesn't seem like a good idea, specially if you just want the first 10 or so.
You can try to put your math formula in the query (using the order method from Active Record).
However, my favourite approach would be to create a float attribute to store the score and update that value with a before_save method.
I would read about dirty attributes so you only perform this scoring when some of you're criteria is updated.
You may also create a rake task that re-scores your current objects.
This way you would keep the scoring functionality in Ruby (you could test it easily) and you could add an index to your float attribute so database queries have better performance.
One attempt would be to let the DB do this work for you with some query like: (can not really test it because of laking db schema):
ActiveRecord::Base.connection.execute("SELECT *,
(2*(SELECT COUNT(*) FROM overall_ratings
WHERE app_id = a.id) +
1.5*(SELECT COUNT(*) FROM current_ratings
WHERE app_id = a.id)
AS rating FROM apps a
WHERE true HAVING rating > 3 ORDER BY rating desc")
Idea is to sum the number of ratings found for each current and overall rating with the subqueries for an specific app id and weight them as desired.

Memory leak in Rails 3.0.11 migration

A migration contains the following:
Service.find_by_sql("select
service_id,
registrations.regulator_given_id,
registrations.regulator_id
from
registrations
order by
service_id, updated_at desc").each do |s|
this_service_id = s["service_id"]
if this_service_id != last_service_id
Service.find(this_service_id).update_attributes!(:regulator_id => s["regulator_id"],
:regulator_given_id => s["regulator_given_id"])
last_service_id = this_service_id
end
end
and it is eating up memory, to the point where it will not run in the 512MB allowed in Heroku (the registrations table has 60,000 items). Is there a known problem? Workaround? Fix in a later version of Rails?
Thanks in advance
Edit following request to clarify:
That is all the relevant source - the rest of the migration creates the two new columns that are being populated. The situation is that I have data about services from multiple sources (regulators of the services) in the registrations table. I have decided to 'promote' some of the data ([prime]regulator_id and [prime]regulator_given_key) into the services table for the prime regulators to speed up certain queries.
This will load all 60000 items in one go and keep those 60000 AR objects around, which will consume a fair amount of memory. Rails does provide a find_each method for breaking down a query like that into chunks of 1000 objects at a time, but it doesn't allow you to specify an ordering as you do.
You're probably best off implementing your own paging scheme. Using limit/offset is a possibility however large OFFSET values are usually inefficient because the database server has to generate a bunch of results that it then discards.
An alternative is to add conditions to your query that ensures that you don't return already processed items, for example specifying that service_id be less than the previously returned values. This is more complicated if when compared in this matter some items are equal. With both of these paging type schemes you probably need to think about what happens if a row gets inserted into your registrations table while you are processing it (probably not a problem with migrations, assuming you run them with access to the site disabled)
(Note: OP reports this didn't work)
Try something like this:
previous = nil
Registration.select('service_id, regulator_id, regulator_given_id')
.order('service_id, updated_at DESC')
.each do |r|
if previous != r.service_id
service = Service.find r.service_id
service.update_attributes(:regulator_id => r.regulator_id, :regulator_given_id => r.regulator_given_id)
previous = r.service_id
end
end
This is a kind of hacky way of getting the most recent record from regulators -- there's undoubtedly a better way to do it with DISTINCT or GROUP BY in SQL all in a single query, which would not only be a lot faster, but also more elegant. But this is just a migration, right? And I didn't promise elegant. I also am not sure it will work and resolve the problem, but I think so :-)
The key change is that instead of using SQL, this uses AREL, meaning (I think) the update operation is performed once on each associated record as AREL returns them. With SQL, you return them all and store in an array, then update them all. I also don't think it's necessary to use the .select(...) clause.
Very interested in the result, so let me know if it works!

Ruby Rails Postgis - Finding all the points in a polygon

I would like some help on constructing sql queries for use in rails with activerecord-postgis-adapter. I have been doing quite a bit of reading but am now a bit stuck, any help would be much appreciated.
I have the two models Events and Areas:
Events have a 'geometry' column which is of type Point
class Event < ActiveRecord::Base
self.rgeo_factory_generator = RGeo::Geos.factory_generator
end
t.spatial "geometry", :limit => {:srid=>4326, :type=>"polygon", :geographic=>true}
Areas have a 'geometry' column which is of type Polygon
class Area < ActiveRecord::Base
self.rgeo_factory_generator = RGeo::Geos.factory_generator
end
t.spatial "geometry", :limit => {:srid=>4326, :type=>"point", :geographic=>true}
I can create and plot both events and areas on a google map, and create areas by clicking on a map and saving to the database.
I want to be able to do the follow 2 queries:
#area.events - show all the events in an area
#event.areas - show all the areas a single event is in
I know i might be asking a bit much here, but any help would be much appreciated
Many thanks
Here's a quick way to do this. These will simply return arrays of ActiveRecord objects.
class Area
def events
Event.joins("INNER JOIN areas ON areas.id=#{id} AND st_contains(areas.geometry, events.geometry)").all
end
end
class Event
def areas
Area.joins("INNER JOIN events ON events.id=#{id} AND st_contains(areas.geometry, events.geometry)").all
end
end
You probably should memoize (cache the result) so that you don't query the database every time you call the method. That should be straightforward; I leave it as an exercise for the reader.
It may be possible to get sophisticated and wrap this up in a true Rails association proxy (so you can get all the Rails association goodies). I haven't looked into this though. It wouldn't be a standard Rails association in any case, because you're not storing IDs.
Twelfth is right: you should create spatial indexes for both tables. Activerecord-postgis-adapter should make those easy to do in your migration.
change_table :events do |t|
t.index :geometry, :spatial => true
end
change_table :areas do |t|
t.index :geometry, :spatial => true
end
If you're having trouble with installing postgis, I recently wrote up a bunch of blog entries on this stuff. Check out http://www.daniel-azuma.com/blog/archives/category/tech/georails. I'm also the author of rgeo and activerecord-postgis-adapter themselves, so I'm happy to help if you're stuck on stuff.
This answer will be a bit of a work in progress for you. I'm weak with ruby on rails, but I should be able to help you through the DB section.
You have two tables, Area which holds a polygon and Event which holds the event as a single point (it's a bit more complicated if the event is also an area and you're trying to pick out overlapping area's...if events are single points this works).
Select *
from area a inner join event e on 1=1
This is going to create a list of every area joined to every event...if you have 500 events and 20 area's, this will query will return 10'000 lines. Now you want to filter this so only events that are within the area they've been joined to. We can use st_contains for this as st_contains(polygon,point):
where st_contains(a.polygon,e.point) = 't'
If you run this, it should give you a.,e. for all events within area's. Now it's just a matter of counting what you want to count.
select a.id, count(1)
from area a inner join event e on 1=1
where st_contains(a.polygon,e.point) = 't'
group by 1
This will give you a list of all your area's (by id) and the count of the events in it. Switching out a.id with e.id will give a list of event id's and the number area's they are in.
Unfortunately I have no idea how to express these queries within Ruby, but the DB concepts that you'll need are here...
For speed considerations, you should look into the GIStree indexing that Postgres has...indexed polygons perform exponentially better.
Edit:
PostGIS is a contrib file that comes with Postgres but does not exist in a standard install...you'll need to find this contrib file. These will install a series of GIS functions within your database including ST_Contains. (functions reside in a database, so make sure you install the functions in the DB you are using)
The second thing the PostGIS contrib files installs is the template_postGIS database which is required for the geometry datatypes (geom as a data type won't exist until this is installed).

Ruby Rails Complex SQL with aggregate function and DayOfWeek

Rails 2.3.4
I have searched google, and have not found an answer to my dilemma.
For this discussion, I have two models. Users and Entries. Users can have many Entries (one for each day).
Entries have values and sent_at dates.
I want to query and display the average value of entries for a user BY DAY OF WEEK. So if a user has entered values for, say, the past 3 weeks, I want to show the average value for Sundays, Mondays, etc. In MySQL, it is simple:
SELECT DAYOFWEEK(sent_at) as day, AVG(value) as average FROM entries WHERE user_id = ? GROUP BY 1
That query will return between 0 and 7 records, depending upon how many days a user has had at least one entry.
I've looked at find_by_sql, but while I am searching Entry, I don't want to return an Entry object; instead, I need an array of up to 7 days and averages...
Also, I am concerned a bit about the performance of this, as we would like to load this to the user model when a user logs in, so that it can be displayed on their dashboard. Any advice/pointers are welcome. I am relatively new to Rails.
You can query the database directly, no need to use an actual ActiveRecord object. For example:
ActiveRecord::Base.connection.execute "SELECT DAYOFWEEK(sent_at) as day, AVG(value) as average FROM entries WHERE user_id = #{user.id} GROUP BY DAYOFWEEK(sent_at);"
This will give you a MySql::Result or MySql2::Result that you can then use each or all on this enumerable, to view your results.
As for caching, I would recommend using memcached, but any other rails caching strategy will work as well. The nice benefit of memcached is that you can have your cache expire after a certain amount of time. For example:
result = Rails.cache.fetch('user/#{user.id}/averages', :expires_in => 1.day) do
# Your sql query and results go here
end
This would put your results into memcached for one day under the key 'user//averages'. For example if you were user with id 10 your averages would be in memcached under 'user/10/average' and the next time you went to perform this query (within the same day) the cached version would be used instead of actually hitting the database.
Untested, but something like this should work:
#user.entries.select('DAYOFWEEK(sent_at) as day, AVG(value) as average').group('1').all
NOTE: When you use select to specify columns explicitly, the returned objects are read only. Rails can't reliably determine what columns can and can't be modified. In this case, you probably wouldn't try to modify the selected columns, but you can'd modify your sent_at or value columns through the resulting objects either.
Check out the ActiveRecord Querying Guide for a breakdown of what's going on here in a fairly newb-friendly format. Oh, and if that query doesn't work, please post back so others that may stumble across this can see that (and I can possibly update).
Since that won't work due to entries returning an array, we can try using join instead:
User.where(:user_id => params[:id]).joins(:entries).select('...').group('1').all
Again, I don't know if this will work. Usually you can specify where after joins, but I haven't seen select combined in there. A tricky bit here is that the select is probably going to eliminate returning any data about the user at all. It might make more sense just to eschew find_by_* methods in favor of writing a method in the Entry model that just calls your query with select_all (docs) and skips the association mapping.