Rails 3 query matching attribute of has_one association that is a subset of has_many association - sql

The title is confusing, but allow me to explain. I have a Car model that has multiple datapoints with different timestamps. We are almost always concerned with attributes of its latest status. So the model has_many statuses, along with a has_one to easily access it's latest one:
class Car < ActiveRecord::Base
has_many :statuses, class_name: 'CarStatus', order: "timestamp DESC"
has_one :latest_status, class_name: 'CarStatus', order: "timestamp DESC"
delegate :location, :timestamp, to: 'latest_status', prefix: 'latest', allow_nil: true
# ...
end
To give you an idea of what the statuses hold:
loc = Car.first.latest_location # Location object (id = 1 for example)
loc.name # "Miami, FL"
Let's say I wanted to have a (chainable) scope to find all cars with a latest location id of 1. Currently I have a sort of complex method:
# car.rb
def self.by_location_id(id)
ids = []
find_each(include: :latest_status) do |car|
ids << car.id if car.latest_status.try(:location_id) == id.to_i
end
where("id in (?)", ids)
end
There may be a quicker way to do this using SQL, but not sure how to only get the latest status for each car. There may be many status records with a location_id of 1, but if that's not the latest location for its car, it should not be included.
To make it harder... let's add another level and be able to scope by location name. I have this method, preloading statuses along with their location objects to be able to access the name:
def by_location_name(loc)
ids = []
find_each(include: {latest_status: :location}) do |car|
ids << car.id if car.latest_location.try(:name) =~ /#{loc}/i
end
where("id in (?)", ids)
end
This will match the location above with "miami", "fl", "MIA", etc... Does anyone have any suggestions on how I can make this more succinct/efficient? Would it be better to define my associations differently? Or maybe it will take some SQL ninja skills, which I admittedly don't have.
Using Postgres 9.1 (hosted on Heroku cedar stack)

All right. Since you're using postgres 9.1 like I am, I'll take a shot at this. Tackling the first problem first (scope to filter by location of last status):
This solution takes advantage of PostGres's support for analytic functions, as described here: http://explainextended.com/2009/11/26/postgresql-selecting-records-holding-group-wise-maximum/
I think the following gives you part of what you need (replace/interpolate the location id you're interested in for the '?', naturally):
select *
from (
select cars.id as car_id, statuses.id as status_id, statuses.location_id, statuses.created_at, row_number() over (partition by statuses.id order by statuses.created_at) as rn
from cars join statuses on cars.id = statuses.car_id
) q
where rn = 1 and location_id = ?
This query will return car_id, status_id, location_id, and a timestamp (called created_at by default, although you could alias it if some other name is easier to work with).
Now to convince Rails to return results based on this. Because you'll probably want to use eager loading with this, find_by_sql is pretty much out. There is a trick I discovered though, using .joins to join to a subquery. Here's approximately what it might look like:
def self.by_location(loc)
joins(
self.escape_sql('join (
select *
from (
select cars.id as car_id, statuses.id as status_id, statuses.location_id, statuses.created_at, row_number() over (partition by statuses.id order by statuses.created_at) as rn
from cars join statuses on cars.id = statuses.car_id
) q
where rn = 1 and location_id = ?
) as subquery on subquery.car_id = cars.id order by subquery.created_at desc', loc)
)
end
Join will act as a filter, giving you only the Car objects that were involved in the subquery.
Note: In order to refer to escape_sql as I do above, you'll need to modify ActiveRecord::Base slightly. I do this by adding this to an initializer in the app (which I place in app/config/initializers/active_record.rb):
class ActiveRecord::Base
def self.escape_sql(clause, *rest)
self.send(:sanitize_sql_array, rest.empty? ? clause : ([clause] + rest))
end
end
This allows you to call .escape_sql on any of your models that are based on AR::B. I find this profoundly useful, but if you've got some other way to sanitize sql, feel free to use that instead.
For the second part of the question - unless there are multiple locations with the same name, I'd just do a Location.find_by_name to turn it into an id to pass into the above. Basically this:
def self.by_location_name(name)
loc = Location.find_by_name(name)
by_location(loc)
end

Related

Select oldest HABTM record with group by clause

I want to show a line chart on the admin page (with chartkick) with the incremental number of scores related to their earliest export date.
I have the following models:
# score.rb
class Score < ApplicationRecord
has_and_belongs_to_many :export_orders, join_table: :scores_export_orders
end
# export_order.rb
class ExportOrder < ApplicationRecord
has_and_belongs_to_many :scores, join_table: :scores_export_orders
end
How do I select, for each Score having at least one ExportOrder, the corresponding ExportOrder with the earliest created_at (in date only format)?
I had a look at this, but my situation has a HABTM relationship instead of a simple has_many.
I tried this code, to get at least a mapping between oldest export date and number of scores:
sql = "
SELECT
COUNT(DISTINCT scores.id), MIN(export_orders.created_at::date)
FROM
scores
INNER JOIN
scores_export_orders
ON
scores.id = scores_export_orders.score_id
INNER JOIN
export_orders
ON
export_orders.id = scores_export_orders.export_order_id
GROUP BY
export_orders.created_at::date
".split("\n").join(' ')
query = ActiveRecord::Base.connection.execute(sql)
query.map { |v| [v['count'], v['min']] }
but the total number of scores is greater than all scores having an export date.
Any ideas?
Try:
class Score < ApplicationRecord
has_and_belongs_to_many :export_orders, join_table: :scores_export_orders
def earliest_export_date
export_orders.pluck(&:created_at).min
end
end
This will let you call #score.earliest_export_date, which should return the value you want.
I also think it's the most performant way to do it in ruby, although someone may correct me on that.
The following has better performance than Mark's solution since it relies on pure SQL. Basically, the GROUP BY clause required grouping by scores_export_orders.score_id rather than export_orders.created_at:
sql = "
SELECT
COUNT(DISTINCT scores_export_orders.score_id), MIN(export_orders.created_at::date)
INNER JOIN
scores_export_orders
INNER JOIN
export_orders
ON
export_orders.id = scores_export_orders.export_order_id
GROUP BY
scores_export_orders.score_id
".split("\n").join(' ')
query = ActiveRecord::Base.connection.execute(sql)
query.map { |v| [v['count'], v['min']] }
I couldn't find an exact equivalent in ActiveRecord instructions (all of such attempts were giving me strange results), so executing the SQL will also do the trick.

ActiveRecord: Adding condition to ON clause for includes

I have a model offers and another historical_offers, one offer has_many historical_offers.
Now I would like to eager load the historical_offers of one given day for a set of offers, if it exists. For this, I think I need to pass the day to the ON clause, not the WHERE clause, so that I get all offers, also when there is no historical_offer for the given day.
With
Offer.where(several_complex_conditions).includes(:historical_offers).where("historical_offers.day = ?", Date.today)
I would get
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id
WHERE day = '2012-11-09' AND ...
But I want to have the condition in the ON clause, not in the WHERE clause:
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id AND day = '2012-11-09'
WHERE ...
I guess I could alter the has_many definition with a lambda condition for a specific date, but how would I pass in a date then?
Alternatively I could write the joins mysqlf like this:
Offer.where(several_complex_conditions)
.joins(["historical_offers ON offers.id = historical_offers.offer_id AND day = ?", Date.today])
But how can I hook this up so that eager loading is done?
After a few hours headscratching and trying all sorts of ways to accomplish eager loading of a constrained set of associated records I came across #dbenhur's answer in this thread which works fine for me - however the condition isn't something I'm passing in (it's a date relative to Date.today). Basically it is creating an association with the conditions I wanted to put into the LEFT JOIN ON clause into the has_many condition.
has_many :prices, order: "rate_date"
has_many :future_valid_prices,
class_name: 'Price',
conditions: ['rate_date > ? and rate is not null', Date.today-7.days]
And then in my controller:
#property = current_agent.properties.includes(:future_valid_prices).find_by_id(params[:id])

Rails 3 Order Records By Grand-child Count

I'm trying to do some fairly complicated record sorting that I was having a bit of trouble with. I have three models:
class User < ActiveRecord::Base
has_many :registers
has_many :results, :through => :registers
#Find all the Users that exist as registrants for a tournament
scope :with_tournament_entrees, :include => :registers, :conditions => "registers.id IS NOT NULL"
end
Register
class Register < ActiveRecord::Base
belongs_to :user
has_many :results
end
Result
class Result < ActiveRecord::Base
belongs_to :register
end
Now on a Tournament result page I list all users by their total wins (wins is calculated through the results table). First thing first I find all users who have entered a tournament with the query:
User.with_tournament_entrees
With this I can simply loop through the returned users and query each individual record with the following to retrieve each users "Total Wins":
user.results.where("win = true").count()
However I would also like to take this a step further and order all of the users by their "Total Wins", and this is the best I could come up with:
User.with_tournament_entrees.select('SELECT *,
(SELECT count(*)
FROM results
INNER JOIN "registers"
ON "results"."register_id" = "registers"."id"
WHERE "registers"."user_id" = "users.id"
AND (win = true)
) AS total_wins
FROM users ORDER BY total_wins DESC')
I think it's close, but it doesn't actually order by the total_wins in descending order as I instruct it to. I'm using a PostgreSQL database.
Edit:
There's actually three selects taking place, the first occurs on User.with_tournament_entries which just performs a quick filter on the User table. If I ignore that and try
SELECT *, (SELECT count(*) FROM results INNER JOIN "registers" ON "results"."register_id" = "registers"."id" WHERE "registers"."user_id" = "users.id" AND (win = true)) AS total_wins FROM users ORDER BY total_wins DESC;
it fails in both PSQL and the ERB console. I get the error message:
PGError: ERROR: column "users.id" does not exist
I think this happens because the inner-select occurs before the outer-select so it doesn't have access to the user id before hand. Not sure how to give it access to all user ids before than inner select occurs but this isn't an issue when I do User.with_tournament_entires followed by the query.
In your SQL, "users.id" is quoted wrong -- it's telling Postgres to look for a column named, literally, "users.id".
It should be "users"."id", or, just users.id (you only need to quote it if you have a table/column name that conflicts with a postgres keyword, or have punctuation or something else unusual).

Rails/Sql - order/group search results such that repetition of entities occurs only after appearance of others

In my application, say, animals have many photos. I'm querying photos of animals such that I want all photos of all animals to be displayed. However, I want each animal to appear as a photo before repetition occurs.
Example:
animal instance 1, 'cat', has four photos,
animal instance 2, 'dog', has two photos:
photos should appear ordered as so:
#photo belongs to #animal
tiddles.jpg , cat
fido.jpg dog
meow.jpg cat
rover.jpg dog
puss.jpg cat
felix.jpg, cat (no more dogs so two consecutive cats)
Pagination is required so I can't
order on an array.
Filename
structure/convention provides no
help, though the animal_id exists on
each photo.
Though there are two
types of animal in this example this
is an active record model with
hundreds of records.
Animals may be
selectively queried.
If this isn't possible with active_record then I'll happily use sql; I'm using postgresql.
My brain is frazzled so if anyone can come up with a better title, please go ahead and edit it or suggest in comments.
Here is a PostgreSQL specific solution:
batch_id_sql = "RANK() OVER (PARTITION BY animal_id ORDER BY id ASC)"
Photo.paginate(
:select => "DISTINCT photos.*, (#{batch_id_sql}) batch_id",
:order => "batch_id ASC, photos.animal_id ASC",
:page => 1)
Here is a DB agnostic solution:
batch_id_sql = "
SELECT COUNT(bm.*)
FROM photos bm
WHERE bm.animal_id = photos.animal_id AND
bm.id <= photos.id
"
Photo.paginate(
:select => "photos.*, (#{batch_id_sql}) batch_id",
:order => "batch_id ASC, photos.animal_id ASC",
:page => 1)
Both queries work even when you have a where condition. Benchmark the query using expected data set to check if it meets the expected throughput and latency requirements.
Reference
PostgreSQL Window function
Having no experience in activerecord. Using plain PostgreSQL I would try something like this:
Define a window function over all previous rows which counts how many time the current animal has appeared, then order by this count.
SELECT
filename,
animal_id,
COUNT(*) OVER (PARTITION BY animal_id ORDER BY filename) AS cnt
FROM
photos
ORDER BY
cnt,
animal_id,
filename
Filtering on certain animal_id's will work. This will always order the same way. I don't know if you want something random in there, but it should be easily added.
New solution
Add an integer column called batch_id to the animals table.
class AddBatchIdToPhotos < ActiveRecord::Migration
def self.up
add_column :photos, :batch_id, :integer
set_batch_id
change_column :photos, :batch_id, :integer, :nil => false
add_index :photos, :batch_id
end
def self.down
remove_column :photos, :batch_id
end
def self.set_batch_id
# set the batch id to existing rows
# implement this
end
end
Now add a before_create on the Photo model to set the batch id.
class Photo
belongs_to :animal
before_create :batch_photo_add
after_update :batch_photo_update
after_destroy :batch_photo_remove
private
def batch_photo_add
self.batch_id = next_batch_id_for_animal(animal_id)
true
end
def batch_photo_update
return true unless animal_id_changed?
batch_photo_remove(batch_id, animal_id_was)
batch_photo_add
end
def batch_photo_remove(b_id=batch_id, a_id=animal_id)
Photo.update_all("batch_id = batch_id- 1",
["animal_id = ? AND batch_id > ?", a_id, b_id])
true
end
def next_batch_id_for_animal(a_id)
(Photo.maximum(:batch_id, :conditions => {:animal_id => a_id}) || 0) + 1
end
end
Now you can get the desired result by issuing simple paginate command
#animal_photos = Photo.paginate(:page => 1, :per_page => 10,
:order => :batch_id)
How does this work?
Let's consider we have data set as given below:
id Photo Description Batch Id
1 Cat_photo_1 1
2 Cat_photo_2 2
3 Dog_photo_1 1
2 Cat_photo_3 3
4 Dog_photo_2 2
5 Lion_photo_1 1
6 Cat_photo_4 4
Now if we were to execute a query ordered by batch_id we get this
# batch 1 (cat, dog, lion)
Cat_photo_1
Dog_photo_1
Lion_photo_1
# batch 2 (cat, dog)
Cat_photo_2
Dog_photo_2
# batch 3,4 (cat)
Cat_photo_3
Cat_photo_4
The batch distribution is not random, the animals are filled from the top. The number of animals displayed in a page is governed by per_page parameter passed to paginate method (not the batch size).
Old solution
Have you tried this?
If you are using the will_paginate gem:
# assuming you want to order by animal name
animal_photos = Photo.paginate(:include => :animal, :page => 1,
:order => "animals.name")
animal_photos.each do |animal_photo|
puts animal_photo.file_name
puts animal_photo.animal.name
end
I'd recommend something hybrid/corrected based on KandadaBoggu's input.
First off, the correct way to do it on paper is with row_number() over (partition by animal_id order by id). The suggested rank() will generate a global row number, but you want the one within its partition.
Using a window function is also the most flexible solution (in fact, the only solution) if you want to plan to change the sort order here and there.
Take note that this won't necessarily scale well, however, because in order to sort the results you'll need to:
fetch the whole result set that matches your criteria
sort the whole result set to create the partitions and obtain a rank_id
top-n sort/limit over the result set a second time to get them in their final order
The correct way to do this in practice, if your sort order is immutable, is to maintain a pre-calculated rank_id. KandadaBoggu's other suggestion points in the correct direction in this sense.
When it comes to deletes (and possibly updates, if you don't want them sorted by id), you may run into issues because you end up trading faster reads for slower writes. If deleting the cat with an index of 1 leads to updating the next 50k cats, you're going to be in trouble.
If you've very small sets, the overhead might be very acceptable (don't forget to index animal_id).
If not, there's a workaround if you find the order in which specific animals appear is irrelevant. It goes like this:
Start a transaction.
If the rank_id is going to change (i.e. insert or delete), obtain an advisory lock to ensure that two sessions can't impact the rank_id of the same animal class, e.g.:
SELECT pg_try_advisory_lock('the_table'::regclass, the_animal_id);
(Sleep for .05s if you don't obtain it.)
On insert, find max(rank_id) for that animal_id. Assign it rank_id + 1. Then insert it.
On delete, select the animal with the same animal_id and the largest rank_id. Delete your animal, and assign its old rank_id to the fetched animal (unless you were deleting the last one, of course).
Release the advisory lock.
Commit the work.
Note that the above will make good use of an index on (animal_id, rank_id) and can be done using plpgsql triggers:
create trigger "__animals_rank_id__ins"
before insert on animals
for each row execute procedure lock_animal_id_and_assign_rank_id();
create trigger "_00_animals_rank_id__ins"
after insert on animals
for each row execute procedure unlock_animal_id();
create trigger "__animals_rank_id__del"
before delete on animals
for each row execute procedure lock_animal_id();
create trigger "_00_animals_rank_id__del"
after delete on animals
for each row execute procedure reassign_rank_id_and_unlock_animal_id();
You can then create a multi-column index on your sort criteria if you're not joining all over them place, e.g. (rank_id, name). And you'll end up with a snappy site for reads and writes.
You should be able to get the pictures (or filenames, anyway) using ActiveRecord, ordered by name.
Then you can use Enumerable#group_by and Enumerable#zip to zip all the arrays together.
If you give me more information about how your filenames are really arranged (i.e., are they all for sure with an underscore before the number and a constant name before the underscore for each "type"? etc.), then I can give you an example. I'll write one up momentarily showing how you'd do it for your current example.
You could run two sorts and build one array as follows:
result1= The first of each animal type only. use the ruby "find" method for this search.
result2= All animals, sorted by group. Use "find" to again find the first occurrence of each animal and then use "drop" to remove those "first occurrences" from result2.
Then:
markCustomResult = result1 + result2
Then:
You can use willpaginate on markCustomResult

SQL: Get a selected row index from a query

I have an applications that stores players ratings for each tournament. So I have many-to-many association:
Tournament
has_many :participations, :order => 'rating desc'
has_many :players, :through => :participations
Participation
belongs_to :tournament
belongs_to :player
Player
has_many :participations
has_many :tournaments, :through => :participations
The Participation model has a rating field (float) that stores rating value (it's like score points) for each player at each tournament.
The thing I want - get last 10 ranks of the player (rank is a position of the player at particular tournament based on his rating: the more rating - the higher rank). For now to get a player's rank on a tournament I'm loading all participations for this tournament, sort them by rating field and get the player's participation index with ruby code:
class Participation < ActiveRecord::Base
belongs_to :player
belongs_to :tournament
def rank
tournament.participations.index(self)
end
end
Method rank of the participation gets its parent tournament, loads all tournamentr's participations (ordered by rating desc) and get own index inside this collection
and then something like:
player.participations.last.rank
The one thing I don't like - it need to load all participations for the tournament, and in case I need player ranks for last 10 tournaments it loads over 5.000 items (and its amount will grow when new players added).
I believe that there should be way to use SQL for it. Actually I tried to use SQL variables:
find_by_sql("select #row:=#row+1 `rank`, p.* from participations p, (SELECT #row:=0) r where(p.tournament_id = #{tournament_id}) order by rating desc limit 10;")
This query selects top-10 ratings from the given tournament. I've been trying to modify it to select last 10 participations for a given user and his rank.
Will appreciate any kind of help. (I think solution will be a SQL request, since it's pretty complex for ActiveRecord).
P.S. I'm using rails3.0.0.beta4
UPD:
Here is final sql request that gets last 10 ranks of the player (in addition it loads the participated tournaments as well)
SELECT *, (
SELECT COUNT(*) FROM participations AS p2
WHERE p2.tour_id = p1.tour_id AND p2.rating > p1.rating
) AS rank
FROM participations AS p1 LEFT JOIN tours ON tours.id = p1.tour_id WHERE p1.player_id = 68 ORDER BY tours.date desc LIMIT 10;
First of all, should this:
Participation
belongs_to :tournament
belongs_to :players
be this?
Participation
belongs_to :tournament
belongs_to :player
Ie, singular player after the belongs_to?
I'm struggling to get my head around what this is doing:
class Participation
def rank_at_tour(tour)
tour.participations.index(self)
end
end
You don't really explain enough about your schema to make it easy to reverse engineer. Is it doing the following...?
"Get all the participations for the given tour and return the position of this current participation in that list"? Is that how you calculate rank? If so i agree it seems like a very convoluted way of doing it.
Do you do the above for the ten participation objects you get back for the player and then take the average? What is rating? Does that have anything to do with rank? Basically, can you explain your schema a bit more and then restate what you want to do?
EDIT
I think you just need a more efficient way of finding the position. There's one way i could think of off the top of my head - get the record you want and then count how many are above it. Add 1 to that and you get the position. eg
class Participation
def rank_at_tour(tour)
tour.participations.count("rating > ?", self.rating) + 1
end
end
You should see in your log file (eg while experimenting in the console) that this just makes a count query. If you have an index on the rating field (which you should have if you don't) then this will be a very fast query to execute.
Also - if tour and tournament are the same thing (as i said you seem to use them interchangeably) then you don't need to pass tour to participation since it belongs to a tour anyway. Just change the method to rank:
class Participation
def rank
self.tour.participations.count("rating > ?", self.rating) + 1
end
end
SELECT *, (
SELECT COUNT(*) FROM participations AS p2
WHERE p2.tour_id = p1.tour_id AND p2.rating > p1.rating
) AS rank
FROM participations AS p1 LEFT JOIN tours ON tours.id = p1.tour_id WHERE p1.player_id = 68 ORDER BY tours.date desc LIMIT 10;