Problem:
I am using the ransack gem to sort columns in a table. I have 2 models: Campaign and Course. A campaign has many courses, and a course belongs to one campaign. Each course has a number of total_attendees. My Campaigns table has a column for Total Attendees, and I want it to be sortable. So it would sum up the total_attendees field for each course that belongs to a single campaign, and sort based on that sum.
Ex. A campaign has 3 courses, each with 10 attendees. The Total Attendees column on the campaign table would show 30 and it would be sortable against total attendees for all the other campaigns.
I found ransackers:
https://github.com/activerecord-hackery/ransack/wiki/Using-Ransackers
and this SO question: Ransack sort by sum of relation
and from that put together a lot of what is below.
From Model - campaign.rb:
class Campaign < ApplicationRecord
has_many :courses
ransacker :sum_of_total_attendees do
query = "SELECT SUM(r.total_attendees)
FROM campaigns c
LEFT OUTER JOIN courses r
ON r.campaign_id = c.id
GROUP BY c.id"
Arel.sql(query)
end
end
From Model - course.rb:
class Course < ApplicationRecord
belongs_to :campaign, optional: true
end
View:
<th scope="col"><%= sort_link(#q, :sum_of_total_attendees, 'Total Attendees') %></th>
Controller - campaigns_controller.rb:
all_campaigns = Campaign.all
#q = all_campaigns.ransack(params[:q])
#campaigns = #q.result
Errors:
The ransacker query gives me the data I want, but I don't know what to do to get the right information .
Originally, when I clicked on the th link to sort the data, I got this error:
PG::CardinalityViolation: ERROR: more than one row returned by a
subquery used as an expression
I don't know what changed, but now I'm getting this error:
PG::SyntaxError: ERROR: syntax error at or near "SELECT"
LINE 1: SELECT "campaigns".* FROM "campaigns" ORDER BY SELECT SUM(r....
^
: SELECT "campaigns".* FROM "campaigns" ORDER BY SELECT
SUM(r.total_attendees)
FROM campaigns c
LEFT OUTER JOIN courses r
ON r.campaign_id = c.id
GROUP BY c.id ASC
This error seems to say that the ransack search parameter, #q and the ransacker query don't work together. There are two selects in this request, when there should definitely be only one, but the first one is coming from ransack, so I'm not sure how to address it.
How do I get my query to sort correctly with ransack?
Articles I've looked at but did not seem to apply to what I was looking to accomplish with this story:
Ransack Sort By Sum of Relation: This is the one I worked from a lot, but I'm not sure why it works for this user and not for me. They don't show what is changed, if anything, in the controller
Ransack Github Issue For Multiple Params: This doesn't cover the issue of summing table columns.
Rails Ransack Sorting Searching Based On A Definition In The Model: This didn't apply to my need to sort based on summed data.
Three Ways to Bend The Ransack Gem: This looks like what I was doing, but I'm not sure why theirs is working but mine isn't.
There are two models with our familiar one-to-many relationship:
class Custom
has_many :orders
end
class Order
belongs_to :custom
end
I want to do the following work:
get all the custom information whose age is over 18, and how many big orders(pay for 1,000 dollars) they have?
UPDATE:
for the models:
rails g model custom name:string age:integer
rails g model orders amount:decimal custom_id:integer
I hope one left join sql statement will do all my job, and don't construct unnecessary objects like this:
Custom.where('age > ?', '18').includes(:orders).where('orders.amount > ?', '1000')
It will construct a lot of order objects which I don't need, and it will calculate the count by Array#count function which will waste time.
UPDATE 2:
My own solution is wrong, it will remove customs who doesn't have big orders from the result.
Finding adult customers with big orders
This solution uses a single query, with the nested orders relation transformed into a sub-query.
big_customers = Custom.where("age > ?", "18").where(
id: Order.where("amount > ?", "1000").select(:custom_id)
)
Grab all adults and their # of big orders (MySQL)
This can still be done in a single query. The count is grabbed via a join on orders and sticking the count of orders into a column in the result called big_orders_count, which ActiveRecord turns into a method. It involves a lot more "raw" SQL. I don't know any way to avoid this with ActiveRecord except with the great squeel gem.
adults = Custom.where("age > ?", "18").select([
Custom.arel_table["*"],
"count(orders.id) as big_orders_count"
]).joins(%{LEFT JOIN orders
ON orders.custom_id = customs.id
AND orders.amount > 1000})
# see count:
adults.first.big_orders_count
You might want to consider caching counters like this. This join will be expensive on the database, so if you had a dedicated customs.big_order_count column that was either refreshed regularly or updated by an observer that watches for big Order records.
Grab all adults and their # of big orders (PostgreSQL)
Solution 2 is mysql only. To get this to work in postgresql I created a third solution that uses a sub-query. Still one call to the DB :-)
adults = Custom.where("age > ?", "18").select([
%{"customs".*},
%{(
SELECT count(*)
FROM orders
WHERE orders.custom_id = customs.id
AND orders.amount > 1000
) AS big_orders_count}
])
# see count:
adults.first.big_orders_count
I have tested this against postgresql with real data. There may be a way to use more ActiveRecord and less SQL, but this works.
Edited.
#custom_over_18 = Custom.where("age > ?", "18").orders.where("amount > ?", "1000").count
By example:
r = Model.arel_table
s = SomeOtherModel.arel_table
Model.select(r[:id], s[:othercolumn].as('othercolumn')).
joins(:someothermodel)
Will product the sql:
`SELECT `model`.`id`, `someothermodel`.`othercolumn` AS othercolumn FROM `model` INNER JOIN `someothermodel` ON `model`.`id` = `someothermodel`.`model_id`
Which is correct. However, when the models are loaded, the attribute othercolumn is ignored because it is not an attribute of Model.
It's similar to eager loading and includes, but I don't want all columns, only the one specified so include is no good.
There must be an easy way of getting columns from other models? I'd preferably have the items return as instances of Model than simple arrays/hashes
When you do a select with joins or includes, you will be returned an ActiveRecordRelation. This ActiveRecordRelation is composed of only the objects of the class which you use to call select on. The selected columns from the joined models are added to the objects returned. Because these attributes are not Model's attribute they don't show up when you inspect these objects, and I believe this is the primary reason for confusion.
You could try this out in your rails console:
> result = Model.select(r[:id], s[:othercolumn].as('othercolumn')).joins(:someothermodel)
=> #<ActiveRecord::Relation [#<Model id: 1>]>
# "othercolumn" is not shown in the result but doing the following will yield correct result
> result.first.othercolumn
=> "myothercolumnvalue"
In my application, say, animals have many photos. I'm querying photos of animals such that I want all photos of all animals to be displayed. However, I want each animal to appear as a photo before repetition occurs.
Example:
animal instance 1, 'cat', has four photos,
animal instance 2, 'dog', has two photos:
photos should appear ordered as so:
#photo belongs to #animal
tiddles.jpg , cat
fido.jpg dog
meow.jpg cat
rover.jpg dog
puss.jpg cat
felix.jpg, cat (no more dogs so two consecutive cats)
Pagination is required so I can't
order on an array.
Filename
structure/convention provides no
help, though the animal_id exists on
each photo.
Though there are two
types of animal in this example this
is an active record model with
hundreds of records.
Animals may be
selectively queried.
If this isn't possible with active_record then I'll happily use sql; I'm using postgresql.
My brain is frazzled so if anyone can come up with a better title, please go ahead and edit it or suggest in comments.
Here is a PostgreSQL specific solution:
batch_id_sql = "RANK() OVER (PARTITION BY animal_id ORDER BY id ASC)"
Photo.paginate(
:select => "DISTINCT photos.*, (#{batch_id_sql}) batch_id",
:order => "batch_id ASC, photos.animal_id ASC",
:page => 1)
Here is a DB agnostic solution:
batch_id_sql = "
SELECT COUNT(bm.*)
FROM photos bm
WHERE bm.animal_id = photos.animal_id AND
bm.id <= photos.id
"
Photo.paginate(
:select => "photos.*, (#{batch_id_sql}) batch_id",
:order => "batch_id ASC, photos.animal_id ASC",
:page => 1)
Both queries work even when you have a where condition. Benchmark the query using expected data set to check if it meets the expected throughput and latency requirements.
Reference
PostgreSQL Window function
Having no experience in activerecord. Using plain PostgreSQL I would try something like this:
Define a window function over all previous rows which counts how many time the current animal has appeared, then order by this count.
SELECT
filename,
animal_id,
COUNT(*) OVER (PARTITION BY animal_id ORDER BY filename) AS cnt
FROM
photos
ORDER BY
cnt,
animal_id,
filename
Filtering on certain animal_id's will work. This will always order the same way. I don't know if you want something random in there, but it should be easily added.
New solution
Add an integer column called batch_id to the animals table.
class AddBatchIdToPhotos < ActiveRecord::Migration
def self.up
add_column :photos, :batch_id, :integer
set_batch_id
change_column :photos, :batch_id, :integer, :nil => false
add_index :photos, :batch_id
end
def self.down
remove_column :photos, :batch_id
end
def self.set_batch_id
# set the batch id to existing rows
# implement this
end
end
Now add a before_create on the Photo model to set the batch id.
class Photo
belongs_to :animal
before_create :batch_photo_add
after_update :batch_photo_update
after_destroy :batch_photo_remove
private
def batch_photo_add
self.batch_id = next_batch_id_for_animal(animal_id)
true
end
def batch_photo_update
return true unless animal_id_changed?
batch_photo_remove(batch_id, animal_id_was)
batch_photo_add
end
def batch_photo_remove(b_id=batch_id, a_id=animal_id)
Photo.update_all("batch_id = batch_id- 1",
["animal_id = ? AND batch_id > ?", a_id, b_id])
true
end
def next_batch_id_for_animal(a_id)
(Photo.maximum(:batch_id, :conditions => {:animal_id => a_id}) || 0) + 1
end
end
Now you can get the desired result by issuing simple paginate command
#animal_photos = Photo.paginate(:page => 1, :per_page => 10,
:order => :batch_id)
How does this work?
Let's consider we have data set as given below:
id Photo Description Batch Id
1 Cat_photo_1 1
2 Cat_photo_2 2
3 Dog_photo_1 1
2 Cat_photo_3 3
4 Dog_photo_2 2
5 Lion_photo_1 1
6 Cat_photo_4 4
Now if we were to execute a query ordered by batch_id we get this
# batch 1 (cat, dog, lion)
Cat_photo_1
Dog_photo_1
Lion_photo_1
# batch 2 (cat, dog)
Cat_photo_2
Dog_photo_2
# batch 3,4 (cat)
Cat_photo_3
Cat_photo_4
The batch distribution is not random, the animals are filled from the top. The number of animals displayed in a page is governed by per_page parameter passed to paginate method (not the batch size).
Old solution
Have you tried this?
If you are using the will_paginate gem:
# assuming you want to order by animal name
animal_photos = Photo.paginate(:include => :animal, :page => 1,
:order => "animals.name")
animal_photos.each do |animal_photo|
puts animal_photo.file_name
puts animal_photo.animal.name
end
I'd recommend something hybrid/corrected based on KandadaBoggu's input.
First off, the correct way to do it on paper is with row_number() over (partition by animal_id order by id). The suggested rank() will generate a global row number, but you want the one within its partition.
Using a window function is also the most flexible solution (in fact, the only solution) if you want to plan to change the sort order here and there.
Take note that this won't necessarily scale well, however, because in order to sort the results you'll need to:
fetch the whole result set that matches your criteria
sort the whole result set to create the partitions and obtain a rank_id
top-n sort/limit over the result set a second time to get them in their final order
The correct way to do this in practice, if your sort order is immutable, is to maintain a pre-calculated rank_id. KandadaBoggu's other suggestion points in the correct direction in this sense.
When it comes to deletes (and possibly updates, if you don't want them sorted by id), you may run into issues because you end up trading faster reads for slower writes. If deleting the cat with an index of 1 leads to updating the next 50k cats, you're going to be in trouble.
If you've very small sets, the overhead might be very acceptable (don't forget to index animal_id).
If not, there's a workaround if you find the order in which specific animals appear is irrelevant. It goes like this:
Start a transaction.
If the rank_id is going to change (i.e. insert or delete), obtain an advisory lock to ensure that two sessions can't impact the rank_id of the same animal class, e.g.:
SELECT pg_try_advisory_lock('the_table'::regclass, the_animal_id);
(Sleep for .05s if you don't obtain it.)
On insert, find max(rank_id) for that animal_id. Assign it rank_id + 1. Then insert it.
On delete, select the animal with the same animal_id and the largest rank_id. Delete your animal, and assign its old rank_id to the fetched animal (unless you were deleting the last one, of course).
Release the advisory lock.
Commit the work.
Note that the above will make good use of an index on (animal_id, rank_id) and can be done using plpgsql triggers:
create trigger "__animals_rank_id__ins"
before insert on animals
for each row execute procedure lock_animal_id_and_assign_rank_id();
create trigger "_00_animals_rank_id__ins"
after insert on animals
for each row execute procedure unlock_animal_id();
create trigger "__animals_rank_id__del"
before delete on animals
for each row execute procedure lock_animal_id();
create trigger "_00_animals_rank_id__del"
after delete on animals
for each row execute procedure reassign_rank_id_and_unlock_animal_id();
You can then create a multi-column index on your sort criteria if you're not joining all over them place, e.g. (rank_id, name). And you'll end up with a snappy site for reads and writes.
You should be able to get the pictures (or filenames, anyway) using ActiveRecord, ordered by name.
Then you can use Enumerable#group_by and Enumerable#zip to zip all the arrays together.
If you give me more information about how your filenames are really arranged (i.e., are they all for sure with an underscore before the number and a constant name before the underscore for each "type"? etc.), then I can give you an example. I'll write one up momentarily showing how you'd do it for your current example.
You could run two sorts and build one array as follows:
result1= The first of each animal type only. use the ruby "find" method for this search.
result2= All animals, sorted by group. Use "find" to again find the first occurrence of each animal and then use "drop" to remove those "first occurrences" from result2.
Then:
markCustomResult = result1 + result2
Then:
You can use willpaginate on markCustomResult
I was hoping somebody may be able to point me in the right direction...
I have a database called Info and use a find command to select the rows in this database which match a certain criteria
#matching = Info.find( :all, :conditions => ["product_name = ?", distinctproduct], :order => 'Price ASC')
I then pull out the cheapest of these items
#cheapest = #matching.first
Finally, I would like to create an instantaneous array which contains a list of #cheapest for a number of different search criteria. i.e. row 1 in #allcheapest is #cheapest for criteria 1, row 2 in #allcheapest is #cheapest for criteria 2, ...
Any help would be great, thanks in advance
Info.where(:product_name => distinct_product.to_s).order('Price ASC').first
to select the cheapest price for the product_name. Without more insight into how your database is structured, it is difficult to suggest how to obtain the latter, but you may try
Info.where(:product_name => distinct_product.to_s).order('Price ASC').group(:product_name)