SUM operation on attributes of children of multiple parent records - sql

I have this method in my Product model that does what I need:
def self.available
available = []
Product.all.each do |product|
if product.quantity > product.sales.sum(:quantity)
available << product
end
end
return available
end
But, I am wondering if there is a more efficient way to do this, maybe with only one call to the database.

Well you might try:
Product.where("products.quantity > Coalesce((select sum(s.quantity) from sales s where s.product_id = products.id), 0)")

This creates a number of queries equal to the number of products you have due to the sum query. Here is a way I though of that will reduce database queries.
map = Sale.joins(:product)
.group("products.id", "products.quantity").sum(:quantity).to_a
Which will produce an array similar to
[[[1,20],30], [[2,45], 20]]
this corresponds to [[[product_id, product_quantity], sold_quantity ]]
Now loop over this array and compare the values.
available = []
map.each do |item|
if item[0][1] > item[1]
available << item[0][0]
end
end
Now that you have the available array populated, perform another query.
available = Product.where(id: available)
Now you got your same output in two queries instead of Product.count (N) number of queries. This solution can be inefficient sometimes but I will be updating it regularly if I had any ideas.

Related

Ruby on Rails - Active Record FIlter where the value of a referenced table is > 0

I am currently trying to filter out from selected data in Ruby on Rails those where the attribute "amount_available" is greater than zero. This would be no problem via #events.where(ticket_categories.amount_available > 0), but ticket_categories is an array with not a fixed length, because there can be multiple categories. How can you easily iterate through the array in the where clause and do this comparison?
I only need the events in the output where at least one associated category has the amount_available > 0.
This is my code:
#upcoming_events = #events.where("date >=?", Date.current)
#available_events = #upcoming_events.where(ticket_categories[0].amount_available > 0)
json_response(#available_events)
You can chain where conditions and you can add conditions that are based on associated models with joins:
available_events = #events
.where('date >= ?', Date.current)
.joins(:ticket_categories).where('ticket_categories.amount > 0')
.group(:id)
render json: available_events
Note: Database joins might return duplicate records (depending on your database structure and the condition) therefore the need to group the result set by id.
It is only a representation because the Events table is linked to TicketCategories via has_many. I use PostgresSQL and could now solve it with the following code:
#upcoming_events = #close_events.where("date >=?", Date.current)
available_events = []
#upcoming_events.each do |event|
event.ticket_categories.each do|category|
if category.amount_available > 0
available_events.push(event)
break;
end
end
end
render json: available_events

How to retrieve a list of records and the count of each one's children with condition in Active Record?

There are two models with our familiar one-to-many relationship:
class Custom
has_many :orders
end
class Order
belongs_to :custom
end
I want to do the following work:
get all the custom information whose age is over 18, and how many big orders(pay for 1,000 dollars) they have?
UPDATE:
for the models:
rails g model custom name:string age:integer
rails g model orders amount:decimal custom_id:integer
I hope one left join sql statement will do all my job, and don't construct unnecessary objects like this:
Custom.where('age > ?', '18').includes(:orders).where('orders.amount > ?', '1000')
It will construct a lot of order objects which I don't need, and it will calculate the count by Array#count function which will waste time.
UPDATE 2:
My own solution is wrong, it will remove customs who doesn't have big orders from the result.
Finding adult customers with big orders
This solution uses a single query, with the nested orders relation transformed into a sub-query.
big_customers = Custom.where("age > ?", "18").where(
id: Order.where("amount > ?", "1000").select(:custom_id)
)
Grab all adults and their # of big orders (MySQL)
This can still be done in a single query. The count is grabbed via a join on orders and sticking the count of orders into a column in the result called big_orders_count, which ActiveRecord turns into a method. It involves a lot more "raw" SQL. I don't know any way to avoid this with ActiveRecord except with the great squeel gem.
adults = Custom.where("age > ?", "18").select([
Custom.arel_table["*"],
"count(orders.id) as big_orders_count"
]).joins(%{LEFT JOIN orders
ON orders.custom_id = customs.id
AND orders.amount > 1000})
# see count:
adults.first.big_orders_count
You might want to consider caching counters like this. This join will be expensive on the database, so if you had a dedicated customs.big_order_count column that was either refreshed regularly or updated by an observer that watches for big Order records.
Grab all adults and their # of big orders (PostgreSQL)
Solution 2 is mysql only. To get this to work in postgresql I created a third solution that uses a sub-query. Still one call to the DB :-)
adults = Custom.where("age > ?", "18").select([
%{"customs".*},
%{(
SELECT count(*)
FROM orders
WHERE orders.custom_id = customs.id
AND orders.amount > 1000
) AS big_orders_count}
])
# see count:
adults.first.big_orders_count
I have tested this against postgresql with real data. There may be a way to use more ActiveRecord and less SQL, but this works.
Edited.
#custom_over_18 = Custom.where("age > ?", "18").orders.where("amount > ?", "1000").count

Filtering model with HABTM relationship

I have 2 models - Restaurant and Feature. They are connected via has_and_belongs_to_many relationship. The gist of it is that you have restaurants with many features like delivery, pizza, sandwiches, salad bar, vegetarian option,… So now when the user wants to filter the restaurants and lets say he checks pizza and delivery, I want to display all the restaurants that have both features; pizza, delivery and maybe some more, but it HAS TO HAVE pizza AND delivery.
If I do a simple .where('features IN (?)', params[:features]) I (of course) get the restaurants that have either - so or pizza or delivery or both - which is not at all what I want.
My SQL/Rails knowledge is kinda limited since I'm new to this but I asked a friend and now I have this huuuge SQL that gets the job done:
Restaurant.find_by_sql(['SELECT restaurant_id FROM (
SELECT features_restaurants.*, ROW_NUMBER() OVER(PARTITION BY restaurants.id ORDER BY features.id) AS rn FROM restaurants
JOIN features_restaurants ON restaurants.id = features_restaurants.restaurant_id
JOIN features ON features_restaurants.feature_id = features.id
WHERE features.id in (?)
) t
WHERE rn = ?', params[:features], params[:features].count])
So my question is: is there a better - more Rails even - way of doing this? How would you do it?
Oh BTW I'm using Rails 4 on Heroku so it's a Postgres DB.
This is an example of a set-iwthin-sets query. I advocate solving these with group by and having, because this provides a general framework.
Here is how this works in your case:
select fr.restaurant_id
from features_restaurants fr join
features f
on fr.feature_id = f.feature_id
group by fr.restaurant_id
having sum(case when f.feature_name = 'pizza' then 1 else 0 end) > 0 and
sum(case when f.feature_name = 'delivery' then 1 else 0 end) > 0
Each condition in the having clause is counting for the presence of one of the features -- "pizza" and "delivery". If both features are present, then you get the restaurant_id.
How much data is in your features table? Is it just a table of ids and names?
If so, and you're willing to do a little denormalization, you can do this much more easily by encoding the features as a text array on restaurant.
With this scheme your queries boil down to
select * from restaurants where restaurants.features #> ARRAY['pizza', 'delivery']
If you want to maintain your features table because it contains useful data, you can store the array of feature ids on the restaurant and do a query like this:
select * from restaurants where restaurants.feature_ids #> ARRAY[5, 17]
If you don't know the ids up front, and want it all in one query, you should be able to do something along these lines:
select * from restaurants where restaurants.feature_ids #> (
select id from features where name in ('pizza', 'delivery')
) as matched_features
That last query might need some more consideration...
Anyways, I've actually got a pretty detailed article written up about Tagging in Postgres and ActiveRecord if you want some more details.
This is not "copy and paste" solution but if you consider following steps you will have fast working query.
index feature_name column (I'm assuming that column feature_id is indexed on both tables)
place each feature_name param in exists():
select fr.restaurant_id
from
features_restaurants fr
where
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'pizza')
and
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'delivery')
group by
fr.restaurant_id
Maybe you're looking at it backwards?
Maybe try merging the restaurants returned by each feature.
Simplified:
pizza_restaurants = Feature.find_by_name('pizza').restaurants
delivery_restaurants = Feature.find_by_name('delivery').restaurants
pizza_delivery_restaurants = pizza_restaurants & delivery_restaurants
Obviously, this is a single instance solution. But it illustrates the idea.
UPDATE
Here's a dynamic method to pull in all filters without writing SQL (i.e. the "Railsy" way)
def get_restaurants_by_feature_names(features)
# accepts an array of feature names
restaurants = Restaurant.all
features.each do |f|
feature_restaurants = Feature.find_by_name(f).restaurants
restaurants = feature_restaurants & restaurants
end
return restaurants
end
Since its an AND condition (the OR conditions get dicey with AREL). I reread your stated problem and ignoring the SQL. I think this is what you want.
# in Restaurant
has_many :features
# in Feature
has_many :restaurants
# this is a contrived example. you may be doing something like
# where(name: 'pizza'). I'm just making this condition up. You
# could also make this more DRY by just passing in the name if
# that's what you're doing.
def self.pizza
where(pizza: true)
end
def self.delivery
where(delivery: true)
end
# query
Restaurant.features.pizza.delivery
Basically you call the association with ".features" and then you use the self methods defined on features. Hopefully I didn't misunderstand the original problem.
Cheers!
Restaurant
.joins(:features)
.where(features: {name: ['pizza','delivery']})
.group(:id)
.having('count(features.name) = ?', 2)
This seems to work for me. I tried it with SQLite though.

ActiveRecord find_each combined with limit and order

I'm trying to run a query of about 50,000 records using ActiveRecord's find_each method, but it seems to be ignoring my other parameters like so:
Thing.active.order("created_at DESC").limit(50000).find_each {|t| puts t.id }
Instead of stopping at 50,000 I'd like and sorting by created_at, here's the resulting query that gets executed over the entire dataset:
Thing Load (198.8ms) SELECT "things".* FROM "things" WHERE "things"."active" = 't' AND ("things"."id" > 373343) ORDER BY "things"."id" ASC LIMIT 1000
Is there a way to get similar behavior to find_each but with a total max limit and respecting my sort criteria?
The documentation says that find_each and find_in_batches don't retain sort order and limit because:
Sorting ASC on the PK is used to make the batch ordering work.
Limit is used to control the batch sizes.
You could write your own version of this function like #rorra did. But you can get into trouble when mutating the objects. If for example you sort by created_at and save the object it might come up again in one of the next batches. Similarly you might skip objects because the order of results has changed when executing the query to get the next batch. Only use that solution with read only objects.
Now my primary concern was that I didn't want to load 30000+ objects into memory at once. My concern was not the execution time of the query itself. Therefore I used a solution that executes the original query but only caches the ID's. It then divides the array of ID's into chunks and queries/creates the objects per chunk. This way you can safely mutate the objects because the sort order is kept in memory.
Here is a minimal example similar to what I did:
batch_size = 512
ids = Thing.order('created_at DESC').pluck(:id) # Replace .order(:created_at) with your own scope
ids.each_slice(batch_size) do |chunk|
Thing.find(chunk, :order => "field(id, #{chunk.join(',')})").each do |thing|
# Do things with thing
end
end
The trade-offs to this solution are:
The complete query is executed to get the ID's
An array of all the ID's is kept in memory
Uses the MySQL specific FIELD() function
Hope this helps!
find_each uses find_in_batches under the hood.
Its not possible to select the order of the records, as described in find_in_batches, is automatically set to ascending on the primary key (“id ASC”) to make the batch ordering work.
However, the criteria is applied, what you can do is:
Thing.active.find_each(batch_size: 50000) { |t| puts t.id }
Regarding the limit, it wasn't implemented yet: https://github.com/rails/rails/pull/5696
Answering to your second question, you can create the logic yourself:
total_records = 50000
batch = 1000
(0..(total_records - batch)).step(batch) do |i|
puts Thing.active.order("created_at DESC").offset(i).limit(batch).to_sql
end
Retrieving the ids first and processing the in_groups_of
ordered_photo_ids = Photo.order(likes_count: :desc).pluck(:id)
ordered_photo_ids.in_groups_of(1000, false).each do |photo_ids|
photos = Photo.order(likes_count: :desc).where(id: photo_ids)
# ...
end
It's important to also add the ORDER BY query to the inner call.
Rails 6.1 adds support for descending order in find_each, find_in_batches and in_batches.
One option is to put an implementation tailored for your particular model into the model itself (speaking of which, id is usually a better choice for ordering records, created_at may have duplicates):
class Thing < ActiveRecord::Base
def self.find_each_desc limit
batch_size = 1000
i = 1
records = self.order(created_at: :desc).limit(batch_size)
while records.any?
records.each do |task|
yield task, i
i += 1
return if i > limit
end
records = self.order(created_at: :desc).where('id < ?', records.last.id).limit(batch_size)
end
end
end
Or else you can generalize things a bit, and make it work for all the models:
lib/active_record_extensions.rb:
ActiveRecord::Batches.module_eval do
def find_each_desc limit
batch_size = 1000
i = 1
records = self.order(id: :desc).limit(batch_size)
while records.any?
records.each do |task|
yield task, i
i += 1
return if i > limit
end
records = self.order(id: :desc).where('id < ?', records.last.id).limit(batch_size)
end
end
end
ActiveRecord::Querying.module_eval do
delegate :find_each_desc, :to => :all
end
config/initializers/extensions.rb:
require "active_record_extensions"
P.S. I'm putting the code in files according to this answer.
You can iterate backwards by standard ruby iterators:
Thing.last.id.step(0,-1000) do |i|
Thing.where(id: (i-1000+1)..i).order('id DESC').each do |thing|
#...
end
end
Note: +1 is because BETWEEN which will be in query includes both bounds but we need include only one.
Sure, with this approach there could be fetched less than 1000 records in batch because some of them are deleted already but this is ok in my case.
As remarked by #Kirk in one of the comments, find_each supports limit as of version 5.1.0.
Example from the changelog:
Post.limit(10_000).find_each do |post|
# ...
end
The documentation says:
Limits are honored, and if present there is no requirement for the batch size: it can be less than, equal to, or greater than the limit.
(setting a custom order is still not supported though)
I was looking for the same behaviour and thought up of this solution. This DOES NOT order by created_at but I thought I would post anyways.
max_records_to_retrieve = 50000
last_index = Thing.count
start_index = [(last_index - max_records_to_retrieve), 0].max
Thing.active.find_each(:start => start_index) do |u|
# do stuff
end
Drawbacks of this approach:
- You need 2 queries (first one should be fast)
- This guarantees a max of 50K records but if ids are skipped you will get less.
You can try ar-as-batches Gem.
From their documentation you can do something like this
Users.where(country_id: 44).order(:joined_at).offset(200).as_batches do |user|
user.party_all_night!
end
Using Kaminari or something other it will be easy.
Create batch loader class.
module BatchLoader
extend ActiveSupport::Concern
def batch_by_page(options = {})
options = init_batch_options!(options)
next_page = 1
loop do
next_page = yield(next_page, options[:batch_size])
break next_page if next_page.nil?
end
end
private
def default_batch_options
{
batch_size: 50
}
end
def init_batch_options!(options)
options ||= {}
default_batch_options.merge!(options)
end
end
Create Repository
class ThingRepository
include BatchLoader
# #param [Integer] per_page
# #param [Proc] block
def batch_changes(per_page=100, &block)
relation = Thing.active.order("created_at DESC")
batch_by_page do |next_page|
query = relation.page(next_page).per(per_page)
yield query if block_given?
query.next_page
end
end
end
Use the repository
repo = ThingRepository.new
repo.batch_changes(5000).each do |g|
g.each do |t|
#...
end
end
Adding find_in_batches_with_order did solve my usecase, where I was having ids already but need batching and ordering. It was inspired by #dirk-geurs solution
# Create file config/initializers/find_in_batches_with_order.rb with follwing code.
ActiveRecord::Batches.class_eval do
## Only flat order structure is supported now
## example: [:forename, :surname] is supported but [:forename, {surname: :asc}] is not supported
def find_in_batches_with_order(ids: nil, order: [], batch_size: 1000)
relation = self
arrangement = order.dup
index = order.find_index(:id)
unless index
arrangement.push(:id)
index = arrangement.length - 1
end
ids ||= relation.order(*arrangement).pluck(*arrangement).map{ |tupple| tupple[index] }
ids.each_slice(batch_size) do |chunk_ids|
chunk_relation = relation.where(id: chunk_ids).order(*order)
yield(chunk_relation)
end
end
end
Leaving Gist here https://gist.github.com/the-spectator/28b1176f98cc2f66e870755bb2334545
I had the same problem with a query with DISTINCT ON where you need an ORDER BY with that field, so this is my approach with Postgres:
def filtered_model_ids
Model.joins(:father_model)
.select('DISTINCT ON (model.field) model.id')
.order(:field)
.map(&:id)
end
def processor
filtered_model_ids.each_slice(BATCH_SIZE).lazy.each do |batch|
Model.find(batch).each do |record|
# Code
end
end
end
My code
batch_size = 100
total_count = klass.count
offset = 0
processed_count = 0
while processed_count < total_count
relation = klass.order({ active_at: :asc, created_at: :desc }).offset(offset).limit(batch_size)
relation.each do |record|
record.process
end
processed_count += batch_size
end
Do it in one query and avoid iterating:
User.offset(2).order('name DESC').last(3)
will product a query like this
SELECT "users".* FROM "users" ORDER BY name ASC LIMIT $1 OFFSET $2 [["LIMIT", 3], ["OFFSET", 2]

Rails (or maybe SQL): Finding and deleting duplicate AR objects

ActiveRecord objects of the class 'Location' (representing the db-table Locations) have the attributes 'url', 'lat' (latitude) and 'lng' (longitude).
Lat-lng-combinations on this model should be unique. The problem is, that there are a lot of Location-objects in the database having duplicate lat-lng-combinations.
I need help in doing the following
Find objects that share the same
lat-lng-combination.
If the 'url' attribute of the object
isn't empty, keep this object and delete the
other duplicates. Otherwise just choose the
oldest object (by checking the attribute
'created_at') and delete the other duplicates.
As this is a one-time-operation, solutions in SQL (MySQL 5.1 compatible) are welcome too.
If it's a one time thing then I'd just do it in Ruby and not worry too much about efficiency. I haven't tested this thoroughly, check the sorting and such to make sure it'll do exactly what you want before running this on your db :)
keep = []
locations = Location.find(:all)
locations.each do |loc|
# get all Locations's with the same coords as this one
same_coords = locations.select { |l| l.lat == loc.lat and \
l.lng == loc.lng }
with_urls = same_coords.select { |l| !l.url.empty? }
# decide which list to use depending if there were any urls
same_coords = with_urls.any? ? with_urls : same_coords
# pick the best one
keep << same_coords.sort { |a,b| b.created_at <=> a.created_at }.first.id
end
# only keep unique ids
keep.uniq!
# now we just delete all the rows we didn't decide to keep
locations.each do |loc|
loc.destroy unless keep.include?( loc.id )
end
Now like I said, this is definitely poor, poor code. But sometimes just hacking out the thing that works is worth the time saved in thinking up something 'better', especially if it's just a one-off.
If you have 2 MySQL columns, you can use the CONCAT function.
SELECT * FROM table1 GROUP BY CONCAT(column_lat, column_lng)
If you need to know the total
SELECT COUNT(*) AS total FROM table1 GROUP BY CONCAT(column_lat, column_lng)
Or, you can combine both
SELECT COUNT(*) AS total, table1.* FROM table1
GROUP BY CONCAT(column_lat, column_lng)
But if you can explain more on your question, perhaps we can have more relevant answers.