So I have these two sidekiq workers, both should connect to and save data to a database. The specific database can change based on the Account.
Here is the one that works: (there are other methods in the worker)
class GaDataloadProcessor < GaWorker
attr_reader :profile, :dataload, :connection, :results
def perform(id)
#dataload = Dataload::GoogleAnalytics.find(id)
connector = dataload.google_analytics_connector
with_session(connector.username, connector.password) do
load_profile(connector.account, dataload.profile)
run_report
setup_db_connection
import_results
end
end
private
def setup_db_connection
#connection = ActiveRecord::RdsDb.get_connection(dataload.account)
end
I start this worker in the console with a valid ID and I get:
irb(main):004:0> ga_worker.perform(2)
Dataload::GoogleAnalytics Load (3.0ms) SELECT "dataload_google_analytics".* FROM "dataload_google_analytics" WHERE "dataload_google_analytics"."id" = $1 LIMIT 1 [["id", 2]]
Connector::GoogleAnalytics Load (0.7ms) SELECT "connector_google_analytics".* FROM "connector_google_analytics" WHERE "connector_google_analytics"."id" = 7 LIMIT 1
Account Load (1.2ms) SELECT "accounts".* FROM "accounts" WHERE "accounts"."id" = 16 LIMIT 1
(76.1ms) insert into google_analytics_test2 (visitors,date) values ('4','20131018')
Awesome, works as expected. This worker does not work (bad worker): (and there are other methods in the worker)
class CmpUpdateWorker < BaseWorker
attr_reader :dataload, :connection
def perform(dataload_id)
#dataload = DataloadMailchimp.find(dataload_id)
#gibbon = Gibbon.new(#dataload.api_key)
setup_db_connection
run_report
import_results
end
private
def setup_db_connection
#connection = ActiveRecord::RdsDb.get_connection(dataload.account)
end
And when I execute this worker from console I get this:
irb(main):002:0> cmp_worker.perform(5)
DataloadMailchimp Load (68.0ms) SELECT "dataload_mailchimps".* FROM "dataload_mailchimps" WHERE "dataload_mailchimps"."id" = $1 LIMIT 1 [["id", 5]]
NoMethodError: undefined method `rds_username' for 16:Fixnum
16 is the correct account_id and it is in the database. That record also has the correct rds_username and other db connection values.
Both models referenced in the workers have belongs_to :account. I have been testing and trying to narrow this down but am stuck. I know there is a lot more code than I have posted here. I am happy to add anything that may help in solving this to a gist or access to the repo.
I appreciate any advice or direction. Thanks.
I got the CmpUpdateWorker to run by updating setup_db_connection to:
def setup_db_connection
#connection = ActiveRecord::RdsDb.get_connection(#dataload.account)
end
Related
I want to do something like "Find all Books where book.pages.count < books.max_pages".
So the models are:
class Book
has_many :pages
end
class Page
belongs_to :book
end
I know I can find books w/ a set number of pages. eg:
# Get books w/ < 5 pages.
Book.joins(:pages).group("books.id").having("count(pages.id) < ?", 5)
Is there a good way to do this with a dynamic page count? eg:
Book.joins(:pages).group("books.id").select(.having("count(pages.id) <= book.max_pages")
If not I can always just store something inside the Book model (eg book.is_full = false until a save causes it to be full), but this is a bit less flexible if max_pages gets updated.
You could create a scope like this:
def self.page_count_under(amount)
joins(:pages)
.group('books.id')
.having('COUNT(pages.id) < ?', amount)
end
UPDATE
This should work if max_pages is an attribute of the Book model.
def self.page_count_under_max
joins(:pages)
.group('books.id')
.having('COUNT(pages.id) < books.max_pages')
end
Use counter_cache!
http://guides.rubyonrails.org/association_basics.html 4.1.2.3 :counter_cache
I found a weird behaviour from Active Record Pluck.
My query is
Friend.joins('INNER JOIN users ON friends.friend_id = users.id').where("user_id=? AND (status=? or status=?)", 4,"true","").pluck("users.first_name, users.last_name")
It is to join with friends with users and get users first name and last name
Generated SQL command is
SELECT users.first_name, users.last_name FROM "friends" INNER JOIN users ON friends.friend_id = users.id WHERE (user_id=4 AND (status='true' or status=''))
If i run above command on sqlite browser tool
i am getting response like
first_name last_name
user4 y
user5 y
but from command line with pluck
["y", "y"]
and command line with find_by_sql
[#<Friend >, #<Friend >]
What's wrong in my code, or is it problem with pluck and find_by_sql
How can i resolve the problem?
thanks in advance
If you're using rails 4 you can do
Instead of
.pluck("users.first_name, users.last_name")
Try
.pluck("users.first_name", "users.last_name")
In rails 3 you'll want to use select to select those specific fields
.select("users.first_name", "users.last_name")
config/initializers/pluck_all.rb
module ActiveRecord
class Relation
def pluck_all(*args)
args.map! do |column_name|
if column_name.is_a?(Symbol) && column_names.include?(column_name.to_s)
"#{connection.quote_table_name(table_name)}.#{connection.quote_column_name(column_name)}"
else
column_name.to_s
end
end
relation = clone
relation.select_values = args
klass.connection.select_all(relation.arel).map! do |attributes|
initialized_attributes = klass.initialize_attributes(attributes)
attributes.each do |key, attribute|
attributes[key] = klass.type_cast_attribute(key, initialized_attributes)
end
end
end
end
end
Friend.joins('INNER JOIN users ON friends.friend_id = users.id').where("user_id=? AND (status=? or status=?)", 4,"true","").pluck_all("users.first_name","users.last_name")
resolves my issue, it's purely a pluck problem.
thanks for a great tutorial
In terms of making a rails 3 method that behaves the same as the Rails 4 pluck with multiple columns.
This outputs a similar array (rather than a hashed key value collection).
module ActiveRecord
class Relation
def pluck_all(*args)
args.map! do |column_name|
if column_name.is_a?(Symbol) && column_names.include?(column_name.to_s)
"#{connection.quote_table_name(table_name)}.#{connection.quote_column_name(column_name)}"
else
column_name.to_s
end
end
relation = clone
relation.select_values = args
klass.connection.select_all(relation.arel).map! do |attributes|
initialized_attributes = klass.initialize_attributes(attributes)
attributes.map do |key, attribute|
klass.type_cast_attribute(key, initialized_attributes)
end
end
end
end
end
Standing on the shoulders of giants and all (#santosh)
The article #santosh shared was great until I'm going to upgrade from Rails 3.2 to Rails 4.
Here is a gem pluck_all to solve this, making pluck_all method support not only in Rails 3 but in Rails 4 and Rails 5. Hope this will help those who are going to upgrade rails version.
I have 3 models:
class Student < ActiveRecord::Base
has_many :student_enrollments, dependent: :destroy
has_many :courses, through: :student_enrollments
end
class Course < ActiveRecord::Base
has_many :student_enrollments, dependent: :destroy
has_many :students, through: :student_enrollments
end
class StudentEnrollment < ActiveRecord::Base
belongs_to :student
belongs_to :course
end
I wish to query for a list of courses in the Courses table, that do not exist in the StudentEnrollments table that are associated with a certain student.
I found that perhaps Left Join is the way to go, but it seems that joins() in rails only accept a table as argument.
The SQL query that I think would do what I want is:
SELECT *
FROM Courses c LEFT JOIN StudentEnrollment se ON c.id = se.course_id
WHERE se.id IS NULL AND se.student_id = <SOME_STUDENT_ID_VALUE> and c.active = true
How do I execute this query the Rails 4 way?
Any input is appreciated.
You can pass a string that is the join-sql too. eg joins("LEFT JOIN StudentEnrollment se ON c.id = se.course_id")
Though I'd use rails-standard table naming for clarity:
joins("LEFT JOIN student_enrollments ON courses.id = student_enrollments.course_id")
If anyone came here looking for a generic way to do a left outer join in Rails 5, you can use the #left_outer_joins function.
Multi-join example:
Ruby:
Source.
select('sources.id', 'count(metrics.id)').
left_outer_joins(:metrics).
joins(:port).
where('ports.auto_delete = ?', true).
group('sources.id').
having('count(metrics.id) = 0').
all
SQL:
SELECT sources.id, count(metrics.id)
FROM "sources"
INNER JOIN "ports" ON "ports"."id" = "sources"."port_id"
LEFT OUTER JOIN "metrics" ON "metrics"."source_id" = "sources"."id"
WHERE (ports.auto_delete = 't')
GROUP BY sources.id
HAVING (count(metrics.id) = 0)
ORDER BY "sources"."id" ASC
There is actually a "Rails Way" to do this.
You could use Arel, which is what Rails uses to construct queries for ActiveRecrods
I would wrap it in method so that you can call it nicely and pass in whatever argument you would like, something like:
class Course < ActiveRecord::Base
....
def left_join_student_enrollments(some_user)
courses = Course.arel_table
student_entrollments = StudentEnrollment.arel_table
enrollments = courses.join(student_enrollments, Arel::Nodes::OuterJoin).
on(courses[:id].eq(student_enrollments[:course_id])).
join_sources
joins(enrollments).where(
student_enrollments: {student_id: some_user.id, id: nil},
active: true
)
end
....
end
There is also the quick (and slightly dirty) way that many use
Course.eager_load(:students).where(
student_enrollments: {student_id: some_user.id, id: nil},
active: true
)
eager_load works great, it just has the "side effect" of loding models in memory that you might not need (like in your case)
Please see Rails ActiveRecord::QueryMethods .eager_load
It does exactly what you are asking in a neat way.
Combining includes and where results in ActiveRecord performing a LEFT OUTER JOIN behind the scenes (without the where this would generate the normal set of two queries).
So you could do something like:
Course.includes(:student_enrollments).where(student_enrollments: { course_id: nil })
Docs here: http://guides.rubyonrails.org/active_record_querying.html#specifying-conditions-on-eager-loaded-associations
Adding to the answer above, to use includes, if you want an OUTER JOIN without referencing the table in the where (like id being nil) or the reference is in a string you can use references. That would look like this:
Course.includes(:student_enrollments).references(:student_enrollments)
or
Course.includes(:student_enrollments).references(:student_enrollments).where('student_enrollments.id = ?', nil)
http://api.rubyonrails.org/classes/ActiveRecord/QueryMethods.html#method-i-references
You'd execute the query as:
Course.joins('LEFT JOIN student_enrollment on courses.id = student_enrollment.course_id')
.where(active: true, student_enrollments: { student_id: SOME_VALUE, id: nil })
I know that this is an old question and an old thread but in Rails 5, you could simply do
Course.left_outer_joins(:student_enrollments)
You could use left_joins gem, which backports left_joins method from Rails 5 for Rails 4 and 3.
Course.left_joins(:student_enrollments)
.where('student_enrollments.id' => nil)
I've been struggling with this kind of problem for quite some while, and decided to do something to solve it once and for all. I published a Gist that addresses this issue: https://gist.github.com/nerde/b867cd87d580e97549f2
I created a little AR hack that uses Arel Table to dynamically build the left joins for you, without having to write raw SQL in your code:
class ActiveRecord::Base
# Does a left join through an association. Usage:
#
# Book.left_join(:category)
# # SELECT "books".* FROM "books"
# # LEFT OUTER JOIN "categories"
# # ON "books"."category_id" = "categories"."id"
#
# It also works through association's associations, like `joins` does:
#
# Book.left_join(category: :master_category)
def self.left_join(*columns)
_do_left_join columns.compact.flatten
end
private
def self._do_left_join(column, this = self) # :nodoc:
collection = self
if column.is_a? Array
column.each do |col|
collection = collection._do_left_join(col, this)
end
elsif column.is_a? Hash
column.each do |key, value|
assoc = this.reflect_on_association(key)
raise "#{this} has no association: #{key}." unless assoc
collection = collection._left_join(assoc)
collection = collection._do_left_join value, assoc.klass
end
else
assoc = this.reflect_on_association(column)
raise "#{this} has no association: #{column}." unless assoc
collection = collection._left_join(assoc)
end
collection
end
def self._left_join(assoc) # :nodoc:
source = assoc.active_record.arel_table
pk = assoc.association_primary_key.to_sym
joins source.join(assoc.klass.arel_table,
Arel::Nodes::OuterJoin).on(source[assoc.foreign_key].eq(
assoc.klass.arel_table[pk])).join_sources
end
end
Hope it helps.
See below my original post to this question.
Since then, I have implemented my own .left_joins() for ActiveRecord v4.0.x (sorry, my app is frozen at this version so I've had no need to port it to other versions):
In file app/models/concerns/active_record_extensions.rb, put the following:
module ActiveRecordBaseExtensions
extend ActiveSupport::Concern
def left_joins(*args)
self.class.left_joins(args)
end
module ClassMethods
def left_joins(*args)
all.left_joins(args)
end
end
end
module ActiveRecordRelationExtensions
extend ActiveSupport::Concern
# a #left_joins implementation for Rails 4.0 (WARNING: this uses Rails 4.0 internals
# and so probably only works for Rails 4.0; it'll probably need to be modified if
# upgrading to a new Rails version, and will be obsolete in Rails 5 since it has its
# own #left_joins implementation)
def left_joins(*args)
eager_load(args).construct_relation_for_association_calculations
end
end
ActiveRecord::Base.send(:include, ActiveRecordBaseExtensions)
ActiveRecord::Relation.send(:include, ActiveRecordRelationExtensions)
Now I can use .left_joins() everywhere I'd normally use .joins().
----------------- ORIGINAL POST BELOW -----------------
If you want OUTER JOINs without all the extra eagerly loaded ActiveRecord objects, use .pluck(:id) after .eager_load() to abort the eager load while preserving the OUTER JOIN. Using .pluck(:id) thwarts eager loading because the column name aliases (items.location AS t1_r9, for example) disappear from the generated query when used (these independently named fields are used to instantiate all the eagerly loaded ActiveRecord objects).
A disadvantage of this approach is that you then need to run a second query to pull in the desired ActiveRecord objects identified in the first query:
# first query
idents = Course
.eager_load(:students) # eager load for OUTER JOIN
.where(
student_enrollments: {student_id: some_user.id, id: nil},
active: true
)
.distinct
.pluck(:id) # abort eager loading but preserve OUTER JOIN
# second query
Course.where(id: idents)
It'a join query in Active Model in Rails.
Please click here for More info about Active Model Query Format.
#course= Course.joins("LEFT OUTER JOIN StudentEnrollment
ON StudentEnrollment .id = Courses.user_id").
where("StudentEnrollment .id IS NULL AND StudentEnrollment .student_id =
<SOME_STUDENT_ID_VALUE> and Courses.active = true").select
Use Squeel:
Person.joins{articles.inner}
Person.joins{articles.outer}
If anyone out there still needs true left_outer_joins support in Rails 4.2 then if you install the gem "brick" on Rails 4.2.0 or later it automatically adds the Rails 5.0 implementation of left_outer_joins. You would probably want to turn off the rest of its functionality, that is unless you want an automatic "admin panel" kind of thing available in your app!
I have set up a Rails REST Service and I am having a problem showing a single record. Here is the URL that I am trying to hit:
http://website:3000/users/2/timesheets/21
Controller code:
def show
puts "SHOW"
puts params.inspect
#timesheets = User.find(params[:user_id]).timesheets(params[:id])
respond_to do |format|
format.json { render json: #timesheets }
end
end
I know the params are getting to the controller, but it is not using the :timesheet_id. Here is the console output to show what I mean:
Started GET "/users/2/timesheets/21" for **.**.***.** at 2013-03-19 06:12:11 -0400
Processing by TimesheetsController#show as */*
Parameters: {"user_id"=>"2", "id"=>"21"}
User Load (0.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT 1 [["id", "2"]]
Timesheet Load (0.5ms) SELECT "timesheets".* FROM "timesheets" WHERE "timesheets"."user_id" = 2
Completed 200 OK in 120ms (Views: 36.5ms | ActiveRecord: 2.9ms)
I see the timesheet id of 21 in the parameters hash. A query is then made to get the user, but then all of the timesheets for that user are grabbed. Any help would be appreciated. Thanks.
What Prakash suggests works, but executes two queries: one to get the user and one the get the timesheet. There does not seem to be any reason to do User.find(...) first. Might as well query the timesheets table only, which will only execute one query and is thus faster:
#timesheet = Timesheet.where('user_id = ? and id = ?', params[:user_id], params[:id]).first
Try this:
#user = User.find(params[:user_id])
#timesheet = #user.timesheets.find(params[:id])
This should run a query as follows:
SELECT "timesheets".* FROM "timesheets" WHERE "timesheets"."id" = 21 AND ("timesheets"."user_id" = 2) LIMIT 1
The corresponding view should be referring to #timesheet variable and not #timesheets.
I'm trying to run a query of about 50,000 records using ActiveRecord's find_each method, but it seems to be ignoring my other parameters like so:
Thing.active.order("created_at DESC").limit(50000).find_each {|t| puts t.id }
Instead of stopping at 50,000 I'd like and sorting by created_at, here's the resulting query that gets executed over the entire dataset:
Thing Load (198.8ms) SELECT "things".* FROM "things" WHERE "things"."active" = 't' AND ("things"."id" > 373343) ORDER BY "things"."id" ASC LIMIT 1000
Is there a way to get similar behavior to find_each but with a total max limit and respecting my sort criteria?
The documentation says that find_each and find_in_batches don't retain sort order and limit because:
Sorting ASC on the PK is used to make the batch ordering work.
Limit is used to control the batch sizes.
You could write your own version of this function like #rorra did. But you can get into trouble when mutating the objects. If for example you sort by created_at and save the object it might come up again in one of the next batches. Similarly you might skip objects because the order of results has changed when executing the query to get the next batch. Only use that solution with read only objects.
Now my primary concern was that I didn't want to load 30000+ objects into memory at once. My concern was not the execution time of the query itself. Therefore I used a solution that executes the original query but only caches the ID's. It then divides the array of ID's into chunks and queries/creates the objects per chunk. This way you can safely mutate the objects because the sort order is kept in memory.
Here is a minimal example similar to what I did:
batch_size = 512
ids = Thing.order('created_at DESC').pluck(:id) # Replace .order(:created_at) with your own scope
ids.each_slice(batch_size) do |chunk|
Thing.find(chunk, :order => "field(id, #{chunk.join(',')})").each do |thing|
# Do things with thing
end
end
The trade-offs to this solution are:
The complete query is executed to get the ID's
An array of all the ID's is kept in memory
Uses the MySQL specific FIELD() function
Hope this helps!
find_each uses find_in_batches under the hood.
Its not possible to select the order of the records, as described in find_in_batches, is automatically set to ascending on the primary key (“id ASC”) to make the batch ordering work.
However, the criteria is applied, what you can do is:
Thing.active.find_each(batch_size: 50000) { |t| puts t.id }
Regarding the limit, it wasn't implemented yet: https://github.com/rails/rails/pull/5696
Answering to your second question, you can create the logic yourself:
total_records = 50000
batch = 1000
(0..(total_records - batch)).step(batch) do |i|
puts Thing.active.order("created_at DESC").offset(i).limit(batch).to_sql
end
Retrieving the ids first and processing the in_groups_of
ordered_photo_ids = Photo.order(likes_count: :desc).pluck(:id)
ordered_photo_ids.in_groups_of(1000, false).each do |photo_ids|
photos = Photo.order(likes_count: :desc).where(id: photo_ids)
# ...
end
It's important to also add the ORDER BY query to the inner call.
Rails 6.1 adds support for descending order in find_each, find_in_batches and in_batches.
One option is to put an implementation tailored for your particular model into the model itself (speaking of which, id is usually a better choice for ordering records, created_at may have duplicates):
class Thing < ActiveRecord::Base
def self.find_each_desc limit
batch_size = 1000
i = 1
records = self.order(created_at: :desc).limit(batch_size)
while records.any?
records.each do |task|
yield task, i
i += 1
return if i > limit
end
records = self.order(created_at: :desc).where('id < ?', records.last.id).limit(batch_size)
end
end
end
Or else you can generalize things a bit, and make it work for all the models:
lib/active_record_extensions.rb:
ActiveRecord::Batches.module_eval do
def find_each_desc limit
batch_size = 1000
i = 1
records = self.order(id: :desc).limit(batch_size)
while records.any?
records.each do |task|
yield task, i
i += 1
return if i > limit
end
records = self.order(id: :desc).where('id < ?', records.last.id).limit(batch_size)
end
end
end
ActiveRecord::Querying.module_eval do
delegate :find_each_desc, :to => :all
end
config/initializers/extensions.rb:
require "active_record_extensions"
P.S. I'm putting the code in files according to this answer.
You can iterate backwards by standard ruby iterators:
Thing.last.id.step(0,-1000) do |i|
Thing.where(id: (i-1000+1)..i).order('id DESC').each do |thing|
#...
end
end
Note: +1 is because BETWEEN which will be in query includes both bounds but we need include only one.
Sure, with this approach there could be fetched less than 1000 records in batch because some of them are deleted already but this is ok in my case.
As remarked by #Kirk in one of the comments, find_each supports limit as of version 5.1.0.
Example from the changelog:
Post.limit(10_000).find_each do |post|
# ...
end
The documentation says:
Limits are honored, and if present there is no requirement for the batch size: it can be less than, equal to, or greater than the limit.
(setting a custom order is still not supported though)
I was looking for the same behaviour and thought up of this solution. This DOES NOT order by created_at but I thought I would post anyways.
max_records_to_retrieve = 50000
last_index = Thing.count
start_index = [(last_index - max_records_to_retrieve), 0].max
Thing.active.find_each(:start => start_index) do |u|
# do stuff
end
Drawbacks of this approach:
- You need 2 queries (first one should be fast)
- This guarantees a max of 50K records but if ids are skipped you will get less.
You can try ar-as-batches Gem.
From their documentation you can do something like this
Users.where(country_id: 44).order(:joined_at).offset(200).as_batches do |user|
user.party_all_night!
end
Using Kaminari or something other it will be easy.
Create batch loader class.
module BatchLoader
extend ActiveSupport::Concern
def batch_by_page(options = {})
options = init_batch_options!(options)
next_page = 1
loop do
next_page = yield(next_page, options[:batch_size])
break next_page if next_page.nil?
end
end
private
def default_batch_options
{
batch_size: 50
}
end
def init_batch_options!(options)
options ||= {}
default_batch_options.merge!(options)
end
end
Create Repository
class ThingRepository
include BatchLoader
# #param [Integer] per_page
# #param [Proc] block
def batch_changes(per_page=100, &block)
relation = Thing.active.order("created_at DESC")
batch_by_page do |next_page|
query = relation.page(next_page).per(per_page)
yield query if block_given?
query.next_page
end
end
end
Use the repository
repo = ThingRepository.new
repo.batch_changes(5000).each do |g|
g.each do |t|
#...
end
end
Adding find_in_batches_with_order did solve my usecase, where I was having ids already but need batching and ordering. It was inspired by #dirk-geurs solution
# Create file config/initializers/find_in_batches_with_order.rb with follwing code.
ActiveRecord::Batches.class_eval do
## Only flat order structure is supported now
## example: [:forename, :surname] is supported but [:forename, {surname: :asc}] is not supported
def find_in_batches_with_order(ids: nil, order: [], batch_size: 1000)
relation = self
arrangement = order.dup
index = order.find_index(:id)
unless index
arrangement.push(:id)
index = arrangement.length - 1
end
ids ||= relation.order(*arrangement).pluck(*arrangement).map{ |tupple| tupple[index] }
ids.each_slice(batch_size) do |chunk_ids|
chunk_relation = relation.where(id: chunk_ids).order(*order)
yield(chunk_relation)
end
end
end
Leaving Gist here https://gist.github.com/the-spectator/28b1176f98cc2f66e870755bb2334545
I had the same problem with a query with DISTINCT ON where you need an ORDER BY with that field, so this is my approach with Postgres:
def filtered_model_ids
Model.joins(:father_model)
.select('DISTINCT ON (model.field) model.id')
.order(:field)
.map(&:id)
end
def processor
filtered_model_ids.each_slice(BATCH_SIZE).lazy.each do |batch|
Model.find(batch).each do |record|
# Code
end
end
end
My code
batch_size = 100
total_count = klass.count
offset = 0
processed_count = 0
while processed_count < total_count
relation = klass.order({ active_at: :asc, created_at: :desc }).offset(offset).limit(batch_size)
relation.each do |record|
record.process
end
processed_count += batch_size
end
Do it in one query and avoid iterating:
User.offset(2).order('name DESC').last(3)
will product a query like this
SELECT "users".* FROM "users" ORDER BY name ASC LIMIT $1 OFFSET $2 [["LIMIT", 3], ["OFFSET", 2]