Have more than 400 000, repopulating the DB takes 5 hours - sql

Simply running
ElectricityProfile.find_each do |ep|
if UserProfile.exists?(ep.owner_id) && ep.owner_type == 'UserProfile'
ElectricityProfileSummary.create(ep)
end
end
Takes ages (5 hours) to populate the table. Is there any better way to populate the DB?
Lets say get all the data from the DB and store it in array, hash, etc and then push to create a DB
ElectricityProfile.find_each do |ep|
if UserProfile.exists?(ep.owner_id) && ep.owner_type == 'UserProfile'
array_of_electricity_profiles.push(ep)
end
end
ElectricityProfileSummary.mass_create(ep) # => or any other method :)
Sorry forgot mention I do have overridden method create, that takes multiple models and creates ElectricityProfileSummary...
create!(:owner_id => electricity_profile.owner_id,
:owner_type => electricity_profile.owner_type,
:property_type => electricity_profile.owner.user_property_type,
:household_size => electricity_profile.owner.user_num_of_people,
:has_swimming_pool => electricity_profile.has_swimming_pool,
:bill => electricity_bill,
:contract => electricity_profile.on_contract,
:dirty => true,
:provider => electricity_profile.supplier_id,
:plan => electricity_profile.plan_id,
:state => state,
:postcode => postcode,
:discount => discount,
:has_air_conditioner => electricity_profile.has_air_conditioner,
:has_electric_hot_water => electricity_profile.has_electric_hot_water,
:has_electric_central_heating => electricity_profile.has_electric_central_heating,
:has_electric_cooktup => electricity_profile.has_electric_cooktup
)

Doing this in a stored procedure or raw SQL would probably be the best way to go since ActiveRecord can be very expensive when dealing with that many records. However, you can speed it up quite a bit by using includes or joins.
It looks like you only want to create ElectricityProfileSummary models. I am a little unsure of how your relationships look, but assuming you have the following:
class ElectricityProfile
belongs_to :owner, polymorphic: true
end
class UserProfile
has_many :electricity_profiles, as: owner
end
... you should be able to do something like this:
ElectricityProfile.includes(:owner).each do |ep|
ElectricityProfileSummary.create(ep)
end
Now, I am basing this on the assumption that you are using a polymorphic relationship between ElectricityProfile and UserProfile. If that is not the case, let me know. (I made the assumption because you have owner_id and owner_type, which as a pair make up the two fields necessary for polymorphic relationships.)
Why is using an includes better? Using includes causes ActiveRecord to eager load the relationship between the two models, so you're not doing n+1 queries like you are now. Actually, because you are creating records based on the number of ElectricityProfile records, you're still doing n+1, but what you are doing now is more expensive than n+1 because you are querying UserProfile for every single ElectricityProfile, and then you are querying UserProfile again when creating the ElectricityProfileSummary because you are lazy loading the relationship between EP and UP.
When you do includes, Rails will use an inner join to query between the two tables. Using an inner join eliminates the necessity to do ensure that the UserProfile exists, since the inner join will only return records where both sides of the relationship exist.

If you could wrap your import loop into one transaction block, it should speed up import immensely. Read on about ROR transactions here.

Related

Rails deleting a returned relation from subtraction of two queries

I have three models
class Account
has_many :link_requests, :through => :suppliers
end
class Supplier
has_many :link_requests, :dependent => :destroy
end
class LinkRequest
belongs_to :supplier
belongs_to :account
end
I have it so that suppliers can be inputted individually or in bulk through excel.
When a supplier is deleted individually it will remove all the link_requests that belong to it. But we have a method to delete all the suppliers in bulk through a SQL query. But all the link_requests are left behind. I have updated the query now so that it will handle this exception but now we end up with a whole bunch of dead data.
This is really the only way i can figure out how to find the 'dead data'.
LinkRequest.where(account_id: current_account)
will return all the link_requests belonging to the current_account, those that have deleted suppliers and ones with existing suppliers, as a ActiveRecord::Relation.
Then i can call: current_account.link_requests which will return all link_requests with existing suppliers, also as a => ActiveRecord::Relation
There is no attribute that the link_request has that can separate ones with existing suppliers and ones with deleted suppliers. So i can't just call all invalid ones, the only way i know they are valid is because current_account gets its link_requests through its existing suppliers.
So my initial thought was to subtract all the valid ones from the sum total of all the link_requests that way i would be left with all the 'invalid requests' and call destroy_all on them.
Only when i subtract the two relations i get an Array.
LinkRequest.where(account_id: current_account).count
=> 10
current_account.link_requests.count
=> 4
(LinkRequest.where(acconut_id: current_account) - current_acconut.link_requests).count
=> 6 #these are all the invalid ones.
dead_links = (LinkRequest.where(acconut_id: current_account) - current_acconut.link_requests)
dead_links.class
=> Array
There is no way i can then call destroy_all on an array since in has no relation to the suppliers db anymore. i get !! #<NoMethodError: undefined method 'destroy_all' for #<Array:0x007fdb59a67728>>
To compare every link_requests supplier_id with the existing suppliers DB and then delete ones that don't match will be a very expensive operation, since the DB has hundreds of millions of both data types. so i was thinking of creating a before filter on the LinkRequests controller that will delete them for its current account when he runs the indexing action.
private
def clean_up_boogle_link_requests
#dead_requests = LinkRequest.where(account_id: current_account) - current_account.manual_boogle_link_requests
#dead_requests.delete_all if !dead_requests.empty?
end
something like this.. any other ideas will be much appreciated, i am hoping to find a way to get a relation object back after comparing the two query's and then be able to act on that.
Thanks for all the help!
I think this will do the trick:
dead_link_ids = LinkRequest.where(acconut_id: current_account).pluck(:id) -
current_acconut.link_requests.pluck(:id)
LinkRequest.where(id: dead_link_ids).delete_all

BEGINNER: Correct seeds.rb in rails 3

I've just created two models and one "join table". Person, Adress (create_adresses_personss)
class Person < ActiveRecord::Base
has_and_belongs_to_many :streets
end
class Street < ActiveRecord::Base
has_and_belongs_to_many :persons
end
Now I want to add some data to these models in the db/seeds.rb file. The tutorial I follow just adds the objects:
person = Person.create :name => 'Dexter'
street.create[{:streetname => 'street1'},
{:streetname => 'street2'},
{:streetname => 'julianave'},
{:streetname => 'street3'}]
Question 1: Why is persons' data added differently than streets'? Is it just the tutorial that wants to show that there are many ways of adding data in the seeds.rb?
Question 2: The tutorial doesn't make the connections/joins in the seeds.rb. It does that in the rails console;
>>p1 = Person.find(1)
>>s1 = Street.find(1)
>>p1.streets << s1
Can't theese connections be made in the seeds.rb file?
Question 3: Would it be better to do this join with a "rich many_to_many-assocciation"?
Thanks for your time and patience with a beginner ;)
1) The first method is creating one object. The second method is creating multiple objects. However, for the second method you would need to do Street.create, not street.create.
2) Yes, you can do that in the seed file the same way.
3) The "Rich many-to-many" you're talking about is an association with a Join Model, I guess you're talking about. This is opposed to just a join table, which is what has_and_belongs_to_many does. To use a join model, you'll want to look up has_many :through. It's generally considered better to always use a proper join model, however I still use HABTM when I just need a quick, simple association. has_many :through allows for more options and more flexibility, but it is a little more complicated to setup (not that much, though). It's your decision.
One way that I like to create seed data for many-to-many associations is setting up one of the models, the adding a tap block that sets up the other models through the association.
Person.create!(:name => "Fubar").tap do |person|
3.times do |n|
person.streets.create!(:streetname => "street #{n}")
end
# OR
person.streets.create!([
{:streetname => "street 1"},
{:streetname => "street 2"},
... and so on
])
end
All tap is doing is executing the block with the object as it's only parameter. I find it convenient for seeds.
One other tip I would toss out there would be to have your model attribute names spaced on the words with underscores.
:street_name instead of :streetname
The difference is more profound when you start wanting to use some of the ActiveSupport helers that take model attributes and turn them into text strings for use in the UI.
e
:streetname.to_s.titleize # "Streetname"
:street_name.to_s.titleize # "Street Name"
And one last nitpick, you might want your join table to be addresses_people not addresses_persons since the rais inflector is going to pluralize person as people. The same would go for your controller on the Person model, PeopleController instead of PersonsController. Though maybe it will work with persons as well.
:person.to_s.pluralize # "people"
:people.to_s.singularize # "person"
:persons.to_s.singularize # "person"

Should I drop a polymorphic association?

My code is still just in development, not production, and I'm hitting a wall with generating data that I want for some views.
Without burying you guys in details, I basically want to navigate through multiple model associations to get some information at each level. The one association giving me problems is a polymorphic belongs_to. Here are the most relevant associations
Model Post
belongs_to :subtopic
has_many :flags, :as => :flaggable
Model Subtopic
has_many :flags, :as => :flaggable
Model Flag
belongs_to :flaggable, :polymorphic => true
I'd like to display multiple flags in a Flags#index view. There's information from other models that I want to display, as well, but I'm leaving out the specifics here to keep this simpler.
In my Flags_controller#index action, I'm currently using #flags = paginate_by_sql to pull everything I want from the database. I can successfully get the data, but I can't get the associated model objects eager-loaded (though the data I want is all in memory). I'm looking at a few options now:
rewrite my views to work on the SQL data in the #flags object. This should work and will prevent the 5-6 association-model-SQL queries per row on the index page, but will look very hackish. I'd like to avoid this if possible
simplify my views and create additional pages for the more detailed information, to be loaded only when viewing one individual flag
change the model hierarchy/definitions away from polymorphic associations to inheritance. Effectively make a module or class FlaggableObject that would be the parent of both Subtopic and Post.
I'm leaning towards the third option, but I'm not certain that I'll be able to cleanly pull all the information I want using Rails' ActiveRecord helpers only.
I would like insight on whether this would work and, more importantly, if you you have a better solution
EDIT: Some nit-picky include behavior I've encountered
#flags = Flag.find(:all,:conditions=> "flaggable_type = 'Post'", :include => [{:flaggable=>[:user,{:subtopic=>:category}]},:user]).paginate(:page => 1)
=> (valid response)
#flags = Flag.find(:all,:conditions=> ["flaggable_type = 'Post' AND
post.subtopic.category_id IN ?", [2,3,4,5]], :include => [{:flaggable=>
[:user, {:subtopic=>:category}]},:user]).paginate(:page => 1)
=> ActiveRecord::EagerLoadPolymorphicError: Can not eagerly load the polymorphic association :flaggable
Don't drop the polymorphic association. Use includes(:association_name) to eager-load the associated objects. paginate_by_sql won't work, but paginate will.
#flags = Flag.includes(:flaggable).paginate(:page => 1)
It will do exactly what you want, using one query from each table.
See A Guide to Active Record Associations. You may see older examples using the :include option, but the includes method is the new interface in Rails 3.0 and 3.1.
Update from original poster:
If you're getting this error: Can not eagerly load the polymorphic association :flaggable, try something like the following:
Flag.where("flaggable_type = 'Post'").includes([{:flaggable=>[:user, {:subtopic=>:category}]}, :user]).paginate(:page => 1)
See comments for more details.
Issues: Count over a polymorphic association.
#flags = Flag.find(:all,:conditions => ["flaggable_type = 'Post' AND post.subtopic.category_id IN ?",
[2,3,4,5]], :include => [{:flaggable => [:user, {:subtopic=>:category}]},:user])
.paginate(:page => 1)
Try like the following:
#flags = Flag.find(:all,:conditions => ["flaggable_type = 'Post' AND post.subtopic.category_id IN ?",
[2,3,4,5]], :include => [{:flaggable => [:user, {:subtopic=>:category}]},:user])
.paginate(:page => 1, :total_entries => Flag.count(:conditions =>
["flaggable_type = 'Post' AND post.subtopic.category_id IN ?", [2,3,4,5]]))

looping through a rails data object from the controller

I have called a function from a class to find all the items related to a particular ID in a many to many HABTM relationship.
Procedures -> Tasks with a join table: procedures_tasks
I call the information like #example = Procedure.get_tasks(1,1)
I would like to be able to iterate through the data returned so that I can create an instance of each task_id related to the procedure in question
def self.get_tasks(model_id, operating_system_id)
find(:first, :select => 'tasks.id, procedures.id', :conditions => ["model_id = ? AND operating_system_id = ?", model_id, operating_system_id], :include => [:tasks])
end
I tried rendering the data as i normally would and then using .each do |f| in the view layer, but i get:
undefined method `each' for #<Procedure:0x2b879be1db30>
Original Question:
I am creating a rails application to track processes we perform. When a new instance of a process is created I want to automatically create rows for all the tasks that will need to be performed.
tables:
decommissions
models
operating_systems
procedures
tasks
procedures_tasks
host_tasks
procedures -> tasks is many to many through the procedures_tasks join table.
when you start a new decommissioning process you specify a model and OS, the model and OS specify which procedure you follow, each procedure has a list of tasks available in the join table. I am wanting to create a entry in host_tasks for each task relevant to the procedure relevant to the decommission being created.
I've done my head in over this for days, any suggestions?
class Procedure < ActiveRecord::Base
has_and_belongs_to_many :tasks
#has_many :tasks, :through => :procedures_tasks
# has_many :procedures_tasks
belongs_to :model
belongs_to :operating_system
validates_presence_of :name
validates_presence_of :operating_system_id
validates_presence_of :model_id
def self.get_tasks(model_id, operating_system_id)
find(:first, :select => 'tasks.id, procedures.id', :conditions => ["model_id = ? AND operating_system_id = ?", model_id, operating_system_id], :include => [:tasks])
end
end
the get_tasks method will retrieve the tasks associated with the procedure, but I don't know how to manipulate the data pulled from the database in rails, I haven't been able to access the attributes of the returned object through the controller because they haven't been rendered yet?
ideally i would like to be able to format this data so that I only have an array of the task_id's which i can then loop through creating new rows in the appropriate table.
It wasn't looping through because I was using the :first option when finding the data. I changed it to :all which allowed me to .each do |f| etc.
Not the best option, but there will only ever be one option anyway, so it won't cause a problem.

My Rails queries are starting to get complicated, should I switch to raw SQL queries? What do you do?

My Rails app is starting to need complicated queries. Should I just start using raw SQL queries? What is the trend in the Rails community?
Update:
I do not have written queries right now, I wanted to ask this question before I start. But here is an example of what I want to do:
I have books which have categories. I want to say-
Give me all books that were:
-created_at (added to store) between date1 and date2
-updated_at before date3
-joined with books that exist in shopping carts right now
I haven't written the query yet but I think the rails version will be something like this:
books_to_consider = Book.find(:all,
:conditions => "created_at <= '#{date2}' AND created_at >= '#{date1}' AND updated_at <= '#{date3}'",
:joins => "as b inner join carts as c on c.book_id = b.id")
I am not saying ActiveRecord can't handle this query, but is it more accepted to go with raw SQL for readability (or maybe there are other limitations I don't know of yet)?
The general idea is to stick to ActiveRecord-generated queries as much as possible, and use SQL fragments only where necessary. SQL fragments are explicitly supported because the creators of ActiveRecord realised that SQL cannot be completely abstracted away.
Using the the find method without SQL fragments is generally rewarded with better maintainability. Given your example, try:
Book.find(:all,
:conditions => ["created_at >= ? AND created_at <= ? AND updated_at <= ?",
date1, date2, date3]
:include => :carts)
The :inlude => :carts will do the join if you added has_many :carts to your Book model. As you can see, there does not have to be much SQL involved. Even the quoting and escaping of input can be left to Rails, while still using SQL literals to handle the >= and <= operators.
Going a little bit further, you can make it even clearer:
class Book < AciveRecord::Base
# Somewhere in your Book model:
named_scope :created_between, lambda { |start_date, end_date|
{ :conditions => { :created_at => start_date..end_date } }
}
named_scope :updated_before, lambda { |date|
{ :conditions => ["updated_at <= ?", date] }
}
# ...
end
Book.created_between(date1, date2).updated_before(date3).find(:all,
:include => :carts)
Update: the point of the named_scopes is, of course, to reuse the conditions. It's up to you to decide whether or not it makes sense to put a set of conditions in a named scope or not.
Like molf is saying with :include, .find() has the advantage of eager loading of children.
Also, there are several plugins, like pagination, that will wrap the find function. You'll have to use .find() to use the plugins.
If you have a really complex sql query remember that .find() uses your exact parameter string. You can always inject your own sql code:
:conditions => ["id in union (select * from table...
And don't forget there are a lot of optional parameters for .find()
:conditions - An SQL fragment like "administrator = 1", [ "user_name = ?", username ], or ["user_name = :user_name", { :user_name => user_name }]. See conditions in the intro.
:order - An SQL fragment like "created_at DESC, name".
:group - An attribute name by which the result should be grouped. Uses the GROUP BY SQL-clause.
:having - Combined with +:group+ this can be used to filter the records that a GROUP BY returns. Uses the HAVING SQL-clause.
:limit - An integer determining the limit on the number of rows that should be returned.
:offset - An integer determining the offset from where the rows should be fetched. So at 5, it would skip rows 0 through 4.
:joins - Either an SQL fragment for additional joins like "LEFT JOIN comments ON comments.post_id = id" (rarely needed), named associations in the same form used for the :include option, which will perform an INNER JOIN on the associated table(s), or an array containing a mixture of both strings and named associations. If the value is a string, then the records will be returned read-only since they will have attributes that do not correspond to the table‘s columns. Pass :readonly => false to override.
:include - Names associations that should be loaded alongside. The symbols named refer to already defined associations. See eager loading under Associations.
:select - By default, this is "*" as in "SELECT * FROM", but can be changed if you, for example, want to do a join but not include the joined columns. Takes a string with the SELECT SQL fragment (e.g. "id, name").
:from - By default, this is the table name of the class, but can be changed to an alternate table name (or even the name of a database view).
:readonly - Mark the returned records read-only so they cannot be saved or updated.
:lock - An SQL fragment like "FOR UPDATE" or "LOCK IN SHARE MODE". :lock => true gives connection‘s default exclusive lock, usually "FOR UPDATE".
src: http://api.rubyonrails.org/classes/ActiveRecord/Base.html#M002553