Updating attributes on several indexes in array. Using mongoid in Rails 3 - ruby-on-rails-3

Inside my test database, I would like to trigger a "new_item" flag for testing. The method already works in my tests. So now I am setting the created_at and published_at fields of all records to 1 month ago, then want to set a select few in an array to be published yesterday.
I use the following code to set all then 1 record: (In Rails 3.1.1)
yesterday = Time.now - 1.day
last_month = Time.now - 1.month
Item.update_all(:created_at => last_month, :published_at => last_month)
Item.visible[1].update_attributes(:created_at => yesterday, :published_at => yesterday)
Which works. However, how can I select multiple records in that array instead of just the [1] index. ie. [1,4,5,8,10] etc.
I believe update_attributes doesn't work on multiple records. And I'm not sure how to select multiple indexes in an existing array.
I hope this makes sense...
Thanks in advance,
Adam.

update_attributes is an instance method, i.e. it will work on an instance of the model. update_all is a criteria method(may be model class delegates to criteria) and can be chained to mongoid criteria. If you have some criteria for obtaining the indexes of documents on array you can do:
Item.visible.where(:some_other_field => "some_other_value").
update_all(:created_at => last_month, :published_at => last_month)
If you don't have a criteria, but are handpicking the items, you can do:
# instead of `model_array.update_attributes`
ids = model_array.map(&:id)
Item.all.for_ids(ids).update_all(:created_at => last_month, :published_at => last_month)

Related

Have more than 400 000, repopulating the DB takes 5 hours

Simply running
ElectricityProfile.find_each do |ep|
if UserProfile.exists?(ep.owner_id) && ep.owner_type == 'UserProfile'
ElectricityProfileSummary.create(ep)
end
end
Takes ages (5 hours) to populate the table. Is there any better way to populate the DB?
Lets say get all the data from the DB and store it in array, hash, etc and then push to create a DB
ElectricityProfile.find_each do |ep|
if UserProfile.exists?(ep.owner_id) && ep.owner_type == 'UserProfile'
array_of_electricity_profiles.push(ep)
end
end
ElectricityProfileSummary.mass_create(ep) # => or any other method :)
Sorry forgot mention I do have overridden method create, that takes multiple models and creates ElectricityProfileSummary...
create!(:owner_id => electricity_profile.owner_id,
:owner_type => electricity_profile.owner_type,
:property_type => electricity_profile.owner.user_property_type,
:household_size => electricity_profile.owner.user_num_of_people,
:has_swimming_pool => electricity_profile.has_swimming_pool,
:bill => electricity_bill,
:contract => electricity_profile.on_contract,
:dirty => true,
:provider => electricity_profile.supplier_id,
:plan => electricity_profile.plan_id,
:state => state,
:postcode => postcode,
:discount => discount,
:has_air_conditioner => electricity_profile.has_air_conditioner,
:has_electric_hot_water => electricity_profile.has_electric_hot_water,
:has_electric_central_heating => electricity_profile.has_electric_central_heating,
:has_electric_cooktup => electricity_profile.has_electric_cooktup
)
Doing this in a stored procedure or raw SQL would probably be the best way to go since ActiveRecord can be very expensive when dealing with that many records. However, you can speed it up quite a bit by using includes or joins.
It looks like you only want to create ElectricityProfileSummary models. I am a little unsure of how your relationships look, but assuming you have the following:
class ElectricityProfile
belongs_to :owner, polymorphic: true
end
class UserProfile
has_many :electricity_profiles, as: owner
end
... you should be able to do something like this:
ElectricityProfile.includes(:owner).each do |ep|
ElectricityProfileSummary.create(ep)
end
Now, I am basing this on the assumption that you are using a polymorphic relationship between ElectricityProfile and UserProfile. If that is not the case, let me know. (I made the assumption because you have owner_id and owner_type, which as a pair make up the two fields necessary for polymorphic relationships.)
Why is using an includes better? Using includes causes ActiveRecord to eager load the relationship between the two models, so you're not doing n+1 queries like you are now. Actually, because you are creating records based on the number of ElectricityProfile records, you're still doing n+1, but what you are doing now is more expensive than n+1 because you are querying UserProfile for every single ElectricityProfile, and then you are querying UserProfile again when creating the ElectricityProfileSummary because you are lazy loading the relationship between EP and UP.
When you do includes, Rails will use an inner join to query between the two tables. Using an inner join eliminates the necessity to do ensure that the UserProfile exists, since the inner join will only return records where both sides of the relationship exist.
If you could wrap your import loop into one transaction block, it should speed up import immensely. Read on about ROR transactions here.

Rails Order Find By Boolean

I have a simple find in rails 3 that gathers users accounts.
Account.where(:user_id => #user)
The Account model has a 'default' boolean field. As a user adds many accounts I would like the default account to always be first in the loop. Order doesn't seem to work with a boolean field.
Account.where(:user_id => #user, :order => "default DESC")
Is there a way to order the query to handle this or should I just split the queries and find the default account in a separate find?
Try Account.where(:user_id => #user).order("default DESC") - putting :order in your where() clause isn't going to sort the result set.
A cleaner solution might be to add a scope, though.
scope :default_first, order(arel_table[:default].desc)
Then you could just call (assuming your relations are set up properly):
#user.accounts.default_first.all

Problem with active-record and sql

I have a little problem: I can't compose sql-query inside AR.
So, I have Project and Task models, Project has_many Tasks. Task has aasm-field (i.e. "status"; but it doesn't matter, i can be simple int or string field).
So, I want on my projects index page list all (last) projects and for every project I want count it's active, pending and resolved (for example) tasks.
Like this, just look:
First project (1 active, 2 pending,
10 resolved)
Second projects (4
active, 2 pending, 2 resolved)
So, sure I can do it with #projects = Project.all and then in view:
- #projects.each do |project|
= project.title
= project.tasks(:conditions => {:status => "active"}).count #sure it should be in model, just for example
= project.tasks(:conditions => {:status => "pending"}).count
# ...
- end
This is good, but makes 1+N*3 (for 3 task statuses) queries, i want 1. The question is simple: how?.
You could do a find with grouping and counting. Something like:
status_counts = project.tasks.find(:all,
:group => 'status',
:select => 'status, count(*) as how_many')
This will return you a list of Task-like objects with status and how_many attributes which you can then use to give your summary. E.g.
<%= status_counts.map { |sc| "#{sc.how_many} #{sc.status} }.to_sentence %>
Maybe you could, in your Project controller:
Fetch all your projects: Project.all
Fetch all your tasks: Task.all
Then, create a hash with something like
#statuses = Hash.new
#tasks.each do |t|
#statuses[:t.project_id][:t.status] += 1
end
And then use it in your view:
First project (<%= #statuses[:#project.object_id][:active] %> active)
This is not the prefect solution, but it is easy to implement and only use two (big) queries. Of course, this would re-create a hash every time, so you might want to look into database indexes or cache systems.
Also, named scopes would be interesting, like Task.active.
I'd suggest using a counter cache in your project model to prevent needing to recount all tasks on each display of the index page - have an active_count, pending_count and resolved_count, and update them whenever the task changes state.
If you just want to modify your existing code, try:
project.tasks.count(:conditions => "status = 'active'")
You could also add a scope to your task model that would enable you to do something like:
project.tasks.active.count
EDIT
Ok so I'm half asleep - got the wrong impression from your question :/
Yep, you can do it in one query - use find_by_sql to get your projects along with the grouped counts for the tasks. You'll be able to access the group counts in the resulting array of projects.
So, the right answer is:
Projects.all(:joins => :tasks,
:select => 'projects.*,
sum(tasks.status="pending") as pending_count,
sum(tasks.status = "accepted") as accepted_count,
sum(tasks.status = "rejected") as rejected_count',
:group => 'projects.id')

In Rails, how can I return a set of records based on a count of items in a relation OR criteria about the relation?

I'm writing a Rails app in which I have two models: a Machine model and a MachineUpdate model. The Machine model has many MachineUpdates. The MachineUpdate has a date/time field. I'm trying to retrieve all Machine records that have the following criteria:
The Machine model has not had a MachineUpdate within the last 2 weeks, OR
The Machine model has never had any updates.
Currently, I'm accomplishing #1 with a named scope:
named_scope :needs_updates,
:include => :machine_updates,
:conditions => ['machine_updates.date < ?', UPDATE_THRESHOLD.days.ago]
However, this only gets Machine models that have had at least one update. I also wanted to retrieve Machine models that have had no updates. How can I modify needs_updates so the items it returns fulfills that criteria as well?
One solution is to introduce a counter_cache:
# add a machine_updates_count integer database column (with default 0)
# and add this to your Machine model:
counter_cache :machine_updates_count
and then add OR machine_updates_count = 0 to your SQL conditions.
However, you can also solve the problem without a counter cache by using a LEFT JOIN:
named_scope :needs_updates,
:select => "machines.*, MAX(machine_updates.date) as last_update",
:joins => "LEFT JOIN machine_updates ON machine_updates.machine_id = machines.id",
:group => "machines.id",
:having => ["last_update IS NULL OR last_update < ?", lambda{ UPDATE_THRESHOLD.seconds.ago }]
The left join is necessary so that you are sure you are looking at the most recent MachineUpdate (the one with MAX date).
Note also that you have to put your condition in a lambda so it is evaluated every time the query is run. Otherwise it will be evaluated only once (when your model is loaded on application boot-up), and you will not be able to find Machines that have come to need updates since your app started.
UPDATE:
This solution works in MySQL and SQLite, but not PostgreSQL. Postgres does not allow naming of columns in the SELECT clause that are not used in the GROUP BY clause (see discussion). I'm very unfamiliar with PostgreSQL, but I did get this to work as expected:
named_scope :needs_updates, lambda{
cols = Machine.column_names.collect{ |c| "\"machines\".\"#{c}\"" }.join(",")
{
:select => cols,
:group => cols,
:joins => 'LEFT JOIN "machine_updates" ON "machine_updates"."machine_id" = "machines"."id"',
:having => ['MAX("machine_updates"."date") IS NULL OR MAX("machine_updates"."date") < ?', UPDATE_THRESHOLD.days.ago]
}
}
If you can make changes in the table, then you can use the :touch method of the belongs_to association.
For instance, add a datetime column to Machine named last_machine_update. Thereafter in the belongs_to of MachineUpdate, add :touch => :last_machine_update. This will cause that field to become updated with the last time you either added or modified a MachineUpdate connected to that Machine, thus removing the need for the named scope.
Otherwise I would probably do it like Alex proposes.
I just ran into a similar problem. It's actually pretty simple:
Machine.all(
:include => :machine_updates,
:conditions => "machine_updates.machine_id IS NULL OR machine_update.date < ?", UPDATE_THRESHOLD.days.ago])
If you were doing a named scope, just use lambdas to ensure that the date is re-calculated every time the named scope is called
named_scope :needs_updates, lambda { {
:include => :machine_updates,
:conditions => "machine_updates.machine_id IS NULL OR machine_update.date < ?", UPDATE_THRESHOLD.days.ago]
} }
If you want to avoid returning all of the MachineUpdate records in your query, then you need to use the :select option to only return the columns you want
named_scope :needs_updates, lambda { {
:select => "machines.*",
:conditions => "machine_updates.machine_id IS NULL OR machine_update.date < ?", UPDATE_THRESHOLD.days.ago]
} }

My Rails queries are starting to get complicated, should I switch to raw SQL queries? What do you do?

My Rails app is starting to need complicated queries. Should I just start using raw SQL queries? What is the trend in the Rails community?
Update:
I do not have written queries right now, I wanted to ask this question before I start. But here is an example of what I want to do:
I have books which have categories. I want to say-
Give me all books that were:
-created_at (added to store) between date1 and date2
-updated_at before date3
-joined with books that exist in shopping carts right now
I haven't written the query yet but I think the rails version will be something like this:
books_to_consider = Book.find(:all,
:conditions => "created_at <= '#{date2}' AND created_at >= '#{date1}' AND updated_at <= '#{date3}'",
:joins => "as b inner join carts as c on c.book_id = b.id")
I am not saying ActiveRecord can't handle this query, but is it more accepted to go with raw SQL for readability (or maybe there are other limitations I don't know of yet)?
The general idea is to stick to ActiveRecord-generated queries as much as possible, and use SQL fragments only where necessary. SQL fragments are explicitly supported because the creators of ActiveRecord realised that SQL cannot be completely abstracted away.
Using the the find method without SQL fragments is generally rewarded with better maintainability. Given your example, try:
Book.find(:all,
:conditions => ["created_at >= ? AND created_at <= ? AND updated_at <= ?",
date1, date2, date3]
:include => :carts)
The :inlude => :carts will do the join if you added has_many :carts to your Book model. As you can see, there does not have to be much SQL involved. Even the quoting and escaping of input can be left to Rails, while still using SQL literals to handle the >= and <= operators.
Going a little bit further, you can make it even clearer:
class Book < AciveRecord::Base
# Somewhere in your Book model:
named_scope :created_between, lambda { |start_date, end_date|
{ :conditions => { :created_at => start_date..end_date } }
}
named_scope :updated_before, lambda { |date|
{ :conditions => ["updated_at <= ?", date] }
}
# ...
end
Book.created_between(date1, date2).updated_before(date3).find(:all,
:include => :carts)
Update: the point of the named_scopes is, of course, to reuse the conditions. It's up to you to decide whether or not it makes sense to put a set of conditions in a named scope or not.
Like molf is saying with :include, .find() has the advantage of eager loading of children.
Also, there are several plugins, like pagination, that will wrap the find function. You'll have to use .find() to use the plugins.
If you have a really complex sql query remember that .find() uses your exact parameter string. You can always inject your own sql code:
:conditions => ["id in union (select * from table...
And don't forget there are a lot of optional parameters for .find()
:conditions - An SQL fragment like "administrator = 1", [ "user_name = ?", username ], or ["user_name = :user_name", { :user_name => user_name }]. See conditions in the intro.
:order - An SQL fragment like "created_at DESC, name".
:group - An attribute name by which the result should be grouped. Uses the GROUP BY SQL-clause.
:having - Combined with +:group+ this can be used to filter the records that a GROUP BY returns. Uses the HAVING SQL-clause.
:limit - An integer determining the limit on the number of rows that should be returned.
:offset - An integer determining the offset from where the rows should be fetched. So at 5, it would skip rows 0 through 4.
:joins - Either an SQL fragment for additional joins like "LEFT JOIN comments ON comments.post_id = id" (rarely needed), named associations in the same form used for the :include option, which will perform an INNER JOIN on the associated table(s), or an array containing a mixture of both strings and named associations. If the value is a string, then the records will be returned read-only since they will have attributes that do not correspond to the table‘s columns. Pass :readonly => false to override.
:include - Names associations that should be loaded alongside. The symbols named refer to already defined associations. See eager loading under Associations.
:select - By default, this is "*" as in "SELECT * FROM", but can be changed if you, for example, want to do a join but not include the joined columns. Takes a string with the SELECT SQL fragment (e.g. "id, name").
:from - By default, this is the table name of the class, but can be changed to an alternate table name (or even the name of a database view).
:readonly - Mark the returned records read-only so they cannot be saved or updated.
:lock - An SQL fragment like "FOR UPDATE" or "LOCK IN SHARE MODE". :lock => true gives connection‘s default exclusive lock, usually "FOR UPDATE".
src: http://api.rubyonrails.org/classes/ActiveRecord/Base.html#M002553