Rails/Active Record .save! efficiency question - sql

New to rails/ruby (using rails 3 and ruby 1.9.2), and am trying to get rid of some unnecessary queries being executed.
When I'm running an each do:
apples.to_a.each do |apple|
new_apple = apple.clone
new_apple.save!
end
and I check the sql LOG, I see three select statements followed by one insert statement. The select statements seem completely unnecessary. For example, they're something like:
SELECT Fruit.* from Fruits where Fruit.ID = 5 LIMIT 1;
SELECT Color.* from Colors where Color.ID = 6 LIMIT 1;
SELECT TreeType.* from TreeTypes where TreeType.ID = 7 LIMIT 1;
INSERT into Apples (Fruit_id, color_id, treetype_id) values (6, 7, 8) RETURNING "id";
Seemingly, this wouldnt' take much time, but when I've got 70k inserts to run, I'm betting those three selects for each insert will take up a decent amount of time.
So I'm wondering the following:
Is this typical of ActiveRecord/Rails .save! method, or did the previous developer add some sort of custom code?
Would those three select statements, being executed for each item, cause a noticeable amount of extra time?
If it is built into rails/active record, would it be easily bypassed, if that would make it run more efficiently?

You must be validating your associations on save for such a thing to occur, something like this:
class Apple < ActiveRecord::Base
validates :fruit,
:presence => true
end
In order to validate that the relationship, the record must be loaded, and this needs to happen for each validation individually, for each record in turn. That's the standard behavior of save!
You could save without validations if you feel like living dangerously:
apples.to_a.each do |apple|
new_apple = apple.clone
new_apple.save(:validate => false)
end
The better approach is to manipulate the records directly in SQL by doing a mass insert if your RDBMS supports it. For instance, MySQL will let you insert thousands of rows with one INSERT call. You can usually do this by making use of the Apple.connection access layer which allows you to make arbitrary SQL calls with things like execute

I'm guessing that there is a before_save EDIT: (or a validation as suggested above) method being called that is looking up the color and type of the fruit and storing that with the rest of the attributes when the fruit is saved - in which case these lookups are necessary ...
Normally I wouldn't expect activerecord to do unnecessary lookups - though that does not mean it is always efficient ...

Related

Django - SQL bulk get_or_create possible?

I am using get_or_create to insert objects to database but the problem is that doing 1000 at once takes too long time.
I tried bulk_create but it doesn't provide functionality I need (creates duplicates, ignores unique value, doesn't trigger post_save signals I need).
Is it even possible to do get_or_create in bulk via customized sql query?
Here is my example code:
related_data = json.loads(urllib2.urlopen(final_url).read())
for item in related_data:
kw = item['keyword']
e, c = KW.objects.get_or_create(KWuser=kw, author=author)
e.project.add(id)
#Add m2m to parent project
related_data cotains 1000 rows looking like this:
[{"cmp":0,"ams":3350000,"cpc":0.71,"keyword":"apple."},
{"cmp":0.01,"ams":3350000,"cpc":1.54,"keyword":"apple -10810"}......]
KW model also sends signal I use to create another parent model:
#receiver(post_save, sender=KW)
def grepw(sender, **kwargs):
if kwargs.get('created', False):
id = kwargs['instance'].id
kww = kwargs['instance'].KWuser
# KeyO
a, b = KeyO.objects.get_or_create(defaults={'keyword': kww}, keyword__iexact=kww)
KW.objects.filter(id=id).update(KWF=a.id)
This works but as you can imagine doing thousands of rows at once takes long time and even crashes my tiny server, what bulk options do I have?
As of Django 2.2, bulk_create has an ignore_conflicts flag. Per the docs:
On databases that support it (all but Oracle), setting the ignore_conflicts parameter to True tells the database to ignore failure to insert any rows that fail constraints such as duplicate unique values
This post may be of use to you:
stackoverflow.com/questions/3395236/aggregating-saves-in-django
Note that the answer recommends using the commit_on_success decorator which is deprecated. It is replaced by the transaction.atomic decorator. Documentation is here:
transactions
from django.db import transaction
#transaction.atomic
def lot_of_saves(queryset):
for item in queryset:
modify_item(item)
item.save()
If I understand correctly, "get_or_create" means SELECT or INSERT on the Postgres side.
You have a table with a UNIQUE constraint or index and a large number of rows to either INSERT (if not yet there) and get the newly create ID or otherwise SELECT the ID of the existing row. Not as simple as it may seem on the outside. With concurrent write load, the matter is even more complicated.
And there are various parameters that need to be defined (how to handle conflicts exactly):
How to use RETURNING with ON CONFLICT in PostgreSQL?

Complex SQL Query in Rails 4

I have a complicated query I need for a scope in my Rails app and have tried a lot of things with no luck. I've resorted to raw SQL via find_by_sql but wondering if any gurus wanted to take a shot. I will simplify the verbiage a bit for clarity, but the problem should be stated accurately.
I have Users. Users own many Records. One of them is marked current (#is_current = true) and the rest are not. Each CoiRecord has many Relationships. Relationships have a value for when they were active (active_when) which takes four values, [1..4].
Values 1 and 2 are considered recent. Values 3 and 4 are not.
The problem was ultimately to have a scopes (has_recent_relationships and has_no_recent_relationships) on User that filters on whether or not they have recent Relationships on current Record. (Old Records are irrelevant for this.) I tried create a recent and not_recent scope on Relationship, and then building the scopes on Record, combining with checking for is_current == 1. Here is where I failed. I have to move on with the app but have no choice but to use raw SQL and continue the app, hoping to revisit this later. I put that on User, the only context I really need it, and set aside the code for the scopes on the other objects.
The SQL that works, that correctly finds the Users who have recent relationships is below. The other just uses "= 0" instead "> 0" in the HAVING clause.
SELECT * FROM users WHERE `users`.`id` IN (
SELECT
records.owner_id
FROM `coi_records`
LEFT OUTER JOIN `relationships` ON `relationships`.`record_id` = `records`.`id`
WHERE `records`.`is_current` = 1
HAVING (
SELECT count(*)
FROM relationships
WHERE ((record_id = records.id) AND ((active_when = 1) OR (active_when = 2)))
) > 0
)
My instincts tell me this is complicated enough that my modeling probably could be redesigned and simplified, but the individual objects are pretty simple, just getting at this specific data from two objects away has become complicated.
Anyway, I'd appreciate any thoughts. I'm not expecting a full solution because, ick. Just thought the masochists among you might find this amusing.
Have you tried using Arel directly and this website?
Just copy-and-pasting your query you get this:
User.select(Arel.star).where(
User.arel_table[:id].in(
Relationship.select(Arel.star.count).where(
Arel::Nodes::Group.new(
Relationship.arel_table[:record_id].eq(Record.arel_table[:id]).and(
Relationship.arel_table[:active_when].eq(1).or(Relationship.arel_table[:active_when].eq(2))
)
)
).joins(
CoiRecord.arel_table.join(Relationship.arel_table, Arel::Nodes::OuterJoin).on(
Relationship.arel_table[:record_id].eq(Record.arel_table[:id])
).join_sources
).ast
)
)
I managed to find a way to create what I needed which returns ActiveRelationship objects, which simplifies a lot of other code. Here's what I came up with. This might not scale well, but this app will probably not end up with so much data that it will be a problem.
I created two scope methods. The second depends on the first to simplify things:
def self.has_recent_relationships
joins(records_owned: :relationships)
.merge(Record.current)
.where("(active_when = 1) OR (active_when = 2)")
.distinct
end
def self.has_no_recent_relationships
users_with_recent_relationships = User.has_recent_relationships.pluck(:id)
if users_with_recent_relationships.length == 0
User.all
else
User.where("id not in (?)", users_with_recent_relationships.to_a)
end
end
The first finds Users with recent relationships by just joining Record, merging with a scope that selects current records (should be only one), and looks for the correct active_when values. Easy enough.
The second method finds Users who DO have recent relationships (using the first method.) If there are none, then all Users are in the set of those with no recent relationships, and I return User.all (this will really never happen in the wild, but in theory it could.) Otherwise I return the inverse of those who do have recent relationships, using the SQL keywords NOT IN and an array. It's this part that could be non-performant if the array gets large, but I'm going with it for the moment.

Building sql insert strings on the fly in Ruby on Rails to improve performance

I use the following code to get a sql insert string to build an insert statement for bulk insertion of large numbers of records. I need this because saving them singly via ActiveRecord is slow.
Adding them in via activerecord-import is also too slow -- much slower than my current method. E.g., my method takes 15 seconds where activerecord-import takes 4 minutes 10 seconds on a sample bulk insert of 100k records (which is typical for me).
class MyClass < ActiveRecord::Base
def get_sql_insert_string
"('#{entry_date.strftime('%Y-%m-%d')}',
#{field1.nil? ? "NULL" : field1},
#{field2.nil? ? "NULL" : field2},
#{field3.nil? ? "NULL" : field3})"
end
end
The problem is that this is fragile. Every time I add a field, I need to remember to include it in this method. Can I use dynamic programming (define_method) to build this once when the class loads?
There are a few tools that exist that will make your life a lot easier. Rather than write your own sql insert statements I would recommend using a gem such as activerecord-import which will handle doing the bulk inserts for you.
To squeeze the last bit of performance out of activerecord-import you can use the columns property of your active record classes. For instance
columns = MyClass.columns.map(&:name)
values = my_instances.map { |x| MyClass.columns.map { |y| x.send(y.name) } }
MyClass.import columns, values, :validate => false

Rails way to reset seed on id field

I have found the "pure SQL" answers to this question. Is there a way, in Rails, to reset the id field for a specific table?
Why do I want to do this? Because I have tables with constantly moving data - rarely more than 100 rows, but always different. It is up to 25k now, and there's just no point in that. I intend on using a scheduler internal to the Rails app (rufus-scheduler) to run the id field reset monthly or so.
You never mentioned what DBMS you're using. If this is postgreSQL, the ActiveRecord postgres adapter has a reset_pk_sequences! method that you could use:
ActiveRecord::Base.connection.reset_pk_sequence!('table_name')
I came out with a solution based on hgimenez's answer and this other one.
Since I usually work with either Sqlite or PostgreSQL, I've only developed for those; but extending it to, say MySQL, shouldn't be too troublesome.
Put this inside lib/ and require it on an initializer:
# lib/active_record/add_reset_pk_sequence_to_base.rb
module ActiveRecord
class Base
def self.reset_pk_sequence
case ActiveRecord::Base.connection.adapter_name
when 'SQLite'
new_max = maximum(primary_key) || 0
update_seq_sql = "update sqlite_sequence set seq = #{new_max} where name = '#{table_name}';"
ActiveRecord::Base.connection.execute(update_seq_sql)
when 'PostgreSQL'
ActiveRecord::Base.connection.reset_pk_sequence!(table_name)
else
raise "Task not implemented for this DB adapter"
end
end
end
end
Usage:
Client.count # 10
Client.destroy_all
Client.reset_pk_sequence
Client.create(:name => 'Peter') # this client will have id=1
EDIT: Since the most usual case in which you will want to do this is after clearing a database table, I recommend giving a look to database_cleaner. It handles the ID resetting automatically. You can tell it to delete just selected tables like this:
DatabaseCleaner.clean_with(:truncation, :only => %w[clients employees])
I assume you don't care about the data:
def self.truncate!
connection.execute("truncate table #{quoted_table_name}")
end
Or if you do, but not too much (there is a slice of time where the data only exists in memory):
def self.truncate_preserving_data!
data = all.map(&:clone).each{|r| raise "Record would not be able to be saved" unless r.valid? }
connection.execute("truncate table #{quoted_table_name}")
data.each(&:save)
end
This will give new records, with the same attributes, but id's starting at 1.
Anything belongs_toing this table could get screwy.
Based on #hgmnz 's answer, I made this method that will set the sequence to any value you like... (Only tested with the Postgres adapter.)
# change the database sequence to force the next record to have a given id
def set_next_id table_name, next_id
connection = ActiveRecord::Base.connection
def connection.set_next_id table, next_id
pk, sequence = pk_and_sequence_for(table)
quoted_sequence = quote_table_name(sequence)
select_value <<-end_sql, 'SCHEMA'
SELECT setval('#{quoted_sequence}', #{next_id}, false)
end_sql
end
connection.set_next_id(table_name, next_id)
end
One problem is that these kinds of fields are implemented differently for different databases- sequences, auto-increments, etc.
You can always drop and re-add the table.
No there is no such thing in Rails. If you need a nice ids to show the users then store them in a separate table and reuse them.
You could only do this in rails if the _ids are being set by rails. As long as the _ids are being set by your database, you won't be able to control them without using SQL.
Side note: I guess using rails to regularly call a SQL procedure that resets or drops and recreates a sequence wouldn't be a purely SQL solution, but I don't think that is what you're asking...
EDIT:
Disclaimer: I don't know much about rails.
From the SQL perspective, if you have a table with columns id first_name last_name and you usually insert into table (first_name, last_name) values ('bob', 'smith') you can just change your queries to insert into table (id, first_name, last_name) values ([variable set by rails], 'bob', 'smith') This way, the _id is set by a variable, instead of being automatically set by SQL. At that point, rails has entire control over what the _ids are (although if it is a PK you need to make sure you don't use the same value while it's still in there).
If you are going to leave the assignment up to the database, you have to have rails run (on whatever time schedule) something like:
DROP SEQUENCE MY_SEQ;
CREATE SEQUENCE MY_SEQ START WITH 1 INCREMENT BY 1 MINVALUE 1;
to whatever sequence controls the ids for your table. This will get rid of the current sequence, and create a new one. This is the simplest way I know of you 'reset' a sequence.
Rails way for e.g. MySQL, but with lost all data in table users:
ActiveRecord::Base.connection.execute('TRUNCATE TABLE users;')
Maybe helps someone ;)
There are CounterCache methods:
https://www.rubydoc.info/docs/rails/4.1.7/ActiveRecord/CounterCache/ClassMethods
I used Article.reset_counters Article.all.length - 1 and it seemed to work.

Lazy loading/caching of SQL query results with a model

I'm developing a system (with Rails 2.3.2, Ruby 1.8.7-p72) that has a sizable reporting component. In order to improve performance, I've created a Report model to archive old reports. The idea is that if a matching report already exists for an arbitrary set of conditions then use it, otherwise generate the report and save the results.
Moreover, I'd like to design the Report model in such a way that only the requested attributes have their corresponding SQL queries run. This all stems from the fact that each attribute takes a long time to compute and I'd rather not generate results that won't be used. I.e. I would like to do something like:
def foo
#foo ||= read_attribute(:foo)
if #foo.nil?
#foo = write_attribute(:foo, (expensive SQL query result))
end
#foo
end
The problem I'm experiencing, however, is that results aren't being properly written out to my database and, as a result, the code is constantly reevaluating the SQL query.
Can anyone tell me why write_attribute isn't working? Furthermore, is there a better approach?
Turns out that what I was doing was fine. The real problem was that the object's "id" lookup was being trumped by a piece of code I had elsewhere. I.e. the actual write was occurring to the database, but with the wrong primary key.
Don't you need to call "save" after doing write_attribute?