Preload all linked records using ecto - sql

I have linked list kind of structure
defmodule Data.Record do
use Data.Web, :model
alias Data.{Record, Repo}
schema "records" do
field(:date_start, :date)
field(:date_end, :date)
field(:change_reason, :string)
field(:is_active, :boolean, default: true)
field(:notes, :string)
belongs_to(
:changed_from,
Data.Record,
foreign_key: :changed_from_id
)
belongs_to(
:changed_to,
Data.Record,
foreign_key: :changed_to_id
)
timestamps()
end
end
But the problem is we need all the nested records preloaded dynamically. e.g the list can record1 changed_to -> record2 changed_to -> record 3 changed_to. But ecto doesnt/cant preload dynamically e.g record |> preload([{:changed_to, :changed_to}])
What is the best way/workaround to preload all the linked changed_to records?

Well, the most (dirty) workaround would be something like this. It builds the arguments for preload to a certain depth:
def preload_args(relation, max_level \\ 50) do
preload_args(relation, max_level - 1, relation)
end
defp preload_args(_relation, level, acc) when level <= 0, do: acc
defp preload_args(relation, level, acc) do
preload_args(relation, level - 1, [{relation, acc}])
end
To use it:
Repo.preload record, Record.preload_args(:changed_to)
This will preload every :changed_to relation to a certain level or until there are no more. Of course this is not the solution you would really like to use because it performs a query for every preload and you don't know how long the chain will be upfront, might be much longer than 50 steps.
(please don't roast me for this code/suggestion, you specifically asked for workarounds too. ;)
I think that this comment about a 'closure table' by Aetherus, which pointed me to this article will probably lead you to a better solution. It also strengthens my presumption that you don't need to store both parent and child ids in the first place, the parent_id alone should be enough. That would also make it easier to insert a new Record: you don't need to update the parent too.

Related

Why does Postgres not accept my count column?

I am building a Rails app with the following models:
# vote.rb
class Vote < ApplicationRecord
belongs_to :person
belongs_to :show
scope :fulfilled, -> { where(fulfilled: true) }
scope :unfulfilled, -> { where(fulfilled: false) }
end
# person.rb
class Person < ApplicationRecord
has_many :votes, dependent: :destroy
def self.order_by_votes(show = nil)
count = 'nullif(votes.fulfilled, true)'
count = "case when votes.show_id = #{show.id} AND NOT votes.fulfilled then 1 else null end" if show
people = left_joins(:votes).group(:id).uniq!(:group)
people = people.select("people.*, COUNT(#{count}) AS people.vote_count")
people.order('people.vote_count DESC')
end
end
The idea behind order_by_votes is to sort People by the number of unfulfilled votes, either counting all votes, or counting only votes associated with a given Show.
This seem to work fine when I test against SQLite. But when I switch to Postgres I get this error:
Error:
PeopleControllerIndexTest#test_should_get_previously_on_show:
ActiveRecord::StatementInvalid: PG::UndefinedColumn: ERROR: column people.vote_count does not exist
LINE 1: ...s"."show_id" = $1 GROUP BY "people"."id" ORDER BY people.vot...
^
If I dump the SQL using #people.to_sql, this is what I get:
SELECT people.*, COUNT(nullif(votes.fulfilled, true)) AS people.vote_count FROM "people" LEFT OUTER JOIN "votes" ON "votes"."person_id" = "people"."id" GROUP BY "people"."id" ORDER BY people.vote_count DESC
Why is this failing on Postgres but working on SQLite? And what should I be doing instead to make it work on Postgres?
(PS: I named the field people.vote_count, with a dot, so I can access it in my view without having to do another SQL query to actually view the vote count for each person in the view (not sure if this works) but I get the same error even if I name the field simply vote_count.)
(PS2: I recently added the .uniq!(:group) because of some deprecation warning for Rails 6.2, but I couldn't find any documentation for it so I am not sure I am doing it right, still the error is there without that part.)
Are you sure you're not getting a syntax error from PostgreSQL somewhere? If you do something like this:
select count(*) as t.vote_count from t ... order by t.vote_count
I get a syntax error before PostgreSQL gets to complain about there being no t.vote_count column.
No matter, the solution is to not try to put your vote_count in the people table:
people = people.select("people.*, COUNT(#{count}) AS vote_count")
...
people.order(vote_count: :desc)
You don't need it there, you'll still be able to reference the vote_count just like any "normal" column in people. Anything in the select list will appear as an accessor in the resultant model instances whether they're columns or not, they won't show up in the #inspect output (since that's generated based on the table's columns) but you call the accessor methods nonetheless.
Historically there have been quite a few AR problems (and bugs) in getting the right count by just using count on a scope, and I am not sure they are actually all gone.
That depends on the scope (AR version, relations, group, sort, uniq, etc). A defaut count call that a gem has to generically use on a scope is not a one-fit-all solution. For that known reason Pagy allows you to pass the right count to its pagy method as explained in the Pagy documentation.
Your scope might become complex and the default pagy collection.count(:all) may not get the actual count. In that case you can get the right count with some custom statement, and pass it to pagy.
#pagy, #records = pagy(collection, count: your_count)
Notice: pagy will efficiently skip its internal count query and will just use the passed :count variable.
So... just get your own calculated count and pass it to pagy, and it will not even try to use the default.
EDIT: I forgot to mention: you may want to try the pagy arel extra that:
adds specialized pagination for collections from sql databases with GROUP BY clauses, by computing the total number of results with COUNT(*) OVER ().
Thanks to all the comments and answers I have finally found a solution which I think is the best way to solve this.
First of, the issue occurred when I called pagy which tried to count my scope by appending .count(:all). This is what caused the errors. The solution was to not create a "field" in select() and use it in .order().
So here is the proper code:
def self.order_by_votes(show = nil)
count = if show
"case when votes.show_id = #{show.id} AND NOT votes.fulfilled then 1 else null end"
else
'nullif(votes.fulfilled, true)'
end
left_joins(:votes).group(:id)
.uniq!(:group)
.select("people.*, COUNT(#{count}) as vote_count")
.order(Arel.sql("COUNT(#{count}) DESC"))
end
This sorts the number of people on the number of unfulfilled votes for them, with the ability to count only votes for a given show, and it works with pagy(), and pagy_arel() which in my case is a much better fit, so the results can be properly paginated.

Rails advance ordering with SQL

I have User model where it has some attributes like is_admin, is_verified, and also has association with other model such as Badges and Activities.
I have table design in HTML where in default, the user will be ordered by is_verified, then is_admin, then number of badges, then number of activities respectively. But I don't know how to create one.
I have tried sample code like this:
users = User.all.limit(10)
users.order(is_verified: :true).order(is_admin: :true).order(users.map{|user| user.badges.count}).order(users.map{|user| user.activites.count})
But this will not work since order only accept :asc, :desc, :ASC, :DESC, "asc", "desc", "ASC", "DESC"]
Do you have any new methods to do this, I'm new to query? Thank you very much for your help.
I have sample design like this:
order accepts multiple args, so you could use something like the following:
users.order(is_verified: :desc, is_admin: :desc)
.order('badges_count DESC', 'activities_count DESC')
For the boolean columns, they're typically stored in the db as 1 for true, 0 for false, hence the desc direction. If you're not defaulting these columns to one or the other (presumably false), you might want to adjust to (i.e.) order("is_verified DESC NULLS LAST", ...).
This would require a bit of a refactor to run efficiently, relying on counter_caches for badges and activities. If you're going to be querying this regularly, I'd highly advise adding this in. There's a good guide here.
All that will be required is:
new columns for activities_count and badges_count
counter_cache: true set on the belongs_to side of the association
the counters reseting using this method, i.e. User.reset_counters(:badges, :activities)
Otherwise, as is, you could use:
User.left_joins(:badges, :activities)
.order(is_verified: :desc, is_admin: :desc)
.group("badges.id", "activities.id")
.order('COUNT(badges.id) DESC', 'COUNT(activities.id) DESC')
If you're curious as to it's output, you can call to_sql to find how it's actually calling the data from the db.
Have a go at that and let me know how you get on - happy to help if you have any questions!
You could try ordering with database itself, which is more fast than perform sorting on loaded records array. By default if you order any boolean field in descending order, all records having boolean value as true will come first. The final query may look like,
User.order(is_verified: :desc).order(is_admin: :desc).
left_joins(:badges).
group(:id).
order("count(badges.id) desc")
left_joins(:activities)
group(:id).
order("count(activities.id) desc")
Left joins (Left Outer Joins) include documents having zero or more associated records. You can find the details in the documentation page.
Hope it helps !

Inserting Nested Object Efficiently

If I have the following schema:
class Sinbad < ActiveRecord::Base
has_many :tinbads
accepts_nested_attributes_for: :tinbads
end
class Tinbad < ActiveRecord::Base
belongs_to :sinbad
has_many :pinbads
accepts_nested_attributes_for: :pinbads
end
class Pinbad < ActiveRecord::Base
belongs_to :tinbad
end
and it's not uncommon for a Tinbad to have a few hundred Pinbads, is there a common way to create a nested Sinbad without invoking hundreds of queries?
I've come to the sad understanding that Active Record doesn't support batch inserts but is there a way around this that doesn't involve handwritten SQL? I've looked at https://github.com/zdennis/activerecord-import, but it doesn't support nested objects. Currently, the SinbadController#create action averages >400 insert transactions and it's the most common action used.
Here's an example of what I want not to happen:
https://gist.github.com/adamkuipers/12578343d31a651bee4a
Instead of inserting into the photos table N times, I want to insert only once.
I'm having the exact same problem. I'm parsing big spreadsheets and the schema used to store the data is nested, so I'm inserting only a single "Sinbad" but thousands of "Pinbad" can get inserted at once...
What I came up to speed up the insert is to bulk insert the bottom leaves of the schema (visualize the schema as a tree), as this must be the model with the highest amount of instances to create - in your case, the Pinbad instances. We cannot bulk insert middle leaves as bulk insert doesn't allow to fetch the ids of the multiplie models inserted (see discussion here concerning postgresql for instance). So it's not ideal, but that's the only way I found to make the inserts more efficient (without changing the schema itself).
You'll have to remove accepts_nested_attributes_for as you need to save the objects yourself, and it's convenient to use activerecord-import for the bulk insert:
class Sinbad
#
# Let's imagine you still receive the params as if you were using accepts_nested_attributes,
# Meaning :pinbads_attributes will be nested under :tinbads_attributes
# that will be nested under :sinbad
#
def self.efficient_create params
# I think AR doesn't like when attributes doesn't exist,
# so we should keep the tinbads attributes somewhere else
tinbads_attributes = params[:tinbads_attributes]
params.delete :tinbads_attributes
sinbad = self.create! params
# Array that will contain the attributes of the pinbads to bulk insert
pinbads_to_save = []
# ActiveRecords-Import needs to know which cols of Pinbad you insert
pinbads_cols = [:tinbad_id, :name, :other]
# We need to manually save the tinbads one by one,
# but that's what happen when using accepts_nested_attributes_for
tinbads_attributes.each do |attrs|
pinbads_attribute = attrs[:pinbads_attributes]
attrs.delete :pinbads_attibutes
tinbad = sinbad.tinbads.create! attrs
pinbads_attributes.each do |p_attrs|
# Take care to put the attributes
# in the same order than the pinbad_cols array
pinbads_to_save << [tinbad.id, p_attrs[:name], p_attrs[:other]]
end
end
# Now we can bulk insert the pinbads, using activerecord-import
Pinbad.import_without_validations_or_callbacks pinbad_cols, pinbads_to_save
end
end
That's what I've done in my situation and as the last level in the schema hierarchy has the most instances to create, the overall insert time was greatly reduced. In your case, you would replace the ~400 inserts of Pinbad with 1 bulk insert.
Hope that helps, and I'm open to any suggestion or alternative solution!

Sorting a Rails database table by a column in an associated model with additional scoping

I am new to Rails. I am trying to implement something like Ryan Bates' sortable table columns code (Railscast #228) on a legacy database. My question is very similar to "Sorting a Rails database table by a column in an associated model", but I can't seem to solve mine based on the answers there.
I want to be able to sort my list of projects in the index view by the udtid in the entityudfstorage table (class Ent), i.e., by project.ent.udtid. I have an additional consideration, in that each project matches a number of ent rows, so I need to scope to match to where ent.rowindex != 0.
The Models:
class Project < ActiveRecord::Base
has_many :ent, :foreign_key => "attachtoid"
has_many :samples, :foreign_key => "projectid"
class Ent < ActiveRecord::Base
set_table_name("entityudfstorage")
belongs_to :project, :foreign_key => "attachtoid"
scope :rowindex, where('entityudfstorage.rowindex != ? ', "0")
project index view:
<tr>
<th><%= sortable "name", "Name" %></th>
<th><%= sortable "projecttype", "Project Type" %> </th>
</tr>
<tr>
<td><%= project.name %></td>
<td><%= project.ent.rowindex.first.udtid %></td>
</tr>
project controller
def list
#projects = Project.order(sort_column + " " + sort_direction)
end
I've been trying to figure out what I can put in the "sort_column" for projecttype which would get it to sort by the associated field project.ent.rowindex.first.udtid (the same way that "name" works in the controller to sort by project.name).
I tried putting in a scope in projects of
scope :by_udtids, Project.joins("left join ent on projects.projectid = ent.attachtoid").where('ent.rowindex != ?', 0).order("ent.udtid DESC")
and then tried this in the project controller.
if sort_column == "projecttype"
#projects = Project.by_udtids
else
#projects = Project.order(sort_column + " " + sort_direction)
The result is that the project index page shows up with the proper data in the columns, but when I click on the "Project Type" link header, it does not sort (whereas, if I click on the "Name" link header, it does sort. The logs I can see in the server terminal are the same for both clicks, and the query's seem correct..
Started GET "/projects?direction=asc&sort=projecttype" for 128.208.10.200 at 2013-08-29 07:47:52 -0700
Processing by ProjectsController#index as HTML
Parameters: {"direction"=>"asc", "sort"=>"projecttype"}
Project Load (1.5ms) SELECT "project".* FROM "project" ORDER BY name asc
Ent Load (0.4ms) SELECT "entityudfstorage".* FROM "entityudfstorage" WHERE "entityudfstorage"."attachtoid" = 602 AND (entityudfstorage.rowindex != '0' ) LIMIT 1
CACHE (0.0ms) SELECT "entityudfstorage".* FROM "entityudfstorage" WHERE "entityudfstorage"."attachtoid" = 602 AND (entityudfstorage.rowindex != '0' ) LIMIT 1
(0.3ms) SELECT COUNT(*) FROM "sample" WHERE "sample"."projectid" = 602
Ent Load (0.3ms) SELECT "entityudfstorage".* FROM "entityudfstorage" WHERE "entityudfstorage"."attachtoid" = 603 AND (entityudfstorage.rowindex != '0' ) LIMIT 1
CACHE (0.0ms) SELECT "entityudfstorage".* FROM "entityudfstorage" WHERE "entityudfstorage"."attachtoid" = 603 AND (entityudfstorage.rowindex != '0' ) LIMIT 1
(0.2ms) SELECT COUNT(*) FROM "sample" WHERE "sample"."projectid" = 603
Rendered projects/list.html.erb within layouts/admin (478.7ms)
Completed 200 OK in 487ms (Views: 398.7ms | ActiveRecord: 87.9ms)
[2013-08-29 07:55:27] WARN Could not determine content-length of response body. Set content- length of the response or set Response#chunked = true
Started GET "/assets/jquery.js?body=1" for 128.208.10.200 at 2013-08-29 07:55:28 -0700
Served asset /jquery.js - 304 Not Modified (0ms)
[2013-08-29 07:55:28] WARN Could not determine content-length of response body. Set content- length of the response or set Response#chunked = true
Much appreciate any insight!
You don't actually say what "doesn't work" means in this context. I would guess that it either means you get an error message or that the Projects aren't sorted the way you expect.
If it's an error message, you should include it with your question, but the first thing I'd do in your place is probably to start using the Rails conventions for naming and referring to things. If your associated class is UserDefinedField, Rails is going to expect the table to be named user_defined_fields and the association to be specified as has_many :user_defined_fields. It also expects fields to be in snake case (attach_to_id, not attachtoid), but aside from having to specify every foreign key everywhere, that probably won't cause errors. Looking at your code, I would expect Rails to be complaining about the association name any time it loaded the Project model. You should either change these things to match the conventions, or (if you have a good reason to use these names), tell Rails what's up by specifying the class and table names in your associations.
If it's an incorrect response order, I'm less certain what the issue is. One thing that jumps out immediately is that you're going to get the same Project returned multiple times with this scope, once for each associated UserDefinedField. If you don't want that to happen, you'll need to add a group clause to your scope and some sort of aggregation for the userdefinedfields.udtids. This might look something like:
scope :by_udtids, Project.
joins("left join userdefinedfields on projects.projectid = userdefinedfields.attachtoid").
where('userdefinedfields.rowindex != ?', 0).
group('projects.id').
order("max(userdefinedfields.udtid) DESC")
Edit:
It looks like your current problem is that the scope isn't getting used at all. Here's a couple of reasons that might be the case:
I notice that your controller action is called ProjectsController#list, but that your log says the request with the parameters is being processed by ProjectsController#index. The call to the list action doesn't appear to be running any queries, which sounds as though perhaps it isn't doing anything that triggers actually loading the objects. Could this simply be a routing or template error?
Given that you say the query being run is the same in both cases, if the correct action is being called, it seems likely that your conditional (sort_column == "projecttype") is returning false, even when you pass the parameters. Even if the scope wasn't quite correct, you would otherwise still see at least a different query there. Is it possible that you're simply forgetting to set sort_column = params[:sort]? Try temporarily removing the conditional - just always use #projects = Project.by_udtid. See if you get a different query then.
Side note on the scope/query itself. If I understand your comment correctly, you want to sort by the udtid of the Ent with the lowest nonzero rowindex. This is going to be tricky, and the details will depend on your database (grouping, especially complex grouping, is one of the things that works fairly differently in mySQL vs. PostGreSQL, the two most common databases for Rails apps). The concept is called a 'groupwise maximum', and your best bet is probably to search for that phrase in conjunction with your database name.

Update more record in one query with Active Record in Rails

Is there a better way to update more record in one query with different values in Ruby on Rails? I solved using CASE in SQL, but is there any Active Record solution for that?
Basically I save a new sort order when a new list arrive back from a jquery ajax post.
#List of product ids in sorted order. Get from jqueryui sortable plugin.
#product_ids = [3,1,2,4,7,6,5]
# Simple solution which generate a loads of queries. Working but slow.
#product_ids.each_with_index do |id, index|
# Product.where(id: id).update_all(sort_order: index+1)
#end
##CASE syntax example:
##Product.where(id: product_ids).update_all("sort_order = CASE id WHEN 539 THEN 1 WHEN 540 THEN 2 WHEN 542 THEN 3 END")
case_string = "sort_order = CASE id "
product_ids.each_with_index do |id, index|
case_string += "WHEN #{id} THEN #{index+1} "
end
case_string += "END"
Product.where(id: product_ids).update_all(case_string)
This solution works fast and only one query, but I create a query string like in php. :) What would be your suggestion?
You should check out the acts_as_list gem. It does everything you need and it uses 1-3 queries behind the scenes. Its a perfect match to use with jquery sortable plugin. It relies on incrementing/decrementing the position (sort_order) field directly in SQL.
This won't be a good solution for you, if your UI/UX relies on saving the order manually by the user (user sorts out the things and then clicks update/save). However I strongly discourage this kind of interface, unless there is a specific reason (for example you cannot have intermediate state in database between old and new order, because something else depends on that order).
If thats not the case, then by all means just do an asynchronous update after user moves one element (and acts_as_list will be great to help you accomplish that).
Check out:
https://github.com/swanandp/acts_as_list/blob/master/lib/acts_as_list/active_record/acts/list.rb#L324
# This has the effect of moving all the higher items down one.
def increment_positions_on_higher_items
return unless in_list?
acts_as_list_class.unscoped.where(
"#{scope_condition} AND #{position_column} < #{send(position_column).to_i}"
).update_all(
"#{position_column} = (#{position_column} + 1)"
)
end