Can I :select multiple fields (*, foo) without the extra ones being added to my instances (Instance.foo=>bar) - sql

I'm trying to write a named scope that will order my 'Products' class based on the average 'Review' value. The basic model looks like this
Product < ActiveRecord::Base
has_many :reviews
Review < ActiveRecord::Base
belongs_to :product
# integer value
I've defined the following named scope on Product:
named_scope :best_reviews,
:select => "*, AVG(reviews.value) score",
:joins => "INNER JOIN (SELECT * FROM reviews GROUP BY reviews.product_id) reviews ON reviews.product_id = products.id",
:group => "reviews.product_id",
:order => "score desc"
This seems to be working properly, except that it's adding the 'score' value in the select to my Product instances, which causes problems if I try to save them, and makes comparisons return false (#BestProduct != Product.best_reviews.first, becuase Product.best_reviews.first has score=whatever).
Is there a better way to structure the named_scope? Or a way to make Rails ignore the extra field in the select?

I'm not a Rails developer, but I know SQL allows you to sort by a field that is not in the select-list.
Can you do this:
:select => "*",
:joins => "INNER JOIN (SELECT * FROM reviews GROUP BY reviews.product_id) reviews ON reviews.product_id = products.id",
:group => "reviews.product_id",
:order => "AVG(reviews.value) desc"

Wow, so I should really wait before asking questions. Here's one solution (I'd love to hear if there are better approaches):
I moved the score field into the inner join. That makes it available for ordering but doesn't seem to add it to the instance:
named_scope :best_reviews,
:joins => "INNER JOIN (
SELECT *, AVG(value) score FROM reviews GROUP BY reviews.product_id
) reviews ON reviews.product_id = products.id",
:group => "reviews.product_id",
:order => "reviews.score desc"

Related

Sequence of finder methods in Active Record Query in Rails

I am writing a complex Active record query to fetch data from multiple tables, the query have joins , select , order , group ,select where.
#posts = Post.published.paginate(:order => 'popularity desc, id',
:joins => [:comments, :images, :updates, :user],
:conditions => conditions,
:group => "posts.id",
:select => "posts.id*,
:per_page => 10,
:page => params[:page])
I wanted to know what should be the sequence of where , joins etc as per the standard and to maximize the performance of the query. If someone could write a query to explain the sequence that would be really great like
#posts = Post.published.joins(:comments, :images, :updates, :user).where(....
As far as I know the sequence does not matter.
An example would be:
#posts = Post.published.select('posts.id').
joins(:comments, :images, :updates, :user).
where('users.email = ?', 'john#doe.com').
group('posts.id')

SQL problems when migrating from MySQL to PostgreSQL

I have a Ruby on Rails 2.3.x application that I'm trying to migrate from my own VPS to Heroku, including porting from SQLite (development) and MySQL (production) to Postgres.
This is a typical Rails call I'm using:
spots = Spot.paginate(:all, :include => [:thing, :user, :store, {:thing => :tags}, {:thing => :brand}], :group => :thing_id, :order => order, :conditions => conditions, :page => page, :per_page => per_page)
Question 1: I get a lot of errors like PG::Error: ERROR: column "spots.id" must appear in the GROUP BY clause or be used in an aggregate function. SQLite/MySQL was evidently more forgiving here. Of course I can easily fix these by adding the specified fields to my :group parameter, but I feel I'm messing up my code. Is there a better way?
Question 2: If I throw in all the GROUP BY columns that Postgres is missing I end up with the following statement (only :group has changed):
spots = Spot.paginate(:all, :include => [:thing, :user, :store, {:thing => :tags}, {:thing => :brand}], :group => 'thing_id,things.id,users.id,spots.id', :order => order, :conditions => conditions, :page => page, :per_page => per_page)
This in turn produces the following SQL code:
SELECT * FROM (SELECT DISTINCT ON ("spots".id) "spots".id, spots.created_at AS alias_0 FROM "spots"
LEFT OUTER JOIN "things" ON "things".id = "spots".thing_id
WHERE (spots.recommended_to_user_id = 1 OR spots.user_id IN (1) OR things.is_featured = 't')
GROUP BY thing_id,things.id,users.id,spots.id) AS id_list
ORDER BY id_list.alias_0 DESC LIMIT 16 OFFSET 0;
...which produces the error PG::Error: ERROR: missing FROM-clause entry for table "users". How can I solve this?
Question 1:
...Is there a better way?
Yes. Since PostgreSQL 9.1 the primary key of a table logically covers all columns of a table in the GROUP BY clause. I quote the release notes for version 9.1:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
Question 2:
The following statement ... produces the error
PG::Error: ERROR: missing FROM-clause entry for table "users"
How can I solve this?
First (as always!), I formatted your query to make it easier to understand. The culprit has bold emphasis:
SELECT *
FROM (
SELECT DISTINCT ON (spots.id)
spots.id, spots.created_at AS alias_0
FROM spots
LEFT JOIN things ON things.id = spots.thing_id
WHERE (spots.recommended_to_user_id = 1 OR
spots.user_id IN (1) OR
things.is_featured = 't')
GROUP BY thing_id, things.id, users.id, spots.id
) id_list
ORDER BY id_list.alias_0 DESC
LIMIT 16
OFFSET 0;
It's all obvious now, right?
Well, not all of it. There is a lot more. DISTINCT ON and GROUP BY in the same query for one, which has its uses, but not here. Radically simplify to:
SELECT s.id, s.created_at AS alias_0
FROM spots s
WHERE s.recommended_to_user_id = 1 OR
s.user_id = 1 OR
EXISTS (
SELECT 1 FROM things t
WHERE t.id = s.thing_id
AND t.is_featured = 't')
ORDER BY s.created_at DESC
LIMIT 16;
The EXISTS semi-join avoids the later need to GROUP BY a priori. This should be much faster (besides being correct) - if my assumptions about the missing table definitions hold.
Going the "pure SQL" route opened up a can of worms for me, so I tried keeping the will_paginate gem and tweak the Spot.paginate parameters instead. The :joins parameter turned out to be very helpful.
This is currently working for me:
spots = Spot.paginate(:all, :include => [:thing, {:thing => :tags}, {:thing => :brand}], :joins => [:user, :store, :thing], :group => 'thing_id,things.id,users.id,spots.id', :order => order, :conditions => conditions, :page => page, :per_page => per_page)

Advanced count and join in Rails

I am try to find the top n number of categories as they relate to articles, there is a habtm relationship set up between the two. This is the SQL I want to execute, but am unsure of how to do this with ActiveRecord, aside from using the find_by_sql method. is there any way of doing this with ActiveRecord methods:
SELECT
"categories".id,
"categories".name,
count("articles".id) as counter
FROM "categories"
JOIN "articles_categories"
ON "articles_categories".category_id = "categories".id
JOIN "articles"
ON "articles".id = "articles_categories".article_id
GROUP BY "categories".id
ORDER BY counter DESC
LIMIT 5;
You can use find with options to achieve the same query:
Category.find(:all,
:select => '"categories".id, "categories".name, count("articles".id) as counter',
:joins => :articles,
:group => '"categories".id',
:order => 'counter DESC',
:limit => 5
)

Sorting ActiveRecord models by sub-model attributes?

Lets assume I have a model Category which has_many Items. Now, I'd like to present a table of Categories sorted on various attributes of Items. For example, have the category with the highest priced item at the top. Or sort the categories based on their best rated item. Or sort the categories based on the most recent item (i.e., the category with the most recent item would be first).
class Category < ActiveRecord::Base
has_many :items
# attributes: name
end
class Item < ActiveRecord::Base
belongs_to :category
# attributes: price, rating, date,
end
Which is the best approach?
Maintain additional columns on the Category model to hold the attributes for sorting (i.e., the highest item price or the best rating of an item in that category). I've done this before but it's kinda ugly and requires updating the category model each time an Item changes
Some magic SQL incantation in the order clause?
Something else?
The best I can come up with is this SQL, for producing a list of Category sorted by the max price of the contained Items.
select categories.name, max(items.price) from categories join items group by categories.name
Not sure how this translates into Rails code though. This SQL also doesn't work if I wanted the Categories sorted by the price of the most recent item. I'm really trying to keep this in the database for obvious performance reasons.
Assuming the attributes listed in the items model are database columns there are many things you could do.
The easiest is probably named_scopes
/app/models/category.rb
class Category < ActiveRecord::Base
has_many :items
# attributes: name
named_scope :sorted_by_price, :joins => :items, :group => 'users.id', :order => "items.price DESC"
named_scope :sorted_by_rating, :joins => :items, :group => 'users.id', :order => "items.rating DESC"
named_scope :active, :condition => {:active => true}
end
Then you could just use Category.sorted_by_price to return a list of categories sorted by price, highest to lowest. The advantages of named_scopes lets you chain multiple similar queries. Using the code above, if your Category had a boolean value named active. You could use
Category.active.sorted_by_price to get a list of active categories ordered by their most expensive item.
Isn't that exactly what :joins is for?
Category.find(:all, :joins => :items, :order => 'price')
Category.find(:all, :joins => :items, :order => 'rating')

named_scope to order posts by last comment date

Posts has_many Comments
I'm using searchlogic which will order by named scopes. So, I'd like a named scope that orders by each post's most recent comment.
named_scope :ascend_by_comment, :order => ...comments.created_at??...
I'm not sure how to do a :joins and get only the most recent comment and sort by its created_at field, all in a named_scope.
I'm using mysql, fyi.
EDIT:
This is the SQL query I'd be trying to emulate:
SELECT tickets.*, comments.created_at AS comment_created_at FROM tickets
INNER JOIN
(SELECT comments.ticket_id, MAX(comments.created_at) AS created_at
FROM comments group by ticket_id) comments
ON tickets.id = comments.ticket_id ORDER BY comment_created_at DESC;
named_scope :ascend_by_comment,
:joins => "LEFT JOIN comments ON comments.post_id = posts.id",
:group => "id",
:select => "posts.*, max(comments.created_at) AS comment_created_max",
:order => "comment_created_max ASC"
You can try to optimize it, but it should work and give you some hints how to do it.
Edit:
After you edited question and shown that you want inner join (no posts without comments?), you can of course change :joins => "..." with :joins => :comments.
You can do that by joining or including the associated model through the scope, something like this will do the trick:
named_scope :ascend_by_comment, :joins => :comments, :order => "comments.created_at DESC"