SQL problems when migrating from MySQL to PostgreSQL - sql

I have a Ruby on Rails 2.3.x application that I'm trying to migrate from my own VPS to Heroku, including porting from SQLite (development) and MySQL (production) to Postgres.
This is a typical Rails call I'm using:
spots = Spot.paginate(:all, :include => [:thing, :user, :store, {:thing => :tags}, {:thing => :brand}], :group => :thing_id, :order => order, :conditions => conditions, :page => page, :per_page => per_page)
Question 1: I get a lot of errors like PG::Error: ERROR: column "spots.id" must appear in the GROUP BY clause or be used in an aggregate function. SQLite/MySQL was evidently more forgiving here. Of course I can easily fix these by adding the specified fields to my :group parameter, but I feel I'm messing up my code. Is there a better way?
Question 2: If I throw in all the GROUP BY columns that Postgres is missing I end up with the following statement (only :group has changed):
spots = Spot.paginate(:all, :include => [:thing, :user, :store, {:thing => :tags}, {:thing => :brand}], :group => 'thing_id,things.id,users.id,spots.id', :order => order, :conditions => conditions, :page => page, :per_page => per_page)
This in turn produces the following SQL code:
SELECT * FROM (SELECT DISTINCT ON ("spots".id) "spots".id, spots.created_at AS alias_0 FROM "spots"
LEFT OUTER JOIN "things" ON "things".id = "spots".thing_id
WHERE (spots.recommended_to_user_id = 1 OR spots.user_id IN (1) OR things.is_featured = 't')
GROUP BY thing_id,things.id,users.id,spots.id) AS id_list
ORDER BY id_list.alias_0 DESC LIMIT 16 OFFSET 0;
...which produces the error PG::Error: ERROR: missing FROM-clause entry for table "users". How can I solve this?

Question 1:
...Is there a better way?
Yes. Since PostgreSQL 9.1 the primary key of a table logically covers all columns of a table in the GROUP BY clause. I quote the release notes for version 9.1:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
Question 2:
The following statement ... produces the error
PG::Error: ERROR: missing FROM-clause entry for table "users"
How can I solve this?
First (as always!), I formatted your query to make it easier to understand. The culprit has bold emphasis:
SELECT *
FROM (
SELECT DISTINCT ON (spots.id)
spots.id, spots.created_at AS alias_0
FROM spots
LEFT JOIN things ON things.id = spots.thing_id
WHERE (spots.recommended_to_user_id = 1 OR
spots.user_id IN (1) OR
things.is_featured = 't')
GROUP BY thing_id, things.id, users.id, spots.id
) id_list
ORDER BY id_list.alias_0 DESC
LIMIT 16
OFFSET 0;
It's all obvious now, right?
Well, not all of it. There is a lot more. DISTINCT ON and GROUP BY in the same query for one, which has its uses, but not here. Radically simplify to:
SELECT s.id, s.created_at AS alias_0
FROM spots s
WHERE s.recommended_to_user_id = 1 OR
s.user_id = 1 OR
EXISTS (
SELECT 1 FROM things t
WHERE t.id = s.thing_id
AND t.is_featured = 't')
ORDER BY s.created_at DESC
LIMIT 16;
The EXISTS semi-join avoids the later need to GROUP BY a priori. This should be much faster (besides being correct) - if my assumptions about the missing table definitions hold.

Going the "pure SQL" route opened up a can of worms for me, so I tried keeping the will_paginate gem and tweak the Spot.paginate parameters instead. The :joins parameter turned out to be very helpful.
This is currently working for me:
spots = Spot.paginate(:all, :include => [:thing, {:thing => :tags}, {:thing => :brand}], :joins => [:user, :store, :thing], :group => 'thing_id,things.id,users.id,spots.id', :order => order, :conditions => conditions, :page => page, :per_page => per_page)

Related

Which of these two AR queries is more efficient? (ie. is it better to outsource some query work to Ruby )

#users = Hash.new
#users[:count] = User.count(:all, :joins => my_join, :conditions => my_conditions)
#users[:data] = User.find(:all, :joins => my_join, :conditions => my_conditions)
or
#users = Hash.new
#users[:data] = User.find(:all, :joins => my_join, :conditions => my_conditions)
#users[:count] = #users[:data].count
It seems like the first option consists of two database queries (which from what I read is expensive) while in the second one, we only make one database query and do the counting work at the Ruby level.
Which one is more efficient?
The second one is better, since, just like you said, it saves a database query.
p.s.
Please be careful if you use some new finder methods introduced in Rails 3, then calling count after would fire a COUNT(*) query:
users = User.where(...) # SELECT "users".* FROM "users" WHERE ...
users_count = users.count # SELECT COUNT(*) FROM "users" WHERE ...
To prevent that, you might want to call size:
users = User.where(...) # SELECT "users".* FROM "users" WHERE ...
users_count = users.size # No database query

Sequence of finder methods in Active Record Query in Rails

I am writing a complex Active record query to fetch data from multiple tables, the query have joins , select , order , group ,select where.
#posts = Post.published.paginate(:order => 'popularity desc, id',
:joins => [:comments, :images, :updates, :user],
:conditions => conditions,
:group => "posts.id",
:select => "posts.id*,
:per_page => 10,
:page => params[:page])
I wanted to know what should be the sequence of where , joins etc as per the standard and to maximize the performance of the query. If someone could write a query to explain the sequence that would be really great like
#posts = Post.published.joins(:comments, :images, :updates, :user).where(....
As far as I know the sequence does not matter.
An example would be:
#posts = Post.published.select('posts.id').
joins(:comments, :images, :updates, :user).
where('users.email = ?', 'john#doe.com').
group('posts.id')

Advanced count and join in Rails

I am try to find the top n number of categories as they relate to articles, there is a habtm relationship set up between the two. This is the SQL I want to execute, but am unsure of how to do this with ActiveRecord, aside from using the find_by_sql method. is there any way of doing this with ActiveRecord methods:
SELECT
"categories".id,
"categories".name,
count("articles".id) as counter
FROM "categories"
JOIN "articles_categories"
ON "articles_categories".category_id = "categories".id
JOIN "articles"
ON "articles".id = "articles_categories".article_id
GROUP BY "categories".id
ORDER BY counter DESC
LIMIT 5;
You can use find with options to achieve the same query:
Category.find(:all,
:select => '"categories".id, "categories".name, count("articles".id) as counter',
:joins => :articles,
:group => '"categories".id',
:order => 'counter DESC',
:limit => 5
)

Can I :select multiple fields (*, foo) without the extra ones being added to my instances (Instance.foo=>bar)

I'm trying to write a named scope that will order my 'Products' class based on the average 'Review' value. The basic model looks like this
Product < ActiveRecord::Base
has_many :reviews
Review < ActiveRecord::Base
belongs_to :product
# integer value
I've defined the following named scope on Product:
named_scope :best_reviews,
:select => "*, AVG(reviews.value) score",
:joins => "INNER JOIN (SELECT * FROM reviews GROUP BY reviews.product_id) reviews ON reviews.product_id = products.id",
:group => "reviews.product_id",
:order => "score desc"
This seems to be working properly, except that it's adding the 'score' value in the select to my Product instances, which causes problems if I try to save them, and makes comparisons return false (#BestProduct != Product.best_reviews.first, becuase Product.best_reviews.first has score=whatever).
Is there a better way to structure the named_scope? Or a way to make Rails ignore the extra field in the select?
I'm not a Rails developer, but I know SQL allows you to sort by a field that is not in the select-list.
Can you do this:
:select => "*",
:joins => "INNER JOIN (SELECT * FROM reviews GROUP BY reviews.product_id) reviews ON reviews.product_id = products.id",
:group => "reviews.product_id",
:order => "AVG(reviews.value) desc"
Wow, so I should really wait before asking questions. Here's one solution (I'd love to hear if there are better approaches):
I moved the score field into the inner join. That makes it available for ordering but doesn't seem to add it to the instance:
named_scope :best_reviews,
:joins => "INNER JOIN (
SELECT *, AVG(value) score FROM reviews GROUP BY reviews.product_id
) reviews ON reviews.product_id = products.id",
:group => "reviews.product_id",
:order => "reviews.score desc"

named_scope to order posts by last comment date

Posts has_many Comments
I'm using searchlogic which will order by named scopes. So, I'd like a named scope that orders by each post's most recent comment.
named_scope :ascend_by_comment, :order => ...comments.created_at??...
I'm not sure how to do a :joins and get only the most recent comment and sort by its created_at field, all in a named_scope.
I'm using mysql, fyi.
EDIT:
This is the SQL query I'd be trying to emulate:
SELECT tickets.*, comments.created_at AS comment_created_at FROM tickets
INNER JOIN
(SELECT comments.ticket_id, MAX(comments.created_at) AS created_at
FROM comments group by ticket_id) comments
ON tickets.id = comments.ticket_id ORDER BY comment_created_at DESC;
named_scope :ascend_by_comment,
:joins => "LEFT JOIN comments ON comments.post_id = posts.id",
:group => "id",
:select => "posts.*, max(comments.created_at) AS comment_created_max",
:order => "comment_created_max ASC"
You can try to optimize it, but it should work and give you some hints how to do it.
Edit:
After you edited question and shown that you want inner join (no posts without comments?), you can of course change :joins => "..." with :joins => :comments.
You can do that by joining or including the associated model through the scope, something like this will do the trick:
named_scope :ascend_by_comment, :joins => :comments, :order => "comments.created_at DESC"