How best to work around PostgreSQL's stricter grouping - sql

I am trying to convert from using MySQL to using PostgreSQL. I have this type of structure:
User(entity) -> Follow -> Business(entity) -> Story
The user needs to see all the news updates put out by the businesses they follow. The following query works great with MySQL because it simply shows all associated stories and groups by the story.id. Unfortunately, being that PostgreSQL is much more literal in the interpretation of the SQL standard, if I want to do the GROUP BY clause I need to ask for each field individually using the DISTINCT clause which.
Story.find(:all, :joins => { :entity => { :followers => :follower } }, :conditions => ['followers_follows.id = ?', 4],
:group => 'stories.id')
PostgreSQL spits out: "ERROR: column "stories.entity_id" must appear in the GROUP BY clause or be used in an aggregate function"
Having to specify each field individually seems inelegant. If anybody can give me a clean way to get the same result as MySQL without having to resort to getting duplicate fields (removing the group by) or having to specify each individual field with along with the DISTINCT clause, I'd appreciate it!
Thanks!

Well, I certainly wouldn't say that PostgreSQL's interpretation of the SQL standard is too strict. In fact, it's the other way around.
Here's a possible solution:
Story.all( :joins => { :entity => { :followers=> :follower } },
:conditions => ['followers_follows.id = ?', 4],
:group => Story.column_names.map { |c| "stories.#{c}" }.join(', ') )
But there are many alternative queries. I blogged about this a few weeks ago. Here's the post: http://awesomeful.net/posts/72-postgresql-s-group-by

Related

Ruby Rails Query In conditional

I have this query that works but I would like to expand it so that I can check for multiple ids such that I pass in a vector of ids. [1,2,3,5] etc... I have tried using SQL IN with no luck.
EventType.find(3).events.all(:include => {:sheet => :rink}, :conditions => ["rinks.id = ?", 2])
You were on the right track with IN. Here's syntax that will work in Rails 3+:
EventType.find(3).events.where("id IN (?)", [1,2,3]).include(:sheet => :rink)
Improvement from a comment removes SQL entirely:
EventType.find(3).events.where(:id => [1,2,3]).include(:sheet => :rink)

Rails ActiveRecord Find with Date

I'm having a hard time figuring this out but how do I tell my finder statement to ignore the time of the Datetime field in the db?
def trips_leaving_in_two_weeks
Trip.find(:all, :conditions => ["depart_date = ?", 2.weeks.from_now.to_date])
end
I want depart_date to come back as just a date but it keeps returning the time as well and causing this equality not to work. Is there someway to just compare against the dates? Thanks
Edit
Here's the code I'm using now that works:
Trip.find(:all, :conditions => ["DATE(depart_date) = ?", 2.weeks.from_now.to_date])
Not sure which DB you're using but does this work?
"depart_date = DATE(?)"
I would use this approach:
Rails 3.x
Trip.where(
:depart_date => 2.weeks.from_now.beginning_of_day..2.weeks.from_now.end_of_day
)
Rails 2.x
Trip.all(
:conditions => {
:depart_date => 2.weeks.from_now.beginning_of_day..2.weeks.from_now.end_of_day
})
If you index the depart_date column this solution will be efficient as the query uses the index. This solution is DB neutral.
When calculated fields are used in a where clause, the performance degrades(unless there is a special index).

How can I find records whose :name do NOT equal an array provided by another query?

Currently my users can add locations to their profiles via a form which includes this statement: (I'm using RoR3, HAML, sqlite3 for dev, and mysql for prod)
= select_tag "id", options_from_collection_for_select(Location.all, 'id', 'name')
However, this allows the user to add the same location multiple times. I would like to list only the locations which the user has NOT already posted. So I would like to do something like:
Location.find(:all, :conditions => ["name != ?", user.locations])
This of course does not work whereas this does.
Location.find(:all, :conditions => ["name != ?", "New York"])
That's because user.locations returns an array. I haven't the slightest idea how to proceed at this point. Other than learning SQL I suppose. Is there a method for this that I'm not finding?
Something like:
Location.find(:all, :conditions => ["name not in (?)", user.locations])
should do it (although admittedly less efficient than doing an outer join and filtering null user_ids ) depending on what your array of "user.locations" actually are.
As a side note, learning SQL will make you a much more capable (and marketable) web-developer ... food for thought.

In Rails, how can I return a set of records based on a count of items in a relation OR criteria about the relation?

I'm writing a Rails app in which I have two models: a Machine model and a MachineUpdate model. The Machine model has many MachineUpdates. The MachineUpdate has a date/time field. I'm trying to retrieve all Machine records that have the following criteria:
The Machine model has not had a MachineUpdate within the last 2 weeks, OR
The Machine model has never had any updates.
Currently, I'm accomplishing #1 with a named scope:
named_scope :needs_updates,
:include => :machine_updates,
:conditions => ['machine_updates.date < ?', UPDATE_THRESHOLD.days.ago]
However, this only gets Machine models that have had at least one update. I also wanted to retrieve Machine models that have had no updates. How can I modify needs_updates so the items it returns fulfills that criteria as well?
One solution is to introduce a counter_cache:
# add a machine_updates_count integer database column (with default 0)
# and add this to your Machine model:
counter_cache :machine_updates_count
and then add OR machine_updates_count = 0 to your SQL conditions.
However, you can also solve the problem without a counter cache by using a LEFT JOIN:
named_scope :needs_updates,
:select => "machines.*, MAX(machine_updates.date) as last_update",
:joins => "LEFT JOIN machine_updates ON machine_updates.machine_id = machines.id",
:group => "machines.id",
:having => ["last_update IS NULL OR last_update < ?", lambda{ UPDATE_THRESHOLD.seconds.ago }]
The left join is necessary so that you are sure you are looking at the most recent MachineUpdate (the one with MAX date).
Note also that you have to put your condition in a lambda so it is evaluated every time the query is run. Otherwise it will be evaluated only once (when your model is loaded on application boot-up), and you will not be able to find Machines that have come to need updates since your app started.
UPDATE:
This solution works in MySQL and SQLite, but not PostgreSQL. Postgres does not allow naming of columns in the SELECT clause that are not used in the GROUP BY clause (see discussion). I'm very unfamiliar with PostgreSQL, but I did get this to work as expected:
named_scope :needs_updates, lambda{
cols = Machine.column_names.collect{ |c| "\"machines\".\"#{c}\"" }.join(",")
{
:select => cols,
:group => cols,
:joins => 'LEFT JOIN "machine_updates" ON "machine_updates"."machine_id" = "machines"."id"',
:having => ['MAX("machine_updates"."date") IS NULL OR MAX("machine_updates"."date") < ?', UPDATE_THRESHOLD.days.ago]
}
}
If you can make changes in the table, then you can use the :touch method of the belongs_to association.
For instance, add a datetime column to Machine named last_machine_update. Thereafter in the belongs_to of MachineUpdate, add :touch => :last_machine_update. This will cause that field to become updated with the last time you either added or modified a MachineUpdate connected to that Machine, thus removing the need for the named scope.
Otherwise I would probably do it like Alex proposes.
I just ran into a similar problem. It's actually pretty simple:
Machine.all(
:include => :machine_updates,
:conditions => "machine_updates.machine_id IS NULL OR machine_update.date < ?", UPDATE_THRESHOLD.days.ago])
If you were doing a named scope, just use lambdas to ensure that the date is re-calculated every time the named scope is called
named_scope :needs_updates, lambda { {
:include => :machine_updates,
:conditions => "machine_updates.machine_id IS NULL OR machine_update.date < ?", UPDATE_THRESHOLD.days.ago]
} }
If you want to avoid returning all of the MachineUpdate records in your query, then you need to use the :select option to only return the columns you want
named_scope :needs_updates, lambda { {
:select => "machines.*",
:conditions => "machine_updates.machine_id IS NULL OR machine_update.date < ?", UPDATE_THRESHOLD.days.ago]
} }

My Rails queries are starting to get complicated, should I switch to raw SQL queries? What do you do?

My Rails app is starting to need complicated queries. Should I just start using raw SQL queries? What is the trend in the Rails community?
Update:
I do not have written queries right now, I wanted to ask this question before I start. But here is an example of what I want to do:
I have books which have categories. I want to say-
Give me all books that were:
-created_at (added to store) between date1 and date2
-updated_at before date3
-joined with books that exist in shopping carts right now
I haven't written the query yet but I think the rails version will be something like this:
books_to_consider = Book.find(:all,
:conditions => "created_at <= '#{date2}' AND created_at >= '#{date1}' AND updated_at <= '#{date3}'",
:joins => "as b inner join carts as c on c.book_id = b.id")
I am not saying ActiveRecord can't handle this query, but is it more accepted to go with raw SQL for readability (or maybe there are other limitations I don't know of yet)?
The general idea is to stick to ActiveRecord-generated queries as much as possible, and use SQL fragments only where necessary. SQL fragments are explicitly supported because the creators of ActiveRecord realised that SQL cannot be completely abstracted away.
Using the the find method without SQL fragments is generally rewarded with better maintainability. Given your example, try:
Book.find(:all,
:conditions => ["created_at >= ? AND created_at <= ? AND updated_at <= ?",
date1, date2, date3]
:include => :carts)
The :inlude => :carts will do the join if you added has_many :carts to your Book model. As you can see, there does not have to be much SQL involved. Even the quoting and escaping of input can be left to Rails, while still using SQL literals to handle the >= and <= operators.
Going a little bit further, you can make it even clearer:
class Book < AciveRecord::Base
# Somewhere in your Book model:
named_scope :created_between, lambda { |start_date, end_date|
{ :conditions => { :created_at => start_date..end_date } }
}
named_scope :updated_before, lambda { |date|
{ :conditions => ["updated_at <= ?", date] }
}
# ...
end
Book.created_between(date1, date2).updated_before(date3).find(:all,
:include => :carts)
Update: the point of the named_scopes is, of course, to reuse the conditions. It's up to you to decide whether or not it makes sense to put a set of conditions in a named scope or not.
Like molf is saying with :include, .find() has the advantage of eager loading of children.
Also, there are several plugins, like pagination, that will wrap the find function. You'll have to use .find() to use the plugins.
If you have a really complex sql query remember that .find() uses your exact parameter string. You can always inject your own sql code:
:conditions => ["id in union (select * from table...
And don't forget there are a lot of optional parameters for .find()
:conditions - An SQL fragment like "administrator = 1", [ "user_name = ?", username ], or ["user_name = :user_name", { :user_name => user_name }]. See conditions in the intro.
:order - An SQL fragment like "created_at DESC, name".
:group - An attribute name by which the result should be grouped. Uses the GROUP BY SQL-clause.
:having - Combined with +:group+ this can be used to filter the records that a GROUP BY returns. Uses the HAVING SQL-clause.
:limit - An integer determining the limit on the number of rows that should be returned.
:offset - An integer determining the offset from where the rows should be fetched. So at 5, it would skip rows 0 through 4.
:joins - Either an SQL fragment for additional joins like "LEFT JOIN comments ON comments.post_id = id" (rarely needed), named associations in the same form used for the :include option, which will perform an INNER JOIN on the associated table(s), or an array containing a mixture of both strings and named associations. If the value is a string, then the records will be returned read-only since they will have attributes that do not correspond to the table‘s columns. Pass :readonly => false to override.
:include - Names associations that should be loaded alongside. The symbols named refer to already defined associations. See eager loading under Associations.
:select - By default, this is "*" as in "SELECT * FROM", but can be changed if you, for example, want to do a join but not include the joined columns. Takes a string with the SELECT SQL fragment (e.g. "id, name").
:from - By default, this is the table name of the class, but can be changed to an alternate table name (or even the name of a database view).
:readonly - Mark the returned records read-only so they cannot be saved or updated.
:lock - An SQL fragment like "FOR UPDATE" or "LOCK IN SHARE MODE". :lock => true gives connection‘s default exclusive lock, usually "FOR UPDATE".
src: http://api.rubyonrails.org/classes/ActiveRecord/Base.html#M002553