How do I create this active_records query - sql

I have a user model and a story model. Users have many stories.
I want to create a scope that returns the twenty-five users records for users who have created the most stories today, along with the amount of stories that they have created.
I know that there are people on SO that are great with active_records queries. I also know that I am not one of those guys:-(. Help would be greatly appreciated and readily accepted!
#UPDATE with the solution
I've been working with #MrYoshiji's suggestion, and here is what i came up with (note, I'm using this query in my active_admin dashboard):
panel "Today's Top Posters" do
time_range = Date.today.beginning_of_day..Date.today.end_of_day
table_for User.joins(:stories)
.select('users.username, count(stories.*) as story_count')
.group('users.id')
.where(:stories => {:created_at => time_range})
.order('story_count DESC')
.limit(25) do
column :username
column "story_count"
end
end
And low and behold, it works!!!!
Note, when I tried a simplified version of MrYoshiji's suggestion:
User.includes(:stories)
.select('users.username, count(stories.*) as story_count')
.order('story_count DESC') #with or without the group statement
.limit(25)
I got the following error:
> User.includes(:stories).select('users.username, count(stories.*) as story_count').order('story_count DESC').limit(25)
User Load (1.6ms) SELECT DISTINCT "users".id, story_count AS alias_0 FROM "users" LEFT OUTER JOIN "stories" ON "stories"."user_id" = "users"."id" ORDER BY story_count DESC LIMIT 25
ActiveRecord::StatementInvalid: PG::Error: ERROR: column "story_count" does not exist
LINE 1: SELECT DISTINCT "users".id, story_count AS alias_0 FROM "us...
It seems like includes don't like any aliasing.

I can't test this right now, running under Windows... Can you try it?
User.includes(:stories)
.select('users.*, count(stories.*) as story_count')
.group('users.id')
.order('story_count DESC')
.where('stories.created_at BETWEEN ? AND ?', Date.today.beginning_of_day, Day.today.end_of_day)
.limit(25)

Using .includes, as suggested by MrYshiji, did not work due to aliasing problems (see the original question for more info on this). Instead, I used .joins as follows to get the query results that I wanted (note, this query is inside my active_admin dashboard):
panel "Today's Top Posters" do
time_range = Date.today.beginning_of_day..Date.today.end_of_day
table_for User.joins(:stories)
.select('users.username, count(stories.*) as story_count')
.group('users.id')
.where(:stories => {:created_at => time_range})
.order('story_count DESC')
.limit(25) do
column :username
column "story_count"
end
end

Related

How to connect ransacker query to ransack sort search parameter

Problem:
I am using the ransack gem to sort columns in a table. I have 2 models: Campaign and Course. A campaign has many courses, and a course belongs to one campaign. Each course has a number of total_attendees. My Campaigns table has a column for Total Attendees, and I want it to be sortable. So it would sum up the total_attendees field for each course that belongs to a single campaign, and sort based on that sum.
Ex. A campaign has 3 courses, each with 10 attendees. The Total Attendees column on the campaign table would show 30 and it would be sortable against total attendees for all the other campaigns.
I found ransackers:
https://github.com/activerecord-hackery/ransack/wiki/Using-Ransackers
and this SO question: Ransack sort by sum of relation
and from that put together a lot of what is below.
From Model - campaign.rb:
class Campaign < ApplicationRecord
has_many :courses
ransacker :sum_of_total_attendees do
query = "SELECT SUM(r.total_attendees)
FROM campaigns c
LEFT OUTER JOIN courses r
ON r.campaign_id = c.id
GROUP BY c.id"
Arel.sql(query)
end
end
From Model - course.rb:
class Course < ApplicationRecord
belongs_to :campaign, optional: true
end
View:
<th scope="col"><%= sort_link(#q, :sum_of_total_attendees, 'Total Attendees') %></th>
Controller - campaigns_controller.rb:
all_campaigns = Campaign.all
#q = all_campaigns.ransack(params[:q])
#campaigns = #q.result
Errors:
The ransacker query gives me the data I want, but I don't know what to do to get the right information .
Originally, when I clicked on the th link to sort the data, I got this error:
PG::CardinalityViolation: ERROR: more than one row returned by a
subquery used as an expression
I don't know what changed, but now I'm getting this error:
PG::SyntaxError: ERROR: syntax error at or near "SELECT"
LINE 1: SELECT "campaigns".* FROM "campaigns" ORDER BY SELECT SUM(r....
^
: SELECT "campaigns".* FROM "campaigns" ORDER BY SELECT
SUM(r.total_attendees)
FROM campaigns c
LEFT OUTER JOIN courses r
ON r.campaign_id = c.id
GROUP BY c.id ASC
This error seems to say that the ransack search parameter, #q and the ransacker query don't work together. There are two selects in this request, when there should definitely be only one, but the first one is coming from ransack, so I'm not sure how to address it.
How do I get my query to sort correctly with ransack?
Articles I've looked at but did not seem to apply to what I was looking to accomplish with this story:
Ransack Sort By Sum of Relation: This is the one I worked from a lot, but I'm not sure why it works for this user and not for me. They don't show what is changed, if anything, in the controller
Ransack Github Issue For Multiple Params: This doesn't cover the issue of summing table columns.
Rails Ransack Sorting Searching Based On A Definition In The Model: This didn't apply to my need to sort based on summed data.
Three Ways to Bend The Ransack Gem: This looks like what I was doing, but I'm not sure why theirs is working but mine isn't.

Query: getting the last record for each member

Given a table ("Table") as follows (sorry about the CSV style since I don't know how to make it look like a table with the Stack Overflow editor):
id,member,data,start,end
1,001,abc,12/1/2012,12/31/2999
2,001,def,1/1/2009,11/30/2012
3,002,ghi,1/1/2009,12/31/2999
4,003,jkl,1/1/2012,10/31/2012
5,003,mno,8/1/2011,12/31/2011
If using Ruby Sequel, how should I write my query so I will get the following dataset in return.
id,member,data,start,end
1,001,abc,12/1/2012,12/31/2999
3,002,ghi,1/1/2009,12/31/2999
4,003,jkl,1/1/2012,10/31/2012
I get the most current (largest end date value) record for EACH (distinct) member from the original table.
I can get the answer if I convert the table to an Array, but I am looking for a solution in SQL or Ruby Sequel query, if possible. Thank you.
Extra credit: The title of this post is lame...but I can't come up with a good one. Please offer a better title if you have one. Thank you.
The Sequel version of this is a bit scary. The best I can figure out is to use a subselect and, because you need to join the table and the subselect on two columns, a "join block" as described in Querying in Sequel. Here's a modified version of Knut's program above:
require 'csv'
require 'sequel'
# Create Test data
DB = Sequel.sqlite()
DB.create_table(:mytable){
field :id
String :member
String :data
String :start # Treat as string to keep it simple
String :end # Ditto
}
CSV.parse(<<xx
1,"001","abc","2012-12-01","2999-12-31"
2,"001","def","2009-01-01","2012-11-30"
3,"002","ghi","2009-01-01","2999-12-31"
4,"003","jkl","2012-01-01","2012-10-31"
5,"003","mno","2011-08-01","2011-12-31"
xx
).each{|x|
DB[:mytable].insert(*x)
}
# That was all setup, here's the query
ds = DB[:mytable]
result = ds.join(ds.select_group(:member).select_append{max(:end).as(:end)}, :member=>:member) do |j, lj, js|
Sequel.expr(Sequel.qualify(j, :end) => Sequel.qualify(lj, :end))
end
puts result.all
This gives you:
{:id=>1, :member=>"001", :data=>"abc", :start=>"2012-12-01", :end=>"2999-12-31"}
{:id=>3, :member=>"002", :data=>"ghi", :start=>"2009-01-01", :end=>"2999-12-31"}
{:id=>4, :member=>"003", :data=>"jkl", :start=>"2012-01-01", :end=>"2012-10-31"}
In this case it's probably easier to replace the last four lines with straight SQL. Something like:
puts DB[
"SELECT a.* from mytable as a
join (SELECT member, max(end) AS end FROM mytable GROUP BY member) as b
on a.member = b.member and a.end=b.end"].all
Which gives you the same result.
What's the criteria for your result?
If it is the keys 1,3 and 4 you may use DB[:mytable].filter( :id => [1,3,4]) (complete example below)
For more information about filtering with sequel, please refer the sequel documentation, especially Dataset Filtering.
require 'csv'
require 'sequel'
#Create Test data
DB = Sequel.sqlite()
DB.create_table(:mytable){
field :id
field :member
field :data
field :start #should be date, not implemented in example
field :end #should be date, not implemented in example
}
CSV.parse(<<xx
id,member,data,start,end
1,001,abc,12/1/2012,12/31/2999
2,001,def,1/1/2009,11/30/2012
3,002,ghi,1/1/2009,12/31/2999
4,003,jkl,1/1/2012,10/31/2012
5,003,mno,8/1/2011,12/31/2011
xx
).each{|x|
DB[:mytable].insert(*x)
}
#Create Test data - end -
puts DB[:mytable].filter( :id => [1,3,4]).all
In my opinion, you're approaching the problem from the wrong side. ORMs (and Sequel as well) represent a nice, DSL-ish layer above the database, but, underneath, it's all SQL down there. So, I would try to formulate the question and the answer in a way to get SQL query which would return what you need, and then see how it would translate to Sequel's language.
You need to group by member and get the latest record for each member, right?
I'd go with the following idea (roughly):
SELECT t1.*
FROM table t1
LEFT JOIN table t2 ON t1.member = t2.member AND t2.end > t1.end
WHERE t2.id IS NULL
Now you should see how to perform left joins in Sequel, and you'll need to alias tables as well. Shouldn't be that hard.

Chaining ActiveRecord::QueryMethods#from with joins gives bad SQL, missing joins

I am using postgres tsearch on a project with Rails 3.0.9. To make a tsearch query, I need to include extra SQL in my "from" clause. For example, say I have this model:
class User < ActiveRecord::Base
has_one :profile
has_many :memberships
has_many :groups, :through => :memberships
end
I want to do a fulltext search on users and their profiles. I can do this:
User.joins(:profile).where(
"(profiles.vectors ## tsearch_query) or (users.vectors ## tsearch_query)"
).from(
"to_tsquery('MYQUERY') as tsearch_query, users")
This produces the following SQL and it works fine:
"SELECT \"users\".* FROM to_tsquery('MYQUERY') as tsearch_query, users INNER JOIN \"profiles\" ON \"profiles\".\"user_id\" = \"users\".\"id\" WHERE ((profiles.vectors ## tsearch_query) or (users.vectors ## tsearch_query))"
But if I tack on another join I get some bad SQL:
User.joins(:profile).where(
"(profiles.vectors ## tsearch_query) or (users.vectors ## tsearch_query)"
).from(
"to_tsquery('MYQUERY') as tsearch_query, users").joins(:groups)
Here's the error:
ActiveRecord::StatementInvalid: PGError: ERROR: missing FROM-clause entry for table "memberships" at character 108
: SELECT "users".* FROM to_tsquery('MYQUERY') as tsearch_query, users INNER JOIN "groups" ON "groups"."id" = "memberships"."group_id" WHERE AND ((profiles.vectors ## tsearch_query) or (users.vectors ## tsearch_query))
There should be three join statements in this query. users-to-profiles, users-to-memberships and memberships-to-groups. Only the last join is included, so we get an error for referencing the memberships table without joining it earlier.
But AR::Relation does know about both joins:
irb(main)> _.send(:joins_values)
=> [:profile, :groups]
I think the problem is from adding that "from" scope call. If I cut it out, I get both my joins. For example, I can even provide a dummy "from" call and get the same error:
User.joins(:profile).from( "users" ).joins(:groups)
ActiveRecord::StatementInvalid: PGError: ERROR: missing FROM-clause entry for table "memberships" at character 68
: SELECT "users".* FROM users INNER JOIN "groups" ON "groups"."id" = "memberships"."group_id"
irb(main)> _.send(:joins_values)
=> [:profile, :groups]
Removing the "from" call, this works fine:
User.joins(:profile).joins(:groups)
irb(main)> _.to_sql
=> "SELECT \"users\".* FROM \"users\" INNER JOIN \"profiles\" ON \"profiles\".\"user_id\" = \"users\".\"id\" INNER JOIN \"memberships\" ON \"users\".\"id\" = \"memberships\".\"user_id\" INNER JOIN \"groups\" ON \"groups\".\"id\" = \"memberships\".\"group_id\"
So I'm not sure how to work around this.
My ultimate goal is to be able to do a tsearch search on User and their profile, while also limiting the results by the groups the user is in.
FWIW, this isn't a great answer, but a decent work around: I can get this to work by passing the actual SQL for the joins to User.joins, rather than using the relation names. Might have to do for now.
In the meantime I suppose I'll put together a bug report.
Another not so great solution is to wait for rails 3.1.
Added this test to activerecord's inner_join_association_test.rb:
def test_from_clause_clobbers_multiple_joins
result = Author.joins(:posts).from('authors').joins(:categorizations).where(:categorizations => {:id => 1}, :posts => {:id => 1}).to_a
assert_equal authors(:david), result.first
end
fails on 3.0.9 but passes on 3.1-rc1

Help optimizing ActiveRecord query (voting system)

I have a voting system with two models: Item(id, name) and Vote(id, item_id, user_id).
Here's the code I have so far:
class Item < ActiveRecord::Base
has_many :votes
def self.most_popular
items = Item.all #where can I optimize here?
items.sort {|x,y| x.votes.length <=> y.votes.length}.first #so I don't need to do anything here?
end
end
There's a few things wrong with this, mainly that I retrieve all the Item records, THEN use Ruby to compute popularity. I am almost certain there is a simple solution to this, but I can't quite put my finger on it.
I'd much rather gather records and run the calculations in the initial query. This way, I can add a simple :limit => 1 (or LIMIT 1) to the query.
Any help would be great--either rewrite in all ActiveRecord or even in raw SQl. The latter would actually give me a much clearer picture of the nature of the query I want to execute.
Group the votes by item id, order them by count and then take the item of the first one. In rails 3 the code for this is:
Vote.group(:item_id).order("count(*) DESC").first.item
In rails 2, this should work:
Vote.all(:order => "count(*) DESC", :group => :item_id).first.item
sepp2k has the right idea. In case you're not using Rails 3, the equivalent is:
Vote.first(:group => :item_id, :order => "count(*) DESC", :include => :item).item
Probably there's a better way to do this in ruby, but in SQL (mysql at least) you could try something like this to get a top 10 ranking:
SELECT i.id, i.name, COUNT( v.id ) AS total_votes
FROM Item i
LEFT JOIN Vote v ON ( i.id = v.item_id )
GROUP BY i.id
ORDER BY total_votes DESC
LIMIT 10
One easy way of handling this is to add a vote count field to the Item, and update that each time there is a vote. Rails used to do that automatically for you, but not sure if it's still the case in 2.x and 3.0. It's easy enough for you to do it in any case using an Observer pattern or else just by putting in a "after_save" in the Vote model.
Then your query is very easy, by simply adding a "VOTE_COUNT DESC" order to your query.

SQL: how to find a complement to a set with a derived function/value

This one has me stumped, so I'm hoping someone who's smarter than me can help me out.
I'm working on a rails project in which I've got a User model which has an association of clock_periods joined to it, having the following partial definition:
User
has_many :clock_periods
#clock_periods has the following properties:
#clock_in_time:datetime
#clock_out_time:datetime
named_scope :clocked_in, :select => "users.*",
:joins => :clock_periods, :conditions => 'clock_periods.clock_out_time IS NULL'
def clocked_in?
#default scope on clock periods sorts by date
clock_periods.last.clock_out_time.nil?
end
The SQL query to retrieve all clocked in users is trivial:
SELECT users.* FROM users INNER JOIN clock_periods ON clock_periods.user_id = users.id
WHERE clock_periods.clock_out_time IS NULL
The converse however--finding all users who are currently clocked out--is deceptively difficult. I ended up using the following named scope definition, though its hackish:
named_scope :clocked_out, lambda{{
:conditions => ["users.id not in (?)", clocked_in.map(&:id)+ [-1]]
}}
What bothers me about it is that it seems like there ought to be a way to do this in SQL without resorting to generating statements like
SELECT users.* FROM users WHERE users.id NOT IN (1,3,5)
Anybody got a better way, or is this really the only way to handle it?
Besides #Eric's suggestion there's the issue (unless you've guaranteed against it in some other way you're not showing us) that a user might not have any clock period -- then the inner join would fail to include that user and he wouldn't show either as clocked in or as clocked out. Assuming you also want to show those users as clocked out, the SQL should be something like:
SELECT users.*
FROM users
LEFT JOIN clock_periods ON clock_periods.user_id = users.id
WHERE (clock_periods.clock_user_id IS NULL) OR
(getdate() BETWEEN clock_periods.clock_out_time AND
clock_periods.clock_in_time)
(this kind of thing is the main use of outer joins such as LEFT JOIN).
assuming getdate() = the function in your SQL implementation that returns a datetime representing right now.
SELECT users.* FROM users INNER JOIN clock_periods ON clock_periods.user_id = users.id
WHERE getdate() > clock_periods.clock_out_time and getdate() < clock_periods.clock_in_time
In rails, Eric H's answer should look something like:
users = ClockPeriod.find(:all, :select => 'users.*', :include => :user,
:conditions => ['? > clock_periods.clock_out_time AND ? < clock_periods.clock_in_time',
Time.now, Time.now])
At least, I think that would work...