Return only unique records in this ActiveRecord query - sql

I have a mildly-complex ActiveRecord query in Rails 3.2 / Postgres that returns documents that are related and most relevant to all documents a user has favorited in the past.
The problem is that despite specifying uniq my query does not return distinct document records:
Document.joins("INNER JOIN related_documents ON
documents.docid = related_documents.docid_id")
.select("documents.*, related_documents.relevance_score")
.where("related_documents.document_id IN (?)",
some_user.favorited_documents)
.order("related_documents.relevance_score DESC")
.uniq
.limit(10)
I use a RelatedDocument join table, ranking each relation by a related_document.relevance_score which I use to order the query result before sampling the top 10. (See this question for schema description.)
The problem is that because I select("documents.*, related_documents.relevance_score"), the same document record returned multiple times with different relevance_scores are considered unique results. (i.e. if the document is a related_document for multiple favorited-documents.)
How do I return unique Documents regardless of the related_document.relevance_score?
I have tried splitting the select into two seperate selects, and changing the position of uniq in the query with no success.
Unfortunately I must select("related_documents.relevance_score") so as to order the results by this field.
Thanks!
UPDATE - SOLUTION
Thanks to Jethroo below, GROUP BY is the needed addition, giving me the follow working query:
Document.joins("INNER JOIN related_documents ON
documents.docid = related_documents.docid_id")
.select("documents.*, max(related_documents.relevance_score)")
.where("related_documents.document_id IN (?)",
some_user.favorited_documents)
.order("related_documents.relevance_score DESC")
.group("documents.id")
.uniq
.limit(10)

Have you tried to group it by documents.docid see http://guides.rubyonrails.org/active_record_querying.html#group?

Related

How to connect ransacker query to ransack sort search parameter

Problem:
I am using the ransack gem to sort columns in a table. I have 2 models: Campaign and Course. A campaign has many courses, and a course belongs to one campaign. Each course has a number of total_attendees. My Campaigns table has a column for Total Attendees, and I want it to be sortable. So it would sum up the total_attendees field for each course that belongs to a single campaign, and sort based on that sum.
Ex. A campaign has 3 courses, each with 10 attendees. The Total Attendees column on the campaign table would show 30 and it would be sortable against total attendees for all the other campaigns.
I found ransackers:
https://github.com/activerecord-hackery/ransack/wiki/Using-Ransackers
and this SO question: Ransack sort by sum of relation
and from that put together a lot of what is below.
From Model - campaign.rb:
class Campaign < ApplicationRecord
has_many :courses
ransacker :sum_of_total_attendees do
query = "SELECT SUM(r.total_attendees)
FROM campaigns c
LEFT OUTER JOIN courses r
ON r.campaign_id = c.id
GROUP BY c.id"
Arel.sql(query)
end
end
From Model - course.rb:
class Course < ApplicationRecord
belongs_to :campaign, optional: true
end
View:
<th scope="col"><%= sort_link(#q, :sum_of_total_attendees, 'Total Attendees') %></th>
Controller - campaigns_controller.rb:
all_campaigns = Campaign.all
#q = all_campaigns.ransack(params[:q])
#campaigns = #q.result
Errors:
The ransacker query gives me the data I want, but I don't know what to do to get the right information .
Originally, when I clicked on the th link to sort the data, I got this error:
PG::CardinalityViolation: ERROR: more than one row returned by a
subquery used as an expression
I don't know what changed, but now I'm getting this error:
PG::SyntaxError: ERROR: syntax error at or near "SELECT"
LINE 1: SELECT "campaigns".* FROM "campaigns" ORDER BY SELECT SUM(r....
^
: SELECT "campaigns".* FROM "campaigns" ORDER BY SELECT
SUM(r.total_attendees)
FROM campaigns c
LEFT OUTER JOIN courses r
ON r.campaign_id = c.id
GROUP BY c.id ASC
This error seems to say that the ransack search parameter, #q and the ransacker query don't work together. There are two selects in this request, when there should definitely be only one, but the first one is coming from ransack, so I'm not sure how to address it.
How do I get my query to sort correctly with ransack?
Articles I've looked at but did not seem to apply to what I was looking to accomplish with this story:
Ransack Sort By Sum of Relation: This is the one I worked from a lot, but I'm not sure why it works for this user and not for me. They don't show what is changed, if anything, in the controller
Ransack Github Issue For Multiple Params: This doesn't cover the issue of summing table columns.
Rails Ransack Sorting Searching Based On A Definition In The Model: This didn't apply to my need to sort based on summed data.
Three Ways to Bend The Ransack Gem: This looks like what I was doing, but I'm not sure why theirs is working but mine isn't.

Get records with no related data using activerecord and RoR3?

I am making scopes for a model that looks something like this:
class PressRelease < ActiveRecord::Base
has_many :publications
end
What I want to get is all press_releases that does not have publications, but from a scope method, so it can be chained with other scopes. Any ideas?
Thanks!
NOTE: I know that there are methods like present? or any? and so on, but these methods does not return an ActiveRecord::Relation as scope does.
NOTE: I am using RoR 3
Avoid eager_loading if you do not need it (it adds overhead). Also, there is no need for subselect statements.
scope :without_publications, -> { joins("LEFT OUTER JOIN publications ON publications.press_release_id = press_releases.id").where(publications: { id: nil }) }
Explanation and response to comments
My initial thoughts about eager loading overhead is that ActiveRecord would instantiate all the child records (publications) for each press release. Then I realized that the query will never return press release records with publications. So that is a moot point.
There are some points and observations to be made about the way ActiveRecord works. Some things I had previously learned from experience, and some things I learned exploring your question.
The query from includes(:publications).where(publications: {id: nil}) is actually different from my example. It will return all columns from the publications table in addition to the columns from press_releases. The publication columns are completely unnecessary because they will always be null. However, both queries ultimately result in the same set of PressRelease objects.
With the includes method, if you add any sort of limit, for example chaining .first, .last or .limit(), then ActiveRecord (4.2.4) will resort to executing two queries. The first query returns IDs, and the second query uses those IDs to get results. Using the SQL snippet method, ActiveRecord is able to use just one query. Here is an example of this from one of my applications:
Profile.includes(:positions).where(positions: { id: nil }).limit(5)
# SQL (0.8ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" WHERE "positions"."id" IS NULL LIMIT 5
# SQL (0.8ms) SELECT "profiles"."id" AS t0_r0, ..., "positions"."end_year" AS t1_r11 FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" # WHERE "positions"."id" IS NULL AND "profiles"."id" IN (107, 24, 7, 78, 89)
Profile.joins("LEFT OUTER JOIN positions ON positions.profile_id = profiles.id").where(positions: { id: nil }).limit(5)
# Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" LEFT OUTER JOIN positions ON positions.profile_id = profiles.id WHERE "positions"."id" IS NULL LIMIT 5
Most importantly
eager_loading and includes were not intended to solve the problem at hand. And for this particular case I think you are much more aware of what is needed than ActiveRecord is. You can therefore make better decisions about how to structure the query.
you can de the following in your PressRelease:
scope :your_scope, -> { where('id NOT IN(select press_release_id from publications)') }
this will return all PressRelease record without publications.
Couple ways to do this, first one requires two db queries:
PressRelease.where.not(id: Publications.uniq.pluck(:press_release_id))
or if you don't want to hardcode association foreign key:
PressRelease.where.not(id: PressRelease.uniq.joins(:publications).pluck(:id))
Another one is to do a left join and pick those without associated elements - you get a relation object, but it will be tricky to work with it as it already has a join on it:
PressRelease.eager_load(:publications).where(publications: {id: nil})
Another one is to use counter_cache feature. You will need to add publication_count column to your press_releases table.
class Publications < ActiveRecord::Base
belongs_to :presss_release, counter_cache: true
end
Rails will keep this column in sync with a number of records associated to given mode, so then you can simply do:
PressRelease.where(publications_count: [nil, 0])

How to retrieve a list of records and the count of each one's children with condition in Active Record?

There are two models with our familiar one-to-many relationship:
class Custom
has_many :orders
end
class Order
belongs_to :custom
end
I want to do the following work:
get all the custom information whose age is over 18, and how many big orders(pay for 1,000 dollars) they have?
UPDATE:
for the models:
rails g model custom name:string age:integer
rails g model orders amount:decimal custom_id:integer
I hope one left join sql statement will do all my job, and don't construct unnecessary objects like this:
Custom.where('age > ?', '18').includes(:orders).where('orders.amount > ?', '1000')
It will construct a lot of order objects which I don't need, and it will calculate the count by Array#count function which will waste time.
UPDATE 2:
My own solution is wrong, it will remove customs who doesn't have big orders from the result.
Finding adult customers with big orders
This solution uses a single query, with the nested orders relation transformed into a sub-query.
big_customers = Custom.where("age > ?", "18").where(
id: Order.where("amount > ?", "1000").select(:custom_id)
)
Grab all adults and their # of big orders (MySQL)
This can still be done in a single query. The count is grabbed via a join on orders and sticking the count of orders into a column in the result called big_orders_count, which ActiveRecord turns into a method. It involves a lot more "raw" SQL. I don't know any way to avoid this with ActiveRecord except with the great squeel gem.
adults = Custom.where("age > ?", "18").select([
Custom.arel_table["*"],
"count(orders.id) as big_orders_count"
]).joins(%{LEFT JOIN orders
ON orders.custom_id = customs.id
AND orders.amount > 1000})
# see count:
adults.first.big_orders_count
You might want to consider caching counters like this. This join will be expensive on the database, so if you had a dedicated customs.big_order_count column that was either refreshed regularly or updated by an observer that watches for big Order records.
Grab all adults and their # of big orders (PostgreSQL)
Solution 2 is mysql only. To get this to work in postgresql I created a third solution that uses a sub-query. Still one call to the DB :-)
adults = Custom.where("age > ?", "18").select([
%{"customs".*},
%{(
SELECT count(*)
FROM orders
WHERE orders.custom_id = customs.id
AND orders.amount > 1000
) AS big_orders_count}
])
# see count:
adults.first.big_orders_count
I have tested this against postgresql with real data. There may be a way to use more ActiveRecord and less SQL, but this works.
Edited.
#custom_over_18 = Custom.where("age > ?", "18").orders.where("amount > ?", "1000").count

Ordering a found set by number of times a user has viewed the page

I'm trying to order a list of locations based on the number of times a user has viewed them. Am using the impressionist gem for the sake of it.
The problem I'm having is that my query completely excludes those locations the user's never viewed. I need to display these at the bottom of the results and order by the created_at timestamp.
I can do this to get a list of location_ids:
#location_ids = #user.impressions.
select('count(id) as counter, impressionable_id').
group(:impressionable_id).
order('counter DESC').
#location_ids.map(&:impressionable_id)
Which gives [3,5,8,44,99] and so on..
However, that doesn't get me far so I tried this:
#user.locations.
joins(:impressions).
select("count(impressions.id) as counter, impressionable_id, locations.location_name, locations.id").
group(:impressionable_id).
order("counter desc")
Which is better but it omits those locations with zero views.
How should I do this to get all the locations?
By default, Rails uses an inner join when you use .joins. That's why you don't see the locations with no associated impressions. You need to tell it to use a left join instead, probably like so:
#user.locations.
joins("left join impressions on impressions.impressionable_id = locations.id and impressions.impressionable_type = 'Location'").
select("count(impressions.id) as counter, impressionable_id, locations.location_name, locations.id").
group('locations.id').
order("counter desc")

Query: getting the last record for each member

Given a table ("Table") as follows (sorry about the CSV style since I don't know how to make it look like a table with the Stack Overflow editor):
id,member,data,start,end
1,001,abc,12/1/2012,12/31/2999
2,001,def,1/1/2009,11/30/2012
3,002,ghi,1/1/2009,12/31/2999
4,003,jkl,1/1/2012,10/31/2012
5,003,mno,8/1/2011,12/31/2011
If using Ruby Sequel, how should I write my query so I will get the following dataset in return.
id,member,data,start,end
1,001,abc,12/1/2012,12/31/2999
3,002,ghi,1/1/2009,12/31/2999
4,003,jkl,1/1/2012,10/31/2012
I get the most current (largest end date value) record for EACH (distinct) member from the original table.
I can get the answer if I convert the table to an Array, but I am looking for a solution in SQL or Ruby Sequel query, if possible. Thank you.
Extra credit: The title of this post is lame...but I can't come up with a good one. Please offer a better title if you have one. Thank you.
The Sequel version of this is a bit scary. The best I can figure out is to use a subselect and, because you need to join the table and the subselect on two columns, a "join block" as described in Querying in Sequel. Here's a modified version of Knut's program above:
require 'csv'
require 'sequel'
# Create Test data
DB = Sequel.sqlite()
DB.create_table(:mytable){
field :id
String :member
String :data
String :start # Treat as string to keep it simple
String :end # Ditto
}
CSV.parse(<<xx
1,"001","abc","2012-12-01","2999-12-31"
2,"001","def","2009-01-01","2012-11-30"
3,"002","ghi","2009-01-01","2999-12-31"
4,"003","jkl","2012-01-01","2012-10-31"
5,"003","mno","2011-08-01","2011-12-31"
xx
).each{|x|
DB[:mytable].insert(*x)
}
# That was all setup, here's the query
ds = DB[:mytable]
result = ds.join(ds.select_group(:member).select_append{max(:end).as(:end)}, :member=>:member) do |j, lj, js|
Sequel.expr(Sequel.qualify(j, :end) => Sequel.qualify(lj, :end))
end
puts result.all
This gives you:
{:id=>1, :member=>"001", :data=>"abc", :start=>"2012-12-01", :end=>"2999-12-31"}
{:id=>3, :member=>"002", :data=>"ghi", :start=>"2009-01-01", :end=>"2999-12-31"}
{:id=>4, :member=>"003", :data=>"jkl", :start=>"2012-01-01", :end=>"2012-10-31"}
In this case it's probably easier to replace the last four lines with straight SQL. Something like:
puts DB[
"SELECT a.* from mytable as a
join (SELECT member, max(end) AS end FROM mytable GROUP BY member) as b
on a.member = b.member and a.end=b.end"].all
Which gives you the same result.
What's the criteria for your result?
If it is the keys 1,3 and 4 you may use DB[:mytable].filter( :id => [1,3,4]) (complete example below)
For more information about filtering with sequel, please refer the sequel documentation, especially Dataset Filtering.
require 'csv'
require 'sequel'
#Create Test data
DB = Sequel.sqlite()
DB.create_table(:mytable){
field :id
field :member
field :data
field :start #should be date, not implemented in example
field :end #should be date, not implemented in example
}
CSV.parse(<<xx
id,member,data,start,end
1,001,abc,12/1/2012,12/31/2999
2,001,def,1/1/2009,11/30/2012
3,002,ghi,1/1/2009,12/31/2999
4,003,jkl,1/1/2012,10/31/2012
5,003,mno,8/1/2011,12/31/2011
xx
).each{|x|
DB[:mytable].insert(*x)
}
#Create Test data - end -
puts DB[:mytable].filter( :id => [1,3,4]).all
In my opinion, you're approaching the problem from the wrong side. ORMs (and Sequel as well) represent a nice, DSL-ish layer above the database, but, underneath, it's all SQL down there. So, I would try to formulate the question and the answer in a way to get SQL query which would return what you need, and then see how it would translate to Sequel's language.
You need to group by member and get the latest record for each member, right?
I'd go with the following idea (roughly):
SELECT t1.*
FROM table t1
LEFT JOIN table t2 ON t1.member = t2.member AND t2.end > t1.end
WHERE t2.id IS NULL
Now you should see how to perform left joins in Sequel, and you'll need to alias tables as well. Shouldn't be that hard.