How to perform a conditional count in a Rails active record query? - sql

In a Rails 3.2 app I'm trying to construct a query that will return an array with two calculated counts. The app has three models: a User has many Events, and a Location has many Events. I need to return an array for a user that contains the number of events they have at each location, as well as the number of active events.
e.g., [#<Location id: 1, name: "Location Name", total_events_count: 4, active_event_count: 4>]>
I can get the total_event_count
user = User.find(params[:id])
user.locations
.select("locations.id, locations.name, count(events.id) AS total_events_count")
.joins(:events)
.group("locations.id")
Given that my Event model has a string status field that can take a value active, how would I also include an active_events_count in this query?
EDIT
After some useful suggestions from xdazz and mu, I'm still struggling with this. My problem appears to be that the count is counting all events, rather than events that belong to the location AND the user.
I'm going to try to rephrase my question, hopefully someone can help me understand this.
Events belong to both a User and a Location (i.e. User has_many :locations, through: :events)
Events have several fields, including status (a string) and participants (an integer).
In a User show view, I'm trying to generate a list of a User's locations. For each location, I want to display a "success rate". The success rate is the total number of a User;s events with participants, divided by the total number of that User's events. i.e., if User1 has 4 events at LocationA, but only two of those events had participants, User1's success rate at LocationA is 0.5 (or 50%).
The way I though to achieve this is via a select query that also includes calculated counts for total_events_count and successful_events_count. (there may be a better way?)
So I do something like:
user = User.find(params[:id])
user.locations
.select("locations.id, locations.name, count(events.id) AS total_events_count, sum(case events.marked when NOT 0 then 1 else 0 end) AS successful_events_count")
.joins(:events)
.group("locations.id")
This is returning an array with the correct keys, but the values are not correct. I am getting the total number of all events (and all successful events) for that location, not just a count of those events that belong to the current user.
I have been looking at this problem for so long that I'm getting myself very confused. Would be very grateful for a fresh perspective and ideas!!
EDIT2
After a break and fresh eyes, I have managed to get the result I need using the following code. It seems quite convoluted. If there is a better way, please let me know. Otherwise I will tidy up this question in case anyone else runs into the same problem.
class User
def location_ratings
events = self.events
successful_events = events.where('events.participants > 0')
user_events_by_location = Event.
select("events.location_id AS l_id, count(events.id) AS location_event_count").
where( id: events.pluck(:id) ).
group("l_id").
to_sql
user_successful_events_by_location = Event.
select("events.location_id AS l_id, count(events.id) AS location_successful_events_count").
where( id: successful_events.pluck(:id) ).
group("l_id").
to_sql
Location.
joins("JOIN (#{user_events_by_location}) AS user_events ON user_events.l_id = location.id").
joins("JOIN (#{user_successful_events_by_location}) AS successful_user_events ON successful_user_events.l_id = location.id").
select('location.id, location.name, user_events.location_events_count, successful_user_events.location_successful_events_count').
order("user_events.location_events_count DESC")
end

You could use sum(events.marked='active') to get it:
user.locations
.select("locations.id, locations.name, count(events.id) AS total_events_count, sum(events.marked='active') AS marked_event_count")
.joins(:events)
.group("locations.id")
Update:
If you are using postgresql, then you have to case boolean to int before using SUM function.
user.locations
.select("locations.id, locations.name, count(events.id) AS total_events_count, sum((events.marked='active')::int) AS marked_event_count")
.joins(:events)
.group("locations.id")

Related

How can you use distinct in rails while still using ActiveRecord's

I am struggling with the following problem:
I want to have two different tabs, one that displays all recent chugs (Done), and one that displays the chugs that are the fastest per person.
However, this needs to remain an ActiveRecord, since I need to use it with link_to and gravatar, thus restraining me from group_by, as far as I understand it.
AKA: If there are three users who each have three chugs, I want to show 1 chug per user, which contains the fastest time of that particular user.
The current code looks like this, where chugs_unique should be edited:
def show
#pagy, #chugs_all_newest = pagy(#chugtype.chugs.order('created_at DESC'), items: 10, page: params[:page])
#chugs_unique = #chugtype.chugs.order('secs ASC, milis ASC, created_at DESC').uniq
breadcrumb #chugtype.name, chugtypes_path(#chugtype)
end
In this case, a chug belongs to both a chugtype and user, and the chugtype has multiple chugs.
Thanks in advance!

Why does Postgres not accept my count column?

I am building a Rails app with the following models:
# vote.rb
class Vote < ApplicationRecord
belongs_to :person
belongs_to :show
scope :fulfilled, -> { where(fulfilled: true) }
scope :unfulfilled, -> { where(fulfilled: false) }
end
# person.rb
class Person < ApplicationRecord
has_many :votes, dependent: :destroy
def self.order_by_votes(show = nil)
count = 'nullif(votes.fulfilled, true)'
count = "case when votes.show_id = #{show.id} AND NOT votes.fulfilled then 1 else null end" if show
people = left_joins(:votes).group(:id).uniq!(:group)
people = people.select("people.*, COUNT(#{count}) AS people.vote_count")
people.order('people.vote_count DESC')
end
end
The idea behind order_by_votes is to sort People by the number of unfulfilled votes, either counting all votes, or counting only votes associated with a given Show.
This seem to work fine when I test against SQLite. But when I switch to Postgres I get this error:
Error:
PeopleControllerIndexTest#test_should_get_previously_on_show:
ActiveRecord::StatementInvalid: PG::UndefinedColumn: ERROR: column people.vote_count does not exist
LINE 1: ...s"."show_id" = $1 GROUP BY "people"."id" ORDER BY people.vot...
^
If I dump the SQL using #people.to_sql, this is what I get:
SELECT people.*, COUNT(nullif(votes.fulfilled, true)) AS people.vote_count FROM "people" LEFT OUTER JOIN "votes" ON "votes"."person_id" = "people"."id" GROUP BY "people"."id" ORDER BY people.vote_count DESC
Why is this failing on Postgres but working on SQLite? And what should I be doing instead to make it work on Postgres?
(PS: I named the field people.vote_count, with a dot, so I can access it in my view without having to do another SQL query to actually view the vote count for each person in the view (not sure if this works) but I get the same error even if I name the field simply vote_count.)
(PS2: I recently added the .uniq!(:group) because of some deprecation warning for Rails 6.2, but I couldn't find any documentation for it so I am not sure I am doing it right, still the error is there without that part.)
Are you sure you're not getting a syntax error from PostgreSQL somewhere? If you do something like this:
select count(*) as t.vote_count from t ... order by t.vote_count
I get a syntax error before PostgreSQL gets to complain about there being no t.vote_count column.
No matter, the solution is to not try to put your vote_count in the people table:
people = people.select("people.*, COUNT(#{count}) AS vote_count")
...
people.order(vote_count: :desc)
You don't need it there, you'll still be able to reference the vote_count just like any "normal" column in people. Anything in the select list will appear as an accessor in the resultant model instances whether they're columns or not, they won't show up in the #inspect output (since that's generated based on the table's columns) but you call the accessor methods nonetheless.
Historically there have been quite a few AR problems (and bugs) in getting the right count by just using count on a scope, and I am not sure they are actually all gone.
That depends on the scope (AR version, relations, group, sort, uniq, etc). A defaut count call that a gem has to generically use on a scope is not a one-fit-all solution. For that known reason Pagy allows you to pass the right count to its pagy method as explained in the Pagy documentation.
Your scope might become complex and the default pagy collection.count(:all) may not get the actual count. In that case you can get the right count with some custom statement, and pass it to pagy.
#pagy, #records = pagy(collection, count: your_count)
Notice: pagy will efficiently skip its internal count query and will just use the passed :count variable.
So... just get your own calculated count and pass it to pagy, and it will not even try to use the default.
EDIT: I forgot to mention: you may want to try the pagy arel extra that:
adds specialized pagination for collections from sql databases with GROUP BY clauses, by computing the total number of results with COUNT(*) OVER ().
Thanks to all the comments and answers I have finally found a solution which I think is the best way to solve this.
First of, the issue occurred when I called pagy which tried to count my scope by appending .count(:all). This is what caused the errors. The solution was to not create a "field" in select() and use it in .order().
So here is the proper code:
def self.order_by_votes(show = nil)
count = if show
"case when votes.show_id = #{show.id} AND NOT votes.fulfilled then 1 else null end"
else
'nullif(votes.fulfilled, true)'
end
left_joins(:votes).group(:id)
.uniq!(:group)
.select("people.*, COUNT(#{count}) as vote_count")
.order(Arel.sql("COUNT(#{count}) DESC"))
end
This sorts the number of people on the number of unfulfilled votes for them, with the ability to count only votes for a given show, and it works with pagy(), and pagy_arel() which in my case is a much better fit, so the results can be properly paginated.

Rails joins query killed or just too slow. Please recommend the proper way to create queries

I am having trouble properly creating query with joins. It starts to talk to server but it ends up clogged and saying "killed" (in Rails console)
I have to models.
One is 'User', the other one is 'Availability'
Some users will open availabilities in 2 weeks. And I'd like to fetch 50 users with this condition with page variable.(because there will be many of them and I'd like to fetch 50 on every call)
Availability has two columns: user_id and start_time(datetime)
And the association is that user has many availabilities.
The query looks like the below.
people = User
.where(role: SOMETHING)
.includes(:availabilities)
.joins(:availabilities)
.where('availabilities.start_time > ?', Time.now)
.where('availabilities.start_time < ?', Time.now + 2.weeks)
.limit(5)
.offset(50 * (n-1))
where n is integer from 1
However, this query never gives me result on the production (in the console it's killed. Before the console kills the process, when querying, it shows normal query statement (sql 30ms for instance) forever. In local, where data is small, it works. Are there anything missing here?
Please give me any advice!!
And weird thing is ,
people = User
.where(role: SOMETHING)
.includes(:availabilities)
.joins(:availabilities)
.limit(5)
.offset(50 * (n-1))
then if
people.map(&:id) => [18,18,18,18,18]
which means people is inappropriately fetched. I am just confused here..!
Includes availabilities will generate availability model instances after querying.
If availability rows are so many, it will cost a lot of time.
If you won't use availabilities after querying, please try
people = User
.where(role: ROLES)
.joins(:availabilities)
.where(availabilities: {start_time: (Time.now)..(2.weeks.from_now)})
.offset(50 * (n-1))
.limit(5)
I kind of find the way to work it:
people = User
.where(role: ROLES)
.joins(:availabilities)
.where(availabilities: { start_time: (Time.now)..(2.weeks.from_now) })
.distinct
.offset(50 * (n-1))
.limit(5)
Then:
people.map(&:id), => [x,y,z,l,m]

Django Q Queries & on the same field?

So here are my models:
class Event(models.Model):
user = models.ForeignKey(User, blank=True, null=True, db_index=True)
name = models.CharField(max_length = 200, db_index=True)
platform = models.CharField(choices = (("ios", "ios"), ("android", "android")), max_length=50)
class User(AbstractUser):
email = models.CharField(max_length=50, null=False, blank=False, unique=True)
Event is like an analytics event, so it's very possible that I could have multiple events for one user, some with platform=ios and some with platform=android, if a user has logged in on multiple devices. I want to query to see how many users have both ios and android devices. So I wrote a query like this:
User.objects.filter(Q(event__platform="ios") & Q(event__platform="android")).count()
Which returns 0 results. I know this isn't correct. I then thought I would try to just query for iOS users:
User.objects.filter(Q(event__platform="ios")).count()
Which returned 6,717,622 results, which is unexpected because I only have 39,294 users. I'm guessing it's not counting the Users, but counting the Event instances, which seems like incorrect behavior to me. Does anyone have any insights into this problem?
You can use annotations instead:
django.db.models import Count
User.objects.all().annotate(events_count=Count('event')).filter(events_count=2)
So it will filter out any user that has two events.
You can also use chained filters:
User.objects.filter(event__platform='android').filter(event__platform='ios')
Which first filter will get all users with android platform and the second one will get the users that also have iOS platform.
This is generally an answer for a queryset with two or more conditions related to children objects.
Solution: A simple solution with two subqueries is possible, even without any join:
base_subq = Event.objects.values('user_id').order_by().distinct()
user_qs = User.objects.filter(
Q(pk__in=base_subq.filter(platform="android")) &
Q(pk__in=base_subq.filter(platform="ios"))
)
The method .order_by() is important if the model Event has a default ordering (see it in the docs about distinct() method).
Notes:
Verify the only SQL request that will be executed: (Simplified by removing "app_" prefix.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user WHERE (
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'android')
AND
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'ios')
)
The function Q() is used because the same condition parameter (pk__in) can not be repeated in the same filter(), but also chained filters could be used instead: .filter(...).filter(...). (The order of filter conditions is not important and it is outweighed by preferences estimated by SQL server optimizer.)
The temporary variable base_subq is an "alias" queryset only to don't repeat the same part of expression that is never evaluated individually.
One join between User (parent) and Event (child) wouldn't be a problem and a solution with one subquery is also possible, but a join with Event and Event (a join with a repeated children object or with two children objects) should by avoided by a subquery in any case. Two subqueries are nice for readability to demonstrate the symmetry of the two filter conditions.
Another solution with two nested subqueries This non symmetric solution can be faster if we know that one subquery (that we put innermost) has a much more restrictive filter than another necessary subquery with a huge set of results. (example if a number of Android users would be huge)
ios_user_ids = (Event.objects.filter(platform="ios")
.values('user_id').order_by().distinct())
user_ids = (Event.objects.filter(platform="android", user_id__in=ios_user_ids)
.values('user_id').order_by().distinct())
user_qs = User.objects.filter(pk__in=user_ids)
Verify how it is compiled to SQL: (simplified again by removing app_ prefix and ".)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user
WHERE user.id IN (
SELECT DISTINCT V0.user_id FROM event V0
WHERE V0.platform = 'ios' AND V0.user_id IN (
SELECT DISTINCT U0.user_id FROM event U0
WHERE U0.platform = 'android'
)
)
(These solutions work also in an old Django e.g. 1.8. A special subquery function Subquery() exists since Django 1.11 for more complicated cases, but we didn't need it for this simple question.)

Rails - get distinct events, sorted by the start date of associated event instances

I've spent several hours going through StackOverflow and playing around with this query, but still can't get it to work! Hopefully an expert here on SO can make the pain go away...
I have two models, Event and EventInstance. An Event has_many EventInstances.
What I want to do is easily get a list of Events (not EventInstances), where:
Events are distinct and not repeated
Events are sorted by the start_date of the nearest EventInstance
Event instances have the attribute :active => true
Only event instances that have a start date in the future are returned
I currently have the query
Event.joins(:event_instances).select('distinct events.*').where('event_instances.start_date >= ?', Time.now).where('event_instances.active = true')
This returns a list of events, but not sorted by date. Excellent - so I am almost there!
If I change the query to add this on the end:
.order('event_instances.start_date')
I get the error:
PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
So I moved it to the select statement:
select('distinct event_instances.start_date, events.*')
Now I get
PG::UndefinedFunction: ERROR: function count(date, events) does not exist
I've tried moving methods around, using includes, everything but I still can't get it to work. Any help would be really appreciated! Thank you.
try changing
.order('event_instances.start_date')
to
.order(:event_instances.start_date)
or if you need descending order add the .reverse_order method to the end of the query
This is the exact query which worked for my models Post and PostComments on both MySQL and PostgreSQL:
Post.joins(:post_comments).select('distinct post_comments.body, post_comments.created_at').order('post_comments.created_at desc')
So for you, it's equivalent should work too. If it still doesn't then please update your post with the fields of your model.