Rails query for associated model with max column value - sql

I have 3 models in rails: Author, Book, and Page. pages belongs to book, books belong to author as so:
class Author < ActiveRecord::Base
has_many :books
end
class Book < ActiveRecord::Base
belongs_to :author
has_many :pages
end
class Page < ActiveRecord::Base
belongs_to :book
end
the Page model has a column called page_number. I'm using Postgres.
My question is this: Assuming a have an author #author, how do query for all that author's last pages? In other words, I want the last page of each book written by that author. I am trying the following which isn't working:
Page.where(book_id: #author.books.pluck(:id)).select('MAX(page_number), *').group(:book_id)
EDIT
The following 2 lines work, but I would love to learn of a faster/cleaner solution:
all_pages = Page.where(book: #author.books)
last_pages = all_pages.select{ |a| !all_pages.select{ |b| b.book_id == a.book_id}.any?{ |c| c.page_number > a.page_number } }

The most efficient way might be leveraging postgres' window functions
A query like this doesn't fit into the activerecord common use case, so you may have to use find_by_sql, but it may be well worth it.
In your case, grabbing the book ids first may be a good call, as joining or an additional subquery may not be advantageous—your call.
Let's say you have a list of book ids from #author.books.ids. The next thing we want is a list of pages "grouped by" book so we can pluck off the last page for each group. Let 1,2 be the book ids for the author in question.
We can use a window function and the rank function in postgres to create a resultset where pages are ranked over partitions (group) of book. We'll even sort those partitions of pages by the page number so that the maximum page number (last page) is at the top of each partition. The query would look like this:
select
*,
rank() over (
partition by book_id order by page_number desc
) as reverse_page_index
from pages
where book_id in (1,2)
Our imagined pages result set would look like this.
author 1, book 1, page 3, rank 1
author 1, book 1, page 2, rank 2
author 1, book 1, page 1, rank 3
author 1, book 2, page 6, rank 1
author 1, book 2, page 5, rank 2
author 1, book 2, page 4, rank 3
author 1, book 2, page 3, rank 4
author 1, book 2, page 2, rank 5
author 1, book 2, page 1, rank 6
The pages records are partitioned by book, sorted by page number ascending and given a rank amongst their partition.
If we then want only the first ranked (last page) for each book after the window calculation is performed, we can use a sub-select like so:
select *
from
(
select
*,
rank() over (
partition by book_id order by page_number desc
) as reverse_page_index
from pages
where book_id in (1,2)
) as pages
where reverse_page_index = 1;
Where we filter the above imagined result set to just page records where the rank (reverse_page_index) is 1 (i.e. the last page).
And now our result set would be:
author 1, book 1, page 3, rank 1
author 1, book 2, page 6, rank 1
You could also order this resultset by last modified or whatever you need.
Toss that query in a find_by_sql and you'll have some activerecord objects to use.

Related

Rails 4 ActiveRecord: Order recrods by attribute and association if it exists

I have three models that I am having trouble ordering:
User(:id, :name, :email)
Capsule(:id, :name)
Outfit(:id, :name, :capsule_id, :likes_count)
Like(:id, :outfit_id, :user_id)
I want to get all the Outfits that belong to a Capsule and order them by the likes_count.
This is fairly trivial and I can get them like this:
Outfit.where(capsule_id: capsule.id).includes(:likes).order(likes_count: :desc)
However, I then want to also order the outfits so that if a given user has liked it, it appears higher in the list.
Example if I have the following outfit records:
Outfit(id: 1, capsule_id: 2, likes_count: 1)
Outfit(id: 2, capsule_id: 2, likes_count: 2)
Outfit(id: 3, capsule_id: 2, likes_count: 2)
And the given user has only liked outfit with id 3, the returned order should be IDs: 3, 2, 1
I'm sure this is fairly easy, but I can't seem to get it. Any help would be greatly appreciated :)
Postgres SQL with a subquery
SELECT outfits.*
FROM outfits
LEFT OUTER JOIN (SELECT likes.outfit_id, 1 AS weight
FROM likes
WHERE likes.user_id = #user_id) AS user_likes
ON user_likes.outfit_id = outfits.id
WHERE outfits.capsule_id = #capsule_id
ORDER BY user_likes.weight ASC, outfits.likes_count DESC;
Postgres gives NULL values bigger weight when ordering. I am not sure how this would look in Arel query. You can try converting it using this cheatsheets.

Django ALL in query (as opposed to OR / plain IN clause)

Say I have 2 models joined via many-to-many:
class Person(models.Model):
name = models.CharField(max_length=200, null=False, blank=False)
sports = models.ManyToManyField('Sport')
class Sport(models.Model):
name = models.CharField(max_length=200, null=False, blank=False)
people = models.ManyToManyField('Person')
I'd like to perform an AND query to filter Person by those who play ALL sports given a list of sport ids. So something like:
Person.objects.filter(sports__id__all=[1,2,3])
Or, said differently, exclude anyone that doesn't play ALL the sports.
Filtering (retaining People that play all given Sports)
The solution is not trivial. If you however can calculate the length of the list, then it can be done by calculating the number of overlap between the list of sports, and the sports a Person plays:
from django.db.models import Count
sports_list = [1, 2, 3]
Person.objects.filter(
sports__in=sports_list
).annotate(
overlap=Count('sports')
).filter(overlap=len(sports_list))
So in case the number of sports of a Person that are in the sports_list is the number of elements in the sports_list, then we know that person plays all those sports.
non-unique Sports in the sport_list
Note that sport_lists should contain unique Sport objects (or id's). You can however build a set of sports, for example:
# in case a sport can occur *multiple times in the list
from django.db.models import Count
sports_set = set([1, 2, 3, 2, 3, 3])
Person.objects.filter(
sports__in=sports_set
).annotate(
overlap=Count('sports')
).filter(overlap=len(sports__set))
SQL query
Behind the curtains, we will construct a query like:
SELECT `person`.*
FROM `person`
INNER JOIN `person_sport` ON `person`.`id` = `person_sport`.`person_id`
WHERE `person_sport`.`sport_id` IN (1, 2, 3)
GROUP BY `person`.`id`
HAVING COUNT(`person_sport`.`sport_id`) = 3
Excluding (retaining People that do not play all given Sports)
A related problem might be to exclude those persons: persons that play all the specified sports. We can do this as well, but then there can occur a problem: people that play no sport at all, will be excluded as well, since the first .filter(..) will remove these people. We can however slightly change the code, such that these are included as well:
# opposite problem: excluding those people
from django.db.models import Q, Count
sports_set = set([1, 2, 3, 2, 3, 3])
Person.objects.filter(
Q(sports__in=sports_set) | Q(sports__isnull=True)
).annotate(
overlap=Count('sports')
).exclude(overlap=len(sports__set))

Group by query, each group has to not have any item not in a List

In need a query that will help me solve this.
Here's my table 'tags':
id (int)
name (String)
user_id (int)
hardware_id (int)
I am grouping the results of the 'tags' table by hardware_id. I also have a List of tags (List<string>).
I want to get the hardware Id of the groups that all of the tags in the custom List matches at a name in the table above.
In other words, I want to get the hardware_id's that the custom List tags matches their name's. There might be name's that doesn't have a match in the custom list, but all of the custom list tags, must be in the group, and if it satisfies this need, I can the Id of that group.
I found it hard to explain and I didn't get an answer for that. I thought about doing it with foreach because it was so hard to solve, but I couldn't do it either and it's very inefficient to do it that way.
Example:
List : ['tag1', 'tag2']
Table Rows:
1, tag1, 5, 1
2, tag2, 5, 1
3, tag3, 5, 1
4, tag4, 5, 2
5, tag5, 6, 2
In this case, I should get the hardware_id of 1, because although one of the hardware Id's have tag3, it doesn't have any rows with a tag name that isn't in the List. IF the List had 'tag4', the hardware_id = 1 WOULDN'T be returned, because the List has a tag that the hardware_id group doesn't have.
If the Group doesn't have an item that the List has, it won't appear in the final result.
Someone game me this code, but it didn't work:
List<decimal> matchingHardareIds = db.tags.GroupBy(x => x.hardware_id)
.Where(x => x.All(s => tags.Contains(s.name.ToLower()) || 0 == tags.Count() && (s.user_id == userId)))
.Select(x => x.Key).ToList();
In that query, when I have one tag in the List and in the table I have several items with hardware_id 1 and one of them has a 'name' that is equal to the value in the List it will return empty results. this is because the rows in the table for a specific group by hardware_id, has a row with a name that doesn't appear in the custom List.
I need a query in either Entity Framework or Linq. Thanks a lot.
Use this:
var t = db.tags.GroupBy(x => x.hardware_Id)
.Where(x => tags.All(y =>
x.Any(z=> z.name == y)))
.Select(x=>x.Key).ToList();
Can not provide you with the entity framework or linq query, but the sql solution is to count the matches like this:
SELECT hardware_id
FROM tags
WHERE name IN (<list>)
GROUP BY hardware_id
HAVING COUNT(DISTINCT name) = <listDistinctCount>
<listDistinctCount> is the count of distinct values in the list. Which you can prepare prior to the query.

How to order by largest amount of identical entries with Rails?

I have a survey where users can post answers and since the answers are being saved in the db as a foreign key for each question, I'd like to know which answer got the highest rating.
So if the DB looks somewhat like this:
answer_id
1
1
2
how can I find that the answer with an id of 1 was selected more times than the one with an id of 2 ?
EDIT
So far I've done this:
#question = AnswerContainer.where(user_id: params[:user_id]) which lists the things a given user has voted for, but, obviously, that's not what I need.
you could try:
YourModel.group(:answer_id).count
for your example return something like: {1 => 2, 2 => 1}
You can do group by and then sort
Select answer_id, count(*) as maxsel
From poll
Group by answer_id
Order by maxsel desc
As stated in rails documentation (http://api.rubyonrails.org/classes/ActiveRecord/Calculations.html) when you use group with count, active record "returns a Hash whose keys represent the aggregated column, and the values are the respective amounts"
Person.group(:city).count
# => { 'Rome' => 5, 'Paris' => 3 }

SQL: Get a selected row index from a query

I have an applications that stores players ratings for each tournament. So I have many-to-many association:
Tournament
has_many :participations, :order => 'rating desc'
has_many :players, :through => :participations
Participation
belongs_to :tournament
belongs_to :player
Player
has_many :participations
has_many :tournaments, :through => :participations
The Participation model has a rating field (float) that stores rating value (it's like score points) for each player at each tournament.
The thing I want - get last 10 ranks of the player (rank is a position of the player at particular tournament based on his rating: the more rating - the higher rank). For now to get a player's rank on a tournament I'm loading all participations for this tournament, sort them by rating field and get the player's participation index with ruby code:
class Participation < ActiveRecord::Base
belongs_to :player
belongs_to :tournament
def rank
tournament.participations.index(self)
end
end
Method rank of the participation gets its parent tournament, loads all tournamentr's participations (ordered by rating desc) and get own index inside this collection
and then something like:
player.participations.last.rank
The one thing I don't like - it need to load all participations for the tournament, and in case I need player ranks for last 10 tournaments it loads over 5.000 items (and its amount will grow when new players added).
I believe that there should be way to use SQL for it. Actually I tried to use SQL variables:
find_by_sql("select #row:=#row+1 `rank`, p.* from participations p, (SELECT #row:=0) r where(p.tournament_id = #{tournament_id}) order by rating desc limit 10;")
This query selects top-10 ratings from the given tournament. I've been trying to modify it to select last 10 participations for a given user and his rank.
Will appreciate any kind of help. (I think solution will be a SQL request, since it's pretty complex for ActiveRecord).
P.S. I'm using rails3.0.0.beta4
UPD:
Here is final sql request that gets last 10 ranks of the player (in addition it loads the participated tournaments as well)
SELECT *, (
SELECT COUNT(*) FROM participations AS p2
WHERE p2.tour_id = p1.tour_id AND p2.rating > p1.rating
) AS rank
FROM participations AS p1 LEFT JOIN tours ON tours.id = p1.tour_id WHERE p1.player_id = 68 ORDER BY tours.date desc LIMIT 10;
First of all, should this:
Participation
belongs_to :tournament
belongs_to :players
be this?
Participation
belongs_to :tournament
belongs_to :player
Ie, singular player after the belongs_to?
I'm struggling to get my head around what this is doing:
class Participation
def rank_at_tour(tour)
tour.participations.index(self)
end
end
You don't really explain enough about your schema to make it easy to reverse engineer. Is it doing the following...?
"Get all the participations for the given tour and return the position of this current participation in that list"? Is that how you calculate rank? If so i agree it seems like a very convoluted way of doing it.
Do you do the above for the ten participation objects you get back for the player and then take the average? What is rating? Does that have anything to do with rank? Basically, can you explain your schema a bit more and then restate what you want to do?
EDIT
I think you just need a more efficient way of finding the position. There's one way i could think of off the top of my head - get the record you want and then count how many are above it. Add 1 to that and you get the position. eg
class Participation
def rank_at_tour(tour)
tour.participations.count("rating > ?", self.rating) + 1
end
end
You should see in your log file (eg while experimenting in the console) that this just makes a count query. If you have an index on the rating field (which you should have if you don't) then this will be a very fast query to execute.
Also - if tour and tournament are the same thing (as i said you seem to use them interchangeably) then you don't need to pass tour to participation since it belongs to a tour anyway. Just change the method to rank:
class Participation
def rank
self.tour.participations.count("rating > ?", self.rating) + 1
end
end
SELECT *, (
SELECT COUNT(*) FROM participations AS p2
WHERE p2.tour_id = p1.tour_id AND p2.rating > p1.rating
) AS rank
FROM participations AS p1 LEFT JOIN tours ON tours.id = p1.tour_id WHERE p1.player_id = 68 ORDER BY tours.date desc LIMIT 10;