SQL: Get a selected row index from a query - sql

I have an applications that stores players ratings for each tournament. So I have many-to-many association:
Tournament
has_many :participations, :order => 'rating desc'
has_many :players, :through => :participations
Participation
belongs_to :tournament
belongs_to :player
Player
has_many :participations
has_many :tournaments, :through => :participations
The Participation model has a rating field (float) that stores rating value (it's like score points) for each player at each tournament.
The thing I want - get last 10 ranks of the player (rank is a position of the player at particular tournament based on his rating: the more rating - the higher rank). For now to get a player's rank on a tournament I'm loading all participations for this tournament, sort them by rating field and get the player's participation index with ruby code:
class Participation < ActiveRecord::Base
belongs_to :player
belongs_to :tournament
def rank
tournament.participations.index(self)
end
end
Method rank of the participation gets its parent tournament, loads all tournamentr's participations (ordered by rating desc) and get own index inside this collection
and then something like:
player.participations.last.rank
The one thing I don't like - it need to load all participations for the tournament, and in case I need player ranks for last 10 tournaments it loads over 5.000 items (and its amount will grow when new players added).
I believe that there should be way to use SQL for it. Actually I tried to use SQL variables:
find_by_sql("select #row:=#row+1 `rank`, p.* from participations p, (SELECT #row:=0) r where(p.tournament_id = #{tournament_id}) order by rating desc limit 10;")
This query selects top-10 ratings from the given tournament. I've been trying to modify it to select last 10 participations for a given user and his rank.
Will appreciate any kind of help. (I think solution will be a SQL request, since it's pretty complex for ActiveRecord).
P.S. I'm using rails3.0.0.beta4
UPD:
Here is final sql request that gets last 10 ranks of the player (in addition it loads the participated tournaments as well)
SELECT *, (
SELECT COUNT(*) FROM participations AS p2
WHERE p2.tour_id = p1.tour_id AND p2.rating > p1.rating
) AS rank
FROM participations AS p1 LEFT JOIN tours ON tours.id = p1.tour_id WHERE p1.player_id = 68 ORDER BY tours.date desc LIMIT 10;

First of all, should this:
Participation
belongs_to :tournament
belongs_to :players
be this?
Participation
belongs_to :tournament
belongs_to :player
Ie, singular player after the belongs_to?
I'm struggling to get my head around what this is doing:
class Participation
def rank_at_tour(tour)
tour.participations.index(self)
end
end
You don't really explain enough about your schema to make it easy to reverse engineer. Is it doing the following...?
"Get all the participations for the given tour and return the position of this current participation in that list"? Is that how you calculate rank? If so i agree it seems like a very convoluted way of doing it.
Do you do the above for the ten participation objects you get back for the player and then take the average? What is rating? Does that have anything to do with rank? Basically, can you explain your schema a bit more and then restate what you want to do?
EDIT
I think you just need a more efficient way of finding the position. There's one way i could think of off the top of my head - get the record you want and then count how many are above it. Add 1 to that and you get the position. eg
class Participation
def rank_at_tour(tour)
tour.participations.count("rating > ?", self.rating) + 1
end
end
You should see in your log file (eg while experimenting in the console) that this just makes a count query. If you have an index on the rating field (which you should have if you don't) then this will be a very fast query to execute.
Also - if tour and tournament are the same thing (as i said you seem to use them interchangeably) then you don't need to pass tour to participation since it belongs to a tour anyway. Just change the method to rank:
class Participation
def rank
self.tour.participations.count("rating > ?", self.rating) + 1
end
end

SELECT *, (
SELECT COUNT(*) FROM participations AS p2
WHERE p2.tour_id = p1.tour_id AND p2.rating > p1.rating
) AS rank
FROM participations AS p1 LEFT JOIN tours ON tours.id = p1.tour_id WHERE p1.player_id = 68 ORDER BY tours.date desc LIMIT 10;

Related

rails return list of users with average rating above 5

I have 2 models User and Rating as follows:
class User < ActiveRecord::Base
has_many :ratings
end
class Rating < ActiveRecord::Base
belongs_to :user
end
Each user receives multiple ratings from 1 - 10. I want to return all users with an average rating of > 5. I've got this so far...
User.joins(:ratings).where('rating > ?', 5)
But that code returns all Users with any rating above 5. I want Users with an Average rating above 5.
I've seen other posts like this and that are asking similar questions, but I'm having a brainfart today, and can't simulate their question into an appropriate answer.
If you're looking at all users, why join first?
#avg = Ratings.group(:user_id).average("rating") #returns hash which contains arrays
#avg.each do |avg| #0 is user_id, 1 is value
puts avg[0] if avg[1] > 5
end
You need to defined method average for user rating.
Check link below this is good example of moving float to average.
How do I create an average from a Ruby array?
Hope this helps someone in the future. This will find the average rating of each user through the ratings table, and return all users with an average rating above 5.
User.joins(:ratings).merge(Rating.group(:user_id).having('AVG(rating) > 5'))
.having was my missing link. More examples of .having here and here

Resume of product query

I have the following schema table:
I have three activerecord models with their associations. I am struggling with a query which will show the following information for each product:
Product Name, Money Total, Quantity Sold Total
It should also take account on the status of the order that the product_line are associated with, which it should be equal to "successful".
I also want a second one query which it will show the above but it will have restriction based on the month (based on the orders.created_at column). For example if I want the sales for January of this product.
Product Name, Total Money so far, Quantity total, Month
I managed to create something but I think it isn't very optimized and I used ruby's group_by which it is doing many additional queries on the view. I would appreciate how you usually start thinking about creating a query like that.
Update
I think I almost managed to solve the first query and it is the following:
products = Product.joins(:product_lines).select("products.name, SUM(product_lines.quantity) as sum_amount, SUM(product_lines.quantity*products.price) as money_total"),group("products.id")
I tried to split each columns separately and find out how I could calculate it. I haven't take into account the order status though.
The associations are the following:
ProbudtLine
class ProductLine < ActiveRecord::Base
belongs_to :order
belongs_to :cart
belongs_to :product
end
Product
class Product < ActiveRecord::Base
has_many :product_lines
end
Order
class Order < ActiveRecord::Base
has_many :product_lines, dependent: :destroy
end
I finally did it.
First query:
#best_products_so_far = Product.joins(product_lines: :order)
.select("products.*, SUM(product_lines.quantity) as amount_total, SUM(product_lines.quantity*products.price) as money_total")
.where("orders.status = 'successful'")
.group("products.id")
Second query:
#best_products_this_month = Product.joins(product_lines: :order)
.select("products.*, SUM(product_lines.quantity) as amount_total, SUM(product_lines.quantity*products.price) as money_total")
.where("orders.status = 'successful'")
.where("extract(month from orders.completed_at) = ?", Date.today.strftime("%m"))
.group("products.id")

Rails 3 query matching attribute of has_one association that is a subset of has_many association

The title is confusing, but allow me to explain. I have a Car model that has multiple datapoints with different timestamps. We are almost always concerned with attributes of its latest status. So the model has_many statuses, along with a has_one to easily access it's latest one:
class Car < ActiveRecord::Base
has_many :statuses, class_name: 'CarStatus', order: "timestamp DESC"
has_one :latest_status, class_name: 'CarStatus', order: "timestamp DESC"
delegate :location, :timestamp, to: 'latest_status', prefix: 'latest', allow_nil: true
# ...
end
To give you an idea of what the statuses hold:
loc = Car.first.latest_location # Location object (id = 1 for example)
loc.name # "Miami, FL"
Let's say I wanted to have a (chainable) scope to find all cars with a latest location id of 1. Currently I have a sort of complex method:
# car.rb
def self.by_location_id(id)
ids = []
find_each(include: :latest_status) do |car|
ids << car.id if car.latest_status.try(:location_id) == id.to_i
end
where("id in (?)", ids)
end
There may be a quicker way to do this using SQL, but not sure how to only get the latest status for each car. There may be many status records with a location_id of 1, but if that's not the latest location for its car, it should not be included.
To make it harder... let's add another level and be able to scope by location name. I have this method, preloading statuses along with their location objects to be able to access the name:
def by_location_name(loc)
ids = []
find_each(include: {latest_status: :location}) do |car|
ids << car.id if car.latest_location.try(:name) =~ /#{loc}/i
end
where("id in (?)", ids)
end
This will match the location above with "miami", "fl", "MIA", etc... Does anyone have any suggestions on how I can make this more succinct/efficient? Would it be better to define my associations differently? Or maybe it will take some SQL ninja skills, which I admittedly don't have.
Using Postgres 9.1 (hosted on Heroku cedar stack)
All right. Since you're using postgres 9.1 like I am, I'll take a shot at this. Tackling the first problem first (scope to filter by location of last status):
This solution takes advantage of PostGres's support for analytic functions, as described here: http://explainextended.com/2009/11/26/postgresql-selecting-records-holding-group-wise-maximum/
I think the following gives you part of what you need (replace/interpolate the location id you're interested in for the '?', naturally):
select *
from (
select cars.id as car_id, statuses.id as status_id, statuses.location_id, statuses.created_at, row_number() over (partition by statuses.id order by statuses.created_at) as rn
from cars join statuses on cars.id = statuses.car_id
) q
where rn = 1 and location_id = ?
This query will return car_id, status_id, location_id, and a timestamp (called created_at by default, although you could alias it if some other name is easier to work with).
Now to convince Rails to return results based on this. Because you'll probably want to use eager loading with this, find_by_sql is pretty much out. There is a trick I discovered though, using .joins to join to a subquery. Here's approximately what it might look like:
def self.by_location(loc)
joins(
self.escape_sql('join (
select *
from (
select cars.id as car_id, statuses.id as status_id, statuses.location_id, statuses.created_at, row_number() over (partition by statuses.id order by statuses.created_at) as rn
from cars join statuses on cars.id = statuses.car_id
) q
where rn = 1 and location_id = ?
) as subquery on subquery.car_id = cars.id order by subquery.created_at desc', loc)
)
end
Join will act as a filter, giving you only the Car objects that were involved in the subquery.
Note: In order to refer to escape_sql as I do above, you'll need to modify ActiveRecord::Base slightly. I do this by adding this to an initializer in the app (which I place in app/config/initializers/active_record.rb):
class ActiveRecord::Base
def self.escape_sql(clause, *rest)
self.send(:sanitize_sql_array, rest.empty? ? clause : ([clause] + rest))
end
end
This allows you to call .escape_sql on any of your models that are based on AR::B. I find this profoundly useful, but if you've got some other way to sanitize sql, feel free to use that instead.
For the second part of the question - unless there are multiple locations with the same name, I'd just do a Location.find_by_name to turn it into an id to pass into the above. Basically this:
def self.by_location_name(name)
loc = Location.find_by_name(name)
by_location(loc)
end

Combining Active Record group, join, maximum & minimum

I'm trying to get to grips with the Active Record query interface. I have two models:
class Movie < ActiveRecord::Base
has_many :datapoints
attr_accessible :genre
end
class Datapoint < ActiveRecord::Base
belongs_to :movie
attr_accessible :cumulative_downloads, :timestamp
end
I want to find the incremental downloads per genre for a given time period.
So far I've managed to get the maximum and minimum downloads per movie within a time period, like so:
maximums = Datapoint.joins(:movie)
.where(["datapoints.timestamp > ?", Date.today - #timespan])
.group('datatpoints.movie_id')
.maximum(:cumulative_downloads)
This then allows me to calculate the incremental per movie, before aggregating this into the incremental per genre.
Clearly this is a bit ham-fisted, and I'm sure it would be possible to do this in one step (and using hash conditions). I just can't get my head around how. Can you help?
Much appreciated!
Derek.
I think this will allow you to calculate maximum per genre:
Movie.joins(:datapoints).where(datapoints: {timestamp: (Time.now)..(Time.now+1.year)}).group(:genre).maximum(:cumulative_downloads)
Edit 1
You can get the diffs in a couple of steps:
rel = Movie.joins(:datapoints).where(datapoints: {timestamp: (Time.now)..(Time.now+1.year)}).group(:genre)
mins = rel.minimum(:cumulative_downloads)
maxs = rel.maximum(:cumulative_downloads)
res = {}
maxs.each{|k,v| res[k] = v-mins[k]}
Edit 2
Your initial direction was almost there. All you have to do is calculate the diff per movie in the SQL and stage the data so you can collect it with one pass. I'm sure there's a way to do it all in SQL, but I'm not sure it will be as simple.
# get the genre and diff per movie
result = Movie.select('movies.genre, MAX(datapoints.cumulative_downloads)-MIN(datapoints.cumulative_downloads) as diff').joins(:datapoints).group(:movie_id)
# sum the diffs per genre
per_genre = Hash.new(0)
result.each{|m| per_genre[m.genre] += m.diff}
Edit 3
Including the movie_id in the select and the genre in the group:
# get the genre and diff per movie
result = Movie
.select('movies.movie_id, movies.genre, MAX(datapoints.cumulative_downloads)-MIN(datapoints.cumulative_downloads) as diff')
.joins(:datapoints)
.group('movies.movie_id, movies.genre')
# sum the diffs per genre
per_genre = Hash.new(0)
result.each{|m| per_genre[m.genre] += m.diff}

Rails 3 Order Records By Grand-child Count

I'm trying to do some fairly complicated record sorting that I was having a bit of trouble with. I have three models:
class User < ActiveRecord::Base
has_many :registers
has_many :results, :through => :registers
#Find all the Users that exist as registrants for a tournament
scope :with_tournament_entrees, :include => :registers, :conditions => "registers.id IS NOT NULL"
end
Register
class Register < ActiveRecord::Base
belongs_to :user
has_many :results
end
Result
class Result < ActiveRecord::Base
belongs_to :register
end
Now on a Tournament result page I list all users by their total wins (wins is calculated through the results table). First thing first I find all users who have entered a tournament with the query:
User.with_tournament_entrees
With this I can simply loop through the returned users and query each individual record with the following to retrieve each users "Total Wins":
user.results.where("win = true").count()
However I would also like to take this a step further and order all of the users by their "Total Wins", and this is the best I could come up with:
User.with_tournament_entrees.select('SELECT *,
(SELECT count(*)
FROM results
INNER JOIN "registers"
ON "results"."register_id" = "registers"."id"
WHERE "registers"."user_id" = "users.id"
AND (win = true)
) AS total_wins
FROM users ORDER BY total_wins DESC')
I think it's close, but it doesn't actually order by the total_wins in descending order as I instruct it to. I'm using a PostgreSQL database.
Edit:
There's actually three selects taking place, the first occurs on User.with_tournament_entries which just performs a quick filter on the User table. If I ignore that and try
SELECT *, (SELECT count(*) FROM results INNER JOIN "registers" ON "results"."register_id" = "registers"."id" WHERE "registers"."user_id" = "users.id" AND (win = true)) AS total_wins FROM users ORDER BY total_wins DESC;
it fails in both PSQL and the ERB console. I get the error message:
PGError: ERROR: column "users.id" does not exist
I think this happens because the inner-select occurs before the outer-select so it doesn't have access to the user id before hand. Not sure how to give it access to all user ids before than inner select occurs but this isn't an issue when I do User.with_tournament_entires followed by the query.
In your SQL, "users.id" is quoted wrong -- it's telling Postgres to look for a column named, literally, "users.id".
It should be "users"."id", or, just users.id (you only need to quote it if you have a table/column name that conflicts with a postgres keyword, or have punctuation or something else unusual).