Order results by weighted variables? - sql

I have a listing of ~10,000 apps and I'd like to order them by certain columns, but I want to give certain columns more "weight" than others.
For instance, each app has overall_ratings and current_ratings. If the app has a lot of overall_ratings, that's worth 1.5, but the number of current_ratings would be worth, say 2, since the number of current_ratings shows the app is active and currently popular.
Right now there are probably 4-6 of these variables I want to take into account.
So, how can I pull that off? In the query itself? After the fact using just Ruby (remember, there are over 10,000 rows that would need to be processed here)? Something else?
This is a Rails 3.2 app.

Sorting 10000 objects in plain Ruby doesn't seem like a good idea, specially if you just want the first 10 or so.
You can try to put your math formula in the query (using the order method from Active Record).
However, my favourite approach would be to create a float attribute to store the score and update that value with a before_save method.
I would read about dirty attributes so you only perform this scoring when some of you're criteria is updated.
You may also create a rake task that re-scores your current objects.
This way you would keep the scoring functionality in Ruby (you could test it easily) and you could add an index to your float attribute so database queries have better performance.

One attempt would be to let the DB do this work for you with some query like: (can not really test it because of laking db schema):
ActiveRecord::Base.connection.execute("SELECT *,
(2*(SELECT COUNT(*) FROM overall_ratings
WHERE app_id = a.id) +
1.5*(SELECT COUNT(*) FROM current_ratings
WHERE app_id = a.id)
AS rating FROM apps a
WHERE true HAVING rating > 3 ORDER BY rating desc")
Idea is to sum the number of ratings found for each current and overall rating with the subqueries for an specific app id and weight them as desired.

Related

Best way for getting users friends top rating with Redis SORTED SET

I have SORTED SET user_id:rating for every level in the game(2000+ levels). There is 2 000 000 users in set.
I need to create 2 ratings - first - all users top 100, second - top 5 friends each player
First can be solved very easily with ZRANGE
But there is a problem with second, because in average - every user has 500 friends
There is 2 ways:
1) I can do 500 requests with ZSCORE\ZRANK and sort users on by backend (too many requests, bad performance)
2) I can create SORTED SET for each user and update it on background on every users update. (more data, more ram, more complex)
May be there are any others options I missed?
I believe your main concern here should be your data model. Does every user have a sorted set of his friends?
I would recommend something like this:
users:{id}:friends values as the ids of friends
users:scoreboard values as the users ids and score as the rating
of each
As an answer to your first concern, you can consider using pipelines, which will reduce the number of requests drastically, none the less you will still need to handle ordering the results.
The better answer for you problem would be, in case you have the two sorted sets as described earlier:
Get the intersection between the two, using the "zinterstore" command and storing the result in a sorted set created solely for this purpose. As a result, the new sorted set will contain all the user's friends ids with their rating as the score (need to be careful here since you will need to specify the score of the new sorted set, it can either be the SUM, MIN or MAX of the scores).
ref: http://redis.io/commands/zinterstore
At this point using a simple "zrevrangebyscore" and specifying a limit, will leverage the sorted result you are looking for.

Rails 3 and mongoid: How would you about sorting / grouping collect to display as a 2 dimension table?

I have User model with many fields and I would like to display a
table as a matrix of 2 of those fields:
- created_at
- type
For the created_at I simply used a group_by as so:
(User.where(:type => "blabla" ).all.group_by { |item|
item.send(:created_at).strftime("%Y-%m-%d") }).sort.each do |
creation_date, users|
This gives me a nice array of all the users per creation_date, so the
lines on my table are ok. However I want to display multiple lines,
each representing the sub selection of the users per type.
So for the moment, I am performing one request per line (per type,
simply replacing the "blabla").
For the moment it's ok because I have
just a few type, but this number will soon increase a lot more, and at
this will not be efficient I am afraid.
Any suggestion on how I could achieve my expected results ?
Thanks,
Alex
The general answer here is to perform a Map / Reduce. Generally, you do not perform the map-reduce in real time due to performance constraints. Instead you run the map-reduce on a schedule and query against the results directly.
Here's a primer on map-reduce for Ruby. Here's another example using Mongoid specifically.

Ruby Rails Complex SQL with aggregate function and DayOfWeek

Rails 2.3.4
I have searched google, and have not found an answer to my dilemma.
For this discussion, I have two models. Users and Entries. Users can have many Entries (one for each day).
Entries have values and sent_at dates.
I want to query and display the average value of entries for a user BY DAY OF WEEK. So if a user has entered values for, say, the past 3 weeks, I want to show the average value for Sundays, Mondays, etc. In MySQL, it is simple:
SELECT DAYOFWEEK(sent_at) as day, AVG(value) as average FROM entries WHERE user_id = ? GROUP BY 1
That query will return between 0 and 7 records, depending upon how many days a user has had at least one entry.
I've looked at find_by_sql, but while I am searching Entry, I don't want to return an Entry object; instead, I need an array of up to 7 days and averages...
Also, I am concerned a bit about the performance of this, as we would like to load this to the user model when a user logs in, so that it can be displayed on their dashboard. Any advice/pointers are welcome. I am relatively new to Rails.
You can query the database directly, no need to use an actual ActiveRecord object. For example:
ActiveRecord::Base.connection.execute "SELECT DAYOFWEEK(sent_at) as day, AVG(value) as average FROM entries WHERE user_id = #{user.id} GROUP BY DAYOFWEEK(sent_at);"
This will give you a MySql::Result or MySql2::Result that you can then use each or all on this enumerable, to view your results.
As for caching, I would recommend using memcached, but any other rails caching strategy will work as well. The nice benefit of memcached is that you can have your cache expire after a certain amount of time. For example:
result = Rails.cache.fetch('user/#{user.id}/averages', :expires_in => 1.day) do
# Your sql query and results go here
end
This would put your results into memcached for one day under the key 'user//averages'. For example if you were user with id 10 your averages would be in memcached under 'user/10/average' and the next time you went to perform this query (within the same day) the cached version would be used instead of actually hitting the database.
Untested, but something like this should work:
#user.entries.select('DAYOFWEEK(sent_at) as day, AVG(value) as average').group('1').all
NOTE: When you use select to specify columns explicitly, the returned objects are read only. Rails can't reliably determine what columns can and can't be modified. In this case, you probably wouldn't try to modify the selected columns, but you can'd modify your sent_at or value columns through the resulting objects either.
Check out the ActiveRecord Querying Guide for a breakdown of what's going on here in a fairly newb-friendly format. Oh, and if that query doesn't work, please post back so others that may stumble across this can see that (and I can possibly update).
Since that won't work due to entries returning an array, we can try using join instead:
User.where(:user_id => params[:id]).joins(:entries).select('...').group('1').all
Again, I don't know if this will work. Usually you can specify where after joins, but I haven't seen select combined in there. A tricky bit here is that the select is probably going to eliminate returning any data about the user at all. It might make more sense just to eschew find_by_* methods in favor of writing a method in the Entry model that just calls your query with select_all (docs) and skips the association mapping.

Position of object in database

I have got model Team and I've got (i.e.) team = Team.first :offset => 20. Now I need to get number of position of my team in db table.
I can do it in ruby:
Team.all.index team #=> 20
But I am sure that I can write it on SQL and it will be less expensive for me with big tables.
Assuming the order is made by ID desc:
class Team < ActiveRecord::Base
def position
self.class.count(:conditions => ['id <= ?', self.id])
end
end
I'm not sure I'm following along, but will position value be used a lot it might be worth making sure it is set on save. Will a team change position over time or is it static?
I don't think the position of a record in the database is one of the things a database really cares for, that kind of data should be up to you to set.
Depending on the database you may be able to use the LIMIT clause, but you can't use it to find the position of a certain row. Just to select rows number 20 to 21 for example, but it'll differ depending on sorting order.
SELECT * FROM teams LIMIT 20,21
I think setting a position attribute on your model or having a join model is the way to go.

HABTM finds with "AND" joins, NOT "OR"

I have two models, associated with a HABTM (actually using has_many :through on both ends, along with a join table). I need to retrieve all ModelAs that is associated with BOTH of two ModelBs. I do NOT want all ModelAs for ModelB_1 concatenated with all ModelAs for ModelB_2. I literally want all ModelAs that are associated with BOTH ModelB_1 and ModelB_2. It is not limited to only 2 ModelBs, it may be up to 50 ModelBs, so this must scale.
I can describe the problem using a variety of analogies, that I think better describes my problem than the previous paragraph:
* Find all books that were written by all 3 authors together.
* Find all movies that had the following 4 actors in them.
* Find all blog posts that belonged to BOTH the Rails and Ruby categories for each post.
* Find all users that had all 5 of the following tags: funny, thirsty, smart, thoughtful, and quick. (silly example!)
* Find all people that have worked in both San Francisco AND San Jose AND New York AND Paris in their lifetimes.
I've thought of a variety of ways to accomplish this, but they're grossly inefficient and very frowned upon.
Taking an analogy above, say the last one, you could do something like query for all the people in each city, then find items in each array that exist across each array. That's a minimum of 5 queries, all the data of those queries transfered back to the app, then the app has to intensively compare all 5 arrays to each other (loops galore!). That's nasty, right?
Another possible solution would be to chain the finds on top of each other, which would essentially do the same as above, but won't eliminate the multiple queries and processing. Also, how would you dynamicize the chain if you had user submitted checkboxes or values that could be as high as 50 options? Seems dirty. You'd need a loop. And again, that would intensify the search duration.
Obviously, if possible, we'd like to have the database perform this for us, so, people have suggested to me that I simply put multiple conditions in. Unfortunately, you can only do an OR with HABTM typically.
Another solution I've run across is to use a search engine, like sphinx or UltraSphinx. For my particular situation, I feel this is overkill, and I'd rather avoid it. I still feel there should be a solution that will let a user craft a query for an arbitrary number of ModelBs and find all ModelAs.
How would you solve this problem?
You may do this:
build a query from your ModelA, joining ModelB (through the join model), filtering the ModelBs that have one of the values that you are looking for, that is putting them in OR (i.e. where ModelB = 'ModelB_1' or ModelB = 'ModelB_2'). With this query the result set will have multiple 'ModelA' rows, exactly one row for each ModelB condition satisfied.
add a group by condition to the query on the ModelA columns you need (even all of them if you wish). The count() for each row is equal to the number of ModelB conditions satisfied*.
add a 'having' condition selecting only the rows whose count(*) is equal to the number of ModelB conditions you need to have satisfied
example:
model_bs_to_find = [100, 200]
ModelA.all( :joins=>{:model_a_to_b=>:model_bs},
:group=>"model_as.id",
:select=>"model_as.*",
:conditions=>["model_bs.id in (?)", model_bs_to_find],
:having=>"count(*)=#{model_bs_to_find.size}")
N.B. the group and select parameters specified in that way will work in MySQL, the standard SQL way to do so would be to put the whole list of model_as columns in both the group and select parameters.