How to solve a Redis sorted set draw? - redis

I'm using Redis to handle users points rankings, I gotta:
Store users points
Obtain users ranking positions
So I'm using zincrby to update ranking position and then I'm using zrevrangebyscore for the top list, zscore and zrevrank to obtain all what I need.
So, when a draw case is present (and I actually have lots of that) I can't trust in Redis criteria to sort.
My winning criteria for draw cases is Date, where oldest is first. These are MongoDB ids which I'm storing, so I could actually retrieve the date from _id.
So, if I want to know the actual rank for a user
Obtain the points user achieved, maybe using zrevrank.
Obtain all the users with same points.
Consider upper and lower bounds.
Order users with same points by date obtained from Mongo ID.
Assign a position according to the rest of same points draw and the upper a and lower neighbors.
I will code all this with the default node.js driver, so it's Javascript code which we're talking about.
How can any Redis command help me to achieve this?

Since a Sorted Set's score can be a floating point value, uou can store a combination of the timestamp and the ranking in it and use the decimal point as your "delimiter". That will give you both the ranking that's also based on the date.
Example, if my ranking is 50 and the time stamp now is 1438594593, the score in the set would be 50.856140541 according to the following "formula":
score = ranking + (1 - timestamp / 10^10)

Related

RavenDB -- More Like This -- Need a (similarity) metric; not just rank-orders

I have a RavenDB / 'More Like This' example running (C#) as per
Creating more like this in RavenDB
However, in addition to receiving similar documents back, I really need some measure of similarity back for those documents.
I am assuming (correctly?) that the order in which I get the similar documents back represents the rank-order scores of the documents' similarities (first one back has the highest similarity, second one back has the second highest similarity, etc.).
However, rather than rank orders I need the metric similarity results. This assumes (of course) that the rank orders are computed from a more continuous metric; e.g., tf-idf. If that is true, can I get a hold of those metric scores?
When using MoreLikeThis, you can issue a query such as the following:
from index 'Product/Search'
where morelikethis(id() = 'products/1-A')
And assuming you have setup the TermVector on the index properly, you'll get the results.
In the metadata of the results, you have the index score, which is what I think you are looking for.

Redis georadius but with different sorting order

I am using redis to store and fetch interesting info around the user and show it as a feed.
Lets say I need to fetch all listings with in a given radius R (WITHDISTance) BUT sorted in a reverse chronological order and NOT by distance (as with redis GEORADIUS command). To be more specific, the most recent listing (with in radius R) should be at the top even though it is the farthest of all.
Is it somehow possible to do this with geoset alone ? Else how can I achieve this using some combo of redis datastructures ?
Looking for some clean and efficient approaches
You'll need to do the radius query and intersect the results with another Sorted Set that has the same elements but where the scores are timestamp. Then, page the resulting intersect in reverse order.

Best way for getting users friends top rating with Redis SORTED SET

I have SORTED SET user_id:rating for every level in the game(2000+ levels). There is 2 000 000 users in set.
I need to create 2 ratings - first - all users top 100, second - top 5 friends each player
First can be solved very easily with ZRANGE
But there is a problem with second, because in average - every user has 500 friends
There is 2 ways:
1) I can do 500 requests with ZSCORE\ZRANK and sort users on by backend (too many requests, bad performance)
2) I can create SORTED SET for each user and update it on background on every users update. (more data, more ram, more complex)
May be there are any others options I missed?
I believe your main concern here should be your data model. Does every user have a sorted set of his friends?
I would recommend something like this:
users:{id}:friends values as the ids of friends
users:scoreboard values as the users ids and score as the rating
of each
As an answer to your first concern, you can consider using pipelines, which will reduce the number of requests drastically, none the less you will still need to handle ordering the results.
The better answer for you problem would be, in case you have the two sorted sets as described earlier:
Get the intersection between the two, using the "zinterstore" command and storing the result in a sorted set created solely for this purpose. As a result, the new sorted set will contain all the user's friends ids with their rating as the score (need to be careful here since you will need to specify the score of the new sorted set, it can either be the SUM, MIN or MAX of the scores).
ref: http://redis.io/commands/zinterstore
At this point using a simple "zrevrangebyscore" and specifying a limit, will leverage the sorted result you are looking for.

Redis: Maximum score size for sorted sets? Score + Unique ids = Unique Scores?

I'm using timestamps as the score. I want to prevent duplicates by appending a unique object-id to the score. Currently, this id is a 6 digit number (the highest id right now is 221849), but it is expected to increase over a million. So, the score will be something like
1407971846221849 (timestamp:1407971846 id:221849) and will eventually reach 14079718461000001 (timestamp:1407971846 id:1000001).
My concern is not being able to store scores because they've reached the max allowed.
I've read the docs, but I'm a bit confused. I know, basic math. But bear with me, I want to get this right.
Redis sorted sets use a double 64-bit floating point number to represent the score. In all the architectures we support, this is represented as an IEEE 754 floating point number, that is able to represent precisely integer numbers between -(2^53) and +(2^53) included. In more practical terms, all the integers between -9007199254740992 and 9007199254740992 are perfectly representable. Larger integers, or fractions, are internally represented in exponential form, so it is possible that you get only an approximation of the decimal number, or of the very big integer, that you set as score.
There's another thing bothering me right now. Would the increase in ids break the chronological sort sequence ?
I will appreciate any insights, suggestions, different prespectives or flat out if what I'm trying to do is non-sense.
Thanks for any help.
No, it won't break the "chronological" order, but you may loose the precision of the last digits, so two members may end up having the same score (i.e. non-unique).
There is no problem with duplicate scores. It is just maintaining a sorted set in memory. Members are unique but the scores may be the same. If you want chronological processing I would just rely on the timestamp without adding an id to it.
Appending an id would break the chronological sort if your ids are mixed such that you could have timestamps 1, 2, 3 (simple example) and ids 100, 10, 1, you won't get the correct sort. If your ids will always be added monotonically then you should just use the id as the score.

How to normalize Lucene scores?

I need to normalize the Lucene scores between 0 and 1.
For example, a random query returns the following scores...
8.864665
2.792687
2.792687
2.792687
2.792687
0.49009037
0.33730242
0.33730242
0.33730242
0.33730242
What's the biggest score ? 10.0 ?
thanks
You can divide all scores with the maximum score to get scores between 0 and 1.
However, please note that the normalised scores should be used to compare the results of a single query only. It is not correct to compare the scores (normalised or not) of results from 2 different queries.
There is no good standard way to normalize scores with lucene. Read this: ScoresAsPercentages and this explanation
In your case the highest score is the score of the first result, if the results are sorted by score. But this score will be different for every other query.
See also how-do-i-normalise-a-solr-lucene-score
There is no maximum score in Solr, it depends on too many variables, so it can't be predicted.
But you can implement something called normalized score (Scores As Percentages) which is not recommended.
See related links for more details:
Is it possible to set a Solr Score threshold 'reasonably', independent of results returned? (i.e. Is Solr Scoring standardized in any way)
how do I normalise a solr/lucene score?
Remove results below a certain score threshold in Solr/Lucene?
A regular normalization will only help you to compare the scoring distribution among queries (and theirs retrieved lists).
You cannot simply normalize the score to compare the performance between queries.
Think of a query which all retrieved documents are highly relevant and received the same (high score), and on another query that the retrieved list comprise barley relevant document (again, with the same score) - now, no matter the per-query normalization you make - the normalized score will be the same.
You need to think on a cross-query factor that can bring all the scores to the same level.
For example - maybe computing similarity between the query and the whole index, and use that score somehow along with the document-score
If you want to compare two or more queries, i found an workaround.
You can compare your highest scored document with your queryterm using the LevenstheinDistance or LuceneLevenstheinDistance(Damerau) class to get the distance between your queryterm and your result. The result is the similiarity between them. Do this for each query you want to compare against. Now you have a tool to compare your queries using the similiarity of your querytherm and your highest result. You can now choose the query with the highest score of similiarity and use this for next proper actions.
//Damerau LevenstheinDistance
LuceneLevenshteinDistance d = new LuceneLevenshteinDistance();
similiarity = d.getDistance(queryterm, yourResult );
I applied a non-linearity function in order to compress every queries.