What are the top movies in this Redis sorted set? - redis

I've downloaded the movies from IMDB (movies-list) from here: http://www.imdb.com/interfaces
I want to count how often a given movie appears in the list with help of Redis sorted sets, but I am bit confused of the outcome:
redis 127.0.0.1:6379> zrangebyscore 'movies:title' 5000 +inf WITHSCORES
1) "Countdown"
2) "5254"
3) "The Bold and the Beautiful"
4) "5322"
5) "Days of Our Lives"
6) "5451"
7) "Neighbours"
8) "6442"
9) "The New Price Is Right"
10) "7633"
11) "Coronation Street"
12) "8097"
I would like to have the movie that appears most often at the top. Also, I am bit confused by the score. What does this 5k, 6k, 7k mean?
The script I use for my experiment is a Rake task like this:
task :import do
file = File.new(ENV['file'])
redis = Redis.new
file.each_line do |l|
if l =~ /^"(.*)"/
puts $1
redis.zincrby 'movies:title', 1, $1
end
end

You might want to try using zrevrangebyscore instead of zrangebyscore.
Syntax:
ZREVRANGEBYSCORE key max min [WITHSCORES] [LIMIT offset count]
For example, in your case:
zrangebyscore 'movies:title' +inf 5000 WITHSCORES
Reference

Related

Is there any way to get Redis keys sorted by number of occurences?

I have this set of keys and values that I need to eventually sort by the number of keys' occurences. I'm aware that Redis isn't suppose to work like this, but hoping there may be some smart workaround*.
Schema requirements:
Allow each key to occur any number of times.
Have each value expire after a certain amount of time (and the key with it).
Keep each full pair unique.
Known constraints:
No inbuilt way to expire values, just keys.
Keys can't be duplicated even when they have different values (or can they?)
Using sets or other methods doesn't allow easy counting either (tried that too...)
So apparently the requirements can only be met by grouping both key and value in the Redis key (while assigning them with null/random values and ttls), like this...
Input keys:
"apples:123"
"oranges:123"
"bananas:456"
"apples:456"
"oranges:789"
"apples:789"
[then maybe another hundred or so such pairs]
Expected output:
apples, oranges, bananas
[or apples(3), oranges(2), bananas(1) – but I'll then ditch the numbers anyway.]
* while it can be done in app's logic, I think it loses in efficiency as it needs to get all data at once and cycle through each item, when all I need is a rather limited subset.
So right now I'd have to do it like this (node.js)...
client.keys('*').then(response => {
let occurences = {}
response.forEach(function (pair){
let fruit = pair.split(':')[0]
occurences[fruit] = (occurences[fruit] || 0) + 1
})
let topfruits = Object.keys(occurences).sort((a, b) => occurences[a] - occurences[b]).reverse().slice(0, 3)
console.log(topfruits)
})
// (client.scan in production, which makes it more complicated and doesn't help that much for this use case)
...migrating from a SQL query that does it in one line:
let topfruits = 'SELECT fruit, number, count (fruit) AS occurences FROM fruits GROUP BY fruit ORDER BY occurences DESC LIMIT 3'
This is a great Redisearch Aggregation problem
you can have multiple rows of fruits and counts
They can be expired by using TTLs (EXPIRE command)
These can all be unique (I used order# but it could be a UUID or some other generated informatio)
127.0.0.1:6379> FT.CREATE fruitIndex ON HASH PREFIX 1 fruit_order: SCHEMA fruit TEXT quantity NUMERIC
OK
127.0.0.1:6379> HSET fruit_order:100 fruit bananas quantity 2
(integer) 2
127.0.0.1:6379> HSET fruit_order:101 fruit bananas quantity 200
(integer) 2
127.0.0.1:6379> HSET fruit_order:103 fruit apples quantity 12
(integer) 2
127.0.0.1:6379> FT.AGGREGATE fruitIndex "*" GROUPBY 1 #fruit REDUCE SUM 1 quantity as totals SORTBY 2 #totals DESC
1) (integer) 3
2) 1) "fruit"
2) "bananas"
3) "totals"
4) "202"
3) 1) "fruit"
2) "oranges"
3) "totals"
4) "25"
4) 1) "fruit"
2) "apples"
3) "totals"
4) "12"
127.0.0.1:6379> EXPIRE fruit_order:101 5
(integer) 1
## Wait 5 seconds and re-run the query and you can see that order drop out
127.0.0.1:6379> FT.AGGREGATE fruitIndex "*" GROUPBY 1 #fruit REDUCE SUM 1 quantity as totals SORTBY 2 #totals DESC
1) (integer) 3
2) 1) "fruit"
2) "oranges"
3) "totals"
4) "25"
3) 1) "fruit"
2) "apples"
3) "totals"
4) "12"
4) 1) "fruit"
2) "bananas"
3) "totals"
4) "2"

Redis: Sorted Set sorted by member ¿?

The Redis ZSET Sorted Set (member, score) sorts the set by the score.
The Redis SET are an unordered collection of unique Strings.
What I need is a method that returns the members in a Sorted Set matching a pattern as in ZRANGEBYLEX but with members with different scores.
Is is possible at all with redis?
Well, it seems I did not know about the SCAN suite. In this case ZSCAN solves this issue, however with cost O(N) where N is the number of items in sorted set because it iterates over the whole set.
Example of elements in:
127.0.0.1:6379> zrange l 0 -1 WITHSCORES
1) "foodgood:irene"
2) "1"
3) "foodgood:pep"
4) "1"
5) "sleep:irene"
6) "1"
7) "sleep:juan"
8) "1"
9) "sleep:pep"
10) "1"
11) "sleep:toni"
12) "1"
13) "foodgood:juan"
14) "2"
Now ZSCAN for those with prefix foodgood:
127.0.0.1:6379> ZSCAN l 0 match foodgood:*
1) "0"
2) 1) "foodgood:irene"
2) "1"
3) "foodgood:pep"
4) "1"
5) "foodgood:juan"
6) "2"
The first returned value "0" if zero indicates collection was completely explored.
What I would have liked is that to be O(log(N)+M) where M is the number of elements retrieved similar to Binary Search Tree.

Redis - ZRANGEBYSCORE with key matching a regex

I'm trying to get the value of the best key in a sorted set.
This is my query at the moment:
ZREVRANGEBYSCORE genre1 +inf -inf WITHSCORES LIMIT 0 1
This is an example of an add in my set:
ZADD "genre1|genre2|genre3" 3.25153 "film"
I'd like to use the query in a way like this
ZREVRANGEBYSCORE *genre1* +inf -inf WITHSCORES LIMIT 0 1
to match keys containing "...|genre1|..." and not only keys like "genre1".
Any help will be appreciated
This can be accomplished in two or three steps:
1) Use SCAN or KEYS to find the keys matching your pattern.
SCAN 0 MATCH "*genre1*"
1) "9"
2) 1) "genre1|genre2|genre3"
2) "genre1|genre4"
2) For each key, use TYPE to test if it is a Sorted Set. This is only important if you may have other genre1 keys on the db
TYPE "genre1|genre4"
zset
3) Run your ZREVRANGEBYSCORE <key> +inf -inf WITHSCORES LIMIT 0 1 for each key.
See this answer on how you can SCAN for a given type. You can modify the Lua script to include the ZREVRANGEBYSCORE and get your results atomically on a single call.
Finally, consider reviewing if storing the genre combinations is optimal in your case. You may use a sorted set per genre, and then use ZUNIONSTORE or ZINTERSTORE to get scored combinations.

Redis - getting 5 elements at a time from a sorted set

I have a sorted set that keeps growing in real time and it contains some ID's which I want to retrieve 5 at a time in reverse order of rank. This is basically to implement pagination. These ID's are keys to a Hashmap. Is there any way to get 5 elements at a time efficiently using redis ZSet operations?
For example, in the Sorted Set below, let's say I want to get 5 elements before "572c7d87e53156245a3fd167", how could I do that given that new ID's could keep getting added after my last element in run time? The expected result should give me the ID's 572c7c58e53156245a3fd166, 572c7ad2e53156245a3fd165, 572c746e1eeba6b059b08f1b, 572c74531eeba6b059b08f1a, and 572c6fc9612ad65757cca4f9.
1) "572b58c0dd319a1a4703eba8"
2) "1462429760.8629999"
3) "572c697e612ad65757cca4f7"
4) "1462499582.6889999"
5) "572c6a8e612ad65757cca4f8"
6) "1462499854.056"
7) "572c6fc9612ad65757cca4f9"
8) "1462501193.927"
9) "572c74531eeba6b059b08f1a"
10) "1462502355.5250001"
11) "572c746e1eeba6b059b08f1b"
12) "1462502382.313"
13) "572c7ad2e53156245a3fd165"
14) "1462504018.325"
15) "572c7c58e53156245a3fd166"
16) "1462504408.1370001"
17) "572c7d87e53156245a3fd167"
18) "1462504711.4200001"
19) "572c7da3e53156245a3fd168"
20) "1462504739.352"
One option is to look at ZRANGEBYLEX or ZRANGEBYSCORE and use the offset/count.
However what I usually do for pagination is create a new list (kind of a snapshot of the original list), that doesn't change dynamically, and load data from there. That way it doesn't feel like chasing a moving target. I just set a TTL to it and forget about it.

ordered sets in redis: random output in case of score ties

I have an ordered set in Redis (I am actually using a python client https://github.com/andymccurdy/redis-py), for example:
zadd myset 1 key1
zadd myset 1 key2
zadd myset 1 key3
zadd myset 0 key4
Note that 3 keys have the same score.
Using ZRANGE, i would like to get the top 2 entries (i.e lowest scores). "key4" will always be the first result as it has a lower value, but I would like the second return value to be randomly selected between the ties: key1,key2,key3. ZRANGE actually returns the keys in the order they are indexed: "keys1" is always my second result:
zrange myset 0 -1 WITHSCORES
1) "key4"
2) "0"
3) "key1"
4) "1"
5) "key2"
6) "1"
7) "key3"
8) "1"
any idea?
thanks,
J.
As kindly requested by Linus G Thiel, here are more details about my usecase:
I would like to use zsets to perform a simple ranking system. I have a list of items, for each one a score representing the relevance of the item. For the cold start of my system, most of the scores will be identical (i.e 0), and I would like to randomly select among the items having the same score. Otherwise I will always return the exact same lexicographic ordering, which will introduce a bias in the system.
The solution you propose, using one specific set for each duplicated score value will work. I will give it a try.
Thanks,