Redis: Sorted Set sorted by member ¿? - redis

The Redis ZSET Sorted Set (member, score) sorts the set by the score.
The Redis SET are an unordered collection of unique Strings.
What I need is a method that returns the members in a Sorted Set matching a pattern as in ZRANGEBYLEX but with members with different scores.
Is is possible at all with redis?

Well, it seems I did not know about the SCAN suite. In this case ZSCAN solves this issue, however with cost O(N) where N is the number of items in sorted set because it iterates over the whole set.
Example of elements in:
127.0.0.1:6379> zrange l 0 -1 WITHSCORES
1) "foodgood:irene"
2) "1"
3) "foodgood:pep"
4) "1"
5) "sleep:irene"
6) "1"
7) "sleep:juan"
8) "1"
9) "sleep:pep"
10) "1"
11) "sleep:toni"
12) "1"
13) "foodgood:juan"
14) "2"
Now ZSCAN for those with prefix foodgood:
127.0.0.1:6379> ZSCAN l 0 match foodgood:*
1) "0"
2) 1) "foodgood:irene"
2) "1"
3) "foodgood:pep"
4) "1"
5) "foodgood:juan"
6) "2"
The first returned value "0" if zero indicates collection was completely explored.
What I would have liked is that to be O(log(N)+M) where M is the number of elements retrieved similar to Binary Search Tree.

Related

Is there any way to get Redis keys sorted by number of occurences?

I have this set of keys and values that I need to eventually sort by the number of keys' occurences. I'm aware that Redis isn't suppose to work like this, but hoping there may be some smart workaround*.
Schema requirements:
Allow each key to occur any number of times.
Have each value expire after a certain amount of time (and the key with it).
Keep each full pair unique.
Known constraints:
No inbuilt way to expire values, just keys.
Keys can't be duplicated even when they have different values (or can they?)
Using sets or other methods doesn't allow easy counting either (tried that too...)
So apparently the requirements can only be met by grouping both key and value in the Redis key (while assigning them with null/random values and ttls), like this...
Input keys:
"apples:123"
"oranges:123"
"bananas:456"
"apples:456"
"oranges:789"
"apples:789"
[then maybe another hundred or so such pairs]
Expected output:
apples, oranges, bananas
[or apples(3), oranges(2), bananas(1) – but I'll then ditch the numbers anyway.]
* while it can be done in app's logic, I think it loses in efficiency as it needs to get all data at once and cycle through each item, when all I need is a rather limited subset.
So right now I'd have to do it like this (node.js)...
client.keys('*').then(response => {
let occurences = {}
response.forEach(function (pair){
let fruit = pair.split(':')[0]
occurences[fruit] = (occurences[fruit] || 0) + 1
})
let topfruits = Object.keys(occurences).sort((a, b) => occurences[a] - occurences[b]).reverse().slice(0, 3)
console.log(topfruits)
})
// (client.scan in production, which makes it more complicated and doesn't help that much for this use case)
...migrating from a SQL query that does it in one line:
let topfruits = 'SELECT fruit, number, count (fruit) AS occurences FROM fruits GROUP BY fruit ORDER BY occurences DESC LIMIT 3'
This is a great Redisearch Aggregation problem
you can have multiple rows of fruits and counts
They can be expired by using TTLs (EXPIRE command)
These can all be unique (I used order# but it could be a UUID or some other generated informatio)
127.0.0.1:6379> FT.CREATE fruitIndex ON HASH PREFIX 1 fruit_order: SCHEMA fruit TEXT quantity NUMERIC
OK
127.0.0.1:6379> HSET fruit_order:100 fruit bananas quantity 2
(integer) 2
127.0.0.1:6379> HSET fruit_order:101 fruit bananas quantity 200
(integer) 2
127.0.0.1:6379> HSET fruit_order:103 fruit apples quantity 12
(integer) 2
127.0.0.1:6379> FT.AGGREGATE fruitIndex "*" GROUPBY 1 #fruit REDUCE SUM 1 quantity as totals SORTBY 2 #totals DESC
1) (integer) 3
2) 1) "fruit"
2) "bananas"
3) "totals"
4) "202"
3) 1) "fruit"
2) "oranges"
3) "totals"
4) "25"
4) 1) "fruit"
2) "apples"
3) "totals"
4) "12"
127.0.0.1:6379> EXPIRE fruit_order:101 5
(integer) 1
## Wait 5 seconds and re-run the query and you can see that order drop out
127.0.0.1:6379> FT.AGGREGATE fruitIndex "*" GROUPBY 1 #fruit REDUCE SUM 1 quantity as totals SORTBY 2 #totals DESC
1) (integer) 3
2) 1) "fruit"
2) "oranges"
3) "totals"
4) "25"
3) 1) "fruit"
2) "apples"
3) "totals"
4) "12"
4) 1) "fruit"
2) "bananas"
3) "totals"
4) "2"

redis hscan command cannot limit the counts

my redis version:3.0.2
Hash data as below show.
key name:test
contents(values):
1) "xx1"
2) "1"
3) "xx2"
4) "2"
5) "xx3"
6) "3"
7) "xx4"
8) "4"
9) "xx5"
10)"5"
use commond -->HSCAN test 0 COUNT 2
Redis return every key and value, not the first of 2 keys and values!
COUNT option for SCAN does not limit the number of key-values returned.
It is used to force the command to increase the number key-values returned.
Redis COUNT option doc:
When iterating Sets encoded as intsets (small sets composed of just
integers), or Hashes and Sorted Sets encoded as ziplists (small hashes
and sets composed of small individual values), usually all the
elements are returned in the first SCAN call regardless of the COUNT
value.
So, get first two values from the result of HSCAN test 0 command.

What are the top movies in this Redis sorted set?

I've downloaded the movies from IMDB (movies-list) from here: http://www.imdb.com/interfaces
I want to count how often a given movie appears in the list with help of Redis sorted sets, but I am bit confused of the outcome:
redis 127.0.0.1:6379> zrangebyscore 'movies:title' 5000 +inf WITHSCORES
1) "Countdown"
2) "5254"
3) "The Bold and the Beautiful"
4) "5322"
5) "Days of Our Lives"
6) "5451"
7) "Neighbours"
8) "6442"
9) "The New Price Is Right"
10) "7633"
11) "Coronation Street"
12) "8097"
I would like to have the movie that appears most often at the top. Also, I am bit confused by the score. What does this 5k, 6k, 7k mean?
The script I use for my experiment is a Rake task like this:
task :import do
file = File.new(ENV['file'])
redis = Redis.new
file.each_line do |l|
if l =~ /^"(.*)"/
puts $1
redis.zincrby 'movies:title', 1, $1
end
end
You might want to try using zrevrangebyscore instead of zrangebyscore.
Syntax:
ZREVRANGEBYSCORE key max min [WITHSCORES] [LIMIT offset count]
For example, in your case:
zrangebyscore 'movies:title' +inf 5000 WITHSCORES
Reference

ordered sets in redis: random output in case of score ties

I have an ordered set in Redis (I am actually using a python client https://github.com/andymccurdy/redis-py), for example:
zadd myset 1 key1
zadd myset 1 key2
zadd myset 1 key3
zadd myset 0 key4
Note that 3 keys have the same score.
Using ZRANGE, i would like to get the top 2 entries (i.e lowest scores). "key4" will always be the first result as it has a lower value, but I would like the second return value to be randomly selected between the ties: key1,key2,key3. ZRANGE actually returns the keys in the order they are indexed: "keys1" is always my second result:
zrange myset 0 -1 WITHSCORES
1) "key4"
2) "0"
3) "key1"
4) "1"
5) "key2"
6) "1"
7) "key3"
8) "1"
any idea?
thanks,
J.
As kindly requested by Linus G Thiel, here are more details about my usecase:
I would like to use zsets to perform a simple ranking system. I have a list of items, for each one a score representing the relevance of the item. For the cold start of my system, most of the scores will be identical (i.e 0), and I would like to randomly select among the items having the same score. Otherwise I will always return the exact same lexicographic ordering, which will introduce a bias in the system.
The solution you propose, using one specific set for each duplicated score value will work. I will give it a try.
Thanks,

Redis relationships between data

I have a a list in redis containing a sequence of Ids. Each id is unique for a single object which I am storing as a JSON string on a separate key.
So I have something like:
redis> LRANGE mylist 0 -1
1) "one"
2) "two"
3) "three"
And I have separate keys mylist:one, mylist:two, mylist:three.
I am saving the ids to a list in order to build a simple FIFO queue on my application.
What is the most efficient way to get all the ids in mylist and their matching values from each individual key? Is there a better way to go about it?
The most efficient way is probably to use the SORT command:
# Populate list
rpush mylist one two three
set mylist:one 1
set mylist:two 2
set mylist:three 3
# Retrieve all items with their corresponding values
sort mylist by nosort get # get mylist:*
1) "one"
2) "1"
3) "two"
4) "2"
5) "three"
6) "3"