Redis ZRANGEBYSCORE returning empty set - redis

this is probably something idiotic..
doing this in redis console
zincrby model 1 20140101
zincrby model 1 20141010
zincrby model 1 20141010
why does this work
zrangebyscore model 00000000 99999999 withscores
1) "20140101"
2) "1"
3) "20141010"
4) "2"
but this doesn't
zrangebyscore model 20140000 20149999 withscores
#> (empty list or set)

ZRANGEBYSCORE is for score ranges lookups and you're using your members instead (in the 3rd snippet). Since 1, 2 << 20140000, 20149999 you're getting nothing back.
EDIT after some comments back n forth
Generally, you need to make a decision about the space/time tradeoff, i.e. more RAM and less CPU or vice versa, and that actually depends on your performance & data size requirements. Usually I'd try to use a sorted set for each model/event that being tracked per aggregation level needed. Key expiration is useful, but sometimes manually removing members from sorted sets is also needed.
IIUC, you only need per-model daily counters so following your initial design, my "schema" would probably be:
Sorted set key name pattern: <model>:daily
|
+- Member value: <day timestamp at 12AM UTC>
+- Member score: <count>
Use ZINCRBY to increment today's hits:
ZINCRBY <model>:daily 1 <today's timestamp at 12AM UTC>
Get a date's hits:
ZSCORE <model>:daily <date timestamp at 12AM UTC>
Notes:
You can't easily do date ranges with this approach as your scores keep the counts. You'll basically need to do multiple ZSCOREs (O(log(N))), looping through each date in the range.
You can keep additional rolling or static aggregates to speed up commonly accessed ranges.
You'll have to manually "expire" the older set members for housekeeping.
An alternative approach that allows ranges is to have the following in place:
Sorted set key name pattern: <model>:daily
|
+- Member value: <day timestamp at 12AM UTC>:<count>
+- Member score: 0
Here, you can use ZRANGEBYLEX to get a range of dates, but since the timestamp and count are concatenated you'll have to do a little processing client-side or with Lua to get the count (ZSCORE will always return 0) or to increment it (you can't use ZINCRBY anymore).

Related

Redis `SCAN`: how to maintain a balance between newcomming keys that might match and ensure eventual result in a reasonable time?

I am not that familiar with Redis. At the moment I am designing some realtime service and I'd like to rely on it. I expect ~10000-50000 keys per minute to be SET with some reasonable EX and match over them using SCAN rarely enough not to bother with performance bottlenecks.
The thing I doubt is "in/out rate" and possible overflooding with keys that might match some SCAN query and thus it never terminates (i.e. always replies with latest cursor position and forces you to continue; that could happen easily if one consumes x items per second and there are x + y items per second coming in with y > 0).
Obviously, I could set desired SCAN size long enough; but I wonder if there exists a better solution or does Redis itself guarantees that SCAN will grow size automatically in such a case?
First some context, solution at the end:
From SCAN command > Guarantee of termination
The SCAN algorithm is guaranteed to terminate only if the size of the
iterated collection remains bounded to a given maximum size, otherwise
iterating a collection that always grows may result into SCAN to never
terminate a full iteration.
This is easy to see intuitively: if the collection grows there is more
and more work to do in order to visit all the possible elements, and
the ability to terminate the iteration depends on the number of calls
to SCAN and its COUNT option value compared with the rate at which the
collection grows.
But in The COUNT option it says:
Important: there is no need to use the same COUNT value for every
iteration. The caller is free to change the count from one iteration
to the other as required, as long as the cursor passed in the next
call is the one obtained in the previous call to the command.
Important to keep in mind, from Scan guarantees:
A given element may be returned multiple times. It is up to the
application to handle the case of duplicated elements, for example
only using the returned elements in order to perform operations that
are safe when re-applied multiple times.
Elements that were not
constantly present in the collection during a full iteration, may be
returned or not: it is undefined.
The key to a solution is in the cursor itself. See Making sense of Redis’ SCAN cursor. It is possible to deduce the percent of progress of your scan because the cursor is really the bits-reversed of an index to the table size.
Using DBSIZE or INFO keyspace command you can get how many keys you have at any time:
> DBSIZE
(integer) 200032
> info keyspace
# Keyspace
db0:keys=200032,expires=0,avg_ttl=0
Another source of information is the undocumented DEBUG htstats index, just to get a feeling:
> DEBUG htstats 0
[Dictionary HT]
Hash table 0 stats (main hash table):
table size: 262144
number of elements: 200032
different slots: 139805
max chain length: 8
avg chain length (counted): 1.43
avg chain length (computed): 1.43
Chain length distribution:
0: 122339 (46.67%)
1: 93163 (35.54%)
2: 35502 (13.54%)
3: 9071 (3.46%)
4: 1754 (0.67%)
5: 264 (0.10%)
6: 43 (0.02%)
7: 6 (0.00%)
8: 2 (0.00%)
[Expires HT]
No stats available for empty dictionaries
The table size is the power of 2 following your number of keys:
Keys: 200032 => Table size: 262144
The solution:
We will calculate a desired COUNT argument for every scan.
Say you will be calling SCAN with a frequency (F in Hz) of 10 Hz (every 100 ms) and you want it done in 5 seconds (T in s). So you want this finished in N = F*T calls, N = 50 in this example.
Before your first scan, you know your current progress is 0, so your remaining percent is RP = 1 (100%).
Before every SCAN call (or every given number of calls that you want to adjust your COUNT if you want to save the Round Trip Time (RTT) of a DBSIZE call), you call DBSIZE to get the number of keys K.
You will use COUNT = K*RP/N
For the first call, this is COUNT = 200032*1/50 = 4000.
For any other call, you need to calculate RP = 1 - ReversedCursor/NextPowerOfTwo(K).
For example, let say you have done 20 calls already, so now N = 30 (remaining number of calls). You called DBSIZE and got K = 281569. This means NextPowerOfTwo(K) = 524288, this is 2^19.
Your next cursor is 14509 in decimal = 000011100010101101 in binary. As the table size is 2^19, we represent it with 18 bits.
You reverse the bits and get 101101010001110000 in binary = 185456 in decimal. This means we have covered 185456 out of 524288. And:
RP = 1 - ReversedCursor/NextPowerOfTwo(K) = 1 - 185456 / 524288 = 0.65 or 65%
So you have to adjust:
COUNT = K*RP/N = 281569 * 0.65 / 30 = 6100
So in your next SCAN call you use 6100. Makes sense it increased because:
The amount of keys has increased from 200032 to 281569.
Although we have only 60% of our initial estimate of calls remaining, progress is behind as 65% of the keyspace is pending to be scanned.
All this was assuming you are getting all keys. If you're pattern-matching, you need to use the past to estimate the remaining amount of keys to be found. We add as a factor PM (percent of matches) to the COUNT calculation.
COUNT = PM * K*RP/N
PM = keysFound / ( K * ReversedCursor/NextPowerOfTwo(K))
If after 20 calls, you have found only keysFound = 2000 keys, then:
PM = 2000 / ( 281569 * 185456 / 524288) = 0.02
This means only 2% of the keys are matching our pattern so far, so
COUNT = PM * K*RP/N = 0.02 * 6100 = 122
This algorithm can probably be improved, but you get the idea.
Make sure to run some benchmarks on the COUNT number you'll use to start with, to measure how many milliseconds is your SCAN taking, as you may need to moderate your expectations about how many calls you need (N) to do this in a reasonable time without blocking the server, and adjust your F and T accordingly.

Redis sorted set leader board ranking on same score

I'm using Redis sorted set to implement the leaderboard of my game, where I show the user ranking in descending order. I'm stuck in a case where two or more users have the same score. So in this case, I want the higher ranking of the user who gets the score first. For example, I'm adding the following entries in Redis.
127.0.0.1:6379> zadd testing-key 5 a
(integer) 1
127.0.0.1:6379> zadd testing-key 4 b
(integer) 1
127.0.0.1:6379> zadd testing-key 5 c
(integer) 1
and when I'm querying for the rank in reverse order, I'm getting this
127.0.0.1:6379> zrevrange testing-key 0 10
1) "c"
2) "a"
3) "b"
but in my case, the ranking should be like
1) "a"
2) "c"
3) "b"
So is there any provision in Redis to give higher precedence to the entity which entered first in the set with the same score?
I found one solution to this problem. In my case, the score is an integer so I converted it into decimal and added Long.MAX_VALUE - System.nanoTime() after decimal. So the final score code will be like
double finalScore = score.(Long.MAX_VALUE - System.nanoTime());
So the final score of the player who scored first would be higher than the second one. Please let me know if you have any better solution.
If your leaderboard's scores are "small" enough, you may get away with using a combination of the score and the timestamp (e.g. 123.111455234, where 123 is the score). However, since the Sorted Set score is a double floating point, you may lose precision.
Alternatively, keep two Sorted Sets - one with each player's leaderboard score and the other with each player's score timestamp, and use both to determine the order.
Or, use a single sorted set for the leader board, encode the timestamp as part of the member and rely on lexicographical ordering.

Checking if IP falls within a range with Redis

I am interested in using Redis to check if a IP address (converted into integer) falls within a range of IPs. It is very likely that the ranges will overlap.
I have found this question/answer, although I am not able to fully understand the logic behind it.
Thank you for your help!
EDIT - Since I got a downvote (a comment to explain why would be nice), I've removed some clutter from my answer.
#DidierSpezia answer in your linked question is a good answer, but it becomes hard to maintain if you are adding/removing ranges.
However it is not trivial (and expensive) to build and maintain it.
I have an answer that is easier to maintain, but it could get slow and memory expensive to compute with many ranges as it requires cloning a set of all ranges.
You need to save all ranges twice, in two sets. The score of each range will be its border values.
Going with the sets in #DidierSpezia example:
A 2-8
B 4-6
C 2-9
D 7-10
Your two sets will be:
ZADD ranges:low 2 "2-8" 4 "4-6" 2 "2-9" 7 "7-10"
ZADD ranges:high 8 "2-8" 6 "4-6" 9 "2-9" 10 "7-10"
To query to which ranges a value belongs, you need to trim the ranges that the lower border is higher than the queried value, and trim the ranges that the higher border is lower.
The most efficient way I can think of is cloning one of the sets, trimming one of it sides by the rules gave above, changing the scores of the ranges to reflect the other border and then trim the second side.
Here's how to find the ranges 5 belongs to:
ZUNIONSTORE tmp 1 ranges:low
ZREMRANGEBYSCORE tmp (5 +inf
ZINTERSTORE tmp 2 tmp ranges:high WEIGHTS 0 1
ZREMRANGEBYSCORE tmp -inf (5
ZRANGE tmp 0 -1
In this discussion, Dvir Volk and #antirez suggested to use a sorted set in which each entry represent a range, and has the following form:
Member = "min-max" range
Score = max value
For example:
ZADD z 10 "0-10"
ZADD z 20 "10-20"
ZADD z 100 "50-100"
And in order to check if a value falls within a range, you can use ZRANGEBYSCORE and parse the member returned.
For example, to check value 5:
ZRANGEBYSCORE z 5 +inf LIMIT 0 1
this will return the "0-10" member, and you only need to parse the string and validate if your value is in between.
To check value 25:
ZRANGEBYSCORE z 25 +inf LIMIT 0 1
will return "50-100", but the value is not between that range.

option for lexicographical order in zrange?

When i add a score for a key using zincrby, it increases the score and puts the element in lexicographical order.
Can i get this list in the order, in which the elements are updated or added ?
e.g>
If I execute
zincrby A 100 g
zincrby A 100 a
zincrby A 100 z
and then
zrange A 0 -1
then the result is
a->g->z
where, i want the result in order the entries are made so,
g->a->z
As score is same for all, redis is placing the elements in lexicographical order. Is there any way to prevent it ?
I don't think it is possible, but if you want to keep the order of insertion with scores, you should manipulate something like this:
<score><timestamp>
instead of
<score>
You will have to define a good time record (millis should be ok). Then you can use
zincrby A 100 * (10^nbdigitsformillis)
For instance:
Score = 100 and timestamps is 1381377600 seconds
That gives: 1001381377600
You incr by 200 the score: 1001381377600 + 200 * 10 = 3001381377600
Be careful with zset as it stores scores with double values (64 bits, but only 52 available for int value) so don't store more than 15-17 digits.
If you can't do that (need for great timestamp precision, and great score precision), you will have to manage two zsets (one for actual score, one for timestamp) and managing your ranking manual with the two values.

Get all members in Sorted Set

I have a Sorted set and want to get all members of set. How to identify a max/min score for command :
zrange key min max
?
You're in luck, as zrange does not take scores, but indices. 0 is the first index, and -1 will be interpreted as the last index:
zrange key 0 -1
To get a range by score, you would call zrangebyscore instead -- where -inf and +inf can be used to denote negative and positive infinity, respectively, as Didier Spezia notes in his comment:
zrangebyscore key -inf +inf
Starting with Redis 6.2.0,
To get all the keys and its value together in a single query using the below,
zrange <KEY> 0 -1 WITHSCORES
The optional WITHSCORES argument supplements the command's reply with the scores of elements returned. The returned list contains value1,score1,...,valueN,scoreN instead of value1,...,valueN. Client libraries can return a more appropriate data type (suggestion: an array with (value, score) arrays/tuples).
In newer versions of redis (>= v6.2.0), if you want to get all members of a sorted set between two scores, you should use:
ZRANGE key min max BYSCORE
Adding the BYSCORE option makes redis treat the min & max arguments as scores rather than indices.
(As of this writing, ZRANGEBYSCORE still works, but is considered deprecated.)