I am interested in using Redis to check if a IP address (converted into integer) falls within a range of IPs. It is very likely that the ranges will overlap.
I have found this question/answer, although I am not able to fully understand the logic behind it.
Thank you for your help!
EDIT - Since I got a downvote (a comment to explain why would be nice), I've removed some clutter from my answer.
#DidierSpezia answer in your linked question is a good answer, but it becomes hard to maintain if you are adding/removing ranges.
However it is not trivial (and expensive) to build and maintain it.
I have an answer that is easier to maintain, but it could get slow and memory expensive to compute with many ranges as it requires cloning a set of all ranges.
You need to save all ranges twice, in two sets. The score of each range will be its border values.
Going with the sets in #DidierSpezia example:
A 2-8
B 4-6
C 2-9
D 7-10
Your two sets will be:
ZADD ranges:low 2 "2-8" 4 "4-6" 2 "2-9" 7 "7-10"
ZADD ranges:high 8 "2-8" 6 "4-6" 9 "2-9" 10 "7-10"
To query to which ranges a value belongs, you need to trim the ranges that the lower border is higher than the queried value, and trim the ranges that the higher border is lower.
The most efficient way I can think of is cloning one of the sets, trimming one of it sides by the rules gave above, changing the scores of the ranges to reflect the other border and then trim the second side.
Here's how to find the ranges 5 belongs to:
ZUNIONSTORE tmp 1 ranges:low
ZREMRANGEBYSCORE tmp (5 +inf
ZINTERSTORE tmp 2 tmp ranges:high WEIGHTS 0 1
ZREMRANGEBYSCORE tmp -inf (5
ZRANGE tmp 0 -1
In this discussion, Dvir Volk and #antirez suggested to use a sorted set in which each entry represent a range, and has the following form:
Member = "min-max" range
Score = max value
For example:
ZADD z 10 "0-10"
ZADD z 20 "10-20"
ZADD z 100 "50-100"
And in order to check if a value falls within a range, you can use ZRANGEBYSCORE and parse the member returned.
For example, to check value 5:
ZRANGEBYSCORE z 5 +inf LIMIT 0 1
this will return the "0-10" member, and you only need to parse the string and validate if your value is in between.
To check value 25:
ZRANGEBYSCORE z 25 +inf LIMIT 0 1
will return "50-100", but the value is not between that range.
Related
I'm using Redis sorted set to implement the leaderboard of my game, where I show the user ranking in descending order. I'm stuck in a case where two or more users have the same score. So in this case, I want the higher ranking of the user who gets the score first. For example, I'm adding the following entries in Redis.
127.0.0.1:6379> zadd testing-key 5 a
(integer) 1
127.0.0.1:6379> zadd testing-key 4 b
(integer) 1
127.0.0.1:6379> zadd testing-key 5 c
(integer) 1
and when I'm querying for the rank in reverse order, I'm getting this
127.0.0.1:6379> zrevrange testing-key 0 10
1) "c"
2) "a"
3) "b"
but in my case, the ranking should be like
1) "a"
2) "c"
3) "b"
So is there any provision in Redis to give higher precedence to the entity which entered first in the set with the same score?
I found one solution to this problem. In my case, the score is an integer so I converted it into decimal and added Long.MAX_VALUE - System.nanoTime() after decimal. So the final score code will be like
double finalScore = score.(Long.MAX_VALUE - System.nanoTime());
So the final score of the player who scored first would be higher than the second one. Please let me know if you have any better solution.
If your leaderboard's scores are "small" enough, you may get away with using a combination of the score and the timestamp (e.g. 123.111455234, where 123 is the score). However, since the Sorted Set score is a double floating point, you may lose precision.
Alternatively, keep two Sorted Sets - one with each player's leaderboard score and the other with each player's score timestamp, and use both to determine the order.
Or, use a single sorted set for the leader board, encode the timestamp as part of the member and rely on lexicographical ordering.
I have a data model like this:
Fields:
counter number (e.g. 00888, 00777, 00123 etc)
counter code (e.g. XA, XD, ZA, SI etc)
start date (e.g. 2017-12-31 ...)
end date (e.g. 2017-12-31 ...)
Other counter date (e.g. xxxxx)
Current Datastructure organization is like this (root and multiple child format):
counter_num + counter_code
---> start_date + end_date --> xxxxxxxx
---> start_date + end_date --> xxxxxxxx
---> start_date + end_date --> xxxxxxxx
Example:
00888 + XA
---> Jan 10 + Jan 20 --> xxxxxxxx
---> Jan 21 + Jan 31 --> xxxxxxxx
---> Feb 01 + Dec 31 --> xxxxxxxx
00888 + ZI
---> Jan 09 + Feb 24 --> xxxxxxxx
---> Feb 25 + Dec 31 --> xxxxxxxx
00777 + XA
---> Jan 09 + Feb 24 --> xxxxxxxx
---> Feb 25 + Dec 31 --> xxxxxxxx
Today the retrieval happens in 2 ways:
//Fetch unique counter data using all the composite keys
counter_number + counter_code + date (start_date <= date <= end_date)
//Fetch all the counter codes and corresponding data matching the below conditions
counter_number + date (start_date <= date <= end_date)
What's the best way to model this in redis as I need to cache some of the frequently hit data. I feel sorted sets should do this somehow, but unable to model it.
UPDATE:
Just to remove the confusion, the ask here is not for an SQL "BETWEEN" like query. 'Coz I don't know what the start_date and end_date values are. Think they are just column names.
What I don't want is
SELECT * FROM redis_db
WHERE counter_num AND
date_value BETWEEN start_date AND end_date
What I want is
SELECT * FROM redis_db
WHERE counter_num AND
start_date <= specifc_date AND end_date >= specific_date
NOTE: The requirement is pretty much close to 2D indexing of what is proposed in Redis multi-dimensional indexing document
https://redis.io/topics/indexes#multi-dimensional-indexes
I understood the concept but unable to digest the implementation detail that is given.
I'm unlikely to get this done in time for the bounty, but what the hell...
This sounds like a job for geohashing. Geohashing is what you do when you want to index a 2-dimensional (or higher) dataset. For example, if you have a database of cities and you want to be able to quickly respond to queries like "find all the cities within 50km of X", you use geohashing.
For the purposes of this question, you can think of start_date and end_date as x and y coordinates. Normally in geohashing you're searching for points in your dataset near a particular point in space, or in a certain bounded region of space. In this case you just have a lower bound on one of the coordinates and an upper bound on the other one. But I suppose in practice the whole dataset is bounded anyway, so that's not a problem.
It would be nice if there was a library for doing this in Redis. There probably is, if you look hard enough. The newer versions of Redis have built-in geohashing functionality. See the commands starting with GEO. But it doesn't claim to be very accurate, and it's designed for the surface of a sphere rather than a flat surface.
So as far as I can see you have 3 options:
Map your search space to a small part of the sphere, preferably near the equator. Use the Redis GEO commands. To search, use GEOSPHERE on a circle covering the triangle you're trying to search, taking into account the inbuilt inaccuracy and the distortion you get by mapping onto the sphere, then filter the results to get the ones that are actually inside the triangle.
Find some 3rd-party geohashing client for Redis which works on flat space and is more accurate than GEO.
Read the rest of this answer, or some other primer on geohashing, then implement it yourself on top of Redis. This is the hardest (but most educational) option.
If you have a database that indexes data using a numerical ordering, such that you can do queries like "find all the rows/records for which z is between a and b", you can build a geohash index on top of it. Suppose the coordinates are (non-negative) integers x and y. Then you add an integer-valued column z, and index by z. To calculate z, write x and y in binary, then take alternate digits from each. Example:
x = 969 = 0 1 1 1 1 0 0 1 0 0 1
y = 1130 = 1 0 0 0 1 1 0 1 0 1 0
z = 1750214 = 0110101011010011000110
Note that the index allows you to find, for example, all records positioned with z between 0101100000000000000000 and 0101101111111111111111 inclusive. In other words, all records for which z starts with 010110. Or to put it another way, you can find all records for which x starts with 001 and y starts with 110. This set of records corresponds to a square in the 2-dimensional space we are trying to search.
Not all squares can be searched in this way. We'll call these ones searchable squares. Suppose the client sends a request for all records for which (x,y) is inside a particular rectangle. (Or a circle, or some other reasonable geometric shape.) Then you need to find a set of searchable squares which cover the rectangle. Then, for each of these squares you've chosen, query the database for records inside that square and send the results to the client. (But you'll have to filter the results, because not all the records in the square are actually in the original rectangle.)
There's a balance to be struck. If you choose a small number of large special squares, you'll probably end up covering a much larger area of the map than you need; the query to the database will return lots of extra results that you'll have to filter out. Alternatively, if you use lots of little special squares, you'll be doing lots of queries to the database, many of which will return no results.
I said above that x and y could be start_time and end_time. But actually the distribution of your dataset won't be as symmetrical as in most uses of geohashing. So the performance might be better (or worse) if you use x = end_time + start_time and y = end_time - start_time.
Because your question remains a bit vague on how you desire to query your data, it remains unclear on how to solve your question. With that in mind, however, here are my thoughts on how I might model your data:
Updated answer, detailing how to use SORTED SET
I have edited this answer to be able to store your values in a way that you can query by dynamic date ranges. This edit assumes that your database values are timestamps, as in the value is for a single time, not 2, as in your current setup.
Yes, you are correct that using Sorted Sets will be able to accomplish this. I suggest that you always use a Unix timestamp value for the score component in these sorted sets.
In case you were not already familiar with redis, let's explain indexing limitations. Redis is a simple key-value designed to quickly retrieve values by a key. Because of this design, it does not contain many features of your traditional DBMS, like indexing a column for instance.
In redis, you accomplish indexing by using a key, and the most nested key-like structures are available in HASH and SORTED SET, but you only get 2 key-like structures. In a HASH, you have the key (same as any data type), and a inner hash key, which can take the form of any string.
In a SORTED SET, you have the key (same as any data type), and a numeric value.
A HASH is nice to use to keep a grouped data organized.
A SORTED SET is nice if you want to query by a range of values. This could be a good fit for your data.
Your SORTED SET would look like the following:
key
00888:XA =>
score (date value) value
1452427200 (2016-01-10) xxxxxxxx
1452859200 (2016-01-10) yyyyxxxx
1453291200 (2016-01-10) zzzzxxxx
Let's use a more intuitive example, the 2017 Juventus roster:
To produce the SORTED SET in the table below, issue this command in your redis client:
ZADD JUVENTUS 32 "Emil Audero" 1 "Gianluigi Buffon" 42 "Mattia Del Favero" 36 "Leonardo Loria" 25 "Neto" 15 "Andrea Barzagli" 4 "Medhi Benatia" 19 "Leonardo Bonucci" 3 "Giorgio Chiellini" 40 "Luca Coccolo" 29 "Paolo De Ceglie" 26 "Stephan Lichtsteiner" 12 "Alex Sandro" 24 "Daniele Rugani" 43 "Alessandro Semprini" 23 "Dani Alves" 22 "Kwadwo Asamoah" 7 "Juan Cuadrado" 6 "Sami Khedira" 18 "Mario Lemina" 46 "Mehdi Leris" 38 "Rolando Mandragora" 8 "Claudio Marchisio" 14 "Federico Mattiello" 45 "Simone Muratore" 20 "Marko Pjaca" 5 "Miralem Pjanic" 28 "Tomás Rincón" 27 "Stefano Sturaro" 21 "Paulo Dybala" 9 "Gonzalo Higuaín" 34 "Moise Kean" 17 "Mario Mandzukic"
Jersey Name Jersey Name
32 Emil Audero 23 Dani Alves
1 Gianluigi Buffon 42 Mattia Del Favero
36 Leonardo Loria 25 Neto
15 Andrea Barzagli 4 Medhi Benatia
19 Leonardo Bonucci 3 Giorgio Chiellini
40 Luca Coccolo 29 Paolo De Ceglie
26 Stephan Lichtsteiner 12 Alex Sandro
24 Daniele Rugani 43 Alessandro Semprini
22 Kwadwo Asamoah 7 Juan Cuadrado
6 Sami Khedira 18 Mario Lemina
46 Mehdi Leris 38 Rolando Mandragora
8 Claudio Marchisio 14 Federico Mattiello
45 Simone Muratore 20 Marko Pjaca
5 Miralem Pjanic 28 Tomás Rincón
27 Stefano Sturaro 21 Paulo Dybala
9 Gonzalo Higuaín 34 Moise Kean
17 Mario Mandzukic
To query the roster by a range of jersey numbers:
ZRANGEBYSCORE JUVENTUS 1 5
Output:
1) "Gianluigi Buffon"
2) "Giorgio Chiellini"
3) "Medhi Benatia"
4) "Miralem Pjanic"
Note that the scores are not returned, however ZRANGEBYSCORE command orders the results in ASC order by score.
To add the scores, append "WITHSCORES" to the command, like so: ZRANGEBYSCORE JUVENTUS 1 5 WITHSCORES
By using ZRANGEBYSCORE, you should be able to query any key (counter number + counter code) with a date range,
producing the values in that range.
Original: Below is my original answer, recommending HASH
Based on your examples, I recommend you use a HASH.
With a hash, you would have a main key to find the hash (Ex. 00888:XA). Then within the hash, you have key -> value pairs (Ex. 2017-01-10:2017-01-20 -> xxxxxxxx). I prefer to delimit or tokenize my keys' components with the colon char :, but you can use any delimiter.
HASH follows your example data structure very well:
key
00888:XA =>
hashkey value
2017-01-10:2017-01-20 xxxxxxxx
2017-01-21:2017-01-31 yyyyxxxx
2016-02-01:2016-12-31 zzzzxxxx
key
00888:ZI =>
hashkey value
2017-01-10:2017-01-20 xxxxxxxx
2017-01-21:2017-01-31 xxxxyyyy
2016-02-01:2016-12-31 xxxxzzzz
When querying for data, instead of GET key, you would query with HGET key hashkey. Same for setting values, instead of SET key value, use HSET key hashkey value.
Example commands
HSET 00777:XA 2017-01-10:2017-01-20 xxxxxxxx
HSET 00777:XA 2017-01-21:2017-01-31 yyyyyyyy
HSET 00777:XA 2016-02-01:2016-12-31 zzzzzzzz
(Note: there is also a HMSET to simplify this into a single command)
Then:
HGET 00777:XA 2017-01-21:2017-01-31
Would return yyyyyyyy
Unless there is some specific performance consideration, or other goal for your data, I think Hashes will work great for your system.
It's also very convenient if you want to get all hashkeys or all values for a given hash, using commands like HKEYS, HVALS, or HGETALL.
I have a table in sql with 3 columns: BIGINT StartNumber, BIGINT EndNumber, BIGINT LocationId, and I need to be able to do something like this
Select LocationId where StartNumber < #number and EndNumber > #number.
for example:
StartNumber EndNumber LocationId
1 5 1
6 9 1
10 16 2
and when I have #number = 7 I should get LocationId = 1
How can I do this in redis?
I was thinking to move this table to redis, use sorted set and ZRANGEBYSCORE but it did't work for me:
1) When I am using ZADD key score member [score] [member], I am unable to add 2 elements with the same member and different score even with nx parameter:
zadd myset nx 1 "17" 2 "17" - it will add one element and then update its score instead of adding two elements.
2) when I am adding this: zadd set1 2 "a" 4 "b" 6 "c" 10 "d" and then trying to do zrangebyscore set1 3 3 (want to get member whose score include 3) I em getting empty result
P.s. All commands are executed on the example pages of redis website.
So as I understood the task, you don't have overlaps and each interval maps to only one location (?) and intervals don't have gaps. Based on this you can use only one sorted list with lower (or upper) bound values:
ZADD StartNumber 1 "1:5:1" 6 "6:9:1" 10 "10:16:2"
Then you can use:
ZREVRANGEBYSCORE StartNumber 7 -inf LIMIT 0 1
And it will be O(log(N)).
Put differently, your question is "how can I map N ranges of numbers to a location". One way of doing this is using two Sorted Sets, one for the StartNumber and the other one for EndNumber. Since members have to be unique, we'll also need to ensure that by using the Start/End values as part of the member. For example, with your example data, this could be done like so:
ZADD StartNumber 1 "1:5:1" 6 "6:9:1" 10 "10:16:2"
ZADD EndNumber 5 "1:5:1" 9 "6:9:1" 16 "10:16:2"
To find the location for #number=7, do ZRANGEBYSCORE StartNumber -inf 7 and ZRANGEBYSCORE EndNumber 7 +inf and intersect the results. All that remains is to split the intesect's result(s) on the colon (:) and use the 3rd element as the location.
Note: if your app ensures that there are no overlapping ranges and that there can be only one location per "number", you can get the same results with only one set.
(this is the first time that I'm giving two answers to the same question - maybe I'll get a badge or sumthin' ;))
The double Sorted Set approach is a generalization and, as such, aims to solve a bigger set of problems than what the OP needs (as put in the comments to the first answer). That approach is also not effective as the query is O(logn)+O(N) so when N is large (e.g. 5M) that's probably not a good idea.
However, to satisfy the requirements and given that the ranges do not overlap, one could actually use only a single Sorted Set and a simpler query. The set's members should be added by concatenating the EndNumber and LocationId and the their scores should be set to their respective StartNumber, so for the sake of the example:
ZADD ranges 1 "5:1" 6 "9:1" 10 "16:2"
Given #number, obtain the relevant LocationId with the following Redis Lua code (O(logn)):
-- rangelookup.lua
-- http://stackoverflow.com/questions/32185898/redis-get-member-where-score-is-between-min-and-max/32186675
-- A **non inclusive** range search on a Sorted Set with the following data:
-- score = <StartNumber>
-- member = <EndNumber>:<LocationId>
--
-- KEYS[1] - Sorted Set key name
-- ARGV[1] - the number to search
--
-- reply - the relevant id, nil if range doesn't exist
--
-- usage example: redis-cli --eval rangelookup.lua ranges , 7
local number = tonumber(ARGV[1])
local data = redis.call('ZREVRANGEBYSCORE', KEYS[1], number, '-inf', 'WITHSCORES', 'LIMIT', 0, 1)
local reply = nil
if data ~= nil and number > tonumber(data[2]) then
local to, id = data[1]:match( '(.*):(.*)' )
if tonumber(to) > number then
reply = id
end
end
return reply
Sample output:
$ redis-cli --eval rangelookup.lua ranges , 7
"1"
$ redis-cli --eval rangelookup.lua ranges , 9
(nil)
$ redis-cli --eval rangelookup.lua ranges , 99
(nil)
When i add a score for a key using zincrby, it increases the score and puts the element in lexicographical order.
Can i get this list in the order, in which the elements are updated or added ?
e.g>
If I execute
zincrby A 100 g
zincrby A 100 a
zincrby A 100 z
and then
zrange A 0 -1
then the result is
a->g->z
where, i want the result in order the entries are made so,
g->a->z
As score is same for all, redis is placing the elements in lexicographical order. Is there any way to prevent it ?
I don't think it is possible, but if you want to keep the order of insertion with scores, you should manipulate something like this:
<score><timestamp>
instead of
<score>
You will have to define a good time record (millis should be ok). Then you can use
zincrby A 100 * (10^nbdigitsformillis)
For instance:
Score = 100 and timestamps is 1381377600 seconds
That gives: 1001381377600
You incr by 200 the score: 1001381377600 + 200 * 10 = 3001381377600
Be careful with zset as it stores scores with double values (64 bits, but only 52 available for int value) so don't store more than 15-17 digits.
If you can't do that (need for great timestamp precision, and great score precision), you will have to manage two zsets (one for actual score, one for timestamp) and managing your ranking manual with the two values.
Assume I have a set (or sorted set or list if that would be better) A of 100 to 1000 strings.
Then I have a sorted set B of many more strings, say one million.
Now C should be the intersection of A and B (of the strings of course).
I want to have every tuple (X, SCORE_OF_X_IN_B) where X is in C.
Any Idea?
I got two ideas:
Interstore
store A a sorted set with every score being 0
interstore to D
get every item of D
delete D
Simple loop in client
loop over A in my client programm
get zscore for every string
While 1. has way too much overhead on the redis side (Has to write for example. The redis page states quite a high time complexity, too http://redis.io/commands/zinterstore), 2. would have |A| database connections and won't be a good choice.
Maybe I could write a redis/lua script which will work like zscore but with an arbitrary number of strings, but I'm not sure if my hoster allows scripts...
So I just wanted to ask SO, if there is an elegant and fast solution available without scripting!
There is a simple solution to your problem: ZINTERSTORE will work with a SET and a ZSET. Try:
redis> sadd foo a
(integer) 1
redis> zadd bar 1 a
(integer) 1
redis> zadd bar 2 b
(integer) 1
redis> zinterstore baz 2 foo bar AGGREGATE MAX
(integer) 1
redis> zrange baz 0 -1 withscores
1) "a"
2) "1"
Edit: I added AGGREGATE MAX above, since redis will give each member of the (non-sorted) set foo a default score of 1, and SUM that with whatever score it has in the (sorted) set bar.