Background:
I'm now using php+redis as my backend to store a rank.
And zset seems to be a good solution to handle this.
However the rank contains multiple scores, if the first score equals, I need to compare the second score to decide the ordering. There are 3 scores in total.
I thought there will be an interface that I can set a custom compare function for a specific zset so that I can do the sort job inside it but I failed to find it. Besides, I'd like the rank to be sorted when it's added. if I need to sort again everytime there is a request to get the rank, then this is wasteful I think.
expect result:
zadd myset 1000_100_3000 matchId1
zadd myset 1000_2500_250 matchId2
zadd myset 1000_2500_200 matchId3
zrange myset 0 -1 returns:
matchId2
matchId3
matchId1
something like this
Short answer: no, you can't do that
However, you can often compose multiple keys in such a way to make them sortable. In your case, an obvious candidate would be to determine the max range of the three pieces, and just compose them all into a single big number. For example, 1000_100_3000 could perhaps be the number 100001003000 (4 decimal digits per chunk), which can be trivially compared or decomposed. You might also want to think in terms of bits rather than digits, though. For example, maybe allow 20 bits per segment, and use shift/mask bit operations to compose/decompose (i.e. (1000 << 40) | (100 << 20) | (3000))
Related
I need an indexed structure in Redis.
I have a ranking algorithm and should store data with ZADD
ZADD myset 125 id::1
ZADD myset 17 id::2
And then, I need both the count and topmost scored data.
ZCOUNT myset -inf +inf
ZPOPMAX myset
provides what I need. But, ZPOPMAX not only brings topmost but also removes it.
QUESTION 1
I need the top most scored one, but I must NOT remove it from the sorted list.
QUESTION 2
I need to put an expiration time for each item in the sorted list. I know, expiration can be only set on keys in Redis. But I need a kind of dynamic ranking, e.g. Currently my topmost ranked data is 125, but after 12 hours, it will be 17 because the first item will be a tombstone after its TTL.
Thank you
You can explore the Redis commands related to Sorted Set HERE
QUESTION 1
I need the top most scored one, but I must NOT remove it from the sorted list.
You can use these commands:
ZRANGE myset 0 0 REV fetch the higher score
ZRANGE myset 0 0 fetch the lowest score.
QUESTION 2
I need to put an expiration time for each item in the sorted list. I know, expiration can be only set on keys in Redis. But I need a kind of dynamic ranking, e.g. Currently my topmost ranked data is 125, but after 12 hours, it will be 17 because the first item will be a tombstone after its TTL.
But For this, You cant use any Redis command to expire Sorted Set keys and I recommend using 2 separated Sorted Set one of them for with your regular scoring system which increase/decrease the score, and the second one same key as the first one with a time score. use a job and query keys that are outdated and remove the keys from both Sorted Sets.
you can query Sorted Sets with ZRANGEBYSCORE see the LINK for the usage.
For getting more ideas on the second Question you can READ THIS.
I am new to Redis. For example, if I have the following schema:
INCR id:product
SET product:<id:product> value
SADD color:red <id:product>
(Aside: I am not sure how to express a variable in Redis. I will just use <id:product> as the primary key value. In production, I will use golang client for this job)
To query products which have red color, I can do:
SMEMBERS color:red
But the problem is I just want to display 10 of them in the first page, and then next 10 in the second page and so on. How to let Redis return only part of them by specifying offset and limit arguments?
What do redis experts normally do for this case? Return all IDs even if I just want 10 of them? Is that efficient? What if it has millions of values in the set, but I only want 10?
Edited 1
Incidentally, I use sets instead of lists and sorted sets because I will need to do SINTER jobs for other queries.
For example:
SADD type:foo <id:product>
SINTER color:red type:foo
And then I will have pagination problem again. Because I actually just want to find 10 of the intersection at a time. (eg: if the intersection returns millions of keys, but actually I just want 10 of them at a time for pagination).
Edited 2
Should I use a sorted set instead? I am not sure if this is the expert choice or not. Something like:
ZADD color:red <id:product> <id:product>
ZADD type:foo <id:product> <id:product>
ZRANGE color:red 0 9 // for the first page of red color products
ZINTERSTORE out 2 color:red type:foo AGGREGATE MIN
ZRANGE out 0 9 // for the first page of red color and type foo products
I have no ideas if the above way is suggested or not.
What will happen if multiple clients are creating the same out sorted set?
Is that meaningful to use the same value for both score and member?
Using sorted sets is the standard way to do pagination in Redis.
The documentation of ZINTERSTORE says that: "If destination already exists, it is overwritten."
Therefore, you shouldn't use "out" as the destination key name. You should instead use a unique or sufficiently random key name and then delete it when you're done.
I'm not sure what you mean by "meaningful". It's a fine choice if that's the order you want them to be in.
There are proposals for sorted set item expiration in Redis (see https://groups.google.com/d/msg/redis-db/rXXMCLNkNSs/Bcbd5Ae12qQJ and https://quickleft.com/blog/how-to-create-and-expire-list-items-in-redis/), I tried the worker approach to expire geospatial indexes with ZREMRANGEBYSCORE and ZREMRANGEBYRANK commands unsuccessfully (nothing removed).
I succeded using ZREMRANGEBYLEX.
Is there a way to work with geospatial items score other than Strings?
Update:
For example, if time to live(ttl) of an item is 30sec, I add it as:
geoadd 1 -8.616021 41.154503 30
Now, suppose worker executes after 40sec, I was expecting that
zremrangebyscore 1 0 40
would do the job, but it does not,
ZREMRANGEBYLEX 1 [0 [40
does it. Why is this behavior? That means the score of a geospatial item supports only lexicographical operations?
Sorted Sets have elements (strings), and every element has a score (floating-point). Geosets use the score to encode a coordinate.
Redis doesn't expire members in a Sorted Set (or a Geoset). You have to remove them yourself if that is required.
In your case, you'll need to keep two Sorted Sets - one as your GeoSet and one for managing TTLs as scores.
For example, assuming your member is called 'foo', to add it:
ZADD ttls 30 foo
ZADD elems -8.616021 41.154503 foo
To manually expire, first find the members with a call to ZRANGEBYSCORE ttls, and then remove them from both Sets.
Tip: it is preferable to use a timestamp as score instead of seconds.
In redis I store objects in a sorted set.
In my solution, it's important to be able to run a ranged query by dates, so I store the items with the score being the timestamp of each items, for example:
# Score Value
0 1443476076 {"Id":"92","Ref":"7ADT","DTime":1443476076,"ATime":1443901554,"ExTime":0,"SPName":"7ADT33CFSAU6","StPName":"7ADT33CFSAU6"}
1 1443482969 {"Id":"11","Ref":"DAJT","DTime":1443482969,"ATime":1443901326,"ExTime":0,"SPName":"DAJTJTT4T02O","StPName":"DAJTJTT4T02O"}
However, in other situations I need to find a single item in the set based on it's ID.
I know I can't just query this data structure as if it were a nosql db, but I tried using ZSCAN, which didn't work.
ZSCAN MySet 0 MATCH Id:92 count 1
It returns; "empty list or set"
Maybe I need to serialize different?
I have serialized using Json.Net.
How, if possible, can I achieve this; using dates as score and still be able to lookup an item by it's ID?
Many thanks,
Lars
Edit:
Assume it's not possible, but any thoughts or inputs are welcome:
Ref: http://openmymind.net/2011/11/8/Redis-Zero-To-Master-In-30-Minutes-Part-1/
In Redis, data can only be queried by its key. Even if we use a hash,
we can't say get me the keys wherever the field race is equal to
sayan.
Edit 2:
I tried to do:
ZSCAN MySet 0 MATCH *87*
127.0.0.1:6379> ZSCAN MySet 0 MATCH *87*
1) "192"
2) 1) "{\"Id\":\"64\",\"Ref\":\"XQH4\",\"DTime\":1443837798,\"ATime\":1444187707,\"ExTime\":0,\"SPName\":\"XQH4BPGW47FM\",\"StPName\":\"XQH4BPGW47FM\"}"
2) "1443837798"
3) "{\"Id\":\"87\",\"Ref\":\"5CY6\",\"DTime\":1443519199,\"ATime\":1444172326,\"ExTime\":0,\"SPName\":\"5CY6DHP23RXB\",\"StPName\":\"5CY6DHP23RXB\"}"
4) "1443519199"
And it finds the desired item, but it also finds another one with an occurance of 87 in the property ATime. Having more unique, longer IDs might work this way and I would have to filter the results in code to find the one with the exact value in its property.
Still open for suggestions.
I think it's very simple.
Solution 1(Inferior, not recommended)
Your way of ZSCAN MySet 0 MATCH Id:92 count 1 didn't work out because the stored string is "{\"Id\":\"92\"... not "{\"Id:92\".... The string has been changed into another format. So try to use MATCH Id\":\"64 or something like that to match the json serialized data in redis. I'm not familiar with json.net, so the actual string leaves for you to discover.
By the way, I have to ask you did ZSCAN MySet 0 MATCH Id:92 count 1 return a cursor? I suspect you used ZSCAN in a wrong way.
Solution 2(Better, strongly recommended)
ZSCAN is good when your sorted set is not large and you know how to save network roundtrip time by Redis' Lua transaction. This still make "look up by ID" operation O(n). Therefore, a better solution is to change you data model in the following way:
change sorted set
from
# Score Value
0 1443476076 {"Id":"92","Ref":"7ADT","DTime":1443476076,"ATime":1443901554,"ExTime":0,"SPName":"7ADT33CFSAU6","StPName":"7ADT33CFSAU6"}
1 1443482969 {"Id":"11","Ref":"DAJT","DTime":1443482969,"ATime":1443901326,"ExTime":0,"SPName":"DAJTJTT4T02O","StPName":"DAJTJTT4T02O"}
to
# Score Value
0 1443476076 Id:92
1 1443482969 Id:11
Move the rest detailed data in another set of hashes type keys:
# Key field-value field-value ...
0 Id:92 Ref-7ADT DTime-1443476076 ...
1 Id:11 Ref-7ADT DTime-1443476076 ...
Then, you locate by id by doing hgetall id:92. As to ranged query by date, you need do ZRANGEBYSCORE sortedset mindate maxdate then hgetall every id one by one. You'd better use lua to wrap these commands in one and it will still be super fast!
Data in NoSql database need to be organized in a redundant way like above. This may make some usual operation involve more than one commands and roundtrip, but it can be tackled by redis's lua feature. I strongly recommend the lua feature of redis, cause it wrap commands into one network roundtrip, which are all executed on the redis-server side and is atomic and super fast!
Reply if there's anything you don't know
I'm caching fan-out news feeds with Redis in the following way:
each feed activity is a key/value, like activity:id where the value is a JSON string of the data.
each news feed is currently a list, the key is feed:user:user_id and the list contains the keys of the relevant activities.
to retrieve a news feed I use for example: 'sort feed:user:user_id by nosort get * limit 0 40'
I'm considering changing the feed to a sorted set where the score is the activity's timestamp, this way the feed is always sorted by time.
I read http://arindam.quora.com/Redis-sorted-sets-and-lists-Pertaining-to-Newsfeed which recommend using lists because of the time complexity of sorted sets, but by keep using lists I have to take care of the insert order,
inserting a past story requires to iterate through the list and finding the right index to push to. (which can cause new problems in distributed environments).
should I keep using lists or go for sorted sets?
is there a way to retrieve the news feed instantly from a sorted set, (like with the sort ... get * command for a list) or does it have to be zrange and then iterating through the results and getting each value?
Yes, sorted sets are very fast and powerful. They seem a much better match for your requirements than SORT operations. The time complexity is often misunderstood. O(log(N)) is very fast, and scales just fine. We use it for tens of millions of members in one sorted set. Retrieval and insertion is sub-millisecond.
Use ZRANGEBYSCORE key min max WITHSCORES [LIMIT offset count] to get your results.
Depending on how you store the timestamps as 'scores', ZREVRANGEBYSCORE might be better.
A small remark about the timestamps: Sorted set SCORES which don't need a decimal part should be using 15 digits or less. So the SCORE has to stay in the range -999999999999999 to 999999999999999. Note: These limits exist because Redis server actually stores the score (float) as a redis-string representation internally.
I therefore recommend this format, converted to Zulu Time: -20140313122802 for second-precision. You may add 1 digit for 100ms-precision, but no more if you want no loss in precision. It's still a float64 by the way, so loss of precision could be fine in some scenarios, but your case fits in the 'perfect precision' range, so that's what I recommend.
If your data expires within 10 years, you can also skip the three first digits (CCY of CCYY), to achieve .0001 second precision.
I suggest negative scores here, so you can use the simpler ZRANGEBYSCORE instead of the REV one. You can use -inf as the start score (minus infinity) and LIMIT 0 100 to get the top 100 results.
Two sorted set members (or 'keys' but that's ambiguous since the sorted set is also a key in itself) may share a score, that's no problem, the results within an identical score are alphabetical.
Hope this helps, TW
Edit after chat
The OP wanted to collect data (using a ZSET) from different keys (GET/SET or HGET/HSET keys). JOIN can do that for you, ZRANGEBYSCORE can't.
The preferred way of doing this, is a simple Lua script. The Lua script is executed on the server. In the example below I use EVAL for simplicity, in production you would use SCRIPT EXISTS, SCRIPT LOAD and EVALSHA. Most client libraries have some bookkeeping logic built-in, so you don't upload the script each time.
Here's an example.lua:
local r={}
local zkey=KEYS[1]
local a=redis.call('zrangebyscore', zkey, KEYS[2], KEYS[3], 'withscores', 'limit', 0, KEYS[4])
for i=1,#a,2 do
r[i]=a[i+1]
r[i+1]=redis.call('get', a[i])
end
return r
You use it like this (raw example, not coded for performance):
redis-cli -p 14322 set activity:1 act1JSON
redis-cli -p 14322 set activity:2 act2JSON
redis-cli -p 14322 zadd feed 1 activity:1
redis-cli -p 14322 zadd feed 2 activity:2
redis-cli -p 14322 eval '$(cat example.lua)' 4 feed '-inf' '+inf' 100
Result:
1) "1"
2) "act1JSON"
3) "2"
4) "act2JSON"