I am working on to find usage pattern. For every request I will take the corresponding second of the day and mark an entry. At the end I will see the usage pattern with this. What is the best structure to perform this in redis?
You can store it in three ways:
1) setbit operations storing in a single key
You can use setbit operations if the frequency is very high. That is if you mark for almost all the seconds then you have to store 86400 values inside that. But this will hardly take 0.1 Mb to store.
Even if you store only one entry at 86400th second then you have to loose that 0.1 Mb. But it always have the fixed size as 0.1 Mb. Also beware that you can get the whole thing as a string and you have to convert them as bits.
setbit date second
get date
2) sets
You can use sets if the frequency is little low. So only those seconds in which the request comes will be in your set.
Sadd date second
smembers date
3) hashes
You can use hashes if want to know the count for each second.
Hincrby date second 1
hgetall date
Also do a sample test with all these and compare the size and efficiency.
I'd use a hash to group by day.
// Set the counter to 0 if it doesn't exist.
HSETNX [DAY] [SECOND] 0
// Increment the counter by 1 each request
HINCRBY [DAY] [SECOND] 1
Then use HKEYS or HSCAN to get your results.
Related
In my case I upload a lot of records to Redis sorted set, but I need to store only 10 highest scored items. I don't have ability to influence on the data which is uploaded (to sort and to limit it before uploading).
Currently I just execute
ZREMRANGEBYRANK key 0 -11
after uploading finish, but such approach looks not very optimal (it's slow and it will be better if Redis could handle that).
So does Redis provide something out of the box to limit count of items in sorted sets?
Nopes, redis does not provide any such functionality apart from ZREMRANGEBYRANK .
There is a similar problem about keeping a redis list of constant size, say 10 elements only when elements are being pushed from left using LPUSH.
The solution lies in optimizing the pruning operation.
Truncate your sorted set, once a while, not everytime
Methods:
Run a ZREMRANGEBYRANK with 1/5 probability everytime, using a random integer.
Use redis pipeline or Lua scripting to achieve this , this would even save the two network calls happening at almost every 5th call.
This is optimal enough for the purpose mentioned.
Algorithm example:
ZADD key member1 score1
random_int = some random number between 1-5
if random_int == 1: # trim sorted set with 1/5 chance
ZREMRANGEBYRANK key 0 -11
I wanna use Redis to keep track of certain numbers. Basically, they're counters. Is there a way to use Redis to sort of track the rate at which these counters increase?
For example, let's say a counter is being incremented at a rate of 10 per minute for the most of the time but suddenly it's being incremented at a rate of 40 per minute. How can I detect that?
You cannot do that directly, but you can do that with a sorted set for example, with a bit of client side, or Lua based processing.
Let's say that you use a sorted set, for each time window you increment the value:
ZINCRBY mykey timestamp 1
Then you have a simple counter per timestamp.
When you want to analyze it, you can take a range by time with ZRANGE or ZREVRANGE, getting the scores by using WITHSCORES, and do some processing on the differences for detecting anomalies. There are many ways to do it, here's a link with a few pointers: https://stats.stackexchange.com/questions/152644/what-algorithm-should-i-use-to-detect-anomalies-on-time-series
There are proposals for sorted set item expiration in Redis (see https://groups.google.com/d/msg/redis-db/rXXMCLNkNSs/Bcbd5Ae12qQJ and https://quickleft.com/blog/how-to-create-and-expire-list-items-in-redis/), I tried the worker approach to expire geospatial indexes with ZREMRANGEBYSCORE and ZREMRANGEBYRANK commands unsuccessfully (nothing removed).
I succeded using ZREMRANGEBYLEX.
Is there a way to work with geospatial items score other than Strings?
Update:
For example, if time to live(ttl) of an item is 30sec, I add it as:
geoadd 1 -8.616021 41.154503 30
Now, suppose worker executes after 40sec, I was expecting that
zremrangebyscore 1 0 40
would do the job, but it does not,
ZREMRANGEBYLEX 1 [0 [40
does it. Why is this behavior? That means the score of a geospatial item supports only lexicographical operations?
Sorted Sets have elements (strings), and every element has a score (floating-point). Geosets use the score to encode a coordinate.
Redis doesn't expire members in a Sorted Set (or a Geoset). You have to remove them yourself if that is required.
In your case, you'll need to keep two Sorted Sets - one as your GeoSet and one for managing TTLs as scores.
For example, assuming your member is called 'foo', to add it:
ZADD ttls 30 foo
ZADD elems -8.616021 41.154503 foo
To manually expire, first find the members with a call to ZRANGEBYSCORE ttls, and then remove them from both Sets.
Tip: it is preferable to use a timestamp as score instead of seconds.
I am no an expert in redis at all. Today I run into one idea, but I don't know if it is possible in redis.
I want to store list of values but only for some time, for example list of ip addresses which visited page in last 5 minutes. As far as I know I can't set EXPIRE on single list/hash item, right? So I am pushing 1, 2, 3 into list/hash but after certain constant time I want each item to expire/disapear? Or maybe instead of list hash structure will be more suitable { '1': timestamp-when-disapear, ... }?
Or maybe only solution is
SET test.1.1 1
EXPIRE test.1.1 60
SET test.1.2 2
EXPIRE test.1.2 60
SET test.1.3 3
EXPIRE test.1.3 60
# to retrieve, can I pipeline KEYS output to MGET?
KEYS test.1.*
Use a sorted set instead.
log the server IP along with the timestamp in a sorted set. During retrieval make use of that timestamp to get things you need. In a scheduler periodically delete keys which goes beyond the range.
Example:
zadd test 1465371055 1.1
zadd test 1465381055 1.3
zadd test 1465391055 1.1
your sorted set will have 1.1 and 1.3, where 1.1 is with the new value 1465391055.
Now on retrieval use
zrangebyscore test min max
min -> currenttime - (5*60*1000)
max -> currenttime
you will get IP's visited in last 5 mins.
In another scheduler kind of thread you need to delete unwanted entries.
zremrangebyscore test min max
min -> currenttime - (10*60*1000) -> you can give it to any minute you want.
max -> currenttime
Also understand that if number of distinct IP's are too large then the sorted set will grow rapidly. Your scheduler thread must work properly to keep the memory in control.
I'm caching fan-out news feeds with Redis in the following way:
each feed activity is a key/value, like activity:id where the value is a JSON string of the data.
each news feed is currently a list, the key is feed:user:user_id and the list contains the keys of the relevant activities.
to retrieve a news feed I use for example: 'sort feed:user:user_id by nosort get * limit 0 40'
I'm considering changing the feed to a sorted set where the score is the activity's timestamp, this way the feed is always sorted by time.
I read http://arindam.quora.com/Redis-sorted-sets-and-lists-Pertaining-to-Newsfeed which recommend using lists because of the time complexity of sorted sets, but by keep using lists I have to take care of the insert order,
inserting a past story requires to iterate through the list and finding the right index to push to. (which can cause new problems in distributed environments).
should I keep using lists or go for sorted sets?
is there a way to retrieve the news feed instantly from a sorted set, (like with the sort ... get * command for a list) or does it have to be zrange and then iterating through the results and getting each value?
Yes, sorted sets are very fast and powerful. They seem a much better match for your requirements than SORT operations. The time complexity is often misunderstood. O(log(N)) is very fast, and scales just fine. We use it for tens of millions of members in one sorted set. Retrieval and insertion is sub-millisecond.
Use ZRANGEBYSCORE key min max WITHSCORES [LIMIT offset count] to get your results.
Depending on how you store the timestamps as 'scores', ZREVRANGEBYSCORE might be better.
A small remark about the timestamps: Sorted set SCORES which don't need a decimal part should be using 15 digits or less. So the SCORE has to stay in the range -999999999999999 to 999999999999999. Note: These limits exist because Redis server actually stores the score (float) as a redis-string representation internally.
I therefore recommend this format, converted to Zulu Time: -20140313122802 for second-precision. You may add 1 digit for 100ms-precision, but no more if you want no loss in precision. It's still a float64 by the way, so loss of precision could be fine in some scenarios, but your case fits in the 'perfect precision' range, so that's what I recommend.
If your data expires within 10 years, you can also skip the three first digits (CCY of CCYY), to achieve .0001 second precision.
I suggest negative scores here, so you can use the simpler ZRANGEBYSCORE instead of the REV one. You can use -inf as the start score (minus infinity) and LIMIT 0 100 to get the top 100 results.
Two sorted set members (or 'keys' but that's ambiguous since the sorted set is also a key in itself) may share a score, that's no problem, the results within an identical score are alphabetical.
Hope this helps, TW
Edit after chat
The OP wanted to collect data (using a ZSET) from different keys (GET/SET or HGET/HSET keys). JOIN can do that for you, ZRANGEBYSCORE can't.
The preferred way of doing this, is a simple Lua script. The Lua script is executed on the server. In the example below I use EVAL for simplicity, in production you would use SCRIPT EXISTS, SCRIPT LOAD and EVALSHA. Most client libraries have some bookkeeping logic built-in, so you don't upload the script each time.
Here's an example.lua:
local r={}
local zkey=KEYS[1]
local a=redis.call('zrangebyscore', zkey, KEYS[2], KEYS[3], 'withscores', 'limit', 0, KEYS[4])
for i=1,#a,2 do
r[i]=a[i+1]
r[i+1]=redis.call('get', a[i])
end
return r
You use it like this (raw example, not coded for performance):
redis-cli -p 14322 set activity:1 act1JSON
redis-cli -p 14322 set activity:2 act2JSON
redis-cli -p 14322 zadd feed 1 activity:1
redis-cli -p 14322 zadd feed 2 activity:2
redis-cli -p 14322 eval '$(cat example.lua)' 4 feed '-inf' '+inf' 100
Result:
1) "1"
2) "act1JSON"
3) "2"
4) "act2JSON"