Accuracy of redis dbsize command - redis

How accurate is the dbsize command in redis?
I've noticed that the count of keys returned by dbsize does not match the number of actual keys returned by the keys command.
Here's an example:
redis-cli dbsize
(integer) 3057
redis-cli keys "*" | wc -l
2072
Why is dbsize so much higher than the actual number of keys?

I would say it is linked to key expiration.
Key/value stores like Redis or memcached cannot afford to define a physical timer per object to expire. There would be too many of them. Instead they define a data structure to easily track items to be expired, and multiplex all the expiration events to a single physical timer. They also tend to implement a lazy strategy to deal with these events.
With Redis, when an item expires, nothing happens. However, before each item access, a check is systematically done to avoid returning expired items, and potentially delete the item. On top of this lazy strategy, every 100 ms, a scavenger algorithm is triggered to physically expire a number of items (i.e. remove them from the main dictionary). The number of considered keys at each iteration depends on the expiration workload (the algorithm is adaptative).
The consequence is Redis may have a backlog of items to expire at a given point in time, when you have a steady flow of expiration events.
Now coming back to the question, the DBSIZE command just return the size of the main dictionary, so it includes expired items that have not yet been removed. The KEYS command walks through the whole dictionary, accessing individual keys, so it excludes all expired items. The number of items may therefore not match.

Related

Redis - prioritizing two sets of keys (eviction policy)

We have two separate set of keys in one Redis instance (set1 and set2). All keys in both sets have an expire time set.
If Redis instance hits max memory cap, we want keys from set1 (and only from it!) be evicted to free some memory, but we need to have a guarantee that keys from set2 will not be evicted until their time limit and, thus, will always expire in a normal way.
Is there any possibility to achieve it?
Thanx in advance!
Redis doesn't provide this finely grained of a level of control over cache invalidation. You're restricted to the following options:
noeviction: New values aren’t saved when memory limit is reached. When a database uses replication, this applies to the primary database
allkeys-lru: Keeps most recently used keys; removes least recently used (LRU) keys
allkeys-lfu: Keeps frequently used keys; removes least frequently used (LFU) keys
volatile-lru: Removes least recently used keys with the expire field set to true.
volatile-lfu: Removes least frequently used keys with the expire field set to true.
allkeys-random: Randomly removes keys to make space for the new data added.
volatile-random: Randomly removes keys with expire field set to true.
volatile-ttl: Removes keys with expire field set to true and the shortest remaining time-to-live (TTL) value.
The best you could do would be to set the policy to noeviction and then write your own cache-invalidation process. Or maybe set it to volatile-ttl and then have set2 be non-volatile keys that you remove manually. A fair bit of work and possibly not worth it.
The documentation describing these options also provides some good insight into how Redis actually removes things and might be worth perusing.

Redis LRANGE Pop Atomicity

I am having a redis data store in which there are unique keys stored. Now my app server will send multiple requests to redis to get some 100 keys from start and I am planning to use LRANGE command for the same.
But my requirement is that each request should receive unique set of keys,which means that if one request goes to redis for 100 keys then those keys will never be returned to any request in future.
As I saw that redis operations are atomic, so can i assume that if there multiple requests coming from app server at same time to redis, as redis is single thrreaded, so it will execute LRANGE mylist 0 100 and once it is completed (means once 100 keys taken and removes from List), only then next request will be processed, so atomicity is inbuild,is it correct?
Is it ever possible under any circumstance that two requests can get same 100 keys?
It sounds like the command you actually want is LPOP, since LRANGE doesn't remove anything from the list.
LPOP mylist 101
And, yes, this command is atomic, so no two clients will receive the same elements.

redis: delete oldest keys when memory limited

I have keys I want to keep indefinitely in redis provided I have enough memory. However, if redis runs low on memory, then I'd like it to remove the oldest keys first. I looked at the "eviction policy" options and it appears redis doesn't support this out of the box. https://support.redislabs.com/hc/en-us/articles/203290657-What-eviction-policies-do-you-support-
How could I implement this myself using commands available as part of the redis-client api?
Here's some pseudocode that might work to give a flavor for what I need:
1. Get the first N keys from a list sorted by key date asc.
2. Delete the oldest keys.
3. Repeat until memory is no longer constrained.
The eviction policy determines what happens when a database reaches its memory limit. To make room for new data, older data is evicted (removed) according to the selected policy.
You can select the policies from the reference link below based on your requirement. The one I am using in the example below is "allkeys-lru"
reference link
Example -
127.0.0.1:6379> CONFIG SET maxmemory-policy allkeys-lru
OK
Example in Python -
import redis
client = redis.Redis(host='localhost', port=6379, db=0)
client.config_set('maxmemory-policy', "allkeys-lru")

how to set expiry for every item in redis queue

I am using jedis, a redis java client. I have a queue of string items. As per normal I am using lpush lpop rpush rpop for the necessary operations. But I will like to set expiry for each individual items in the queue. Is it possible?
This is not possible in redis by design for the sake of keeping redis simple and fast.
You can either store an expire value along with the string in the list, or store a separate list of expire times to let your application know if the key has expired.
There is also an alternative solution discussed here. You can store values in a sorted set with expire timestamps as scores and only select those members, whose scores are greater than certain timestamp. (This of course leaves it up to your app to clear the expired elements in a set)

Real time analytic processing system design

I am designing a system that should analyze large number of user transactions and produce aggregated measures (such as trends and etc).
The system should work fast, be robust and scalable.
System is java based (on Linux).
The data arrives from a system that generate log files (CSV based) of user transactions.
The system generates a file every minute and each file contains the transactions of different users (sorted by time), each file may contain thousands of users.
A sample data structure for a CSV file:
10:30:01,user 1,...
10:30:01,user 1,...
10:30:02,user 78,...
10:30:02,user 2,...
10:30:03,user 1,...
10:30:04,user 2,...
.
.
.
The system I am planning should process the files and perform some analysis in real-time.
It has to gather the input, send it to several algorithms and other systems and store computed results in a database. The database does not hold the actual input records but only high level aggregated analysis about the transactions. For example trends and etc.
The first algorithm I am planning to use requires for best operation at least 10 user records, if it can not find 10 records after 5 minutes, it should use what ever data available.
I would like to use Storm for the implementation, but I would prefer to leave this discussion in the design level as much as possible.
A list of system components:
A task that monitors incoming files every minute.
A task that read the file, parse it and make it available for other system components and algorithms.
A component to buffer 10 records for a user (no longer than 5 minutes), when 10 records are gathered, or 5 minute have passed, it is time to send the data to the algorithm for further processing.
Since the requirement is to supply at least 10 records for the algorithm, I thought of using Storm Field Grouping (which means the same task gets called for the same user) and track the collection of 10 user's records inside the task, of course I plan to have several of these tasks, each handles a portion of the users.
There are other components that work on a single transaction, for them I plan on creating other tasks that receive each transaction as it gets parsed (in parallel to other tasks).
I need your help with #3.
What are the best practice for designing such a component?
It is obvious that it needs to maintain the data for 10 records per users.
A key value map may help, Is it better to have the map managed in the task itself or using a distributed cache?
For example Redis a key value store (I never used it before).
Thanks for your help
I had worked with redis quite a bit. So, I'll comment on your thought of using redis
#3 has 3 requirements
Buffer per user
Buffer for 10 Tasks
Should Expire every 5 min
1. Buffer Per User:
Redis is just a key value store. Although it supports wide variety of datatypes, they are always values mapped to a STRING key. So, You should decide how to identify a user uniquely incase you need have per user buffer. Because In redis you will never get an error when you override a key new value. One solution might be check the existence before write.
2. Buffer for 10 Tasks: You obviously can implement a queue in redis. But restricting its size is left to you. Ex: Using LPUSH and LTRIM or Using LLEN to check the length and decide whether to trigger your process. The key associated with this queue should be the one you decided in part 1.
3. Buffer Expires in 5 min: This is a toughest task. In redis every key irrespective of underlying datatype it value has, can have an expiry. But the expiry process is silent. You won't get notified on expiry of any key. So, you will silently lose your buffer if you use this property. One work around for this is, having an index. Means, the index will map a timestamp to the keys who are all need to be expired at that timestamp value. Then in background you can read the index every minute and manually delete the key [after reading] out of redis and call your desired process with the buffer data. To have such an index you can look at Sorted Sets. Where timestamp will be your score and set member will be the keys [unique key per user decided in part 1 which maps to a queue] you wish to delete at that timestamp. You can do zrangebyscore to read all set members with specified timestamp
Overall:
Use Redis List to implement a queue.
Use LLEN to make sure you are not exceeding your 10 limit.
Whenever you create a new list make an entry into index [Sorted Set] with Score as Current Timestamp + 5 min and Value as the list's key.
When LLEN reaches 10, remember to read then remove the key from the index [sorted set] and from the db [delete the key->list]. Then trigger your process with data.
For every one min, generate current timestamp, read the index and for every key, read data then remove the key from db and trigger your process.
This might be my way to implement it. There might be some other better way to model your data in redis
For your requirements 1 & 2: [Apache Flume or Kafka]
For your requirement #3: [Esper Bolt inside Storm. In Redis for accomplishing this you will have to rewrite the Esper Logic.]