Redis expire on large set of keys - redis

My problem is: i have a set of values that each of them has to have an expire value.
code:
set a:11111:22222 someValue
expire a:11111:22222 604800 \\usually equal a week
In a perfect world i would have put all those values in a hash and give each of them it's appropriate expire value, but redis does not allow expire on a hash fields.
problem is that i also have a process that need to get all those keys about once an hour
keys a:*
this command is really expensive and according to redis documentation can cause performance issues. I have about 25000-30000 keys at each given moment.
Does someone knows how can i solve such a problem?
thumbs up it guarantee (-;
Roy

Let me propose an alternative solution.
Rather than asking Redis to scan all the keys, why not perform a background dump, and parse the dump to extract the keys? This way, there is zero impact on the Redis instance itself.
Parsing the dump file is not as scary as it sounds, because you can use the excellent redis-rdb-tools package:
https://github.com/sripathikrishnan/redis-rdb-tools
You can either convert the dump file into a json file, and then parse the json file, or use the Python API to extract the keys by yourself.

As you've already mentioned, using keys is not a good solution to get your keys:
Warning: consider KEYS as a command that should only be used in production environments with extreme care. It may ruin performance when it is executed against large databases. This command is intended for debugging and special operations, such as changing your keyspace layout. Don't use KEYS in your regular application code. If you're looking for a way to find keys in a subset of your keyspace, consider using sets.
Source: Redis docs for KEYS
As the docs are suggesting, you should build your own indices!
A common way of building an index is to use a sorted set. You can read more on how it's working on my question over here.
Building references to your a:* keys using a sorted set, will also allow you to only select the required keys in relation to a date or any other int value, in case you're filtering the results!
And yes: it would be awesome if hashes could expire. Sadly it looks like its not going to happen, but there are in fact creative alternatives to take care about it by yourself.

Why don't you use a sorted set.
Here is some data creation sequence.
redis 127.0.0.1:6379> setex a:11111:22222 604800 someValue
OK
redis 127.0.0.1:6379> zadd user:index 1385112435 a:11111:22222 // 1384507635 + 604800
(integer) 1
redis 127.0.0.1:6379> setex a:11111:22223 604800 someValue2
OK
redis 127.0.0.1:6379> zadd user:index 1385113289 a:11111:22223 // 1384508489 + 604800
(integer) 1
redis 127.0.0.1:6379> zrangebyscore user:index 1385112435 1385113289
1) "a:11111:22222"
2) "a:11111:22223"
This is no select performance issue.
but, It spends more memory and insert cost.

Related

Efficiently delete RedisKeys in bulk via wildcard pattern

Problem:
I need to efficiently delete keys from my Redis Cache using a wildcard pattern. I don't need atomicity; eventual consistency is acceptable.
Tech stack:
.NET 6 (async all the way through)
StackExchange.Redis 2.6.66
Redis Server 6.2.6
I currently have ~500k keys in Redis.
I'm not able to use RedisJSON for various reasons
Example:
I store the following 3 STRING types with keys:
dailynote:getitemsforuser:region:sw:user:123
dailynote:getitemsforuser:region:fl:user:123
dailynote:getitemsforuser:region:sw:user:456
...
where each STRING stores JSON like so:
> dump dailynote:getitemsforuser:region:fl:user:123
"{\"Name\":\"john\",\"Age\":22}"
The original solution used the KeysAsync method to retrieve the list of keys to delete via a wildcard pattern. Since the Redis Server is 6.x, the SCAN feature is being used by KeysAsync internally by the StackExchange.Redis nuget.
Original implementation used a wildcard pattern dailynote:getitemsforuser:region:*. As one would expect, this solution didn't scale well and we started seeing RedisTimeoutExceptions.
I'm aware of the "avoid this in PROD if you can" and have seen Marc Gravell respond to a couple other questions/issues on SO and StackExchange.Redis GitHub. The only potential alternative I could think of is to use a Redis SET to "track" each RedisKey and then retrieve the list of values from the SET (which are the keys I need to remove). Then delete the SET as well as the returned keys.
Potential Solution?:
Create a Redis SET with a key of dailynote:getitemsforuser with a value which is the key of the form dailynote:getitemsforuser:region:XX...
The SET would look like:
dailynote:getitemsforuser (KEY)
dailynote:getitemsforuser:region:sw:user:123 (VALUE)
dailynote:getitemsforuser:region:fl:user:123 (VALUE)
dailynote:getitemsforuser:region:sw:user:456 (VALUE)
...
I would still have each individual STRING type as well:
dailynote:getitemsforuser:region:sw:user:123
dailynote:getitemsforuser:region:fl:user:123
dailynote:getitemsforuser:region:sw:user:456
...
when it is time to do the "wildcard" remove, I get the members of the dailynote:getitemsforuser SET, then call RemoveAsync passing the members of the set as the RedisKey[]. Then call RemoveAsync with the key of the SET (dailynote:getitemsforuser)
I'm looking for feedback on how viable of a solution this is, alternative ideas, gotchas, and suggestions for improvement. TIA
UPDATE
Added my solution I went with below...
The big problem with both KEYS and SCAN with Redis is that they require a complete scan of the massive hash table that stores every Redis key. Even if you use a pattern, it still needs to check each entry in that hash table to see if it matches.
Assuming you are calling SADD when you are also setting the value in your key—and thus avoiding the call to SCAN—this should work. It is worth noting that calls to SMEMBERS to get all the members of a Set can also cause issues if the Set is big. Redis—being single-threaded—will block while all the members are returned. You can mitigate this by using SSCAN instead. StackExchange.Redis might do this already. I'm not sure.
You might also be able to write a Lua script that reads the Set and UNLINKs all the keys atomically. This would reduce network but could tie Redis up if this takes too long.
I ended up using the solution I suggested above where I use a Redis SET with a known/fixed key to "track" each of the necessary keys.
When a key that needs to be tracked is added, I call StackExchange.Redis.IDatabase.SetAddAsync (SADD) while calling StackExchange.Redis.IDatabase.HashSetAsync (HSET) for adding the "tracked" key (along with its TTL).
When it is time to remove the "tracked" key, I first call StackExchange.Redis.IDatabase.SetScanAsync (SSCAN) (with a page size of 250) iterating on the IAsyncEnumerable and call StackExchange.Redis.IDatabase.KeyDeleteAsync (HDEL) on chunks of the members of the SET. I then call StackExchange.Redis.IDatabase.KeyDeleteAsync on the actual key of the SET itself.
Hope this helps someone else.

String vs Hash for string type? Hash will have only one key instead of many

For example, I see many people are doing something like the following:
> set data:1000 "some string 1"
> set data:1001 "some string 2"
But what about using a hash to minimize the number of keys?
> hset data 1000 "some string 1"
> hset data 1001 "some string 2"
In the second way, it will only create one data key instead of creating many keys in the first way.
Which way is recommended?
I just see some people and tutorial are doing hset data:10 01 xxx. This is actually not related to my question. My question is simply asking what it's recommended between set data:1001 xxx and hset data 1001 xxx.
And I don't plan to modify hash-max-zipmap-entries and hash-max-zipmap-value. That means the hash will exceed the two values eventually. In such a config, are the two ways the same? or Which way is recommended?
Reasons to use strings:
you need per value timeouts
the values are semantically isolated
you're on cluster and want the values to be sharded over different nodes to spread load (sharding is based on the key)
Reasons to use hashes:
you want to be able to purge all of them at once (del/unlink), or have a timeout that impacts all of those values at once
you want to be able to enumerate them (prefer hscan/hgetall over scan/keys)
slightly better memory usage on the keys themselves
the values are semantically related
it is OK for all the values to be on the same node (whether single-server or cluster)
This all depends on the tradeoffs you want to support. In general, using hashes will have a smaller memory footprint than using simple keys. In fact, it is about an order of magnitude less memory. And access to hash values is constant time. So, if you are using redis simply as a key-value store, then hashes are way more efficient than simple keys.
However, you will want to use simple keys if you need to support expiration, keyspace notifications, etc, then you will need to use simple keys.
Just be careful to tweak the values for hash-max-zipmap-entries and hash-max-zipmap-value in your redis.conf in order to ensure that hashes are treated correctly for your environment.
You can read all about the details in the memory optimization section of the documentation.

In Redis is it possible to find all hashes with a key containing a specified value?

I am using Jedis, and new to both that and Redis itself. I have db that stores hashes, and need to find all keys in the db that contain an entry with a specified key and a specified value. EG: "find all hashes in the db that have key/value of STATUS=ERROR". Is this possible in Jedis? From what I can tell from googling, hscan will find keys in a specified hash.
More generally, by way of teaching me to fish, any pointers for where to look this up? It seems there is no real jedis api doc, and not even the Redis doc itself seems to have nothing on hscan.
As you mentioned, you can use HSCAN to find the specified key-value pair from a hash. Also, you need to use the SCAN command to find all hashes.
However, this is NOT an efficient solution. In order to achieve your goal efficiently, you need to build an extra index, i.e. use a Redis SET to save keys of all hashes that have the specified key-value pair.
HSET hash1 STATUS ERROR
// ...
// HSET other members
// ...
// add it to index
SADD status:error hash1
// get all hashes have the specified key-value pair
SMEMBERS status:error
UPDATE:
As #Itamar Haber mentioned in the comments, if you have many records in the SET, you should use SSCAN to get these members. Since in this case, SMEMBERS might block Redis for a long time.

Get all hashes exists in redis

I'm have hashes in redis cache like:
Hash Key Value
hashme:1 Hello World
hashme:2 Here Iam
myhash:1 Next One
My goal is to get the Hashes as output in the CLI like:
hashme
myhash
If there's no such option, this is ok too:
hashme:1
hashme:2
myhash:1
I didn't find any relevant command for it in Redis API.
Any suggestions ?
You can use the SCAN command to get all keys from Redis. Then for each key, use the TYPE command to check if it's a hash.
UPDATE:
With Redis 6.0, the SCAN command supports TYPE subcommand, and you can use this subcommand to scan all keys of a specified type:
SCAN 0 TYPE hash
Also never use KEYS command in production environment!!! It's a dangerous command which might block Redis for a long time.
keys *
is work for me. you Can try it.
The idea of redis (and others K/v stores) is for you to build an index. The database won't do it for you. It's a major difference with relational databases, which conributes to better performances.
So when your app creates a hash, put its key into a SET. When your app deletes a hash, remove its key from the SET. And then to get the list of hash IDs, just use SMEMBERS to get the content of the SET.
connection.keys('*') this will bring all the keys irrespective of the data type as everything in redis is stored as key value format
for redis in python, you can use below command to retrieve keys from redis db
def standardize_list(bytelist):
return [x.decode('utf-8') for x in bytelist]
>>> standardize_list(r.keys())
['hat:1236154736', 'hat:56854717', 'hat:1326692461']
here r variable is redis connection object

Redis, partial match keys with end of line

This is a 2 part question.
I have a redis db storing items with the following keys:
record type 1: "site_id:1_item_id:3"
record type 2: "site_id:1_item_id:3_user_id:6"
I've been using KEYS site_id:1_item_id:* to grab record type 1 items (in this case for site 1)
Unfortunately, it returns all type 1 and type 2 items.
Whats the best way to grab all "site_id:1_item_id:3" type records? While avoiding the ones including user_id? Is there an EOL match I can use?
Secondly, I've read using KEYS is a bad choice, can anyone recommend a different approach here? I'm open to editing the key names if I must.
First thing first: unless your are the only redis user on your local developpment machine, you are right using KEYS is wrong. It blocks the redis instance until completed, so anyone querying it while you are using keys will have to wait for you keys query to be finished. Use SCAN instead.
SCAN will iterate over the entries in a non blocking way, and you are guaranteed to get all of them.
I don't know which language you use to query it, but in python it is quite easy to query the keys with scan and filter them on the fly.
But let's say you would like to use keys anyway. Looks to me like either doing KEYS site_id:1_item_id:? or KEYS site_id:1_item_id:3 does the trick.
wether you want the keys finishing with "3" or not (I am not sure I completely understood your question here).
Here is an example that I tried on my local machine:
redis 127.0.0.1:6379> flushall
OK
redis 127.0.0.1:6379> set site_id:1_item_id:3 a
OK
redis 127.0.0.1:6379> set site_id:1_item_id:3_user_id:6 b
OK
redis 127.0.0.1:6379> set site_id:1_item_id:4 c
OK
// ok so we have got the database cleaned and set up
redis 127.0.0.1:6379> keys *
1) "site_id:1_item_id:3"
2) "site_id:1_item_id:4"
3) "site_id:1_item_id:3_user_id:6"
// gets all the keys like site_id:1_item_id:X
redis 127.0.0.1:6379> keys site_id:1_item_id:?
1) "site_id:1_item_id:3"
2) "site_id:1_item_id:4"
// gets all the keys like site_id:1_item_id:3
redis 127.0.0.1:6379> keys site_id:1_item_id:3
1) "site_id:1_item_id:3"
Don't forget that Redis KEYS uses GLOB style pattern, which is not exactly like a regex.
You can check out the keys documentation examples to make sure you understand
The correct approach here, is to use an index of keys - maintained by you. Redis should not be queried in any conventional sense.