I have a Spring boot application which is connected to Redis.
I want to perform a redis operation to fetch the keys which matches certain pattern.
I understand this can be achieved in multiple ways
Redis Template and Keys command : But its not suitable to be used on large data sets . As it may block the client (not the server) for long time and also exhaust the server memory due to the response buffer size.
Redis Template and Scan command : Redis docs recommends to use scan command in comparison to Keys. As it does the scanning iteratively which makes faster smaller operations and better on server resources.
Spring Data Redis Repository : Fetch all by creating a hash on the pattern in the Redis Entity.
But i am not sure which will give me overall faster performance to fetch all the matched keys under high load and would be recommended for my scenario.
Best Regards,
Saurav
Redis is single-threaded so traversing all keys on a large dataset (a large number of keys) may block the server for a long time (even several seconds). And so it is not advised to run 'Keys' in production at all.
The scan operation is built to run iteratively but you should note that you might get the same key more than once and also there is a chance that some keys will not be returned. Overall your system will run faster with Scan.
Related
Is there an advantage to set a default value for an entry that will be heavily queried in Redis or will querying for the unset key take the same time?
Given the keys are stored in a distributed hash, it will have to check that the key is not in the bucket before returning on a miss, which may be a bit slower than finding and stopping at a hit. Is the bucket sorted of linear? Does anything else make it slower either way?
Redis is setup in a cluster and has many million entries in this case.
I'm assuming you're just talking about strings & hashes here here (so the only operations you care about are set/get, maybe hget/hset) - From Redis' perspective, a cache hit and cache miss have the same time complexity, if anything, a cache miss will be faster because redis will not have to transfer any data back over the socket to your app.
Redis 4.0
Keys Command can list all required pattern keys
Memory Usage [key] can return the key memory
How to use them together to get sum of the used memory for that pattern keys
You'd have to implement that logic using any language you're most comfortable with. In pseudo code:
Get all key names using KEYS
For each key, get its MEMORY USAGE
Sum up the numbers
Note: don't use KEYS in production, use SCAN.
As #Itamar pointed out, do not use keys <pattern> on production as this command does a complete scan on all the keys in the redis server. This query will degrade the redis performance and almost all of the redis queries will take considerate amount of time (as redis is a single threaded application).
The thing you want to achieve can be achieved via creating a Lua script. Though I would recommend not to use custom solutions, there exists dashboards (like zabbix) for monitoring redis and memory usage.
I have multiple servers that all store set members in a shared Redis cache. When the cache fills up, I need to persist the data to disk to free up RAM. I then plan to parse the dumped data such that I will be able to combine all of the values that belong to a given key in MongoDB.
My first plan was to have each server process attempt an sadd operation. If the request fails because Redis has reached maxmemory, I planned to query for each of my set keys, and write each to disk.
However, I am wondering if there is a way to use one of the inbuilt persistence methods in Redis to write the Redis data to disk and delete the key/value pairs after writing. If this is possible I could just parse the rdb dump and work with the data in that fashion. I'd be grateful for any help others can offer on this question.
Redis' persistence is meant to be used for whatever's in the RAM. Put differently, you can't persist what ain't in RAM.
To answer your question: no, you can't use persistence to "offload" data from RAM.
I have a very large set of keys, 200M keys, with small values, <100 bytes, to store and I'm trying to use Redis. The problem is such that I have 10 Redis DB to split the keys over, but currently I'm on a single server with those 10 Redis DB. By a Redis DB I mean using SELECT. From my calculations it looks like I'm going to blow out memory. I think I'll need over 4TB of memory for this case! What are my options? First, my calculation is based on 10000 keys with 100 byte values taking 220MB of RAM (this is from a table I found). So simply put (2*10^8 / 10^4) * 220MB = 4.4TB.
If my calculation looks correct, what are my options? I've read on different posts that Redis VM is no longer an option. Can I use a Redis cluster? This still appears to require too many servers to be practical. I understand I could switch to another DB, but I'd like that to be the last resort option.
Firstly, using shared databases (i.e. the SELECT command) isn't a recommended practice since all of these databases are essentially managed by the same Redis process. It is preferable having 10 separate Redis processes (even on the same server) in order to avoid contention (more info here).
Next, there are ways to reduce the memory footprint of your database. You could, for example, perform client-side compression (see here) or consider other optimizations such as using Hashes to keep multiple values (as described here).
That said, a Redis server is ultimately bound by the amount of RAM that the host provides. Once you've reached that limit you'll need to shard your database and use a Redis cluster. Since you're already using multiple databases this shouldn't pose a big challenge as your code should already be compatible with that to a degree. Sharding can be done in one of three approaches: client, proxy or Redis Cluster. Client-side sharding can be implemented in your code or by the Redis client that you're using (if the client library that you're using supports that). Redis Cluster (v3) is expected to be released in the very near future and already has a stable release candidate. As for proxy-based sharding, there are several open source solutions out there, including Twitter's twemproxy, Netflix's dynomite and codis. Additional information about sharding and partitioning can be found here.
Disclaimer: I work at Redis Labs. Lastly, AFAIK there's only one Redis-as-a-Service provider that already provides built-in support for clustering Redis. Redis Labs' Redis Cloud is a fully-managed service that can scale seamlessly to any required capacity. Our clusters support both the '{}' hashtag standard as well as sharding by RegEx - more about this can be found here.
You can use LMDB with Dynomite to store data beyond your memory capacity. LMDB uses both disk and memory to store data. Dynomite make LMDB to be distributed.
We have done a POC with this combo and they work nicely together.
For more information, please check out our open issue here:
https://github.com/Netflix/dynomite/issues/254
I have a buffer that needs to read all values(hash, field and values) from redis after reboot, is there a way to do that in a fast way? I have approximately 100,000 hashes with 4 fields each.
Thanks!
EDIT:
The Slow way: Current Implementation is getting all the hashes using
Keys *
then
HGETALL xxx
to get all the fields' values.
There are two ways to approach this problem.
The first one is to try to optimize the KEYS/HGETALL combination you have described. Because you do not have millions of keys (100K is not so high by Redis standard), the KEYS command will not block the instance for a long time, and the output buffer size required to return 100K items is probably acceptable. Once the list of keys have been received by your program, then the next challenge is to run many HGETALL commands as fast as possible. The key is to pipeline them (for instance in synchronous batches of 1000 items) which is quite easy to implement with hiredis (just use redisAppendCommand / redisGetReply). The 100K items will be retrieved in 100 roundtrips only. Because most Redis instances can sustain 100K op/s or more, it should not last more than a few seconds. A more efficient solution would be to use the asynchronous interface of hiredis to try to maximize the throughput, but it is more complex to implement. I'm not sure it is worth it for 100K items.
The second approach is to use a BGSAVE command to take a snapshot of Redis content, retrieve the generated dump file, and then parse the file to extract the data. You can have a look at the excellent redis-rdb-tools package for a Python implementation. The main benefit of this approach is there is no impact on the Redis instance (no KEYS command to block the event loop) while still retrieving consistent data.