We observered a sporadic issue (once in every month or 2) in our redis. Suddenly we see that redis read/write calls from our application returns a failure. At that point we are still able to connect to redis from redis-cli and doing a FLUSHALL resolves the issue.
Has anyone encountered this kind of issue. Our redis version is 4.0.9. Since we are not observing any issue in connecting to redis, we are not suspecting issue with Jedis Client.
What kind of redis calls? Something that writes the data or reads the data?
All in all, just as an idea, examine your redis.conf file
It has Maxmemory configuration directive, and when redis reaches max memory limit it acts according to eviction policy.
If the policy is set to 'noeviction', the behavior sounds like what you've described (from their documentation):
return errors when the memory limit was reached and the client is trying to execute commands that could result in more memory to be used (most write commands, but DEL and a few more exceptions).
You can configure redis as a cache so that it will delete some entries by itself.
Read about this here
Related
I'm facing a challenge where we're getting spikes in redis problems causing RedisCommandTimeout and consequently the #Cacheable annotation behavior tries to put the object's in redis.
When we received this peak timeout in redis, we jumped from 30 Command timeout per minute to 60 thousand, consequently we have the put of these 60 thousand data, helping to degrade the redis environment.
I'm studying the component to check if there is any configuration or customization so that in case of an error in the cache get it does not stimulate the put.
Has anyone gone through something similar?
Our application is written in java with spring boot, uses the lettuce library from redis and we use it in sentinel mode.
In case anyone has gone through something similar and can share.
Recently, I started to have some trouble with one of me Redis cluster. used_memroy and used_memory_rss increasing constantly.
According to some Googling, I found following discussion:
https://github.com/antirez/redis/issues/4570
Now I am wandering if it is safe to run SCRIPT FLUSH command on my production Redis cluster?
Yes - you can run the SCRIPT FLUSH command safely in a production cluster. The only potential side effect is blocking the server while it executes. Note, however, that you'll want to call it in each of your nodes.
We have a redis configuration with two redis servers. We also have 3 sentinels to monitor the two instances and initiate a fail over when needed.
We currently have a process where we periodically have to do a FLUSHALL on the redis server. This is a blocking operation that takes longer than the time we have allotted for the sentinels to timeout. In other words, we have our sentinel configuration with:
sentinel down-after-milliseconds OurMasterName 5000
and doing a redis-cli FLUSHALL on the server takes > 5000 milliseconds, so the sentinels initiate a fail over.
We acknowledge that doing a FLUSHALL isn't great and we also know that we could increase the down-after-milliseconds to but for the purposes of this question assume that neither of these are options.
The question is: how can we do a FLUSHALL (or equivalent operation) WITHOUT having our sentinels initiate a fail over due to the FLUSHALL blocking for greater than 5000 milliseconds? Has anyone encountered and solved this problem?
You could just create new instances: if you are using something like AWS or Azure than you have API for creating a new Redis cluster. Start it, load it with data and once ready just modify the DNS, again with API call -so all these can be handled by some part of your application. But on premises things can get more complex because it will require some automation with ansible/chef/puppet.
The next best option you currently have to is to delete keys in batches to reduce the amout of work to at once. You can build a list, assuming you don't have one, using scan Then delete in whatever batch size works for you.
Edit: as you are not interested in keeping data, disable persistence, delete the RDB file, then just restart the instance. This way you do t have to update sentinel like you would if you take the provision new hosts.
Out of curiosity, if you're just going to be flushing all the time and don't care about the data as you'll be wiping it, why bother with sentinel?
I have a Redis server with maxmemory 512MB and maxmemory-policy allkeys-lru but once the server has filled up after a day of usage, I can't add any more items:
redis 127.0.0.1:6379[3]> set foooo 123
(error) OOM command not allowed when used memory > 'maxmemory'.
IMHO that never should happen with the LRU policy.
I copied some server info to this Pasebin: http://pastebin.com/qkax4C7A
How can I solve this problem?
Note: I'm trying to use maxmemory because my Redis server is continously eating up memory even though nearly all keys have an expire setting and because FLUSHDB does not release system memory - perhaps this is related..
In the end I'm trying to use Redis as a cache.
Your info output suggests that a lot of your server's memory is taken by Lua scripts:
used_memory_lua:625938432
Note that Lua scripts remain in memory until the server is restarted or SCRIPT FLUSH is called. It would appear as if you're generating Lua scripts on the fly...
I'm wondering what are the pros and cons of using redis as a broker in an infrastructure?
At the moment, all my agents are sending to a central NXLog server which proxies the requests to logstash --> ES.
What would I gain by using a redis server in between my nxlog collector and logstash? To me, it seems pointless as nxlog has already good mem and disk buffers in case logstash is down.
What would I gain?
Thank you
On a heavy load : calling ES (HTTP) directly can be dangerous and you can have problems if ES break down .
Redis can handle More (Much more) Write request and send it in asynch logic to ES(HTTP).
I started using redis because I felt that it would separat the input and the filter part.
At least during periodes in which I change the configuration a lot.
As you know if you change the logstash configuration you have to restart the thing. All clients (in my case via syslog) are doomed to reconnect to the logstash daemon when he is back in business.
By putting an indexer in front which holds the relativly static input configuration and pusing everything to redis I am able to restart logstash without causing hickups throughout the datacenter.
I encountered some issues, because our developers hadn't found time (yet) to reduce the amount of useless logs send to syslog, thus overflowing the server. Before we had logstash they overflowed the disk space for logs - more general issue though... :)
When used with Logstash, Redis acts as a message queue. You can have multiple writers and multiple readers.
By using Redis (or any other queueing service) allows you to scale Logstash horizontaly by adding more servers to the 'cluster'. This will not matter for small operations but can be extremely useful for larger installations.
When using Logstash with Redis, you can configure Redis to only store all the log entries in memory which would like a in memory queue (like memcache).
You mat come to the point where the number of logs sent will not be processed by Logstash and it can bring down your system on constant basis (observed in our environment).
If you feel Redis is an overhead for your disk, you can configure it to store all the logs in memory until they are processed by logstash.
As we built our ELK infrastructure, we originally had a lot of problems with the logstash indexer (reading from redis). Redis would back up and eventually die. I believe this was because, in the hope of not losing log files, redis was configured to persist the cache to disk once in a while. When the queue got "too large" (but still within available disk space), redis would die, taking all of the cached entries with it.
If this is the best redis can do, I wouldn't recommend it.
Fortunately, we were able to resolve the issues with the indexer, which typically kept the redis queue empty. We set our monitoring to alert quickly when the queue did back up, and it was a good sign that the indexer was unhappy again.
Hope that helps.