Redis replication for large data to new slave - redis

I have a redis master which has 30 GB of data and the memory there is 90 GB. We have this setup as we have less writes and more reads. Normally, we would have a 3X db size RAM machine.
The problem here is, one slave went corrupt and later on when we added it back using sentinel. it got stuck in wait_bgsave state on master (after seeing the info on master)
The reason was that :
client-output-buffer-limit slave 256mb 64mb 60
This was set on master and since max memory is not available it breaks replication for the new slave.
I saw this question Redis replication and client-output-buffer-limit where similar issue is being discussed but i have a broader scope of question.
We can't use a lot of memory. So, what are the possible ways to do replication in this context to prevent any failure on master (wrt. memory and latency impacts)
I have few things on mind:
1 - Should i do diskless replication - will it have any impact on latency of writes and reads?
2 - Should i just copy the dump file from another slave to this new slave and restart redis. ? will that work.
3 - Should i increase the output-buffer-limit slave to a greater limit? If yes, then how much? I want to do this for sometime till replication happens and then revert it back to normal setting? I am skeptic about this approach.

You got this problem, because you have a slow replica, and it cannot read the replication data as fast as needed.
In order to solve the problem, you can try to increase the client-output-buffer-limit buffer limit. Also you can try to disable persistence on replica when it syncing from master, and enable persistence after that. By disabling persistence, replica might consume the data faster. However, if the bandwidth between master and replica is really small, you might need to consider re-deploy your replica to make it near the master, and have a large bandwidth.
1 - Should i do diskless replication - will it have any impact on latency of writes and reads?
IMHO, I think it has nothing to do with diskless replication.
2 - Should i just copy the dump file from another slave to this new slave and restart redis. ? will that work.
NO, it won't work.
3 - Should i increase the output-buffer-limit slave to a greater limit? If yes, then how much? I want to do this for sometime till replication happens and then revert it back to normal setting?
YES, you can try to increase the limit. And in your case, since your data size is 30G, so a hard limit of 30G should slove the problem. However, that's too much, and might have other impact. You need to do some benchmark to get a right limit.
YES, you can dynamically change this setting by the CONFIG SET command.


redis async replication of a bitset

I am using redis to store some pretty large bitsets. Redis is run in master/slave sentinel mode.
I got curious about the replication performance for very big bitsets (my bitset has a size of +-100Kbyte).
From the documentation: Async replication works by sending a stream of commands between master and slave.
Can I expect those commands to update a single bit in a slave or do they copy entire keys each time? Obviously I would prefer SETBIT commands to be passed instead of setting entire keys in order to decrease network traffic.
Async replication will only pass the write command eg SETBIT to the replica in most cases.
If the replica falls too far behind however, the replica will get flushed (cleared out) and a full resync will occur. This happens if there is a lot of latency and if there are a large number of writes. If you see this happening you can tune your replication buffers to lower the possibility of a full sync

Behaviour of redis client-output-buffer-limit during resynchronization

I'm assuming that during replica resynchronisation (full or partial), the master will attempt to send data as fast as possible to the replica. Wouldn't this mean the replica output buffer on the master would rapidly fill up since the speed the master can write is likely to be faster than the throughput of the network? If I have client-output-buffer-limit set for replicas, wouldn't the master end up closing the connection before the resynchronisation can complete?
Yes, Redis Master will close the connection and the synchronization will be started from beginning again. But, please find some details below:
Do you need to touch this configuration parameter and what is the purpose/benefit/cost of it?
There is a zero (almost) chance it will happen with default configuration and pretty much moderate modern hardware.
"By default normal clients are not limited because they don't receive data
without asking (in a push way), but just after a request, so only asynchronous clients may create a scenario where data is requested faster than it can read." - the chunk from documentation .
Even if that happens, the replication will be started from beginning but it may lead up to infinite loop when slaves will continuously ask for synchronization over and over. Redis Master will need to fork whole memory snapshot (perform BGSAVE) and use up to 3 times of RAM from initial snapshot size each time during synchronization. That will be causing higher CPU utilization, memory spikes network utilization (if any) and IO.
General recommendations to avoid production issues tweaking this configuration parameter:
Don't decrease this buffer and before increasing the size of the buffer make sure you have enough memory on your box.
Please consider total amount of RAM as snapshot memory size (doubled for copy-on-write BGSAVE process) plus the size of any other buffers configured plus some extra capacity.
Please find more details here

AOF and RDB backups in redis

This question is about Redis persistence.
I'm using redis as a 'fast backend' for a social networking website. It's a single server set up. I've been transferring PostgreSQL responsibilities to Redis steadily. Currently in etc/redis/redis.conf, the appendonly setting is set to appendonly no. Snapshotting settings are save 900 1, save 300 10, save 60 10000. All this is true for production and development both. As per production logs, save 60 10000 gets invoked heavily. Does this mean that practically, I'm getting backups every 60 seconds?
Some literature suggests using AOF and RDB backups together. Thus I was weighing in on turning appendonly on and using appendfsync everysec. For anyone who has had experience of both sides of the coin:
1) Will using appendonly on and appendfsync everysec cause a performance downgrade? Will it hit the CPU? The write load is on the high side.
2) Once I restart the redis server with these new settings, I'll still lose the last 60 secs of my data, correct?
3) Are restart times something to worry about? My dump.rdb file is small; ~90MB.
I'm trying to find out more about redis persistence, and getting my expectations right. Personally, I'm fine with losing 60s of data in the case of a catastrophe, thus whether I should use AOF is also something I'm pondering. Feel free to chime in. Thanks!
Does this mean that practically, I'm getting backups every 60 seconds?
NO. Redis does a background save after 60 seconds, if there're at least 10000 keys have been changed. Otherwise, it doesn't do a background save.
Will using appendonly on and appendfsync everysec cause a performance downgrade? Will it hit the CPU? The write load is on the high side.
It depends on many things, e.g. disk performance (SSD VS HDD), write/read load (QPS), data model, and so on. You need do a benchmark with your own data in your specific environment.
Once I restart the redis server with these new settings, I'll still lose the last 60 secs of my data, correct?
NO. If you turn on both AOF and RDB, when Redis restarts, the AOF file will be used to rebuild the database. Since you config it to appendfsync everysec, you will only lose the last 1 second of data.
Are restart times something to worry about? My dump.rdb file is small; ~90MB.
If you turn on AOF, and when Redis restarts, it replays logs in AOF file to rebuild the database. Normally AOF file is larger then RDB file, and it might be slower than recovering from RDB file. Should you worry about that? Do a benchmark with your own data in your specific environment.
Assume that you already set Redis to use RDB saving, and write lots of data to Redis. After a while, you want to turn on AOF saving. NEVER MODIFY THE CONFIG FILE TO TURN ON AOF AND RESTART REDIS, OTHERWISE YOU'LL LOSE EVERYTHING.
Because, once you set appendonly yes in redis.conf, and restart Redis, it will load data from AOF file, no matter whether the file exists or not. If the file doesn't exist, it creates an empty file, and tries to load data from that empty file. So you'll lose everything.
In fact, you don't have to restart Redis to turn on AOF. Instead, you can use config set command to dynamically turn it on: config set appendonly yes.

Can redis be configured to save only to disk and not in memory?

I am facing some scaling issues with my redis instances and was wondering if there's a way to configure redis to save data only to disk (and not hold it in memory). That way I could just increase disk space and not RAM.
Right now my instances are getting stuck and just hang when they reach the memory limit.
No - Redis, atm, is an in-memory database. That means that all data that it manages resides first and foremost in RAM.

Does Redis Replication help in load balancing?

We keep continuously writing and updating events into redis and so when we ever we want to read data(which is a lot of data , upwards of for 500000 key value pairs), redis has performance issues. So, we decided to get the data via multiple threads. But because of single instance redis , the performance issues persisted .Will replication help us? As in, by making master and slave redis's , will our reads of the events be distributed to the slaves . We are thinking of making the master write only.
Any other suggestion for performance improvements?
(one of) Replication's declared purposes is to help in scaling reads, so yes to the topic.
Note that after you've set up the slave, you'll need to specify its address for your reader threads and processes. Make sure that you start with read-slaves if you don't have a clear separation between writers and readers.
If a single slave isn't enough, you can actually add more slaves. If you add them directly to the master, you'll get fresher reads but there'll eventually be a performance impact on the master. Alternatively, replication chaining is a great solution for most use cases, i.e. 1 master -> 1 slave -> n slaves.
There are probably other ways to scale Redis for your use case (e.g. clustering), but that really depends on what you're trying/wanting to do :)