Behaviour of redis client-output-buffer-limit during resynchronization - redis

I'm assuming that during replica resynchronisation (full or partial), the master will attempt to send data as fast as possible to the replica. Wouldn't this mean the replica output buffer on the master would rapidly fill up since the speed the master can write is likely to be faster than the throughput of the network? If I have client-output-buffer-limit set for replicas, wouldn't the master end up closing the connection before the resynchronisation can complete?

Yes, Redis Master will close the connection and the synchronization will be started from beginning again. But, please find some details below:
Do you need to touch this configuration parameter and what is the purpose/benefit/cost of it?
There is a zero (almost) chance it will happen with default configuration and pretty much moderate modern hardware.
"By default normal clients are not limited because they don't receive data
without asking (in a push way), but just after a request, so only asynchronous clients may create a scenario where data is requested faster than it can read." - the chunk from documentation .
Even if that happens, the replication will be started from beginning but it may lead up to infinite loop when slaves will continuously ask for synchronization over and over. Redis Master will need to fork whole memory snapshot (perform BGSAVE) and use up to 3 times of RAM from initial snapshot size each time during synchronization. That will be causing higher CPU utilization, memory spikes network utilization (if any) and IO.
General recommendations to avoid production issues tweaking this configuration parameter:
Don't decrease this buffer and before increasing the size of the buffer make sure you have enough memory on your box.
Please consider total amount of RAM as snapshot memory size (doubled for copy-on-write BGSAVE process) plus the size of any other buffers configured plus some extra capacity.
Please find more details here

Related

redis async replication of a bitset

I am using redis to store some pretty large bitsets. Redis is run in master/slave sentinel mode.
I got curious about the replication performance for very big bitsets (my bitset has a size of +-100Kbyte).
From the documentation: Async replication works by sending a stream of commands between master and slave.
Can I expect those commands to update a single bit in a slave or do they copy entire keys each time? Obviously I would prefer SETBIT commands to be passed instead of setting entire keys in order to decrease network traffic.
Async replication will only pass the write command eg SETBIT to the replica in most cases.
If the replica falls too far behind however, the replica will get flushed (cleared out) and a full resync will occur. This happens if there is a lot of latency and if there are a large number of writes. If you see this happening you can tune your replication buffers to lower the possibility of a full sync

Redis replication for large data to new slave

I have a redis master which has 30 GB of data and the memory there is 90 GB. We have this setup as we have less writes and more reads. Normally, we would have a 3X db size RAM machine.
The problem here is, one slave went corrupt and later on when we added it back using sentinel. it got stuck in wait_bgsave state on master (after seeing the info on master)
The reason was that :
client-output-buffer-limit slave 256mb 64mb 60
This was set on master and since max memory is not available it breaks replication for the new slave.
I saw this question Redis replication and client-output-buffer-limit where similar issue is being discussed but i have a broader scope of question.
We can't use a lot of memory. So, what are the possible ways to do replication in this context to prevent any failure on master (wrt. memory and latency impacts)
I have few things on mind:
1 - Should i do diskless replication - will it have any impact on latency of writes and reads?
2 - Should i just copy the dump file from another slave to this new slave and restart redis. ? will that work.
3 - Should i increase the output-buffer-limit slave to a greater limit? If yes, then how much? I want to do this for sometime till replication happens and then revert it back to normal setting? I am skeptic about this approach.
You got this problem, because you have a slow replica, and it cannot read the replication data as fast as needed.
In order to solve the problem, you can try to increase the client-output-buffer-limit buffer limit. Also you can try to disable persistence on replica when it syncing from master, and enable persistence after that. By disabling persistence, replica might consume the data faster. However, if the bandwidth between master and replica is really small, you might need to consider re-deploy your replica to make it near the master, and have a large bandwidth.
1 - Should i do diskless replication - will it have any impact on latency of writes and reads?
IMHO, I think it has nothing to do with diskless replication.
2 - Should i just copy the dump file from another slave to this new slave and restart redis. ? will that work.
NO, it won't work.
3 - Should i increase the output-buffer-limit slave to a greater limit? If yes, then how much? I want to do this for sometime till replication happens and then revert it back to normal setting?
YES, you can try to increase the limit. And in your case, since your data size is 30G, so a hard limit of 30G should slove the problem. However, that's too much, and might have other impact. You need to do some benchmark to get a right limit.
YES, you can dynamically change this setting by the CONFIG SET command.

Couchbase 3.1.0 - Hard out of memory error when performing full backup

We recently migrated to Couchbase 3.1.0. The odd thing is - when performing full backup of a bucket, web UI alerts "Hard Out Of Memory Error. Bucket X on node Y is full. All memory allocated to this bucket is used for metadata". The numbers from RAM usage in the web UI contradict that - about 75% is used, but not 100%. I looked into the logs, but haven't find any similar errors there.
Is that even normal?
This is a known issue in the Couchbase Server 3.x releases.
To understand the problem, we must also first understand Database Change Protocol (DCP), the protocol used to transfer data throughout the system. At a high level the flow-control for DCP is as follows:
The Consumer creates a connection with the Producer and sends an Open Connection message. The Consumer then sends a Control message to indicate per stream flow control. This messages will contain “stream_buffer_size” in the key section and the buffer size the Consumer would like each stream to have in the value section.
The Consumer will then start opening streams so that is can receive data from the server.
The Producer will then continue to send data for the stream that has buffer space available until it reaches the maximum send size.
Steps 1-3 continue until the connection is closed, as the Consumer continues to consume items from the stream.
The cbbackup utility does not implement any flow control (data buffer limits) however, and it will try to stream all vbuckets from all nodes at once, with no cap on the buffer size.
While this does not mean that it will use the same amount of memory as your overall data size (as the streams are being drained slowly by the cbbackup process), it does mean that a large memory overhead is required to be able to store the data streams.
When you are in a heavy DGM (disk greater than memory) scenario, the amount of memory required to store the streams is likely to grow more rapidly than cbbackup can drain them as it is streaming large quantities of data off of disk, leading to very large streams, which take up a lot of memory as previously mentioned.
The slightly misleading message about metadata taking up all of the memory is displayed as there is no memory left for the data, so all of the remaining memory is allocated to the metadata, which when using value eviction cannot be ejected from memory.
The reason that this only affects Couchbase Server versions prior to 4.0 is that in 4.0 a server-side improvement to DCP stream management was made that allows the pausing of DCP streams to keep the memory footprint down, this is tracked as MB-12179.
As a result, you should not experience the same issue on Couchbase Server versions 4.x+, regardless of how DGM your bucket is.
Workaround
If you find yourself in a situation where this issue is occurring, then terminating the backup job should release all of the memory consumed by the streams immediately.
Unfortunately if you have already had most of your data evicted from memory as a result of the backup, then you will have to retrieve a large quantity of data off of disk instead of RAM for a small period of time, which is likely to increase your get latencies.
Over time 'hot' data will be brought into memory when requested, so this will only be a problem for a small period of time, however this is still a fairly undesirable situation to be in.
The workaround to avoid this issue completely is to only stream a small number of vbuckets at once when performing the backup, as opposed to all vbuckets which cbbackup does by default.
This can be achieved using cbbackupwrapper which comes bundled with all Couchbase Server releases 3.1.0 and later, details of using cbbackupwrapper can be found in the Couchbase Server documentation.
In particular the parameter to pay attention to is the -n flag, which specifies the number of vbuckets to be backed up in a batch at once.
As the name suggests, cbbackupwrapper is simply a wrapper script on top of cbbackup which partitions the vbuckets up and automatically handles all of the directory creation and backup generation, while still using cbbackup under the hood.
As an example, with a batch size of 50, cbbackupwrapper would backup vbuckets 0-49 first, followed by 50-99, then 100-149 etc.
It is suggested that you test with cbbackupwrapper in a testing environment which mirrors your production environment to find a suitable value for -n and -P (which controls how many backup processes run at once, the combination of these two controls the amount of memory pressure caused by backup as well as the overall speed).
You should not find that lowering the value of -n from its default 100 decreases the backup speed, in some cases you may find that the backup speed actually increases due to the fact that there is far less memory pressure on the server.
You may however wish to sensibly adjust the -P parameter if you wish to speed up the backup further.
Below is an example command:
cbbackupwrapper http://[host]:8091 [backup_dir] -u [user_name] -p [password] -n 50
It should be noted that if you use cbbackupwrapper to perform your backup then you must also use cbrestorewrapper to restore the data, as cbrestorewrapper is automatically aware of the directory structures used by cbbackupwrapper.
When you run a full backup, by default the backup tool streams data from all nodes over the network. This is not the best way, because it causes a lot of extra load and increased memory usage, especially of you run cbbackup on one of the Couchbase nodes. I would use the data-copy mode of cbbackup, which copies data directly from the files on disk:
> sudo /opt/couchbase/bin/cbbackup couchstore-files:///opt/couchbase/var/lib/couchbase/data/ /tmp/backup
Of course, change the data path to wherever your Couchbase data is actually stored. (In my example it runs as sudo because only root has read access to /opt/couchbase/blabla..) Do this on every node, then collect all the backup folders and put them somewhere. Note that the backups are very compressible, so you might want to zip them before copying over the network.

Does Redis persist data?

I understand that Redis serves all data from memory, but does it persist as well across server reboot so that when the server reboots it reads into memory all the data from disk. Or is it always a blank store which is only to store data while apps are running with no persistence?
I suggest you read about this on http://redis.io/topics/persistence . Basically you lose the guaranteed persistence when you increase performance by using only in-memory storing. Imagine a scenario where you INSERT into memory, but before it gets persisted to disk lose power. There will be data loss.
Redis supports so-called "snapshots". This means that it will do a complete copy of whats in memory at some points in time (e.g. every full hour). When you lose power between two snapshots, you will lose the data from the time between the last snapshot and the crash (doesn't have to be a power outage..). Redis trades data safety versus performance, like most NoSQL-DBs do.
Most NoSQL-databases follow a concept of replication among multiple nodes to minimize this risk. Redis is considered more a speedy cache instead of a database that guarantees data consistency. Therefore its use cases typically differ from those of real databases:
You can, for example, store sessions, performance counters or whatever in it with unmatched performance and no real loss in case of a crash. But processing orders/purchase histories and so on is considered a job for traditional databases.
Redis server saves all its data to HDD from time to time, thus providing some level of persistence.
It saves data in one of the following cases:
automatically from time to time
when you manually call BGSAVE command
when redis is shutting down
But data in redis is not really persistent, because:
crash of redis process means losing all changes since last save
BGSAVE operation can only be performed if you have enough free RAM (the amount of extra RAM is equal to the size of redis DB)
N.B.: BGSAVE RAM requirement is a real problem, because redis continues to work up until there is no more RAM to run in, but it stops saving data to HDD much earlier (at approx. 50% of RAM).
For more information see Redis Persistence.
It is a matter of configuration. You can have none, partial or full persistence of your data on Redis. The best decision will be driven by the project's technical and business needs.
According to the Redis documentation about persistence you can set up your instance to save data into disk from time to time or on each query, in a nutshell. They provide two strategies/methods AOF and RDB (read the documentation to see details about then), you can use each one alone or together.
If you want a "SQL like persistence", they have said:
The general indication is that you should use both persistence methods if you want a degree of data safety comparable to what PostgreSQL can provide you.
The answer is generally yes, however a fuller answer really depends on what type of data you're trying to store. In general, the more complete short answer is:
Redis isn't the best fit for persistent storage as it's mainly performance focused
Redis is really more suitable for reliable in-memory storage/cacheing of current state data, particularly for allowing scalability by providing a central source for data used across multiple clients/servers
Having said this, by default Redis will persist data snapshots at a periodic interval (apparently this is every 1 minute, but I haven't verified this - this is described by the article below, which is a good basic intro):
http://qnimate.com/redis-permanent-storage/
TL;DR
From the official docs:
RDB persistence [the default] performs point-in-time snapshots of your dataset at specified intervals.
AOF persistence [needs to be explicitly configured] logs every write operation received by the server, that will be played again at server startup, reconstructing the
original dataset.
Redis must be explicitly configured for AOF persistence, if this is required, and this will result in a performance penalty as well as growing logs. It may suffice for relatively reliable persistence of a limited amount of data flow.
You can choose no persistence at all.Better performance but all the data lose when Redis shutting down.
Redis has two persistence mechanisms: RDB and AOF.RDB uses a scheduler global snapshooting and AOF writes update to an apappend-only log file similar to MySql.
You can use one of them or both.When Redis reboots,it constructes data from reading the RDB file or AOF file.
All the answers in this thread are talking about the possibility of redis to persist the data: https://redis.io/topics/persistence (Using AOF + after every write (change)).
It's a great link to get you started, but it is defenently not showing you the full picture.
Can/Should You Really Persist Unrecoverable Data/State On Redis?
Redis docs does not talk about:
Which redis providers support this (AOF + after every write) option:
Almost none of them - redis labs on the cloud does NOT provide this option. You may need to buy the on-premise version of redis-labs to support it. As not all companies are willing to go on-premise, then they will have a problem.
Other Redis Providers does not specify if they support this option at all. AWS Cache, Aiven,...
AOF + after every write - This option is slow. you will have to test it your self on your production hardware to see if it fits your requirements.
Redis enterpice provide this option and from this link: https://redislabs.com/blog/your-cloud-cant-do-that-0-5m-ops-acid-1msec-latency/ let's see some banchmarks:
1x x1.16xlarge instance on AWS - They could not achieve less than 2ms latency:
where latency was measured from the time the first byte of the request arrived at the cluster until the first byte of the ‘write’ response was sent back to the client
They had additional banchmarking on a much better harddisk (Dell-EMC VMAX) which results < 1ms operation latency (!!) and from 70K ops/sec (write intensive test) to 660K ops/sec (read intensive test). Pretty impresive!!!
But it defenetly required a (very) skilled devops to help you create this infrastructure and maintain it over time.
One could (falsy) argue that if you have a cluster of redis nodes (with replicas), now you have full persistency. this is false claim:
All DBs (sql,non-sql,redis,...) have the same problem - For example, running set x 1 on node1, how much time it takes for this (or any) change to be made in all the other nodes. So additional reads will receive the same output. well, it depends on alot of fuctors and configurations.
It is a nightmare to deal with inconsistency of a value of a key in multiple nodes (any DB type). You can read more about it from Redis Author (antirez): http://antirez.com/news/66. Here is a short example of the actual ngihtmare of storing a state in redis (+ a solution - WAIT command to know how much other redis nodes received the latest change change):
def save_payment(payment_id)
redis.rpush(payment_id,”in progress”) # Return false on exception
if redis.wait(3,1000) >= 3 then
redis.rpush(payment_id,”confirmed”) # Return false on exception
if redis.wait(3,1000) >= 3 then
return true
else
redis.rpush(payment_id,”cancelled”)
return false
end
else
return false
end
The above example is not suffeint and has a real problem of knowing in advance how much nodes there actually are (and alive) at every moment.
Other DBs will have the same problem as well. Maybe they have better APIs but the problem still exists.
As far as I know, alot of applications are not even aware of this problem.
All in all, picking more dbs nodes is not a one click configuration. It involves alot more.
To conclude this research, what to do depends on:
How much devs your team has (so this task won't slow you down)?
Do you have a skilled devops?
What is the distributed-system skills in your team?
Money to buy hardware?
Time to invest in the solution?
And probably more...
Many Not well-informed and relatively new users think that Redis is a cache only and NOT an ideal choice for Reliable Persistence.
The reality is that the lines between DB, Cache (and many more types) are blurred nowadays.
It's all configurable and as users/engineers we have choices to configure it as a cache, as a DB (and even as a hybrid).
Each choice comes with benefits and costs. And this is NOT an exception for Redis but all well-known Distributed systems provide options to configure different aspects (Persistence, Availability, Consistency, etc). So, if you configure Redis in default mode hoping that it will magically give you highly reliable persistence then it's team/engineer fault (and NOT that of Redis).
I have discussed these aspects in more detail on my blog here.
Also, here is a link from Redis itself.

Cassandra Commit and Recovery on a Single Node

I am a newbie to Cassandra - I have been searching for information related to commits and crash recovery in Cassandra on a single node. And, hoping someone can clarify the details.
I am testing Cassandra - so, set it up on a single node. I am using stresstool on datastax to insert millions of rows. What happens if there is an electrical failure or system shutdown? Will all the data that was in Cassandra's memory get written to disk upon Cassandra restart (I guess commitlog acts as intermediary)? How long is this process?
Thanks!
Cassandra's commit log gives Cassandra durable writes. When you write to Cassandra, the write is appended to the commit log before the write is acknowledged to the client. This means every write that the client receives a successful response for is guaranteed to be written to the commit log. The write is also made to the current memtable, which will eventually be written to disk as an SSTable when large enough. This could be a long time after the write is made.
However, the commit log is not immediately synced to disk for performance reasons. The default is periodic mode (set by the commitlog_sync param in cassandra.yaml) with a period of 10 seconds (set by commitlog_sync_period_in_ms in cassandra.yaml). This means the commit log is synced to disk every 10 seconds. With this behaviour you could lose up to 10 seconds of writes if the server loses power. If you had multiple nodes in your cluster and used a replication factor of greater than one you would need to lose power to multiple nodes within 10 seconds to lose any data.
If this risk window isn't acceptable, you can use batch mode for the commit log. This mode won't acknowledge writes to the client until the commit log has been synced to disk. The time window is set by commitlog_sync_batch_window_in_ms, default is 50 ms. This will significantly increase your write latency and probably decrease the throughput as well so only use this if the cost of losing a few acknowledged writes is high. It is especially important to store your commit log on a separate drive when using this mode.
In the event that your server loses power, on startup Cassandra replays the commit log to rebuild its memtable. This process will take seconds (possibly minutes) on very write heavy servers.
If you want to ensure that the data in the memtables is written to disk you can run 'nodetool flush' (this operates per node). This will create a new SSTable and delete the commit logs referring to data in the memtables flushed.
You are asking something like
What happen if there is a network failure at the time data is being loaded in Oracle using SQL*Loader ?
Or what happens Sqoop stops processing due to some condition while transferring data?
Simply whatever data is being transferred before electrical failure or system shutdown, it will remain the same.
Coming to second question, when ever the memtable runs out of space, i.e when the number of keys exceed certain limit (128 is default) or when it reaches the time duration (cluster clock), it is being stored into sstable, immutable space.