Since Redis is single threaded it shouldn't allow any other SET commands to start until initial one finishes, right?
In my application each SET saves between 1 and 300 MB of data, so it would be a bottleneck probably (network overhead etc)
Related
I am new to Redis, and a little bit confused when I should use pipelining or should I use it all the time when there are more than 1 command to be sent?
For example, if I want to send 10 SET commands to Redis server at a time, should I simply run the 10 commands one by one or should I pipeline them?
Are there any disadvantage to pipeline 10 SET commands instead of sending them one by one?
when I should use pipelining
Pipeline is used to reduce RTT, so that you can improve the performance, when you need to send many commands to Redis.
should I use it all the time when there are more than 1 command to be sent?
It depends. You should discuss it case by case.
if I want to send 10 SET commands to redis server at a time, should I simply run the 10 commands one by one or should I pipeline them?
Pipline these commands will be much faster than sending 10 commands. However, in this particular case, the best choice is using the MSET command.
Are there any disadvantage to pipeline 10 SET commands instead of sending them one by one?
With pipeline, Redis needs to consume more memory to hold the result of all these piped commands before sending them to client. So if you pipe too many commands, that's might be a problem.
When looking into the slow log, I find in one of my cluster, there are some slow queries which is set and expire command which even cost more than 60ms
Since the time is only the execution time which does not contain the queuing time and roudtrips time, so this means the command like
SET I2D_5b5e89403dc4e6580c0f4f45 a-24-length-string
My redis is totally 234GB and now 32GB is used. The command is sent to redis by JedisCluster.
Imagine setup of Redis Cluster for example, or just usual sharded setup, where we have N > 1 Redis processes per physical node. All our processes have same redis.conf and enabled SAVE options there with same SAVE period. So, if all our main Redis processes started on the same time - all of them will start SAVE on the same time or around it.
When we have 9 Redis processes and all of them start RDB snapshotting on the same time it:
Affects performance, because we make 9 forked processes that start consume CPU and do IO on the same time.
Requires too much reserved additional memory that can't be used as actual storage, because on write-heavy application Redis may use up to 2x the memory normally used during snapshotting. So... if we want to have redis processes for 100Gb on this node - we should take additional 100Gb for forking all processes on the same time to be safe.
Is there any best practice to modify this setup and make Redis processes start saving one by one or at least with some randomization?
I have only one idea with disabling schedule in redis.conf and write cron script that will start save one by one with time lag. But this solution looks like a hack and it should be some other practices here.
I try to find about custom AOF configuration. I found only that:
There are three options:
fsync every time a new command is appended to the AOF. Very very slow, very safe.
fsync every second. Fast enough (in 2.4 likely to be as fast as snapshotting), and you can lose 1 second of data if there is a disaster.
Never fsync, just put your data in the hands of the Operating System. The faster and less safe method.
Can I configure fsync which every time append a command to the AOF only for specific command (INCR)?
Is it possible ?
You could do that with a MULTI/EXEC block, i.e.:
MULTI
CONFIG SET appendfsync always
INCR somekey
CONFIG SET appendfsync no
EXEC
I am a newbie to Cassandra - I have been searching for information related to commits and crash recovery in Cassandra on a single node. And, hoping someone can clarify the details.
I am testing Cassandra - so, set it up on a single node. I am using stresstool on datastax to insert millions of rows. What happens if there is an electrical failure or system shutdown? Will all the data that was in Cassandra's memory get written to disk upon Cassandra restart (I guess commitlog acts as intermediary)? How long is this process?
Thanks!
Cassandra's commit log gives Cassandra durable writes. When you write to Cassandra, the write is appended to the commit log before the write is acknowledged to the client. This means every write that the client receives a successful response for is guaranteed to be written to the commit log. The write is also made to the current memtable, which will eventually be written to disk as an SSTable when large enough. This could be a long time after the write is made.
However, the commit log is not immediately synced to disk for performance reasons. The default is periodic mode (set by the commitlog_sync param in cassandra.yaml) with a period of 10 seconds (set by commitlog_sync_period_in_ms in cassandra.yaml). This means the commit log is synced to disk every 10 seconds. With this behaviour you could lose up to 10 seconds of writes if the server loses power. If you had multiple nodes in your cluster and used a replication factor of greater than one you would need to lose power to multiple nodes within 10 seconds to lose any data.
If this risk window isn't acceptable, you can use batch mode for the commit log. This mode won't acknowledge writes to the client until the commit log has been synced to disk. The time window is set by commitlog_sync_batch_window_in_ms, default is 50 ms. This will significantly increase your write latency and probably decrease the throughput as well so only use this if the cost of losing a few acknowledged writes is high. It is especially important to store your commit log on a separate drive when using this mode.
In the event that your server loses power, on startup Cassandra replays the commit log to rebuild its memtable. This process will take seconds (possibly minutes) on very write heavy servers.
If you want to ensure that the data in the memtables is written to disk you can run 'nodetool flush' (this operates per node). This will create a new SSTable and delete the commit logs referring to data in the memtables flushed.
You are asking something like
What happen if there is a network failure at the time data is being loaded in Oracle using SQL*Loader ?
Or what happens Sqoop stops processing due to some condition while transferring data?
Simply whatever data is being transferred before electrical failure or system shutdown, it will remain the same.
Coming to second question, when ever the memtable runs out of space, i.e when the number of keys exceed certain limit (128 is default) or when it reaches the time duration (cluster clock), it is being stored into sstable, immutable space.