Redis runs out of memory cause slow query but can not find in slow log - redis

I have query take seconds to get a key from redis sometimes.
Redis info shows used_memory is 2 times lager than used_memory_rss and OS starts to use swap.
After cleaning the useless data, used_memory is lower than used_memory_rss and everything goes fine.
what confuse me is: if any query cost like 10 second and block other query to redis would lead serious problem to other part of the app, but it seems fine to the app.
And I can not find any of this long time query in slow log, so I check redis SLOWLOG command and it says
The execution time does not include I/O operations like talking with the client, sending the reply and so forth, but just the time needed to actually execute the command (this is the only stage of command execution where the thread is blocked and can not serve other requests in the meantime)
so if this means the execution of the query is normal and not blocking any other queries? What happen to the query when memory is not enough and lead this long time query? Which part of these query takes so long since "actually execute the command" time cost not long enough to get into slowlog?
Thanks!

When memory is not enough Redis will definitely slow down as it will start swapping .You can use INFO to report the amount of memory Redis is using ,even you can set a max limit to memory usage, using the maxmemory option in the config file to put a limit to the memory Redis can use. If this limit is reached Redis will start to reply with an error to write commands (but will continue to accept read-only commands),

Related

How to troubleshoot periodically CPU jump in redis

I use AWS ElastiCache Redis for our prod. I see CPU every 30 minutes of the round hour from average of 2-3% to 20%.
This is constant, which tells me it comes from schedule job.
From cloudwatch I have a suspicion it is related to KEY (and maybe SET) commands and it's latency is the only one which jumps in the same exact time as the CPU jumps.
I would like to understand what KEY (and maybe SET) commands run on a specific time, or some other way which can help me investigate this.
Thanks for any advice.
with redis-cli monitor I was able to get most of the commands running on server in a stream and get the high usage.

Efficient way to take hot snapshots from redis in production?

We have redis cluster which holds more than 2 million and these keys has been updated with the time interval of 1 minute. Now we have a requirement to take the snapshot of the redis db in a particular interval For eg every 10 minute. This snapshot should not pause the redis command execution.
Is there any async way of taking snapshot from redis ?
It would be really helpful if we get any suggestion on open source tools or frameworks.
The Redis BGSAVE is async and takes a snapshot.
It calls the fork() function of the OS. According to the Redis manual,
Fork() can be time consuming if the dataset is big, and may result in Redis to stop serving clients for some millisecond or even for one second if the dataset is very big and the CPU performance not great
Two million updates in one minutes, that is 30K+ QPS.
So you really have to try it out, run the benchmark that similutes your business, then issue BGSAVE, monitor the I/O and CPU usage of your system, and see if there's a spike in your redis calling latency.
Then issue LASTSAVE, which will tell you when your last success snapshot happened. So you can adjust your backup schedule.

Why a simple set command becomes a slow query, Redis?

When looking into the slow log, I find in one of my cluster, there are some slow queries which is set and expire command which even cost more than 60ms
Since the time is only the execution time which does not contain the queuing time and roudtrips time, so this means the command like
SET I2D_5b5e89403dc4e6580c0f4f45 a-24-length-string
My redis is totally 234GB and now 32GB is used. The command is sent to redis by JedisCluster.

How to forcefully stop long postgres query under heavy load?

I am working on a Rails app with Postgres on Ubuntu. Unfortunately for me, this legacy app uses some heavyweight stored procedures in the db. What's more, the db is quite large (5GB) and my computer is not particularly fast. Every now and then, if I pass some bad parameters from my code to the db, my computer becomes super slow to the degree that I cannot get to the console and kill the postgres process. I assume, this is due to some very expensive db query. My only solution is to hard reset my laptop. So my question is - is there a way to forcefully kill a long-taking query? Or perhaps, is there a way to limit the CPU and RAM the db is allowed to use, so that I still have some resources left to go and manually restart postgres?
You can set a maximum time for statements to take with the statement_timeout configuration option:
Abort any statement that takes more than the specified number of milliseconds, starting from the time the command arrives at the server from the client. If log_min_error_statement is set to ERROR or lower, the statement that timed out will also be logged. A value of zero (the default) turns this off.
You can set this option a variety of ways, such as in postgresql.conf for everyone, per session with the SET command, or even per database or per role. More information on setting options is in the documentation.

Cassandra Commit and Recovery on a Single Node

I am a newbie to Cassandra - I have been searching for information related to commits and crash recovery in Cassandra on a single node. And, hoping someone can clarify the details.
I am testing Cassandra - so, set it up on a single node. I am using stresstool on datastax to insert millions of rows. What happens if there is an electrical failure or system shutdown? Will all the data that was in Cassandra's memory get written to disk upon Cassandra restart (I guess commitlog acts as intermediary)? How long is this process?
Thanks!
Cassandra's commit log gives Cassandra durable writes. When you write to Cassandra, the write is appended to the commit log before the write is acknowledged to the client. This means every write that the client receives a successful response for is guaranteed to be written to the commit log. The write is also made to the current memtable, which will eventually be written to disk as an SSTable when large enough. This could be a long time after the write is made.
However, the commit log is not immediately synced to disk for performance reasons. The default is periodic mode (set by the commitlog_sync param in cassandra.yaml) with a period of 10 seconds (set by commitlog_sync_period_in_ms in cassandra.yaml). This means the commit log is synced to disk every 10 seconds. With this behaviour you could lose up to 10 seconds of writes if the server loses power. If you had multiple nodes in your cluster and used a replication factor of greater than one you would need to lose power to multiple nodes within 10 seconds to lose any data.
If this risk window isn't acceptable, you can use batch mode for the commit log. This mode won't acknowledge writes to the client until the commit log has been synced to disk. The time window is set by commitlog_sync_batch_window_in_ms, default is 50 ms. This will significantly increase your write latency and probably decrease the throughput as well so only use this if the cost of losing a few acknowledged writes is high. It is especially important to store your commit log on a separate drive when using this mode.
In the event that your server loses power, on startup Cassandra replays the commit log to rebuild its memtable. This process will take seconds (possibly minutes) on very write heavy servers.
If you want to ensure that the data in the memtables is written to disk you can run 'nodetool flush' (this operates per node). This will create a new SSTable and delete the commit logs referring to data in the memtables flushed.
You are asking something like
What happen if there is a network failure at the time data is being loaded in Oracle using SQL*Loader ?
Or what happens Sqoop stops processing due to some condition while transferring data?
Simply whatever data is being transferred before electrical failure or system shutdown, it will remain the same.
Coming to second question, when ever the memtable runs out of space, i.e when the number of keys exceed certain limit (128 is default) or when it reaches the time duration (cluster clock), it is being stored into sstable, immutable space.