Aerospike - change storage type and size - aerospike

I'm just starting to learn about aerospike and thinking about the storage type for my application. Right now there is not too much data, so I'm plannning to use
storage-engine device with data-in-memory true.
However, should the size of the data increase in the future, would it be possible to change the settings to store only the indexes in RAM (remove data-in-memory true)? Would I need to do some kind of "database migration" or just comment out the line in config and restart the service?
Same question about filesize and memory-size parameters - if I increase these settings, should I somehow "resave" the data on disk?

For data-in-memory true, you can simply change to false and restart the asd daemon. Rolling restart will work (one node at a time), so no down time.
memory-size is actually dynamically configurable: see doc
filesize would require an asd daemon restart, one node at a time.

Related

Redis stream 50k consumer support parallel - capacity requirement

What are the Redis capacity requirements to support 50k consumers within one consumer group to consume and process the messages in parallel? Looking for testing an infrastructure for the same scenario and need to understand considerations.
Disclaimer: I worked in a company which used Redis in a somewhat large scale (probably less consumers than your case, but our consumers were very active), however I wasn't from the infrastructure team, but I was involved in some DevOps tasks.
I don't think you will find an exact number, so I'll try to share some tips and tricks to help you:
Be sure to read the entire Redis Admin page. There's a lot of useful information there. I'll highlight some of the tips from there:
Assuming you'll set up a Linux host, edit /etc/sysctl.conf and set a high net.core.somaxconn (RabbitMQ suggests 4096). Check the documentation of tcp-backlog config in redis.conf for an explanation about this.
Assuming you'll set up a Linux host, edit /etc/sysctl.conf and set vm.overcommit_memory = 1. Read below for a detailed explanation.
Assuming you'll set up a Linux host, edit /etc/sysctl.conf and set fs.file-max. This is very important for your use case. The Open File Handles / File Descriptors Limit is essentially the maximum number of file descriptors (each client represents a file descriptor) the SO can handle. Please check the Redis documentation on this. RabbitMQ documentation also present some useful information about it.
If you edit the /etc/sysctl.conf file, run sysctl -p to reload it.
"Make sure to disable Linux kernel feature transparent huge pages, it will affect greatly both memory usage and latency in a negative way. This is accomplished with the following command: echo never > /sys/kernel/mm/transparent_hugepage/enabled." Add this command also to /etc/rc.local to make it permanent over reboot.
In my experience Redis is not very resource-hungry, so I believe you won't have issues with CPU. Memory are directly related to how much data you intend to store in it.
If you set up a server with many cores, consider using more than one Redis Server. Redis is (mostly) single-threaded and will not use all your CPU resources if you use a single instance in a multicore environment.
Redis server also warns about wrong/risky configurations on startup (sorry for the old image):
Explanation on Overcommit Memory (vm.overcommit_memory)
Setting overcommit_memory to 1 says Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis [from Redis FAQ]
There are three possible settings for vm.overcommit_memory.
0 (zero): Check if enough memory is available and, if so, allow the allocation. If there isn’t enough memory, deny the request and return an error to the application.
1 (one): Permit memory allocation in excess of physical RAM plus swap, as defined by vm.overcommit_ratio. The vm.overcommit_ratio parameter is a
percentage added to the amount of RAM when deciding how much the kernel can overcommit. For instance, a vm.overcommit_ratio of 50 and 1 GB of
RAM would mean the kernel would permit up to 1.5 GB, plus swap, of memory to be allocated before a request failed.
2 (two): The kernel’s equivalent of "all bets are off", a setting of 2 tells the kernel to always return success to an application’s request for memory. This is absolutely as weird and scary as it sounds.

AOF and RDB backups in redis

This question is about Redis persistence.
I'm using redis as a 'fast backend' for a social networking website. It's a single server set up. I've been transferring PostgreSQL responsibilities to Redis steadily. Currently in etc/redis/redis.conf, the appendonly setting is set to appendonly no. Snapshotting settings are save 900 1, save 300 10, save 60 10000. All this is true for production and development both. As per production logs, save 60 10000 gets invoked heavily. Does this mean that practically, I'm getting backups every 60 seconds?
Some literature suggests using AOF and RDB backups together. Thus I was weighing in on turning appendonly on and using appendfsync everysec. For anyone who has had experience of both sides of the coin:
1) Will using appendonly on and appendfsync everysec cause a performance downgrade? Will it hit the CPU? The write load is on the high side.
2) Once I restart the redis server with these new settings, I'll still lose the last 60 secs of my data, correct?
3) Are restart times something to worry about? My dump.rdb file is small; ~90MB.
I'm trying to find out more about redis persistence, and getting my expectations right. Personally, I'm fine with losing 60s of data in the case of a catastrophe, thus whether I should use AOF is also something I'm pondering. Feel free to chime in. Thanks!
Does this mean that practically, I'm getting backups every 60 seconds?
NO. Redis does a background save after 60 seconds, if there're at least 10000 keys have been changed. Otherwise, it doesn't do a background save.
Will using appendonly on and appendfsync everysec cause a performance downgrade? Will it hit the CPU? The write load is on the high side.
It depends on many things, e.g. disk performance (SSD VS HDD), write/read load (QPS), data model, and so on. You need do a benchmark with your own data in your specific environment.
Once I restart the redis server with these new settings, I'll still lose the last 60 secs of my data, correct?
NO. If you turn on both AOF and RDB, when Redis restarts, the AOF file will be used to rebuild the database. Since you config it to appendfsync everysec, you will only lose the last 1 second of data.
Are restart times something to worry about? My dump.rdb file is small; ~90MB.
If you turn on AOF, and when Redis restarts, it replays logs in AOF file to rebuild the database. Normally AOF file is larger then RDB file, and it might be slower than recovering from RDB file. Should you worry about that? Do a benchmark with your own data in your specific environment.
EDIT
IMPORTANT NOTICE
Assume that you already set Redis to use RDB saving, and write lots of data to Redis. After a while, you want to turn on AOF saving. NEVER MODIFY THE CONFIG FILE TO TURN ON AOF AND RESTART REDIS, OTHERWISE YOU'LL LOSE EVERYTHING.
Because, once you set appendonly yes in redis.conf, and restart Redis, it will load data from AOF file, no matter whether the file exists or not. If the file doesn't exist, it creates an empty file, and tries to load data from that empty file. So you'll lose everything.
In fact, you don't have to restart Redis to turn on AOF. Instead, you can use config set command to dynamically turn it on: config set appendonly yes.

Aerospike namespace configuration options

I've recently been getting involved in implementing an Aerospike data store into our product. We've been trying to work out the best configuration for our namespace. The requirement to persist data means we need to have a storage-engine as a device. we have specified data-in-memory as true.
My question is: does data-in-memory attempt to load ALL the backing store data into memory as the vague description implies?
Keep a copy of all data in memory always.
Or will it pay attention to the memory-size setting on the namespace and only load memory-size amount of data from the backing store?
Description of the setting was retrieved from documentation.
I have been talking to the guy who first implemented aerospike to try and find out if he knew and wasn't sure so I'm looking for clarification.
For reference my namespace config looks something like this, with an obviously smaller memory quota than backing store
namespace Test {
replication-factor 2
memory-size 4G
default-ttl 0
storage-engine device {
file /opt/aerospike/data/Test.dat
filesize 16G
data-in-memory true
}
}
It will keep all the data in memory. Aerospike does not yet have a partial cache implementation to keep the most used data in the provided memory.
Your data will only exist in memory, while the disk is used for persistence for recovery in the event of server restart. The reason the filesize is larger than the memory-size is that disk space is needed for maintenance operations such as defragmentation of blocks. Disk devices are block devices, and in the default 1MB write-block-size you can fit multiple records so operations such as defrag occur by moving records from blocks that are less than defrag-lwm-pct full. This takes extra blocks, so you need that spare capacity.

Redis: Fail instead of evicting?

Is there a way to set Redis so it will never evict data and cause a hard failure if it runs out of memory? I need to ensure no data is lost; I am not using this as a permanent data storage, mechanism, but rather for more of a temporary data storage mechanism for high volume/high performance data transformations.
Is there an alternative NoSQL data store that could come close in performance, but utilize disk read/write if memory runs out; this is not ideal, but is better than losing data. I am reading/writing/updating millions of JSON documents (12+ million and growing).
Yes.
First make sure that you set the maxmemory directive (in the conf file or with a CONFIG SET) to a value other than 0. This will instruct Redis to use that value as it's memory upper limit.
Next, set the maxmemory-policy directive to noeviction - this will cause Redis to return an OOM (out of memory) error when maxmemory is reached when trying to write to it.
See the config file in-file documentation for more details on these directives: http://download.redis.io/redis-stable/redis.conf

What if Redis keys are never deleted programmatically?

What will happen to my redis data if no expiry is set and no DEL command is used.
Will it be removed after some default time ?
One more,
How redis stores data, is it in any file format ? because I can access data even after restarting the computer. So which files are created by redis and where ?
Thanks.
Redis is a in-memory data store meaning all your data is kept in RAM (ie. volatile). So theoritically your data will live as long as you don't turn the power off.
However, it also provides persistence in two modes:
RDB mode which takes snapshots of your dataset and saves them to the disk in a file called dump.drb. This is the default mode.
AOF mode which records every write operation executed by the server in an Append-Only file and then replays it thus reconstructing the original data.
Redis persistence is very good explained here and here by the creator of Redis himself.