Memory required by Primary Index - aerospike

According to Aerospike's website, Primary Indexes will occupy space in RAM given by:
64 bytes × (replication factor) × (number of records)
I was confused if this is the space which will be occupied on each replica or this is the space occupied by the Primary Index in total i.e. the sum of the space required on each replica.

Every record that you store in Aerospike has two components - the data part and a Primary Index in RAM (which takes 64 bytes) that is used to locate the data - where ever you have stored it. (You get to choose where you want to store the data part - it can be in the process RAM or an SSD drive and other exotic options.) Aerospike is a distributed database - so typically has more than one node over which you store your records and easily horizontally scalable. To avoid losing data upon losing a node, you will typically ask Aerospike to store two copies (r=2) of every record - each on different nodes. So when you look at the RAM usage across the entire cluster of nodes for just the primary index, you need n x r x 64 bytes of RAM just for the PI. This is all the RAM required to just store the primary index for master records and replica records over all the nodes in the cluster.
So if you had 100 records, 2 copies on a 5 node cluster, RAM needed just for PI will be 100 x 2 x 64 bytes over 5 servers, ie each server will need about (100 x 2 x 64)/5 bytes of RAM consumed for PI storage. (In reality RAM for PI is allocated in minimum 1GB chunks in Enterprise Edition.)

Related

Redis using too much memory smaller number of keys

I have a redis standalone server, with around 8000 keys at a given instance .
The used_memeory is showing to be around 8.5 GB.
My individuals key-value size is max around 50kb , by that calculation the used_memory should be less than 1 GB (50kb * 8000)
I am using spring RedisTemplate with default pool configuration to connect to redis
Any idea what should I look into, to narrow down where the memory is being consumed ?
zset internally uses two data structures to hold the same elements in order to get O(log(N)) INSERT and REMOVE operations into a sorted data structure.
The two Data-structures to be specific are,
Hash Table
Skip list
Storage for ideal cases according to my research is in the following order,
hset < set < zset
I would recommend you to start using hset in case you have hierarchical data storage. This would lower down your memory consumption but might make searching teeny-tiny bit slower (only if one key has more than say a couple of hundred records)

Unable to upload data even after partitioning in VoltDB

We are trying to upload 80 GB of data in 2 host servers each with 48 GB RAM(in total 96GB). We have partitioned table too. But even after partitioning, we are able to upload data only upto 10 GB. In VMC interface, we checked the size worksheet. The no of rows in the table is 40,00,00,000 and table maximum size is 1053,200,000k and minimum size is 98,000,000K. So, what is issue in uploading 80GB even after partitioning and what is this table size?
The size worksheet provides minimum and maximum size in memory that the number of rows would take, based on the schema of the table. If you have VARCHAR or VARBINARY columns, then the difference between min and max can be quite substantial, and your actual memory use is usually somewhere in between, but can be difficult to predict because it depends on the actual size of the strings that you load.
But I think the issue is that the minimum size is 98GB according to the worksheet, meaning if any nullable strings are null, or any not-null strings would be an empty string. Even without taking into account the heap size and any overhead, this is higher than your 96GB capacity.
What is your kfactor setting? If it is 0, there will be only one copy of each record. If it is 1, there will be two copies of each record, so you would really need 196GB minimum in that configuration.
The size per record in RAM depends on the datatypes chosen and if there are any indexes. Also, VARCHAR values longer than 15 characters or 63 bytes are stored in pooled memory which carries more overhead than fixed-width storage, although it can reduce the wasted space if the values are smaller than the maximum size.
If you want some advice on how to minimize the per-record size in memory, please share the definition of your table and any indexes, and I might be able to suggest adjustments that could reduce the size.
You can add more nodes to the cluster, or use servers with more RAM to add capacity.
Disclaimer: I work for VoltDB.

How can Redis be optimized for storing lists of GUIDs?

We are using Redis to store shuffled decks of cards. A card is represented by a 20 character GUID, and a deck is an array of shuffled card GUIDs. The primary operations called on the Deck list is LLEN (length) and LPOP (pop). The only time that we push to a deck is a) when the deck is initially created and b) when the deck runs out of cards and is re-shuffled (which happens rarely). Currently, the length of a deck varies from 10 to 700 items.
What type of memory optimizations can be made in Redis for this sort of problem? Is there any sort of setting we can configure to reduce the memory overhead, or optimize how (zip)list data types are used?
Related Article: http://redis.io/topics/memory-optimization
My first suggestion would be to use 8byte unsigned integers as your identifier key instead of guids, that saves you several bytes per entry in memory, and increases overall performance of any database including redis you are using.
In case you want to go with guid, and considering the size of list and the operations you are doing on the list.
You can tune the redis defaults to suit your need :
Redis defaults :
list-max-ziplist-entries 512
list-max-ziplist-value 64
You can change this to :
list-max-ziplist-entries 1024 #to accomodate your 700 cards list
list-max-ziplist-value 256 # to accomodate your 20 byte guids
YMMV, hence you need to benchmark redis with both settings, for storage as well as read/write performance with your sample data.

Are redis hashes kept in ziplist after changing hash-max-ziplist-entries?

I'm running a redis instance where I have stored a lot of hashes with integer fields and values. Specifically, there are many hashes of the form
{1: <int>, 2: <int>, ..., ~10000: <int>}
I was initially running redis with the default values for hash-max-ziplist-entries:
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
and redis was using approximately 3.2 GB of memory.
I then changed these values to
hash-max-ziplist-entries 10240
hash-max-ziplist-value 10000
and restarted redis. My memory usage went down to approximately 480 MB, but redis was using 100% CPU. I reverted the values back to 512 and 64, and restarted redis, but it was still only using 480 MB of memory.
I assume that the memory usage went down because a lot of my hashes were stored as ziplists. I would have guessed that after changing the values and restarting redis they would automatically be converted back into hash tables, but this doesn't appear to be the case.
So, are these hashes still being stored as a ziplist?
They are still in optimized "ziplist" format.
Redis will store hashes (via "hset" or similar) in an optimized way if the hash does end up having more than hash-max-ziplist-entries entries, or if the values are smaller than hash-max-ziplist-values bytes.
If these limits are broken Redis will store the item "normally", ie. not optimized.
Relevant section in documentation (http://redis.io/topics/memory-optimization):
If a specially encoded value will overflow the configured max size, Redis will automatically convert it into normal encoding.
Once the values are written in an optimized way, they are not "unpacked", even if you lower the max size settings later. The settings will apply to new keys that Redis stores.

Memory utilization in redis for each database

Redis allows storing data in 16 different 'databases' (0 to 15). Is there a way to get utilized memory & disk space per database. INFO command only lists number of keys per database.
No, you can not control each database individually. These "databases" are just for logical partitioning of your data.
What you can do (depends on your specific requirements and setup) is spin multiple redis instances, each one does a different task and each one has its own redis.conf file with a memory cap. Disk space can't be capped though, at least not in Redis level.
Side note: Bear in mind that the 16 database number is not hardcoded - you can set it in redis.conf.
I did it by calling dump on all the keys in a Redis DB and measuring the total number of bytes used. This will slow down your server and take a while. It seems the size dump returns is about 4 times smaller than the actual memory use. These number will give you an idea of which db is using the most space.
Here's my code:
https://gist.github.com/mathieulongtin/fa2efceb7b546cbb6626ee899e2cfa0b