What is the memory formula of Redis Sorted Set?

What is the memory formula of Redis Sorted Set? - redis

I need to calculate how much memory a Redis SortedSet takes assuming my average element of the Sorted Set is X bytes.

If you know the average size of an element before it's stored in redis, just do this:
Clear redis of all data: command flushall (dumps all databases)
Command info, check field used_memory_human (should be zero or close to it)
Add/store data in redis
info again, check used_memory_human, size indicates memory used by redis to store objects.
Hope it helps

Related

How to interpret "evicted_keys" from Redis Info

We are using ElastiCache for Redis, and are confused by its Evictions metric.
I'm curious what the unit is on the evicted_keys metric from Redis Info? The ElastiCache docs say it is a count: https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/CacheMetrics.Redis.html but for our application we have observed the "Evictions" metric (which is derived from evicted_keys) fluctuates up and down, indicating it's not a count. I would expect a count to never decrease, since we cannot "un-evict" a key. I'm wondering if evicted_keys is actually a rate (eg, evictions/sec), which would explain why it can fluctuate.
Thanks you in advance for any responses!

From INFO command:
evicted_keys: Number of evicted keys due to maxmemory limit
To learn more about evictions see Using Redis as an LRU cache - Eviction policies
This counter is zero when the server starts, and it is only reset if you issue the CONFIG RESETSTAT command. However, on ElastiCache, this command is not available.
That said, ElastiCache derives the metric from this value, by calculating the difference between data-points.
Redis evicted_keys 0 5 12 18 22 ....
CloudWatch Evictions 0 5 7 6 4 ....
This is the usual pattern in CloudWatch metrics. This allows you to use SUM if you want the cumulative value, but also to detect rate changes or spikes easily.
Think for example you want to alarm if evictions are more than 10,000 over one minute period. If ElastiCache stores the cumulative value from Redis straight as a metric, this would be hard to accomplish.
Also, by committing the metric only as evicted keys for the period, you are protected of the data distortion of a server-reset or a value overflow. While the Redis INFO value would go back to zero, on ElastiCache you still get the value for the period and you can still do running sum over any period.

Storing 30M records in redis

I'm wondering the most efficient way to store this data.
I need to track 30-50 million data points per day. It needs to be extremely fast read/write, so I'm using redis.
The data only needs to last for 24 hours, at which point it will EXPIRE.
The data looks like this as a key/value hash
{
"statistics:a5ded391ce974a1b9a86aa5322ea9e90": {
xbi: 1,
bid: 0.24024,
xpl: 25.0,
acc: 40,
pid: 43,
cos: 0.025,
xmp: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
clu: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}
I've replaced the actual string with a lot of x but that IS the proper length of the string.
So far, according to my calculations.... this will use hundreds of GB of memory. Does that seem correct?
This is mostly ephemeral logging data thats important, but not important enough to try to support writing to disk or failovers. I am comfortable keeping it on 1 machine, if that helps make this easier.
What would be the best way to reduce memory space in this scenario? Is there a better way I can do this? Does redis support 300GB on a single instance?

In redis.conf - set hash-max-ziplist-value to 1 more than the length of the field 'xmp'. Then restart redis, and watch your memory go down significantly.
The default value is 64. Increasing it increases cpu utilization when you modify or add new fields in the hash. But your use case seems to be create-only, and in that case there shouldn't be any drawbacks of increasing the setting.

this will use hundreds of GB of memory. Does that seem correct?
YES
Does redis support 300GB on a single instance?
YES
Is there a better way I can do this?
You can try the following methods:
Avoid Using Hash
Since you always get all fields of the log with HGETALL, there's NO need to save the log as HASH. HASH consumes more memory than STRING.
You can serialize all fields into a string, and save the log as a key-value pair:
SET 'statistics:a5ded391ce974a1b9a86aa5322ea9e90' '{xbi: 1, bid: 0.24024, and other fields}'
#Sripathi Krishnan's answer gives another way to avoid HASH, i.e. config Redis to encode the HASH into ZIPLIST. It's a good idea if you don't share your Redis with other applications. Otherwise, this modification might cause problem to others.
Compress The Data
In order to reduce memory usage, you can try to compress your data. Redis can store binary strings, so you can use gzip, snappy or other compression algorithm to compress the log text into binary string, and save it into Redis.
Normally, you can get better compression when the input is bigger. So you'd better compress the whole log, instead of compress each field one by one.
The side-effect is that the producer and consumer of the log need to cost some CPU to compress and decompress the data. However, normally that's NOT a problem, and also it can reduce some network bandwidth.
Batch Write and Batch Read
As I mentioned above, if you want to get better compression, you should get a bigger input. So if you can write multiple logs in a batch, you can compress the batch of logs to get better compression.
Compress multiple logs into a batch: compress(log1, log2, log3) -> batch1: batch-result
Put the batch result into Redis as a key-value pair: SET batch1 batch-result
Build an index for the batch: MSET log1 batch1 log2 batch1 log3 batch1
When you need to get the log:
Search the index to get the batch key: GET log1 -> batch1
Get the batch result: GET batch1 -> batch-result
Decompress the batch result and look up the log from the result
The last method is the most complicated one, and the extra index will cost some extra memory. However, it can largely reduce the size of your data.
Also what these methods can achieve, largely depends on your log. You should do lots of benchmark :)

What Redis data type fit the most for following example

I have following scenario:
Fetch array of numbers (from REDIS) conditionally
For each number do some async stuff (fetch something from DB based on number)
For each thing in result set from DB do another async stuff
Periodically repeat 1. 2. 3. because new numbers will be constantly added to REDIS structure.Those numbers represent unix timestamp in milliseconds so out of the box those numbers will always be sorted in time of addition
Conditionally means fetch those unix timestamp from REDIS that are less or equal to current unix timestamp in milliseconds(Date.now())
Question is what REDIS data type fit the most for this use case having in mind that this code will be scaled up to N instances, so N instances will share access to single REDIS instance. To equally share the load each instance will read for example first(oldest) 5 numbers from REDIS. Numbers are unique (adding same number should fail silently) so REDIS SET seems like a good choice but reading M first elements from REDIS set seems impossible.
To prevent two different instance of the code to read same numbers REDIS read operation should be atomic, it should read the numbers and delete them. If any async operation fail on specific number (steps 2. and 3.), numbers should be added again to REDIS to be handled again. They should be re-added back to the head not to the end to be handled again as soon as possible. As far as i know SADD would push it to the tail.
SMEMBERS key would read everything, it looks like a hammer to me. I would need to include some application logic to get first five than to check what is less or equal to Date.now() and then to delete those and to wrap somehow everything in single transaction. Besides that set cardinality can be huge.
SSCAN sounds interesting but i don't have any clue how it works in "scaled" environment like described above. Besides that, per REDIS docs: The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. Like described above collection will be changed frequently

A more appropriate data structure would be the Sorted Set - members have a float score that is very suitable for storing a timestamp and you can perform range searches (i.e. anything less or equal a given value).
The relevant starting points are the ZADD, ZRANGEBYSCORE and ZREMRANGEBYSCORE commands.
To ensure the atomicity when reading and removing members, you can choose between the the following options: Redis transactions, Redis Lua script and in the next version (v4) a Redis module.
Transactions
Using transactions simply means doing the following code running on your instances:
MULTI
ZRANGEBYSCORE <keyname> -inf <now-timestamp>
ZREMRANGEBYSCORE <keyname> -inf <now-timestamp>
EXEC
Where <keyname> is your key's name and <now-timestamp> is the current time.
Lua script
A Lua script can be cached and runs embedded in the server, so in some cases it is a preferable approach. It is definitely the best approach for short snippets of atomic logic if you need flow control (remember that a MULTI transaction returns the values only after execution). Such a script would look as follows:
local r = redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
return r
To run this, first cache it using SCRIPT LOAD and then call it with EVALSHA like so:
EVALSHA <script-sha> 1 <key-name> <now-timestamp>
Where <script-sha> is the sha1 of the script returned by SCRIPT LOAD.
Redis modules
In the near future, once v4 is GA you'll be able to write and use modules. Once this becomes a reality, you'll be able to use this module we've made that provides the ZPOP command and could be extended to cover this use case as well.

Redis 1+ min query time on lists with 15 larger JSON objects (20MB total)

I use Redis to cache database inserts. For this I created a list CACHE into which I push serialized JSON lists. In pseudocode:
let entries = [{a}, {b}, {c}, ...];
redis.rpush("CACHE", JSON.stringify(entries));
The idea is to run this code for an hour, then later do an
let all = redis.lrange("CACHE", 0, LIMIT);
processAndInsert(all);
redis.ltrim("CACHE", 0, all.length);
Now the thing is that each entries can be relatively large (but far below 512MB / whatever Redis limit I read about). Each of the a, b, c is an object of probably 20 bytes, and entries itself can easily have 100k+ objects / 2MB.
My problem now is that even for very short CACHE lists of only 15 entries a simple lrange can take many minutes(!) even from the redis-cli (my node.js actually dies with an "FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory", but that's a side comment).
The debug output for the list looks like this:
127.0.0.1:6379> debug object "CACHE"
Value at:00007FF202F4E330 refcount:1 encoding:linkedlist serializedlength:18104464 lru:12984004 lru_seconds_idle:1078
What is happening? Why is this so massively slow, and what can I do about it? This does not seem to be a normal slowness, something seems to be fundamentally wrong.
I am using a local Redis 2.8.2101 (x64), ioredis 1.6.1, node.js 0.12 on a relatively hardcore Windows 10 gaming machine (i5, 16GB RAM, 840 EVO SSD, ...) by the way.

Redis is great at doing lots of small operations,
but not so great at doing small numbers of "very big" operations.
I think you should re-evaluate your algorithm, and try to break apart your data in to smaller chunks. Not only you'll save the bandwidth, you'll also will not lock your redis instance long amounts of time.
Redis offers many data structures you should be able to use for more fine grain control over your data.
Well, still, in this case, since you are running the redis locally, and assuming you are not running anything else but this code, I doubt that the bandwidth, nor the redis is the problem. I'm more thinking this line:
JSON.stringify()
is the main culprit why you are seeing the slow execution.
JSON serialization of 20MB of string is not something simple,
The process needs allocate many small strings, and also has to go through all of your array and inspect each item individually. All of this will take a long time for a big object like this one.
Again, if you were breaking apart your data, and doing smaller operations with redis, you'd not need the JSON serializer at all.

What is the maximum value size you can store in redis?

Does anyone know what the maximum value size you can store in redis? I want to use redis as a message queue with celery to store some small documents that need to be processed by a worker on another server, and I want to make sure the documents aren't going to be too big.
I found one page with a reference to 1GB, but when I followed the link on the page for where they got that answer the link wasn't valid anymore. Here is the link:
http://news.ycombinator.com/item?id=1182005

All string values are limited to 512 MiB. This is the size limit you probably care most about.
EDIT: Because keys in Redis are strings, the maximum key size is 512 MiB. The maximum number of keys is 2^32 - 1 = 4,294,967,295.
Values, on the other hand, can vary in size depending on their type. For aggregate data types (i.e. hash, list, set, and sorted set), the maximum value size is 512 MiB for each element, although the data structure itself can have up to 2^32 - 1 elements.
https://redis.io/topics/data-types
https://redis.io/topics/faq#what-is-the-maximum-number-of-keys-a-single-redis-instance-can-hold-and-what-is-the-max-number-of-elements-in-a-hash-list-set-sorted-set
http://groups.google.com/group/redis-db/browse_thread/thread/1c7e33fbc98734b3?fwc=2

Article about Redis Memory Usage can help you to roughly determine how much memory your database would take.

It's in the order of the amount of RAM you have, at least, so unless you plan on puting multi-gigabyte objects in there I wouldn't worry. I've had sets that were hundreds of megabytes big without a problem, but I don't know the exact limits.

A String value can accommodate the size of max 512MB. But according to this link, the size can be increased.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas