Single Object having 2GB size(may be it more in future) to store in Redis Cache - redis

we are planning to implement distributed Cache(Redis Cache) for our application. We have a data and stored it in map with having size around 2GB and it is a single object. Currently it is storing in Context scope similarly we have plenty of objects storing into context scope.
Now we are planning to store all these context data into Redis Cache. Here the map data taking high amount of memory and we have to store this map data as single key-value object.
Is it suitable Redis Cache for my requirement. And which data type is suitable to store this data into Redis Cache.
Please suggest the way to implement this.

So, you didn't finish discussion in the other question and started a new one? 2GB is A LOT. Suppose, you have 1Gb/s link between your servers. You need 16 seconds just to transfer raw data. Add protocol costs, add deserialization costs. And you're at 20 seconds now. This is hardware limitations. Of course you may get 10Gb/s link. Or even multiplex it for 20Gb/s. But is it the way? The real solution is to break this data into parts and perform only partial updates.
To the topic: use String (basic) type, there are no options. Other types are complex structures and you need just one value.

Related

How to operate on batches with high efficiency in redis cluster?

I have a map data to cache in redis cluster, such as:
{
"demoKey:{1}":{"key1":"value1"},
"demoKey:{2}":{"key2":"value2"},
"demoKey:{3}":{"key3":"value3"}
}
If using lua script, I get RedisCommandExecutionException with accessing a non local key in a cluster node.
I know there is a way to tag {demokey} to avoid the RedisCommandExecutionException with accessing a non local key, but it leads to all data cache in the same slot, is it a good way?
I can also use RedisTemplate instance to cache data iteratively, but is it with high efficiency?
Appreciating for any help.
Both the answer of your questions depend on the application and use-cases.
... it leads to all data cache in the same slot, is it a good way?
If the number of data with same hashtag is small, say not more than a hundred, it should be fine.
But if the number is large, say hundreds of thousands or even more, it is unlikely to be a good idea. In that case,
... to cache data iteratively
is the option you may have to go for.

How to use chronicle-map instead of redis as a data cache

I intend to use chronicle-map instead of redis, the application scenario is the memoryData module starts every day from the database to load hundreds of millions of records to chronicle-map, and dozens of jvm continue to read chronicle-map records. Each jvm has hundreds of threads. But probably because of the lack of understanding of the chronicle-map, the code poor performance, running slower, until the memory overflow. I wonder if the above practice is the correct use of chronicle-map.
Because Chronicle map stores your data off-heap it's able to store more data than you can hold in main memory, but will perform better if all the data can fit into memory, ( so if possible, consider increasing your machine memory, if this is not possible try to use an SSD drive ), another reason for poor performance maybe down to how you have sized the map in the chronicle map builder, for example how you have set the number of max entries, if this is too large it will effect performance.

Redis vs RocksDB

I have read about Redis and RocksDB, I don't get the advantages of Redis over RocksDB.
I know that Redis is all in-memory and RocksDB is in-memory and uses flash storage. If all data fits in-memory, which one should I choose? do they have the same performance? Redis scales linearly with the number of CPU's? I guess that there are others differences that I don't get.
I have a dataset which fits in-memory and I was going to choose Redis but it seems that RocksDB offers me the same and if one day the dataset grows too much I wouldn't have to be worried about the memory.
They have nothing in common. You are trying to compare apples and oranges here.
Redis is a remote in-memory data store (similar to memcached). It is a server. A single Redis instance is very efficient, but totally non scalable (regarding CPU). A Redis cluster is scalable (regarding CPU).
RocksDB is an embedded key/value store (similar to BerkeleyDB or more exactly LevelDB). It is a library, supporting multi-threading and a persistence based on log-structured merge trees.
While Didier Spezia's answer is correct in his distinction between the two projects, they are linked by a project called LedisDB. LedisDB is an abstraction layer written in Go that implements much of the Redis API on top of storage engines like RocksDB. In many cases you can use the same Redis client library directly with LedisDB, making it almost a drop in replacement for Redis in certain situations. Redis is obviously faster, but as OP mentioned in his question, the main benefit of using RocksDB is that your dataset is not limited to the amount of available memory. I find that useful not because I'm processing super large datasets, but because RAM is expensive and you can get more milage out of smaller virtual servers.
Redis, in general, has more functionalities than RocksDB. It can natively understand the semantics of complex data structures such as lists and arrays . RocksDB, in contrast, looks at the stored values as a blob of data. If you want to do any further processing, you need to bring the data to your program and process it there (in other words, you can't delegate the processing to the database engine aka RocksDB).
RocksDB only runs on a single server. Redis has a clustered version (though it is not free)
Redis is built for in-memory computation, though it also support backing the data up to the persistent storage, but the main use cases are in memory use cases. RocksDB by contrast is usually used for persisting data and in most cases store the data on persistent medium.
RocksDB has a better multi-threaded support (specially for reads --writes still suffer from concurrent access).
Many memcached servers use Redis (where the protocol used is memcached but underlying server is Redis). This doesn't used most of Redis's functionality but is one case that Redis and RocksDB both function similarly (as a KVS though still in different context, where Redis based memcached is a cache but RocksDB is a database, though not an enterprise grade one)
#Guille If you know the behavior of hot data(getting fetched frequently) is based of time-stamp then Rocksdb would a smart choice, but do optimize it for fallback using bloom-filters .If your hot data is random ,then go for Redis .Using rocksDB entirely in memory is not generally recommended in log-structured databases like Rocksdb and its specifically optimized for SSD and flash storage .So my recommendation would be to understand the usecase and pick a DB for that particular usecase .
Redis is distributed, in-memory data store where as Rocks DB is embedded key-value store and not distributed.
Both are Key-Value Stores, so they have something in common.
As others mentioned RocksDB is embedded (as a library), while Redis is a standalone server. Moreover, Redis can sharded.
RocksDB
Redis
persisted on disk
stored in memory
strictly serializable
eventually consistent
sorted collections
no sorting
vertical scaling
horizontal scaling
If you don't need horizontal scaling, RocksDB is often a superior choice. Some people would assume that an in-memory store would be strictly faster than a persistent one, but it is not always true. Embedded storage doesn't have networking bottlenecks, which matters greatly in practice, especially for vertical scaling on bigger machines.
If you need to server RocksDB over a network or need high-level language bindings, the most efficient approach would be using project UKV. It, however, also supports other embedded stores as engines and provides higher-level functionality, such as Graph collections, similar to RedisGraph, and Document collections, like RedisJSON.

Improving the performance of the titanic pattern

I am referring to the titanic pattern explained in the zeromq guide. Can someone please explain why it recommends not using a key-value store as compared to reading/writing disk files for persistence. Quoting from the guide:
"What I'd not recommend is storing messages in a database, not even a "fast" key/value
store, unless you really like a specific database and don't have performance worries. You
will pay a steep price for the abstraction, ten to a thousand times over a raw disk file."
There are other recommendations given in the guide, like storing the messages on to a disk file in a circular buffer fashion. But would it not be faster to store the messages, and retrieving them from a redis store? Any ideas? Thank you.
In the zeromq guide, the example provided for this pattern uses simple files in a very naive way (using buffered I/O, without any fsync'ing). The broker is directly responsible of storing things on the filesystem. So the performance is mostly linked to the efficiency of the VFS and the filesystem cache. There is no real I/O in the picture.
In this context, the cost of an extra hop to store and retrieve the data into Redis will be very noticeable, especially if it is implemented using synchronous queries/replies.
Redis is very efficient as a remote key/value store, but it cannot compete with an embedded store (even a store implemented on top of a filesystem cache).

Redis as a database

I want to use Redis as a database, not a cache. From my (limited) understanding, Redis is an in-memory datastore. What are the risks of using Redis, and how can I mitigate them?
You can use Redis as an authoritative store in a number of different ways:
Turn on AOF (Append-only File store) see AOF docs. This will keep a log of all Redis commands made against your dataset in real-time.
Run Redis using Master-Slave replication see replication docs. This will allow you to provide high-availability if one of your instances fails.
If you're running on something like EC2 you can EBS back your Redis partition to provide another layer of protection against instance failure.
On the horizon is Redis Cluster - this is specifically designed as a way to run Redis in a way that should help with HA and scalability. However, this won't appear for at least another six months or so.
Redis is an in-memory store which can also write the data back to disc. You can specify how many times to do a fsync to make redis safer(but also slower => trade-off) .
But still I am not certain if redis is in state yet to really store (mission) critical data in it (yet?). If for example it is not a huge problem when 1 more tweets(twitter.com) or something similiar get losts then I would certainly use redis. There is also a lot of information available about persistence at redis's own website.
You should also be aware of some persistence problems which could occur by reading antirez(redis maintainers) blog article. You should read his blog because he has some interesting articles.
I would like to share a few things that we have learned by using Redis as a primary Database in our service. We choose Redis since we had data that could not be partitioned. We wanted to get the best performance we could get out of one box
Pros:
Redis was unbeatable in raw performance. We got 10K transactions per second out of the box (Note that one transaction involved multiple Redis commands). We were able to hit a rate of 25K+ transactions per second after a few optimizations, along with LUA scripts. So when it comes to performance per box, Redis is unmatched.
Redis is very simple to setup and has a very small learning curve as opposed to other SQL and NoSQL datastores.
Cons:
Redis supports only few primitive Data Structures like Hashes, Sets, Lists etc. and operations on these Data Structures. These are more than sufficient when you are using Redis as a cache, but if you want to use Redis as a full fledged primary data store, you will feel constrained. We had a tough time modelling our data requirements using these simple types.
The biggest problem we have seen with Redis was the lack of flexibility. Once you have solutioned the structure of your data, any modifications to storage requirements or access patterns virtually requires re-thinking of the entire solution. Not sure if this is the case with all NoSQL data stores though (I have heard MongoDB is more flexible, but haven't used it myself)
Since Redis is single threaded, CPU utilization is very low. You can't put multiple Redis instances on the same machine to improve CPU utilization as they will compete for the same disk, making disk as the bottleneck.
Lack of horizontal scalability is a problem as mentioned by other answers.
As Redis is an in-memory storage, you cannot store large data that won't fit you machine's memory size. Redis usually work very bad when the data it stores is larger than 1/3 of the RAM size. So, this is the fatal limitation of using Redis as a database.
Certainly, you can distribute you big data into several Redis instances, but you have to do it all on your own manually. The operation usually be done like this(assuming you have only 1 instance from start):
Use its master-slave mechanism to replicate data to the second machine, Now you have 2 copies of the same data.
Cut off the connection between master and slave.
Delete the first half(split by hashing, etc) of data on the first machine, and delete the second half of data on the second machine.
Tell all clients(PHP, C, etc...) to operate on the first machine if the specified keys are on that machine, otherwise operate on the second machine.
This is the way how Redis scales! You also have to stop your service to prevent any writes during the migration.
To the expierence we encounter, we have this conclusion to Redis: Redis is not the right choice to store more than 30G data, Redis is not scalable, Redis is quite suitable for prototype development.
We later find an alternative to Redis, that is SSDB(https://github.com/ideawu/ssdb), a leveldb server that supports nearly all the APIs of Redis, it is suitable for storing more than 1TB of data, that only depends on the size of you harddisk.
Redis is a database, that means we can use it for persisting information for any kind of app, information like user accounts, blog posts, comments and so on. After storing information we can retrieve it later on by writing queries.
Now this behavior is similar to just about every other database, but what is the difference? Or rather why would we use it over any other database?
Redis is fast.
Redis is not fast because it's written in a special programming language or anything like that, it's fast because all data is stored in-memory.
Most databases store all their information between both the memory of a computer and the hard drive. Accessing data in-memory is fast, but getting it stored on a hard disk is relatively slow.
So rather than storing memory in hard disk, Redis decided to store it in memory.
Now, the downside to this is that working with data that is larger than the amount of memory your computer has, that is not going to work.
That may sound like a tremendous problem, but Redis has clear strategies for working around this limitation.
The above is just the first reason why Redis is so fast.
The second reason is that Redis stores all of its data or rather organizes all of its data in simple data structures such as Doubly Linked Lists, Sorted Sets and so on.
These data structures have well-known and well-understood performance characteristics. So as developers we can decide exactly how our information is organized and how to efficiently query data.
It's also very fast because Redis is simple in nature, it's not feature heavy; feature heavy datastores like Postgres have performance penalties.
So to use Redis as a database you have to know how to store in limited space, you have to know how to organize it into these simple data structures mentioned above and you have to understand how to work around the limited feature set.
So as far as mitigating risks, the way you start to do that is to start to think Redis Design Methodology and not SQL Database Design Methodology. What do I mean?
So instead of, step 1. Put the data in tables, step 2. figure out how we will query it.
With Redis it's more:
Step 1. Figure out what queries we need to answer.
Step 2. Structure data to best answer those queries.