Apache Ignite 2.x BinaryObject deserialize performance - ignite

I'm observing two orders of magnitude performance difference scanning a local off-heap cache between binary and deserialized mode (200k/sec vs 2k/sec). Have not profiled it with tools yet.
Is the default reflection based binary codec a recommended one for production or there's a better one?
What's the best source to read for description of the binary layout (the official documentation is missing that)?
Or in the most generic form - what's the expected data retrieval performance with Ignite scanning query and how to achieve it?

Since 2.0.0 version ignite stores all data in off heap memory, so it's expected that BinaryObjects works faster, because BinaryObject doesn't deserialize you objects to classes, but works directly with bytes.
So yes, it's recommended to use BinaryObjects if possible for performance sake.
Read the following doc:
https://apacheignite.readme.io/docs/binary-marshaller
it explains how to use BinaryObjects.

Related

Using Redis for in-memory caching and DynamoDB (or Cassandra) for URL shortener

I am a beginner programmer and am building a web service for a URL shortener, and am considering which NoSQL to use. I only need to store original urls and shortened ones, so Redis is an obvious choice as it is very fast. But Redis is limited to the memory size, whereas other key-value NoSQL like DynamoDB or Cassandra stores data in the disk. Do you think it makes sense to use Redis as a caching for heavy-read requests and use other NoSQL as a database at the same time?
Yes it is possible that you can use Redis for caching and use another NoSql tool for persistence. In fact, you should consider the complexity of your project such as the number of concurrent visitors, hardware you can afford etc. Redis can also store the data in disk so when you restart the server, your cached data will retain. However; it also stores all of them in memory which is the secret of Redis being so fast. You should also consider storing the data in Redis as binary rather than Json which will decrease the use of memory a lot. You can use encoding libraries such as Protobuf which will help you save huge amount of memory. If your project seems to get more complex in the near future, you can also use a RDMS as a database server for your future needs that will provide you distinct features.

Single Object having 2GB size(may be it more in future) to store in Redis Cache

we are planning to implement distributed Cache(Redis Cache) for our application. We have a data and stored it in map with having size around 2GB and it is a single object. Currently it is storing in Context scope similarly we have plenty of objects storing into context scope.
Now we are planning to store all these context data into Redis Cache. Here the map data taking high amount of memory and we have to store this map data as single key-value object.
Is it suitable Redis Cache for my requirement. And which data type is suitable to store this data into Redis Cache.
Please suggest the way to implement this.
So, you didn't finish discussion in the other question and started a new one? 2GB is A LOT. Suppose, you have 1Gb/s link between your servers. You need 16 seconds just to transfer raw data. Add protocol costs, add deserialization costs. And you're at 20 seconds now. This is hardware limitations. Of course you may get 10Gb/s link. Or even multiplex it for 20Gb/s. But is it the way? The real solution is to break this data into parts and perform only partial updates.
To the topic: use String (basic) type, there are no options. Other types are complex structures and you need just one value.

Hazelcast vs Redis vs S3

I am currently evaluating the fastest possible caching solutions that we can use among the Technologies in question. We know that while Redis and Hazelcast are caching solutions by their very intent and definition, and there is a clear stackoverflow link # redis vs hazelcast, there is also the AWS S3 which may not be a caching solution but is nevertheless, a storage and retrieval service + it supports SQL as well which makes it in my opinion a qualifier in the race as well. Considering this, are there any forethoughts on comparing the three based on speed, volumes of data etc.?
Hazelcast also provides SQL alike capabilities - run queries to fetch data in a resultset. Technology wise, Hazelcast/Redis and S3 are fundamentally different; for the latter is a disk bound data store and that are proven/known to be significantly slower than their in-memory counterparts.
To put things in a logical perspective: S3 or any other disk bound data store can not match the performance of accessing data from in-memory data stores.
However, it is also a common practice to run Hazelcast on top of a disk bound data store to get performance boost. In such type of architectures, your applications basically always only interact with Hazelcast. One can then use Hazelcast tools to keep the cached data in sync with underlying database.

Redis vs RocksDB

I have read about Redis and RocksDB, I don't get the advantages of Redis over RocksDB.
I know that Redis is all in-memory and RocksDB is in-memory and uses flash storage. If all data fits in-memory, which one should I choose? do they have the same performance? Redis scales linearly with the number of CPU's? I guess that there are others differences that I don't get.
I have a dataset which fits in-memory and I was going to choose Redis but it seems that RocksDB offers me the same and if one day the dataset grows too much I wouldn't have to be worried about the memory.
They have nothing in common. You are trying to compare apples and oranges here.
Redis is a remote in-memory data store (similar to memcached). It is a server. A single Redis instance is very efficient, but totally non scalable (regarding CPU). A Redis cluster is scalable (regarding CPU).
RocksDB is an embedded key/value store (similar to BerkeleyDB or more exactly LevelDB). It is a library, supporting multi-threading and a persistence based on log-structured merge trees.
While Didier Spezia's answer is correct in his distinction between the two projects, they are linked by a project called LedisDB. LedisDB is an abstraction layer written in Go that implements much of the Redis API on top of storage engines like RocksDB. In many cases you can use the same Redis client library directly with LedisDB, making it almost a drop in replacement for Redis in certain situations. Redis is obviously faster, but as OP mentioned in his question, the main benefit of using RocksDB is that your dataset is not limited to the amount of available memory. I find that useful not because I'm processing super large datasets, but because RAM is expensive and you can get more milage out of smaller virtual servers.
Redis, in general, has more functionalities than RocksDB. It can natively understand the semantics of complex data structures such as lists and arrays . RocksDB, in contrast, looks at the stored values as a blob of data. If you want to do any further processing, you need to bring the data to your program and process it there (in other words, you can't delegate the processing to the database engine aka RocksDB).
RocksDB only runs on a single server. Redis has a clustered version (though it is not free)
Redis is built for in-memory computation, though it also support backing the data up to the persistent storage, but the main use cases are in memory use cases. RocksDB by contrast is usually used for persisting data and in most cases store the data on persistent medium.
RocksDB has a better multi-threaded support (specially for reads --writes still suffer from concurrent access).
Many memcached servers use Redis (where the protocol used is memcached but underlying server is Redis). This doesn't used most of Redis's functionality but is one case that Redis and RocksDB both function similarly (as a KVS though still in different context, where Redis based memcached is a cache but RocksDB is a database, though not an enterprise grade one)
#Guille If you know the behavior of hot data(getting fetched frequently) is based of time-stamp then Rocksdb would a smart choice, but do optimize it for fallback using bloom-filters .If your hot data is random ,then go for Redis .Using rocksDB entirely in memory is not generally recommended in log-structured databases like Rocksdb and its specifically optimized for SSD and flash storage .So my recommendation would be to understand the usecase and pick a DB for that particular usecase .
Redis is distributed, in-memory data store where as Rocks DB is embedded key-value store and not distributed.
Both are Key-Value Stores, so they have something in common.
As others mentioned RocksDB is embedded (as a library), while Redis is a standalone server. Moreover, Redis can sharded.
RocksDB
Redis
persisted on disk
stored in memory
strictly serializable
eventually consistent
sorted collections
no sorting
vertical scaling
horizontal scaling
If you don't need horizontal scaling, RocksDB is often a superior choice. Some people would assume that an in-memory store would be strictly faster than a persistent one, but it is not always true. Embedded storage doesn't have networking bottlenecks, which matters greatly in practice, especially for vertical scaling on bigger machines.
If you need to server RocksDB over a network or need high-level language bindings, the most efficient approach would be using project UKV. It, however, also supports other embedded stores as engines and provides higher-level functionality, such as Graph collections, similar to RedisGraph, and Document collections, like RedisJSON.

Improving the performance of the titanic pattern

I am referring to the titanic pattern explained in the zeromq guide. Can someone please explain why it recommends not using a key-value store as compared to reading/writing disk files for persistence. Quoting from the guide:
"What I'd not recommend is storing messages in a database, not even a "fast" key/value
store, unless you really like a specific database and don't have performance worries. You
will pay a steep price for the abstraction, ten to a thousand times over a raw disk file."
There are other recommendations given in the guide, like storing the messages on to a disk file in a circular buffer fashion. But would it not be faster to store the messages, and retrieving them from a redis store? Any ideas? Thank you.
In the zeromq guide, the example provided for this pattern uses simple files in a very naive way (using buffered I/O, without any fsync'ing). The broker is directly responsible of storing things on the filesystem. So the performance is mostly linked to the efficiency of the VFS and the filesystem cache. There is no real I/O in the picture.
In this context, the cost of an extra hop to store and retrieve the data into Redis will be very noticeable, especially if it is implemented using synchronous queries/replies.
Redis is very efficient as a remote key/value store, but it cannot compete with an embedded store (even a store implemented on top of a filesystem cache).