Apache ignite In-Memory Database

Apache ignite In-Memory Database - ignite

I want to use ignite as a pure in memory db(all data/indexes is in memory) with persistence to disk (similar to Redis), when the service restarts all the data/indexes should be loaded into memory again (automatic warmup).
Update:
There's actually two questions inline here:
Can I guarantee that the data is in memory including indexes? (AFAIK persistent-store doesn't guarantee it).
Can I warm the cache(including indexes) without writing Java? (implementing CacheStore as described here: https://apacheignite.readme.io/docs/data-loading#section-ignitecacheloadcache)

You can enable persistency through xml configuration: https://apacheignite.readme.io/v2.6/docs/distributed-persistent-store#section-usage

Related

why Ignite server shows heap usage without any activity?

Ignite version : 2.12
OS : Windows 10
I am trying to understand ignites heap usage.
I started Ignite server with below command and no special vm args. As suggested by https://ignite.apache.org/docs/latest/quick-start/java
ignite.bat -v ..\examples\config\example-ignite.xml
Post that started analyzing heap usage of same with visualvm tool and the heap usage looks like this
Next thing that I tried is increase the heap memory and restart the server.
Surprisingly Now ignite is consuming even more memory as seen in this graph
I Know the GC is working its way to clear the heap, but why does ignite memory consumption increases with increase in heap space ?
How will this impact a server with ~40-60G memory, how much memory I can expect to be consumed by Ignite?
I'm planning to use ignite as in memory cache along with Cassandra as DB.

Just like Cassandra, Hadoop or Kafka, Ignite is a Java middleware that uses the Java Heap for various needs. But your data is always stored in an off-heap memory that allows utilizing all available memory space without worrying about garbage collection. This gives Ignite complete control over how the data is managed, and ensures the long-term performance of the system.
Ignite uses a page memory model for storing everything, including user data, indices, meta information, etc. This allows Ignite to utilize memory management, improve performance and it also can use the whole disk without any data modifications.
In other words, you might think that direct page memory access is being performed by memory pointers (outside of JVM), but some internal tasks like bootstrapping Ignite itself, performing local SQL processing tasks, etc. do require JVM heap because Ignite itself is written in Java.
Check this and that pages for details.
How will this impact a server with ~40-60G memory, how much memory I
can expect to be consumed by Ignite?
You would need 40-60 GB of RAM + something for JVM itself (Java heap), recommended values might differ, but 2GB of Java heap should be enough.

How does Redis achieve the high throughput and performance?

I know this is a very generic question. But, I wanted to understand what are the major architectural decision that allow Redis (or caches like MemCached, Cassandra) to work at amazing performance limits.
How are connections maintained?
Are connections TCP or HTTP?
I know that it is completely written in C. How is the memory managed?
What are the synchronization techniques used to achieve high throughput inspite
of competing read/writes?
Basically, what is the difference between a plain vanilla implementation of a machine with in memory cache and server that can respond to commands and a Redis box? I also understand that the answer needs to be very huge and should include very complex details for completion. But, what I'm looking for are some general techniques used rather than all nuances.

There is a wealth of of information in the Redis documentation to understand how it works. Now, to answer specifically your questions:
1) How are connections maintained?
Connections are maintained and managed using the ae event loop (designed by the Redis author). All network I/O operations are non blocking. You can see ae as a minimalistic implementation using the best network I/O demultiplexing mechanism of the platform (epoll for Linux, kqueue for BSD, etc ...) just like libevent, libev, libuv, etc ...
2) Are connections TCP or HTTP?
Connections are TCP using the Redis protocol, which is a simple telnet compatible, text oriented protocol supporting binary data. This protocol is typically more efficient than HTTP.
3) How is the memory managed?
Memory is managed by relying on a general purpose memory allocator. On some platforms, this is actually the system memory allocator. On some other platforms (including Linux), jemalloc has been selected since it offers a good balance between CPU consumption, concurrency support, fragmentation and memory footprint. jemalloc source code is part of the Redis distribution.
Contrary to other products (such as memcached), there is no implementation of a slab allocator in Redis.
A number of optimized data structures have been implemented on top of the general purpose allocator to reduce the memory footprint.
4) What are the synchronization techniques used to achieve high throughput inspite of competing read/writes?
Redis is a single-threaded event loop, so there is no synchronization to be done since all commands are serialized. Now, some threads also run in the background for internal purposes. In the rare cases they access the data managed by the main thread, classical pthread synchronization primitives are used (mutexes for instance). But 100% of the data accesses made on behalf of multiple client connections do not require any synchronization.
You can find more information there:
Redis is single-threaded, then how does it do concurrent I/O?
What is the difference between a plain vanilla implementation of a machine with in memory cache and server that can respond to commands and a Redis box?
There is no difference. Redis is a plain vanilla implementation of a machine with in memory cache and server that can respond to commands. But it is an implementation which is done right:
using the single threaded event loop model
using simple and minimalistic data structures optimized for their corresponding use cases
offering a set of commands carefully chosen to balance minimalism and usefulness
constantly targeting the best raw performance
well adapted to modern OS mechanisms
providing multiple persistence mechanisms because the "one size does fit all" approach is only a dream.
providing the building blocks for HA mechanisms (replication system for instance)
avoiding stacking up useless abstraction layers like pancakes
resulting in a clean and understandable code base that any good C developer can be comfortable with

Does Redis Db has built-in compression option

Redis is "memory monster". Storing data as "compressed json string" minimizes memory usage.
Is there any built-in compression option in Redis Db?

Redis uses LZF light data compressor at the dump time, so it won't lessen the memory consumption. Implying that the redis does not compresses the data in memory and stores it as a string.You must deploy your own client side compression code.
The lua scripting also provides the compression algorithm but the branch is relatively new and therefore won't be advisable to use at production level.

No, there isn't any runtime compression option.
However, as dan-boa said - it might be a good idea to implement compression on your application side. Doing it that way will let to save CPU on the Redis server. Your Database server won't be affected of cpu time needed for compression.
In one of our Redis cluster we saved like 82% of memory (from circa 340GB to 60GB) thanks to GZIPing our json-based blobs. Some more thoughts about it and other ways of optimizing memory usage can be found in our article:
http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/
Note: link moved to archive.org backup

Distributed Cache that supports incr

I'm looking for a distributed key/value store that supports a balanced load of reads and writes.
Necessary Features:
Get, Set, Incr
Disk backed
Blazingly fast (i.e. eventual consistency is OK)
High availability (i.e. rebalancing load upon node failures)
Nice to have Features:
Overflow to disk (Assuming the load has nice locality properties)
Platform-agnostic (e.g. java based)
Because a lot of the distributed caching solutions support get/set but not incr, it looks like the only option that fits the requirements is terracotta. (Though Redis has a cluster model in their unstable branch).
Any Suggestions?

I can speak namely for redis.
Necessary Features:
Yes, support also for other advanced data structures like hashed, (ordered) sets and lists
Yes, by default redis saves snapshot of the data set on disk.
Yes.
Rebalancing load upon node failures is rather a partition tolerance than high availability in terms of CAP theorem. Redis support replication and cluster is in development.
Nice to have Features:
Read the article about virtual memory.
Most of the POSIX systems.
Maybe your can try to take a look also on membase or couchbase server.

http://www.basho.com/ Riak will do this for you.

Redis - Can data size be greater than memory size?

I'm rather new to Redis and before using it I'd like to learn some important (as for me) details on it. So....
Redis is using RAM and HDD for storing data. RAM is used as fast read/write storage, HDD is used to make this data persistant. When Redis is started it loads all data from HDD to RAM or it loads only often queried data to the RAM? What if I have 500Mb Redis storage on HDD, but I have only 100Mb or RAM for Redis. Where can I read about it?

Redis loads everything into RAM. All the data is written to disk, but will only be read for things like restarting the server or making a backup.
There are a couple of ways you can use it with less RAM than data though. You can set it up in combination with MySQL or another disk based store to work much like memcached - you manage cache misses and persistence manually.
Redis has a VM mode where all keys must fit in RAM but infrequently accessed data can be on disk. However, I'm not sure if this is in the stable builds yet.

Recent versions (>2.0) have improved significantly and memory management is more efficient. See this blog post that explains how to use hashes to optimize RAM memory footprint: http://antirez.com/post/redis-weekly-update-7.html

The feature called Virtual Memory and it official deprecated
Redis VM is now deprecated. Redis 2.4 will be the latest Redis version featuring Virtual Memory (but it also warns you that Virtual Memory usage is discouraged). We found that using VM has several disadvantages and problems. In the future of Redis we want to simply provide the best in-memory database (but persistent on disk as usual) ever, without considering at least for now the support for databases bigger than RAM. Our future efforts are focused into providing scripting, cluster, and better persistence.
more information about VM: https://redis.io/topics/virtual-memory

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas