I deployed an ignite cluster in yarn. The cluster has 5 servers. Each server has 10GB memory and 8GB heap. I was trying to write a lot of data to ignite cache. Each item is an integer array whose length is 100K. The backups is 2. When I write 3980 items to the ignite cache, the cluster's heap is almost full. But instead of reject writing, the servers went down one by one.
My questions are:
Is there a configuration or way to control the cache ratio of servers, so the heap won't be full and servers won't go down?
Apparently, servers go down when write too much into cache seems not good for users. I'm wondering why ignite will let this happen, if user uses default configuration.
Apache Ignite, as well as Java Virtual Machine, is NOT responsible for managing or controlling a size of data sets that are placed into Java heap. This is the reason why OutOfMemoryError is presented in Java API because it's a responsibility of an application to handle its data sets and make sure that they fit into the heap.
You can set up eviction policy and Ignite may either move data to off-heap region or swap or completely remove from the memory.
Refer to my foreword above. This is the responsibility of an application. Ignite can assist here wit its eviction policy, off-heap mode and ability to scale out.
Related
Could you please answer these 2 questions and correct me if wrong.
I assume Both Redis Database and Redis Cache are stored in Memory and not in Disk. Am I correct?
If Yes, What are the major difference between both. I am assuming both are stored in memory and it should not make much difference between them both. I mean the speed should be the same as they are in memory only. Do we still need Cache again?
Could you please tell me what are the differences and advantages between the both.
Second Question: Can the server restart remove all data in the Redis database? Cache must be deleted for sure I believe.
Thanks
Not sure what do you mean?
Redis is a product first of all - its an in-memory data structures store.
Depending on its configurations it can be targeted to different use cases:
Database
Cache
Even message broker
If you're coming from the cloud world, cloud providers can call this "Cache" and this means that they offer a redis that is pre-configured to be used as a cache (remove the oldest records when the memory becomes next to be fully utilized, etc).
But after you'll you will work with some kind of redis client that will interact with remote redis server.
We are evaluating ChronicleMap and our application runs cluster mode with nodes ranging from 5 to 45. The plan is to have the ChronicleMap persisted in shared NFS folder so that all the nodes can read/write.
There are more likely chance that individual nodes could go down for various reasons in the middle of a read/write operation with this said. I have some questions
If node-1 goes down during a write operation, can another healthy node-2 in the cluster still continue to read/write to the files?
Lets say we implement some logic to detect a server crash and call the .recoverPersistedTo() on restart. Will this cause any issues while other healthy nodes in the cluster are reading/writing to the files? The reason I ask this question is that the document says
“You must ensure that no other process is accessing the Chronicle Map
store when calling .recoverPersistedTo()”
I have read that using .recoverPersistedTo() in place is createPersistedTo() is not a good practice, but what are the downsides?
First of all, we (Chronicle) don't support putting Chronicle Map files on NFS (as we use memory mapping and NFS is known to cause problems with it). Additionally, trying to use recovery on NFS will cause data corruption as there's no adequate file locking on NFS, and recovery tries to lock the file to prevent simultaneous recovery by multiple processes. In general, open source Chronicle Map is supposed to be used by multiple processes on the same host.
The solution to your problem is commercial Map Enterprise which supports map replication between nodes, please contact sales#chronicle.software for details.
My EMR master node has become full and I need to attach some ESB volumne to it, is there any way to do it without terminating the cluster?
You can add additional EBS volumes & also resize
How to explained here :
https://superuser.com/questions/1409373/how-to-add-an-ebs-volume-by-snapshot-id-to-amazon-emr
https://github.com/qyjohn/AWS_Tutorials/wiki/Grow-EBS-volumes-on-EMR-clusters
I don't think so. This is because you set up Amazon Elastic Block Store (Amazon EBS) volumes and configure mount points when the cluster is launched, so it’s difficult to modify the storage capacity after the cluster is running.
The feasible solutions usually involve adding more nodes to your
cluster, backing up your data to a data lake, and then launching a new
cluster with a higher storage capacity. Or, if the data that occupies
the storage is expendable, removing the excess data is usually the way
to go.
For more details,have a look at: https://aws.amazon.com/blogs/big-data/dynamically-scale-up-storage-on-amazon-emr-clusters/
I am facing some scaling issues with my redis instances and was wondering if there's a way to configure redis to save data only to disk (and not hold it in memory). That way I could just increase disk space and not RAM.
Right now my instances are getting stuck and just hang when they reach the memory limit.
Thanks!
No - Redis, atm, is an in-memory database. That means that all data that it manages resides first and foremost in RAM.
According to this link which belongs to JBoss documentation, I understood that Infinispan is a better product than JBoss Cache and kind of improvement the reason for which they recommend to migrate from JBoss Cache to Infinispan, that is supported by JBoss as well. Am I right in what I understood? Otherwise, are there differences?
One more question : Talking about replication and distribution, can any one of them be better than the other according to the need?
Thank you
Question:
Talking about replication and distribution, can any one of them be better than the other according to the need?
Answer:
I am taking a reference directly from Clustering modes - Infinispan
Distributed:
Number of copies represents the tradeoff between performance and durability of data
The more copies you maintain, the lower performance will be, but also the lower the risk of losing data due to server outages
use of a consistent hash algorithm to determine where in a cluster entries should be stored
No need to replicate data on each node that takes more time than just communicating hash code
Best suitable if no of nodes are high
Best suitable if size of data stored in cache is high.
Replicated:
Entries added to any of these cache instances will be replicated to all other cache instances in the cluster
This clustered mode provides a quick and easy way to share state across a cluster
replication practically only performs well in small clusters (under 10 servers), due to the number of replication messages that need to happen - as the cluster size increases
Practical Experience:
I are using Infinispan cache in my running live application on Jboss server having 8 nodes. Initially I used replicated cache but it took much longer time to respond due to large size of data. Finally we come back to Distributed and now its working fine.
Use replicated or distributed cache only for data specific to any user session. If data is common regardless of any user than prefer Local cache that's created separately for each node.