java.lang.OutOfMemoryError: Java heap space on ignite cluster - ignite

We are migrating a web application from an ad hoc in memory cache solution to an apache ignite cluster, where the jboss that runs the webapp works as client node and two external vm works as ignite server nodes.
When testing performance with one client node and one server node all goes ok. But when testing with one client node and two server nodes in cluster, the server nodes crash with an OutOfMemoryError.
The virtual machine of both nodes it's started with -server -Xms1024M -Xmx1024M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseTLAB -XX:NewSize=128m -XX:MaxNewSize=128m -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=1024 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=60 -XX:MaxGCPauseMillis=1000 -XX:InitiatingHeapOccupancyPercent=50 -XX:+UseCompressedOops -XX:ParallelGCThreads=8 -XX:ConcGCThreads=8 -XX:+DisableExplicitGC
Any idea why a two nodes cluster fails when a single node one works perfectly running the same test ?
I don't know if it's relevant, but the test consists on 10 parallel http requests launched against the JBoss server, that each one starts a process that writes several entries into the caché.

The communication between nodes can add some overhead, so apparently 1GB is not enough for the data and Ignite itself. Generally 1GB is not enough, I would recommend to allocate at least 2GB, better 4GB per node.

In the end the problem wasn't in the amount of memory required by the two nodes, but in the synchronization between the nodes. My test caché was running with PRIMARY_SYNC, but write/read cycles where faster than the replication in the cluster and end up in an inconsistent read that provoked an infinite loop that wrote infinite values to the cluster.
Changing to FULL_SYNC fixed the problem.

Related

Apache ignite takes a long time to create new cache

My application will create new cache on demand, but it seems Apache Ignite always takes seconds of time to create a new cache when there are hundreds of caches already. I find there are two stages
consuming most of the time when creating new cache :
stage1: Waiting in exchange queue
stage2: Waiting for full message
Is there any way I can optimize this process?
Apache ignige: 2.10.0, cluster mode, two nodes, jdbc thin client
Jvm: Java HotSpot(TM) 64 bit Server VM, 1.8.0_60
Cache creation operation is not cheap as you correctly highlighted, it is cluster-wide operation and requires PME and other internal routines. For that reasons, think of reusing the existing caches if you need best performance.
You can accelerate caches processing and reduce resource usage if you group them in a single Cache Group. But network communication will be required nevertheless.

HTTPD response time is increasing after 90 TPS

I am doing load test to tune my apache to server maximum concurrent https request. Below is the details of my test.
System
I dockerized my httpd and deployed in openshift with pod configuration is 4CPU, 8GB RAM.
Running load from Jmeter with 200 thread, 600sec ramup time, loop is for infinite. duration is long run (Jmeter is running in same network with VM configuration 16CPU, 32GB RAM ).
I compiled by setting module with worker and deployed in openshift.
Issue
Httpd is not scaling more than 90TPS, even after tried multiple mpm worker configuration (no difference with default and higher configuration)
2.Issue which i'am facing after 90TPS, average time is increasing and TPS is dropping.
Please let me know what could be the issue, if any information is required further suggestions.
I don't have the answer, but I do have questions.
1/ What does your Dockerfile look like?
2/ What does your OpenShift cluster look like? How many nodes? Separate control plane and workers? What version?
2b/ Specifically, how is traffic entering the pod (if you are going in via a route, you'll want to look at your load balancer; if you want to exclude OpenShift from the equation then for the short term, expose a NodePort and have Jmeter hit that directly)
3/ Do I read correctly that your single pod was assigned 8G ram limit? Did you mean the worker node has 8G ram?
4/ How did you deploy the app -- raw pod, deployment config? Any cpu/memory limits set, or assumed? Assuming a deployment, how many pods does it spawn? What happens if you double it? Doubled TPS or not - that'll help point to whether the problem is inside httpd or inside the ingress route.
5/ What's the nature of the test request? Does it make use of any files stored on the network, or "local" files provisioned in a network PV.
And,
6/ What are you looking to achieve? Maximum concurrent requests in one container, or maximum requests in the cluster? If you've not already look to divide and conquer -- more pods on more nodes.
Most likely you have run into a bottleneck/limitation at the SUT. See the following post for a detailed answer:
JMeter load is not increasing when we increase the threads count

How can I configure Apache Zookeeper with redundancy on only two physical frames?

I would like to have a high-availability/redundant installation of Zookeeper running in my production environment. The problem is that I only have 2 physical frames available, so that rules out configuring a Zookeeper cluster/ensemble since I'd only have redundancy if the frame with the minority of servers goes down. What is the best practice in this situation? Is it possible to have a separate standalone install running on each frame connected to the same set of SOLR nodes or to use one server as primary and one as backup?
Zookeeper requires 3 nodes. In your scenario if you cannot get another machine you can setup multiple zookeeper nodes on the same machine in different directories using different ports.

Ignite Server mode vs Client Mode

Ignite has two modes, one is Server mode, and the other is client mode.I am reading https://apacheignite.readme.io/docs/clients-vs-servers, but didn't get a good understanding of these two modes.
In my opinion, there are two use cases:
If the Ignite is used as an embedded server in a java application, they the Ignite should be in server mode, that is, Ignite should be started with
Ignite ignite = Ignition.start(configFile)
If I have setup an Ignite cluster that are running as standalone processes. Then in my java code, I should start Ignite in client mode, so that the client mode Ignite can connect to the Ignite cluster, and CRUD the cache data that resides in the ignite cluster?
Ignition.setClientMode(true);
Ignite ignite = Ignition.start(configFile)
Yeah, this is correct understanding.
Ignite client mode intended as lightweight mode (which do not store data and do not execute compute tasks). Client node should communicate with a cluster and should not utilize self resources.
Client does not even started without server node presented in topology.
In order to further add to #Makros answer, Ignite Client stores data if near cache is enabled. This is done in order to increase the performance of cache retrievals.
Yeah, you are right in ignite client has IgniteConfiguration.setClientMode(true); and for server IgniteConfiguration.setClientMode(false);, which is default value. if set IgniteConfiguration.setClientMode(false); in you code or forget to set setClientMode(); it will work as server.

Does Apache Ignite support WAN replication?

I've been doing some experiments with Apache Ignite and I've started to look into WAN replication. By this I mean there would be 2 (or more) data centres each running an Ignite cluster. There would be some caches that I would like kept in sync between the two data centres.
Does Apache Ignite support this? If so how is this configured as I can't find any mention of this in the documentation.
At the moment Ignite does not support caches spanning multiple clusters(nor cache mirroring). If however you mean there is only one Ignite cluster consisting of nodes located in different data centers(WAN),that would be possible though would most likely be inefficient! since you will have to use the Replicated Mode.
GridGain provides asynchronous WAN replication on top of Ignite as part of their payed solution: https://www.gridgain.com/docs/latest/administrators-guide/data-center-replication/configuring-replication