High JVM long pause in ignite client microservice - ignite

A high JVM long pause is coming in ignite thick client , resulting in client going out of cluster.
[2022:02:17:19:02:15] [org.apache.ignite.internal.IgniteKernal] [WARN] Possible too long JVM pause: 28690 milliseconds.
[2022:02:17:19:02:15] [org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi] [INFO] Client node disconnected from cluster, will try to reconnect with new id
There are no issues reported in the gc logs during the time of issue and also ignite client logs logs is also showing no high memory and cpu usage during the time of issue.
[2022:02:17:19:00:56] [org.apache.ignite.internal.IgniteKernal] [INFO]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=35a3b804, uptime=4 days, 09:08:44.781]
^-- H/N/C [hosts=9, nodes=62, CPUs=120]
^-- CPU [cur=0.3%, avg=0.77%, GC=0%]
^-- PageMemory [pages=0]
^-- Heap [used=3958MB, free=35.58%, comm=6144MB]
^-- Off-heap [used=0MB, free=-1%, comm=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=8, qSize=0]
GC Log:
2022-02-17T19:01:05.911+0000: 378554.327: [GC pause (G1 Evacuation Pause) (young), 0.0196024 secs]
[Eden: 3678.0M(3678.0M)->0.0B(3678.0M) Survivors: 8192.0K->8192.0K Heap: 3994.3M(6144.0M)->317.1M(6144.0M)]
[Times: user=0.06 sys=0.00, real=0.02 secs]
Can anyone please suggest on what could be the reasons for high jvm long pause.

Take a look at this documentation. It might also happen in case of VM freeze for a backup procedure, a JVM safepoint or so.

Related

Moving from caffeine to redisson redis with AWS elatic cache CPU increase

We are moving an in-memory cache implementation of DB results cache to redis(Aws elastic cache). The JVM memory usage for the redisson based redis implementaion in performance test show more CPU usage (about 30 to 50%
The blue line is the redisson to redis implementation of distributed cache and the yellow line is the in memory caffeine implementation. Is this expected legitimate increase due to more I/O ? Or some redission configuration tuning needed?

How to monitor JVM memory in Apache NiFi

I am creating a memory monitoring reporting tasks in Apache NiFi, to monitory the JVM usage. But i don't know which memory pool is appropriate to monitor usage of JVM. Any suggestion will be appreciated.
Memory pools available:
Code Cache
Metaspace
Compressed Class Space
G1 Eden Space
G1 Survivor Space
G1 Old Gen
As per my knowledge G1 Eden Space, G1 Survivor Space and G1 Old Gen are younger generation memory pool, so these three used to monitor java heap space. correct me if i am wrong.
You can use MonitorMemory to monitor Java Heap.
Detail is here:
NIFI : Monitoring processor and nifi Service
Monitor Apache NiFi with Apache NiFi

how to resove "connection.blocked: true" in capabilities on the RabbitMQ UI

"rabbitmqctl list_connections" shows as running but on the UI in the connections tab, under client properties, i see "connection.blocked: true".
I can see that messages are in queued in RabbitMq and the connection is in idle state.
I am running Airflow with Celery. My jobs are not executing at all.
Is this the reason why jobs are not executing?
How to resolve the issue so that my jobs start running
I'm experiencing the same kind of issue by just using celery.
It seems that when you have a lot of messages in the queue, and these are fairly chunky, and your node memory goes high, the rabbitMQ memory watermark gets trespassed and this triggers a blocking into consumer connections, so no worker can access that node (and related queues).
At the same time publishers are happily sending stuff via the exchange so you get in a lose-lose situation.
The only solution we had is to avoid hitting that memory watermark and scale up the number of consumers.
Keep messages/tasks lean so that the signature is not MB but KB

Celery stops processing

With roughly 0.9 million messages in rabbitmq, celery workers stopped processing tasks. On killing celery and running it again, processing resumed. Rabbit never went out of memory. Nothing suspicious in any logs or statuses except:
** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
from /var/log/rabbitmq/rabbit.log. Similar symptoms were present before with around 1.6m messages en-queued.
More info:
Celery concurrency: 4
RAM installed: 4GB
Swap space 8GB
disk_free_limit (Rabbit): 8GB
vm_memory_high_watermark: 2
vm_memory_high_watermark_paging_ratio: 0.75
How can the actual cause of workers stopping be diagnosed and how can it be prevented from reoccurring.
Thanks.
Probably submitting/consuming the messages from queue too fast?
If you don't need messages to be durable and can store those in memory only, it will significantly improve the RabbitMQ performance.
http://docs.celeryproject.org/en/latest/userguide/optimizing.html#using-transient-queues

What is the expected behavior when a RabbitMQ durable queue runs out of RAM?

My understanding of RabbitMQ durable queues (i.e. delivery_mode = 2) is that they run in RAM, but that messages are flushed to disk so that they can be recovered in the event that the process is restarted or the machine is rebooted.
It's unclear to me though what the expected behavior is when the machine runs out of memory. If the queue gets overloaded, dies, and needs to be restored, then simply loading the messages from the disk-backed store would consume all available RAM.
Do durable queues only load a subset of the messages into RAM in this scenario?
RabbitMQ will page the messages to disc as memory fills up. See https://www.rabbitmq.com/memory.html section "Configuring the Paging Threshold".