Ignite keeps crashing with OOM - ignite

We are running a 2-client 3-server cluster. We keep seeing OOM even as we provide more heap. Below is the heap dump snapshot, any idea why we see so many CacheEvent objects, what is really happening here?
Looking at the source for 'GridSelectorNioSessionImpl', seems it has an unbounded queue that is accumulating WriteRequests. Any idea why are these not being flushed? I was reading about setting messageQueue, even the ignite logs warns about it possibly resulting into an OOM, since its unbounded. But i could not correlate that setting to the queue initialization happening in 'GridSelectorNioSessionImpl'.
TIA

Related

what cause vue.js memory leak

Memory leakage occurs in the browser with less memory (448MB). The chrome debugger was used to determine the cause, and the graph of the test is as shown in the image below. It has been confirmed that JS Heap, Nodes, and EventHandler are all reduced at the time of GC. Can the increase in Nodes and EventHandler also cause memory leakage?
Unfortunately, we can't allocate more memory...
The increase in nodes will increase the DOM elements therefore will contribute to the memory leakage. In addition to that there are also other factors that could be a cause for the memory leakage. Please check the below link to get to know the ways how to fix those problems
https://nolanlawson.com/2020/02/19/fixing-memory-leaks-in-web-applications/

Redis-server using all RAM at startup

i'm using redis and noticed that it crashes with the following error :
MISCONF Redis is configured to save RDB snapshots
I tried the solution suggested in this post
but everything seems to be OK in term of permissions and space.
htop command tells me that redis is consuming 70% of RAM. i tried to stop / restart redis in order to flush but at startup, the amount of RAM used by redis was growing up dramatically and stops around 66%. I'm pretty sure at this moment no processus was using any redis instance !
what happens there ?
The growing up ram issue is an expected behaviour of Redis at first data load, after restarts, writing the data to disk (snapshot process). Redis tends to allocate memory as much as it can unless you don't use "maxmemory" option at your conf file.
It allocates memory but not release immediately. Sometimes it takes hours, I saw such cases.
Well known fact about Redis is that, it can allocate memory up to twice size of the dataset it keeps.
I suggest you to wait couple of hours without any restart (Redis can work in this time, get/set operations etc.) and keep watching the memory.
Please check that too
Redis will not always free up (return) memory to the OS when keys are
removed. This is not something special about Redis, but it is how most
malloc() implementations work. For example if you fill an instance
with 5GB worth of data, and then remove the equivalent of 2GB of data,
the Resident Set Size (also known as the RSS, which is the number of
memory pages consumed by the process) will probably still be around
5GB, even if Redis will claim that the user memory is around 3GB. This
happens because the underlying allocator can't easily release the
memory. For example often most of the removed keys were allocated in
the same pages as the other keys that still exist.

RabbitMQ x-max-length-bytes Argument Not Working

All,
I'm playing around with the parameters of RabbitMQ queues trying to limit growth. I have noticed that the x-max-length parameter works as expected, but when I change to x-max-length bytes, this does not.
See my example below, where we see the argument correctly listed for the queue, but you can see the unbounded growth happening and a memory footprint for the queue of 421MB while I had it set to just 1MB.
Any ides on why this is happening?
Thanks and Regards.

Monitor worker crashes in apache storm

When running in a cluster, if something wrong happens, a worker generally dies (JVM shutdown). It can be caused by many factors, most of the time it is a challenge (the biggest difficulty with storm?) to find out what causes the crash.
Of course, storm-supervisor restarts dead workers and liveness is quite good within a storm cluster, still a worker crash is a mess that we should avoid as it adds overhead, latency (can be very long until a worker is found dead and respawned) and data loss if you didn't design your topology to prevent that.
Is there an easy way / tool / methodology to check when and possibly why a storm worker crashes? They are not shown in storm-ui (whereas supervisors are shown), and everything needs manual monitoring (with jstack + JVM opts for instance) with a lot of care.
Here are some cases that can happen:
timeouts and many possible reasons: slow java garbage collection, bad network, bad sizing in timeout configuration. The only output we get natively from supervisor logs is "state: timeout" or "state: disallowed" which is poor. Also when a worker dies the statistics on storm-ui are rebooted. As you get scared of timeouts you end up using long ones which does not seem to be a good solution for real-time processing.
high back pressure with unexpected behaviour, starving worker heartbeats and inducing a timeout for instance. Acking seems to be the only way to deal with back pressure and needs good crafting of bolts according to your load. Not acking seems to be a no-go as it would indeed crash workers and get bad results in the end (even less data processed than an acking topology under pressure?).
code runtime exceptions, sometimes not shown in storm-ui that need manual checking of application logs (the easiest case).
memory leaks that can be found out with JVM dumps.
The storm supervisor logs restart by timeout.
you can monitor the supervisor log, also you can monitor your bolt's execute(tuple) method's performance.
As for memory leak, since storm supervisor does kill -9 the worker, the heap dump is likely to be corrupted, so i would use tools that monitor your heap dynamically or killing the supervisor to produce heap dumps via jmap. Also, try monitoring the gc logs.
I still recommend increasing the default timeouts.

Recover memory from w3wp.exe

Is it possible to recover memory lost from w3wp.exe? I thought a session.abandon() should clear up the resources like that? The thing is I have a web application, certain pages make w3wp.exe grow significantly. Like from 40 MB to 400 MB. Now I am going to optimize my code defiantly to reduce this, however for what ever amount the w3wp.exe grows, is there no way to recover the lost memory even when the user has logged out and closed the browser?
I know this worker process will recycle after 30 minutes (default) of idle use, but what if there is no idle use-age for a long time and the worker process still has that portion of memory, it just keeps on growing? Any thoughts people?
The garbage collector will take care of whatever memory needs to be freed, provided that you dispose things correctly, etc. The GC doesn't immediately kick in every time you call Session.Abandon(), as that would be a major performance hit.
That said, every application has a "normal" memory usage, i.e. a stable memory usage (again, provided you don't have leaks), and this figure is different for every application. 400MB can be a lot or it can be nothing, depending on what your app does. I have apps that hover around 400MB and others around 1.5GB and that's OK as long as memory usage stabilizes somewhere. If you see unbounded memory usage then you most likely have a leak somewhere in your app.
Storing large amounts of data in the in-proc session can also quickly rack up the memory usage. Instead, use a file or a database to store this data.
unless you are leaking the memory, the memory manager will re-use this memory so you should not see the process memory keep growing.