Websphere - frequent thread/heap dump generation - jvm

Our application in prod environment is generating frequent heap/thread dumps while running very large reports eventually resulting in JVM failure. WebSphere is the server and heap size is set to 1024/2048(initial/max) across all nodes.
What are some ways to tackle this issue? I could think about the following options. Is there anything else I am missing?
Set min/max heap size to 2048 or even higher?
Enable verbose garbage collection in WebSphere and analyze optimal heap size?
Thread Analysis:
Runnable : 123(67%)
Blocked : 16(9%)
Waiting on Condition : 43(23%)

A good starting place to start investigating the OOM is this IBM KnowledgeCenter topic

Since it seems you experience an OutOfMemory Issue there are three possibilities to consider:
Your Apps consistently need more memory to handle the current load.
Solution: You have to load test you application with production-like traffic and tune your Min/Max Heap Size accordingly.
You have a Memory Leak issue.
Solution: Analyze the heapdumps/coredumps produced using IBM Support Assistant tools. A PMR to IBM would help.
Websphere has a memory leak.
Solution: Open a PMR
Here is a nice read about Java Memory Management in WAS environments.

Try to capture the memory, garbage collection information from the production environment. I am not sure if GC log has any performance impact. However, jstat is an extremely light weight tool and can be used in production environment with out any performance impact. Dump the output of jstat at regular intervals using the following command (Here I am setting the interval to 1 hour):
jstat -gc <PID> 3600s

Related

High CPU with ImageResizer DiskCache plugin

We are noticing occasional periods of high CPU on a web server that happens to use ImageResizer. Here are the surprising results of a trace performed with NewRelic's thread profiler during such a spike:
It would appear that the cleanup routine associated with ImageResizer's DiskCache plugin is responsible for a significant percentage of the high CPU consumption associated with this application. We have autoClean on, but otherwise we're configured to use the defaults, which I understand are optimal for most typical situations:
<diskCache autoClean="true" />
Armed with this information, is there anything I can do to relieve the CPU spikes? I'm open to disabling autoClean and setting up a simple nightly cleanup routine, but my understanding is that this plugin is built to be smart about how it uses resources. Has anyone experienced this and had any luck simply changing the default configuration?
This is an ASP.NET MVC application running on Windows Server 2008 R2 with ImageResizer.Plugins.DiskCache 3.4.3.
Sampling, or why the profiling is unhelpful
New Relic's thread profiler uses a technique called sampling - it does not instrument the calls - and therefore cannot know if CPU usage is actually occurring.
Looking at the provided screenshot, we can see that the backtrace of the cleanup thread (there is only ever one) is frequently found at the WaitHandle.WaitAny and WaitHandle.WaitOne calls. These methods are low-level synchronization constructs that do not spin or consume CPU resources, but rather efficiently return CPU time back to other threads, and resume on a signal.
Correct profilers should be able to detect idle or waiting threads and eliminate them from their statistical analysis. Because New Relic's profiler failed to do that, there is no useful way to interpret the data it's giving you.
If you have more than 7,000 files in /imagecache, here is one way to improve performance
By default, in V3, DiskCache uses 32 subfolders with 400 items per folder (1000 hard limit). Due to imperfect hash distribution, this means that you may start seeing cleanup occur at as few as 7,000 images, and you will start thrashing the disk at ~12,000 active cache files.
This is explained in the DiskCache documentation - see subfolders section.
I would suggest setting subfolders="8192" if you have a larger volume of images. A higher subfolder count increases overhead slightly, but also increases scalability.

Shinking JVM memory and Swap

Virtual Machine:
4CPU
10GB RAM
10GB swap
Java 1.7
-Xms=-Xmx=6144m
Tomcat 7
We observed a very strange behaviour with the JVM. The JVm resident memory began to shrink and the swap usage shot up to over 50%.
Please see below stats from monitoring tools.
http://i44.tinypic.com/206n6sp.jpg
http://i44.tinypic.com/m99hl0.jpg
Any pointers to understand this is grateful.
Thanks!
Or maybe your Java program was idle and it didn't need that memory, and you have high swappiness? In such situation your OS would free RAM just in case and leave only used part.
In my opinion, that is actually good behaviour, why should you waste RAM for process that won't use it?
Unless you run only this one process on VM, then it would be quite good idea to set swappiness to 0 or other small number - this memory was given to this single process, so we may disable swapping it.
Thanks for the response. Yes this is more close to a system troubleshooting than Java but I thought this the right forum to initiate this topic incase anybody has seen such a phenomena with JVM.
Anyways, I had already checked the top and no there was no other process than Java which was hungry for memory. Actually the second top process was utilizing 72MB (RSS).
No the swappiness is not aggressive set on this system but at default 60. One additional information I missed to share is we have 4 app servers in cluster and all showed this behaviour exactly at the same time. AFAIK, JVM does not swap out but the OS would. But all of it is what confusing me.
All these app servers are production and busy serving request so not idle. The used Heap size was at Avg 5 GB used of the the 6GB.
The other interesting thing I found out were some failed messages in the Vmware logs at the same time which is what I'm investigating.

mongodb high cpu usage

I have installed MongoDB 2.4.4 on Amazon EC2 with ubuntu 64 bit OS and 1.6 GB RAM.
On this server, only MongoDB running nothing else.
But sometime CPU usage reach to 99% and load average: 500.01, 400.73,
620.77
I have also installed MMS on server to monitor what's going on server.
Here is MMS detail
As per MMS details, indexing working perfectly for each queries.
Suspect details as below
1) HIGH non-mapped virtual memory
2) HIGH page faults
Can anyone help me to understand what exactly causing high CPU usage ?
EDIT:
After comments of #Dylan Tong, i have reduced active connetions but
still there is high non-mapped virtual memory
Here's a summary of a few things to look into:
1. Observed a large number of connections and cursors (13k):
- fix: make sure your connection pool is appropriate. For reporting, and your current request rate, you only need a few connections at most. Also, I'm guessing you have a m1small instance, which means you only have 1 core.
2. Review queries and indexes:
- run your queries with explain(), to observe how the queries are executed. The right model normally results in queries only pulling very few documents and utilization of an index.
3. Memory (compact and readahead setting):
- make the best use of memory. 1.6GB is low. Check how much free memory you have, and compare it to what is reported as resident. A couple of common causes of low resident memory is due to fragmentation. If there are alot of documents moving, changing size and such, you should run the compact command to defragment your data files. Also, a bad readahead can lead to poor use of memory as well. Check your readahead setting (http://manpages.ubuntu.com/manpages/lucid/man2/readahead.2.html). Try a few values starting with low values (http://docs.mongodb.org/manual/administration/production-notes/). The production notes recommend 32 (for standard 512byte blocks). Sometimes higher values are optimal if your documents are larger. The hope is that resident memory should be close to your available memory and your page faults should start to lower.
If you're using resources to the fullest after this, and you're still capped out on CPU then it means you need to up your resources.

Memory leak during in hibernate search

Greeting,
We are facing memory leak issue recently on one of our apps.
Development environment : Lucene2.4.0, hibernate search3.2.0, hibernate 3.5.0, spring2.5 and ehcache 1.4.1
The problem is that memory in old gen gradually goes up in a time period. Eventually, JVM runs out of memory as we see from jvm stats that old generation capacity reaches the maximum. As a result, I have to restart web to release all memory.
I generated a heap dump from app and use memory analyzer to check it. I see this:
123,726 instances of "org.apache.lucene.index.TermInfosReader$ThreadResources", loaded by "org.apache.catalina.loader.WebappClassLoader # 0x7f5d71ffe3c8" occupy 3,139,449,272 (79.54%) bytes. These instances are referenced from one instance of "java.util.concurrent.ConcurrentHashMap$Segment[]", loaded by "<system class loader>"
Can you give me some advices please?
thanks

How to measure the memory usage per active Apache Connection?

I would like to measure the memory consumption for one active Apache connection(=Thread) under Ubuntu.
Is there a monitoring tool which is capable of doing this?
If not, does anyone knows how much memory an Apache connection roughly needs?
Activate the mod_status module, you'll get a report on /server-status page, there is a more parseable version on /server-status?q=auto. If you enable ExtendedStatus On you will have a lot of information on processes and threads.
This is the page used by monitoring tools to track a lot of stats parameters, so you will certainly find the one you need (edit: if it is not memory...) . Be careful with security/access settings of this file, it's a nice tool to check how your server respond to DOS :-)
About memory you must note that Apache loves memory, how much memory per process depends on a lot of things (number of modules loaded - check that you need all the ones you have, number of virtualHosts, etc). But on a stable configuration it does not move a lot (except if you use PHP scripts with high memory limit usage...). If you find memory leaks try to limit the number of requests per process MaxRequests (apache will kill him and put a new one).
edit: in fact not a lot of memory info in the server-status. About monitoring tools, any tool using SNMP MIB-II can track memory usage per process, with average/top/low values for the different childs (Cacti, Nagios, Munin, etc) if you had a snmpd daemon. Check this excellent Munin example. It's not a tracking of each apache child but it will give you an idea of what you can track with these tools. If you do not need a complete monitoring system such as Nagios or Centreon, with alerts, user managmenent, big networks (and if you do not have a lot of days for books reading) Munin is, IMHO, a pretty tool to get monitoring reports quite fast.
I'm not sure if there are any tools for doing this. But you could estimate it yourself. Start apache and check how much memory it uses without any sessions. Than create a big number of sessions and check again how much memory it uses.
You could use JMeter to create different workloads.