Shinking JVM memory and Swap - jvm

Virtual Machine:
4CPU
10GB RAM
10GB swap
Java 1.7
-Xms=-Xmx=6144m
Tomcat 7
We observed a very strange behaviour with the JVM. The JVm resident memory began to shrink and the swap usage shot up to over 50%.
Please see below stats from monitoring tools.
http://i44.tinypic.com/206n6sp.jpg
http://i44.tinypic.com/m99hl0.jpg
Any pointers to understand this is grateful.
Thanks!

Or maybe your Java program was idle and it didn't need that memory, and you have high swappiness? In such situation your OS would free RAM just in case and leave only used part.
In my opinion, that is actually good behaviour, why should you waste RAM for process that won't use it?
Unless you run only this one process on VM, then it would be quite good idea to set swappiness to 0 or other small number - this memory was given to this single process, so we may disable swapping it.

Thanks for the response. Yes this is more close to a system troubleshooting than Java but I thought this the right forum to initiate this topic incase anybody has seen such a phenomena with JVM.
Anyways, I had already checked the top and no there was no other process than Java which was hungry for memory. Actually the second top process was utilizing 72MB (RSS).
No the swappiness is not aggressive set on this system but at default 60. One additional information I missed to share is we have 4 app servers in cluster and all showed this behaviour exactly at the same time. AFAIK, JVM does not swap out but the OS would. But all of it is what confusing me.
All these app servers are production and busy serving request so not idle. The used Heap size was at Avg 5 GB used of the the 6GB.
The other interesting thing I found out were some failed messages in the Vmware logs at the same time which is what I'm investigating.

Related

Improving performance of Redis set up (degraded after setting vm.overcommit_memory=1)

Need some help in diagnosing and tuning the performance of my Redis set up (2 redis-server instances on an Ubuntu 14.04 machine). Note that a write-heavy Django web application shares the VM with Redis. The machine has 8 cores and 25GB RAM.
I recently discovered that background saving was intermittently failing (with a fork() error) even when RAM wasn't exhausted. To remedy this, I applied the setting vm.overcommit_memory=1 (was previously default).
Moreover vm.swappiness=2, vm.overcommit_ratio=50. I have disabled transparent huge pages in my set up as well via echo never > /sys/kernel/mm/transparent_hugepage/enabled (although haven't done echo never > /sys/kernel/mm/transparent_hugepage/defrag).
Right after changing the overcommit_memory setting, I noticed that I/O utilization went from 13% to 36% (on average). I/O operations per second doubled, the redis-server CPU consumption has more than doubled, and the memory it's consuming has gone up 66%. Consequently, the server response time has substantially gone up . This is how abruptly things escalated after applying vm.overcommit_memory=1:
Note that redis-server is the only ingredient showing escalation - gunicorn, nginx ,celery etc. are performing like before. Moreover, redis has become very spikey.
Lastly, New Relic has started showing me 3 redis instances instead of 2 (bottom most graph). I think the forked child is counted as the 3rd:
My question is: how can I diagnose and salvage performance here? Being new to server administration, I'm unsure how to proceed. Help me find out what's going on here and how I can fix it.
free -m has the following output (in case needed):
total used free shared buffers cached
Mem: 28136 27912 224 576 68 6778
-/+ buffers/cache: 21064 7071
Swap: 0 0 0
As you don't have swap enabled in your system ( which might be worth reconsidering if you have SSDs), ( and your swappiness was set to a low value), you can't blame it on increased swapping due to memory contention.
Your caching about 6GB of data inside the VFS cache. In case of contention this cache would have depleted in favor of process working memory, so I believe it's safe to say memory is not an issue all together.
It's a shot in the dark, but my guess is that your redis-server is configured to "sync"/"save" too often ( search for in the redis config file "appendfsync"), and that by removing the memory allocation limitation, it now actually does it's job :)
If the data is not super crucial, set appendfsync to never and perhaps tweek the save settings to cause less frequent saving.
BTW, regarding the redis & forked child, I believe you are correct.

Websphere - frequent thread/heap dump generation

Our application in prod environment is generating frequent heap/thread dumps while running very large reports eventually resulting in JVM failure. WebSphere is the server and heap size is set to 1024/2048(initial/max) across all nodes.
What are some ways to tackle this issue? I could think about the following options. Is there anything else I am missing?
Set min/max heap size to 2048 or even higher?
Enable verbose garbage collection in WebSphere and analyze optimal heap size?
Thread Analysis:
Runnable : 123(67%)
Blocked : 16(9%)
Waiting on Condition : 43(23%)
A good starting place to start investigating the OOM is this IBM KnowledgeCenter topic
Since it seems you experience an OutOfMemory Issue there are three possibilities to consider:
Your Apps consistently need more memory to handle the current load.
Solution: You have to load test you application with production-like traffic and tune your Min/Max Heap Size accordingly.
You have a Memory Leak issue.
Solution: Analyze the heapdumps/coredumps produced using IBM Support Assistant tools. A PMR to IBM would help.
Websphere has a memory leak.
Solution: Open a PMR
Here is a nice read about Java Memory Management in WAS environments.
Try to capture the memory, garbage collection information from the production environment. I am not sure if GC log has any performance impact. However, jstat is an extremely light weight tool and can be used in production environment with out any performance impact. Dump the output of jstat at regular intervals using the following command (Here I am setting the interval to 1 hour):
jstat -gc <PID> 3600s

Apache server cannot allocate memory for new process

I have a apache server with 32 GB of RAM. When I start the server and execute top to see the resources It show me that the CPU is at 95 percent. It doesn't a normal behaviour and after a few minutes it raises:
apache cannot allocate memory fork unable to fork new process
I don't know how to solve the problem. Any tips?
I had same problem to fix it there is 2 options:
1- move from micro instances to small and this was the change that solved the problem (micro instances on amazon tend to have large cpu steal time)
2- tune the mysql database server configuration and my apache configuration to use a lot less memory.
tuning guide for a low memory situation such as this one: http://www.narga.net/optimizing-apachephpmysql-low-memory-server/ (But don't use the suggestion of MyISAM tables - horrible...)
this 2 options will make the problem much much less happening .. I am still looking for better solution to close the process that are done and kill the ones that hang in there .

How to avoid Boost ASIO reactor becoming constrained to a single core?

TL;DR: Is it possible that I am reactor throughput limited? How would I tell? How expensive and scalable (across threads) is the implementation of the io_service?
I have a farily massively parallel application, running on a hyperthreaded-dual-quad-core-Xeon machine with tons of RAM and a fast SSD RAID. This is developed using boost::asio.
This application accepts connections from about 1,000 other machines, reads data, decodes a simple protocol, and shuffles data into files mapped using mmap(). The application also pre-fetches "future" mmap pages using madvise(WILLNEED) so it's unlikely to be blocking on page faults, but just to be sure, I've tried spawning up to 300 threads.
This is running on Linux kernel 2.6.32-27-generic (Ubuntu Server x64 LTS 10.04). Gcc version is 4.4.3 and boost::asio version is 1.40 (both are stock Ubuntu LTS).
Running vmstat, iostat and top, I see that disk throughput (both in TPS and data volume) is on the single digits of %. Similarly, the disk queue length is always a lot smaller than the number of threads, so I don't think I'm I/O bound. Also, the RSS climbs but then stabilizes at a few gigs (as expected) and vmstat shows no paging, so I imagine I'm not memory bound. CPU is constant at 0-1% user, 6-7% system and the rest as idle. Clue! One full "core" (remember hyper-threading) is 6.25% of the CPU.
I know the system is falling behind, because the client machines block on TCP send when more than 64kB is outstanding, and report the fact; they all keep reporting this fact, and throughput to the system is much less than desired, intended, and theoretically possible.
My guess is I'm contending on a lock of some sort. I use an application-level lock to guard a look-up table that may be mutated, so I sharded this into 256 top-level locks/tables to break that dependency. However, that didn't seem to help at all.
All threads go through one, global io_service instance. Running strace on the application shows that it spends most of its time dealing with futex calls, which I imagine have to do with the evented-based implementation of the io_service reactor.
Is it possible that I am reactor throughput limited? How would I tell? How expensive and scalable (across threads) is the implementation of the io_service?
EDIT: I didn't initially find this other thread because it used a set of tags that didn't overlap mine :-/ It is quite possible my problem is excessive locking used in the implementation of the boost::asio reactor. See C++ Socket Server - Unable to saturate CPU
However, the question remains: How can I prove this? And how can I fix it?
The answer is indeed that even the latest boost::asio only calls into the epoll file descriptor from a single thread, not entering the kernel from more than one thread at a time. I can kind-of understand why, because thread safety and lifetime of objects is extremely precarious when you use multiple threads that each can get notifications for the same file descriptor. When I code this up myself (using pthreads), it works, and scales beyond a single core. Not using boost::asio at that point -- it's a shame that an otherwise well designed and portable library should have this limitation.
I believe that if you use multiple io_service object (say for each cpu core), each run by a single thread, you will not have this problem. See the http server example 2 on the boost ASIO page.
I have done various benchmarks against the server example 2 and server example 3 and have found that the implementation I mentioned works the best.
In my single-threaded application, I found out from profiling that a large portion of the processor instructions was spent on locking and unlocking by the io_service::poll(). I disabled the lock operations with the BOOST_ASIO_DISABLE_THREADS macro. It may make sense for you, too, depending on your threading situation.

JVM cannot use 8 CPUon Linux

I have observed that JVM cannot user 8 CPU advantage. Because when a thread runs more than 1 secs, other threds are waiting for it. there is no lock beetween these threds is there any jvm option for this ?
The JVM should have no internal locks that inhibit scaling like this. There are many benchmarks (specifically SPECjbb2000 and SPECjbb2005) that show single JVMs scaling to a great number of cores. I would say that you ARE somehow locking between threads, even if you don't know how.
You don't list your JVM level, vendor, or OS. Additionally, the evidence showing lack of scaling would be good. All of those would be necessary to answer the question.