How to reduce time taken on threads reaching Safepoint - Sync state - jvm

About the Issue:
During heavy IO in the VM, we faced JVM pause/slowness due to stopping threads taking more time. When looking on safepoint logs it showed Sync state takes the most time.
We also tried printing Safepoint traces on timeout delay (-XX:+SafepointTimeout -XX:SafepointTimeoutDelay=200) to know which threads is causing this issue but nothing seems to be suspicious. Also when setting timeout for safepoints, we are not getting timeout detected print when the time spent is in 'Sync' state.
Questions about this safepoint tracing:
How does the safepoint timeout work?
After logging the thread details, does the safepoint exists and all threads resume?
Will that VM operation be carried out. What will happen if the vmop is GC.
Using Async-profiler:
Tried time-to-safepoint profiling using async-profiler and noticed VM Thread is taking more time on SafepointSynchronize::begin() method and C2 compiler threads is taking almost equal time as VM Thread.
We doubt that C2 Compilers may be taking time to reach safepoint. Can someone help us in resolving this issue and to interpret this time-to-safepoint flamegraph. Thanks in advance.

SafepointTimeout option affects nothing but logging, i.e. threads will not be interrupted, VM operation will run normally, etc.
SafepointTimeout does not always print timed out threads: a thread may already have reached the safepoint by the time printing occurs. Furthermore, SafepointTimeout may not even detect a timeout, if the entire process has been frozen by the Operating System.
For example, such 'freezes' many happen
when a process has exhausted its cpu quota in a cgroup (container);
when a system is low on physical memory, and direct reclaim occurs;
due to activity of another process (e.g. I observed long JVM pauses when atop utility inspected the system).
async-profiler indeed has a time-to-safepoint profiling option (--ttsp), though using it correctly may seem tricky. It works best in wall profiling mode with jfr output. In this configuration, async-profiler will sample all threads (both running and blocking) during safepoint synchronization, and record each individual event with a timestamp.
Such profile can be then analyzed with JDK Mission Control: choose the time interval around the long pause, and look at the stack traces of java threads in this interval.
Note that if the JVM process is 'frozen', async-profiler thread does not work either, i.e. you will not see collected samples during this period. Normally, in wall clock profiling mode, all threads are sampled evenly. But if you see a 'gap ' (missed events during some time interval), it apparently means the JVM process has not received CPU time. In this case, the reason of JVM pauses is not in the Java application, but rather in the operating system / environment.

Related

Can a BLOCKED Thread cause high CPU Consumption

We saw a high CPU consumption issue in our production environment recently, and saw something strange while debugging the same. When I did a "top -H" to see the CPU stats per thread ID, I found a thread X consuming high CPU. When I took the thread dumps, I saw that this thread X was in BLOCKED state. What does this mean, can a thread which is in BLOCKED state consume high CPU ? I think this might be trivial question but I am a novice in debugging Performance issues and JVM, and not sure what I might be missing here.
Entering and exiting a BLOCKED state can be expensive. If you are BLOCKED for even a little while this is not a problem, but if you are blocking briefly in a busy loop, your thread can appear blocked but in reality burning CPU.
I would look for multiple threads repeatedly competing on a shared resources which are entering BLOCKED very briefly.
#Peter has already mentioned good point about busy loop (which could be JVM internal adaptive optimization of spin locks in case of synchronization or busy loop created by application itself on some condition) which can burn CPU. There is another indirect way in which the CPU can go very high because of thread blocking. Typically in a web server if lots of threads are in blocked state ( not because of synchronization lock related blocking but say waiting for IO from a back-end datastore) then it may put lots of pressure on JVM garbage collection. These worker threads are supposed to finish their work quickly so that all the objects created by them on heap is quickly de-referenced and garbage collected. If lots of threads are in this state then the garbage collection threads have to work overtime and they may end up taking lots of CPU.

Operating System Basics

I am reading process management,and I have a few doubts-
What is meant by an I/o request,for E.g.-A process is executing and
hence it is in running state,it is in waiting state if it is waiting
for the completion of an I/O request.I am not getting by what is meant by an I/O request,Can you
please give an example to elaborate.
Another doubt is -Lets say that a process is executing and suddenly
an interrupt occurs,then the process stops its execution and will be
put in the ready state,is it possible that some other process began
its execution while the interrupt is also being processed?
Regarding the first question:
A simple way to think about it...
Your computer has lots of components. CPU, Hard Drive, network card, sound card, gpu, etc. All those work in parallel and independent of each other. They are also generally slower than the CPU.
This means that whenever a process makes a call that down the line (on the OS side) ends up communicating with an external device, there is no point for the OS to be stuck waiting for the result since the time it takes for that operation to complete is probably an eternity (in the CPU view point of things).
So, the OS fires up whatever communication the process requested (call it IO request), flags the process as waiting for IO, and switches execution to another process so the CPU can do something useful instead of sitting around blocked waiting for the IO request to complete.
When the external device finishes whatever operation was requested, it generates an interrupt, so the OS is informed the work is done, and it can then flag the blocked process as ready again.
This is all a very simplified view of course, but that's the main idea. It allows the CPU to do useful work instead of waiting for IO requests to complete.
Regarding the second question:
It's tricky, even for single CPU machines, and depends on how the OS handles interrupts.
For code simplicity, a simple OS might for example, whenever an interrupt happens process the interrupt in one go, then resume whatever process it decides it's appropriate whenever the interrupt handling is done. So in this case, no other process would run until the interrupt handling is complete.
In practice, things get a bit more complicated for performance and latency reasons.
If you think about an interrupt lifetime as just another task for the CPU (From when the interrupt starts to the point the OS considers that handling complete), you can effectively code the interrupt handling to run in parallel with other things.
Just think of the interrupt as notification for the OS to start another task (that interrupt handling). It grabs whatever context it needs at the point the interrupt started, then keeps processing that task in parallel with other processes.
I/O request generally just means request to do either Input , Output or both. The exact meaning varies depending on your context like HTTP, Networks, Console Ops, or may be some process in the CPU.
A process is waiting for IO: Say for example you were writing a program in C to accept user's name on command line, and then would like to print 'Hello User' back. Your code will go into waiting state until user enters their name and hits Enter. This is a higher level example, but even on a very low level process executing in your computer's processor works on same basic principle
Can Processor work on other processes when current is interrupted and waiting on something? Yes! You better hope it does. Thats what scheduling algorithms and stacks are for. However the real answer depending on what Architecture you are on, does it support parallel or serial processing etc.

Process states in operating system and resource utilization

What is a difference between sleep,wait and suspending a process in OS? Does any of these states consume resources or waste CPU cycles?
In all three cases, the process is not runnable, so it does not consume CPU. THe process is not returned to the runnable state until some event happens. The difference is what that event is:
Sleep: This can describe two different things. Either a process is runnable after a certain (fixed) period of time elapses, or the process is runnable after the device itself wakes up from a power saving mode.
Wait: process is runnable after something finishes. That something is usually an I/O operation (disk, network) completing.
Suspend: either the OS or another process takes the process out of the run state. This can overlap with "Sleeping" above.
Processes in all three states don't consume CPU time, but they do consume memory unless the process is entirely paged out. And processes in the wait state may be consuming I/O resources.

Monitor worker crashes in apache storm

When running in a cluster, if something wrong happens, a worker generally dies (JVM shutdown). It can be caused by many factors, most of the time it is a challenge (the biggest difficulty with storm?) to find out what causes the crash.
Of course, storm-supervisor restarts dead workers and liveness is quite good within a storm cluster, still a worker crash is a mess that we should avoid as it adds overhead, latency (can be very long until a worker is found dead and respawned) and data loss if you didn't design your topology to prevent that.
Is there an easy way / tool / methodology to check when and possibly why a storm worker crashes? They are not shown in storm-ui (whereas supervisors are shown), and everything needs manual monitoring (with jstack + JVM opts for instance) with a lot of care.
Here are some cases that can happen:
timeouts and many possible reasons: slow java garbage collection, bad network, bad sizing in timeout configuration. The only output we get natively from supervisor logs is "state: timeout" or "state: disallowed" which is poor. Also when a worker dies the statistics on storm-ui are rebooted. As you get scared of timeouts you end up using long ones which does not seem to be a good solution for real-time processing.
high back pressure with unexpected behaviour, starving worker heartbeats and inducing a timeout for instance. Acking seems to be the only way to deal with back pressure and needs good crafting of bolts according to your load. Not acking seems to be a no-go as it would indeed crash workers and get bad results in the end (even less data processed than an acking topology under pressure?).
code runtime exceptions, sometimes not shown in storm-ui that need manual checking of application logs (the easiest case).
memory leaks that can be found out with JVM dumps.
The storm supervisor logs restart by timeout.
you can monitor the supervisor log, also you can monitor your bolt's execute(tuple) method's performance.
As for memory leak, since storm supervisor does kill -9 the worker, the heap dump is likely to be corrupted, so i would use tools that monitor your heap dynamically or killing the supervisor to produce heap dumps via jmap. Also, try monitoring the gc logs.
I still recommend increasing the default timeouts.

High CPU utilization - VB.NET

We are facing an issue with VB.NET listeners that utilizes high CPU (50% to 70%) in the server machine where it is running. Listeners are using a threading concept and also we used FileSystemWatcher class to keep monitoring the file renaming pointing to one common location. Both are console applications and scheduled jobs running all the day.
How can I control the CPU utilization with this FileSystemWatcher class?
This could all depend on the code you are running.
For instance if you have a timer with an interval of 10ms but only do work every two minutes and on each timer interval you do a lot of checking this will take a lot of CPU to do nothing.
If you are using multiple threads and one is looping waiting for the second to release a lock (Monitor.TryEnter()) then again this may be taking up extra CPU. You can avoid this by putting the waiting thread into Monitor.Wait() and then when the busy thread is finished do Monitor.Pulse().
Apart for the very general advice above, if you post the key parts of your code or profile results then we may be able to help more.
If you are looking for a profiler we use RedGates ANTS Profiler (costs but with a free trial) and it give good results, I haven't used any other to compare (and I am in no way affiliated with RedGate) so others may be better.