The "VM Periodic Task Thread" is run every 50 milliseconds, can I tune this? - jvm

On a normal hardware today this likely does not hurt ever, but on a Raspberry PI it is a bit annoying that the CPU is woken up every 50 milliseconds even for a java application which currently does absolutely nothing.
I verify with strace, that the "VM Periodic Task Thread" is active every 50 milliseconds. A rough answer of what it does is given here, but can I tune the 50 milliseconds somehow?

try setting -XX:PerfDataSamplingInterval=xxx, the default is 50 and performance sampling matches the description you linked, so that might be it.

Related

Java daemon process performance testing not getting consistent results

I am trying to test cpu consumption of an agent / daemon process written in Java. To avoid getting skewed by garbage collection, I keep trying longer periods for each profiling run. In the beginning I tried 15 minutes, then later arrived at 2 hours. Yet I just found out that, even with 2 hour runs, I can get very inconsistent results. - One run of 2 hours gave me cpu of 6%, another of 2 hours gave me cpu of 12%.
Any suggestions to get consistent results?
Are you controlling for CPU frequency? If there isn't much work to do, the OS (or CPU itself) might reduce the clock frequency to save power. With an aggressive power-management strategy, the CPU will always run at max when it's running at all, so looking CPU% can be meaningful.
On Linux on a Skylake or later CPU, you might set the EPP for each core to performance, to get it to run at max speed whenever it's running at all.
sudo sh -c 'for i in /sys/devices/system/cpu/cpufreq/policy[0-9]*/energy_performance_preference;do echo performance > "$i";done'
Otherwise maybe measure in core clock cycles (like Linux perf stat java ...) instead of CPU %, or at least look at average clock speed while it was running. (Lower clock speed relative to DRAM can skew things, since a cache miss stall for fewer cycles.)

How to reduce time taken on threads reaching Safepoint - Sync state

About the Issue:
During heavy IO in the VM, we faced JVM pause/slowness due to stopping threads taking more time. When looking on safepoint logs it showed Sync state takes the most time.
We also tried printing Safepoint traces on timeout delay (-XX:+SafepointTimeout -XX:SafepointTimeoutDelay=200) to know which threads is causing this issue but nothing seems to be suspicious. Also when setting timeout for safepoints, we are not getting timeout detected print when the time spent is in 'Sync' state.
Questions about this safepoint tracing:
How does the safepoint timeout work?
After logging the thread details, does the safepoint exists and all threads resume?
Will that VM operation be carried out. What will happen if the vmop is GC.
Using Async-profiler:
Tried time-to-safepoint profiling using async-profiler and noticed VM Thread is taking more time on SafepointSynchronize::begin() method and C2 compiler threads is taking almost equal time as VM Thread.
We doubt that C2 Compilers may be taking time to reach safepoint. Can someone help us in resolving this issue and to interpret this time-to-safepoint flamegraph. Thanks in advance.
SafepointTimeout option affects nothing but logging, i.e. threads will not be interrupted, VM operation will run normally, etc.
SafepointTimeout does not always print timed out threads: a thread may already have reached the safepoint by the time printing occurs. Furthermore, SafepointTimeout may not even detect a timeout, if the entire process has been frozen by the Operating System.
For example, such 'freezes' many happen
when a process has exhausted its cpu quota in a cgroup (container);
when a system is low on physical memory, and direct reclaim occurs;
due to activity of another process (e.g. I observed long JVM pauses when atop utility inspected the system).
async-profiler indeed has a time-to-safepoint profiling option (--ttsp), though using it correctly may seem tricky. It works best in wall profiling mode with jfr output. In this configuration, async-profiler will sample all threads (both running and blocking) during safepoint synchronization, and record each individual event with a timestamp.
Such profile can be then analyzed with JDK Mission Control: choose the time interval around the long pause, and look at the stack traces of java threads in this interval.
Note that if the JVM process is 'frozen', async-profiler thread does not work either, i.e. you will not see collected samples during this period. Normally, in wall clock profiling mode, all threads are sampled evenly. But if you see a 'gap ' (missed events during some time interval), it apparently means the JVM process has not received CPU time. In this case, the reason of JVM pauses is not in the Java application, but rather in the operating system / environment.

What are some factors that could affect program runtime?

I'm doing some work on profiling the behavior of programs. One thing I would like to do is get the amount of time that a process has run on the CPU. I am accomplishing this by reading the sum_exec_runtime field in the Linux kernel's sched_entity data structure.
After testing this with some fairly simple programs which simply execute a loop and then exit, I am running into a peculiar issue, being that the program does not finish with the same runtime each time it is executed. Seeing as sum_exec_runtime is a value represented in nanoseconds, I would expect the value to differ within a few microseconds. However, I am seeing variations of several milliseconds.
My initial reaction was that this could be due to I/O waiting times, however it is my understanding that the process should give up the CPU while waiting for I/O. Furthermore, my test programs are simply executing loops, so there should be very little to no I/O.
I am seeking any advice on the following:
Is sum_exec_runtime not the actual time that a process has had control of the CPU?
Does the process not actually give up the CPU while waiting for I/O?
Are there other factors that could affect the actual runtime of a process (besides I/O)?
Keep in mind, I am only trying to find the actual time that the process spent executing on the CPU. I do not care about the total execution time including sleeping or waiting to run.
Edit: I also want to make clear that there are no branches in my test program aside from the loop, which simply loops for a constant number of iterations.
Thanks.
Your question is really broad, but you can incur context switches for various reasons. Calling most system calls involves at least one context switch. Page faults cause contexts switches. Exceeding your time slice causes a context switch.
sum_exec_runtime is equal to utime + stime from /proc/$PID/stat, but sum_exec_runtime is measured in nanoseconds. It sounds like you only care about utime which is the time your process has been scheduled in user mode. See proc(5) for more details.
You can look at nr_switches both voluntary and involuntary which are also part of sched_entity. That will probably account for most variation, but I would not expect successive runs to be identical. The exact time that you get for each run will be affected by all of the other processes running on the system.
You'll also be affected by the amount of file system cache used on your system and how many file system cache hits you get in successive runs if you are doing any IO at all.
To give a very concrete and obvious example of how other processes can affect the run time of the current process, think about if you are exceeding your physical RAM constraints. If your program asks for more RAM, then the kernel is going to spend more time swapping. That time swapping will be accounted in stime but will vary depending on how much RAM you need and how much RAM is available. There are lot's of other ways that other processes can affect your process's run time. This is just one example.
To answer your 3 points:
sum_exec_runtime is the actual time the scheduler ran the process including system time
If you count switching to the kernel as the process giving up the CPU, then yes, but it does not necessarily mean a different user process may get the CPU back once the kernel is done.
I think I've already answered this question that there are lot's of factors.

Optimum thread count NServiceBus

We're trying to figure out the optimum number of threads to use for our NServiceBus service. We're running it on a machine with 2 quad cores. We've been having problems with the queue backing up. We started with 100 threads then bumped it to 200 and things got worse. We backed it down to 75, then 50 and it seemed even better. Is there some optimal number based on how many CPU's we have or some rule of thumb that we should use to determine the number of threads to run?
Every thread you have running has an overhead attached to it. If you have 2 quad cores then you will be able to have exactly 8 threads running at any one time. Each thread will be consuming a core.
If you have more than 8 threads then there is a chance you will start to do LESS useful work, not more. This is because every time windows decides to give one of the threads not currently consuming a core a turn at doing something it needs to store the state of one of the running threads and then restore the old state of the thread that is about to run - then let the thread go at it. If you have a huge number of threads you're going to spend a large amount of time just switching between the threads and doing nothing useful.
If you have a bunch of threads that are blocked waiting for IO (for instance a message to finish writing to disk so it can be got at) then you might be able to run more threads than you have cores and still get something useful done as a number of those threads will be sitting waiting for something else to complete. It's a complex subject and there is no real answer to 'how many threads should I use'. A good rule of thumb is have a thread for every core and then try and play with it a bit if you want to achieve more throughput. Testing it under real conditions is the only real way to find the sweet spot. You might find that you only need one thread to process the messages and half the time that thread is blocked waiting for a message to come in....
Obviously, even what I've described is oversimplified. Windows needs access to the cores to do OSy things so even if you have 8 cores all of your 8 threads wont always be running because the windows threads are having a turn... then you have IO threads etc....

High CPU utilization - VB.NET

We are facing an issue with VB.NET listeners that utilizes high CPU (50% to 70%) in the server machine where it is running. Listeners are using a threading concept and also we used FileSystemWatcher class to keep monitoring the file renaming pointing to one common location. Both are console applications and scheduled jobs running all the day.
How can I control the CPU utilization with this FileSystemWatcher class?
This could all depend on the code you are running.
For instance if you have a timer with an interval of 10ms but only do work every two minutes and on each timer interval you do a lot of checking this will take a lot of CPU to do nothing.
If you are using multiple threads and one is looping waiting for the second to release a lock (Monitor.TryEnter()) then again this may be taking up extra CPU. You can avoid this by putting the waiting thread into Monitor.Wait() and then when the busy thread is finished do Monitor.Pulse().
Apart for the very general advice above, if you post the key parts of your code or profile results then we may be able to help more.
If you are looking for a profiler we use RedGates ANTS Profiler (costs but with a free trial) and it give good results, I haven't used any other to compare (and I am in no way affiliated with RedGate) so others may be better.