JVM getting colder after initial warmup - jvm

We are trying to build a low latency system using Java.
As part of fine tuning, we do the initial warm up of the application during startup to go above the default compile threshold 10000. We could see that any live processing latency immediately after the warmup looks better. The latency looks good when the system is continuously processing events. But when the system goes back to idle state where there is no events coming for 10 minutes, the latency starts degrading.
I initially thought it could be JVM evicting the generated code from the cache. I have set a higher size for code cache and enabled events of code cache and noticed that the code cache is not even reaching 50% capacity.
Any idea what could be the issue? Basically the initial warm up during start up goes in vain once the system goes into idle mode for a short duration.

Related

How much maximum thread capacity of jmeter command line mode?

I have to test load testing for 32000 users, duration 15 minutes. And I have run it on command line mode. Threads--300, ramp up--100, loop 1. But after showing some data, it is freeze. So I can't get the full report/html. Even i can't run for 50 users. How can I get rid of this. Please let me know.
From 0 to 2147483647 threads depending on various factors including but not limited to:
Hardware specifications of the machine where you run JMeter
Operating system limitations (if any) of the machine where you run JMeter
JMeter Configuration
The nature of your test (protocol(s) in use, the size of request/response, presence of pre/post processors, assertions and listeners)
Application response time
Phase of the moon
etc.
There is no answer like "on my macbook I can have about 3000 threads" as it varies from test to test, for GET requests returning small amount of data the number will be more, for POST requests uploading huge files and getting huge responses the number will be less.
The approach is the following:
Make sure to follow JMeter Best Practices
Set up monitoring of the machine where you run JMeter (CPU, RAM, Swap usage, etc.), if you don't have a better idea you can go for JMeter PerfMon Plugin
Start your test with 1 user and gradually increase the load at the same time looking into resources consumption
When any of monitored resources consumption starts exceeding reasonable threshold, i.e. 80% of maximum available capacity stop your test and see how many users were online at this stage. This is how many users you can simulate from particular this machine for particular this test.
Another machine or test - repeat from the beginning.
Most probably for 32000 users you will have to go for distributed testing
If your test "hangs" even for smaller amount of users (300 could be simulated even with default JMeter settings and maybe even in GUI mode):
take a look at jmeter.log file
take a thread dump and see what threads are doing

Weblogic active thread have not increase when under high loading

I am doing load test on a weblogic server.
When I fire a relative high loading to the server, the active thread count stop at 40-50 and never increase again.
When I fire a relative low loading to the server, the active thread count keeping increase util reach the limit.
I saw the oracle doc said the Throughput increase the active will incrase.
If there is a high loading, how the Throughput can increase. The active will never increase.

Behaviour of redis client-output-buffer-limit during resynchronization

I'm assuming that during replica resynchronisation (full or partial), the master will attempt to send data as fast as possible to the replica. Wouldn't this mean the replica output buffer on the master would rapidly fill up since the speed the master can write is likely to be faster than the throughput of the network? If I have client-output-buffer-limit set for replicas, wouldn't the master end up closing the connection before the resynchronisation can complete?
Yes, Redis Master will close the connection and the synchronization will be started from beginning again. But, please find some details below:
Do you need to touch this configuration parameter and what is the purpose/benefit/cost of it?
There is a zero (almost) chance it will happen with default configuration and pretty much moderate modern hardware.
"By default normal clients are not limited because they don't receive data
without asking (in a push way), but just after a request, so only asynchronous clients may create a scenario where data is requested faster than it can read." - the chunk from documentation .
Even if that happens, the replication will be started from beginning but it may lead up to infinite loop when slaves will continuously ask for synchronization over and over. Redis Master will need to fork whole memory snapshot (perform BGSAVE) and use up to 3 times of RAM from initial snapshot size each time during synchronization. That will be causing higher CPU utilization, memory spikes network utilization (if any) and IO.
General recommendations to avoid production issues tweaking this configuration parameter:
Don't decrease this buffer and before increasing the size of the buffer make sure you have enough memory on your box.
Please consider total amount of RAM as snapshot memory size (doubled for copy-on-write BGSAVE process) plus the size of any other buffers configured plus some extra capacity.
Please find more details here

Google Compute Engine VM constantly crashes

On the Compute Engine VM in us-west-1b, I run 16 vCPUs near 99% usage. After a few hours, the VM automatically crashes. This is not a one-time incident, and I have to manually restart the VM.
There are a few instances of CPU usage suddenly dropping to around 30%, then bouncing back to 99%.
There are no logs for the VM at the time of the crash. Is there any other way to get the error logs?
How do I prevent VMs from crashing?
CPU usage graph
This could be your process manager saying that your processes are out of resources. You might wanna look into Kernel tuning where you can increase the limits on the number of active processes on your VM/OS and their resources. Or you can try using a bigger machine with more physical resources. In short, your machine is falling short on resources and hence in order to keep the OS up, process manager shuts down the processes. SSH is one of those processes. Once you reset the machine, all comes back to normal.
How process manager/kernel decides to quit a process varies in many ways. It could simply be that a process has consistently stayed up for way long time to consume too many resources. Also, one thing to note is that OS images that you use to create a VM on GCP is custom hardened by Google to make sure that they can limit malicious capabilities of processes running on such machines.
One of the best ways to tackle this is:
increase the resources of your VM
then go back to code and find out if there's something that is leaking in the process or memory
if all fails, then you might wanna do some kernel tuning to make sure your processes have higer priority than other system process. Though this is a bad idea since you could end up creating a zombie VM.

Dashboard shows close to 100% CPU load, but actual usage is different

During peak hours of our service, I notice the CPU load goes up to 100%. However, when I SSH into the machine and use top or htop, I don't ever see the CPU usage go above 25%. This instance is a dedicated load balancer running HAProxy.
Here is a screenshot of the Dashboard: https://gyazo.com/010715208f81ec97c1bc9b78123fe4d4
Here is a screenshot of top in the instance: https://gyazo.com/afa7098331f7a3f66c018041b9be686d
During these peak times, I noticed there are some latency and this is not caused by database or my other server instances as I checked their loads and were far below threshold. I was wondering if Compute Engine is throttling by mistaking it is at 100% CPU usage or something?
Has anyone had a similar experience?