Why is Tomcat not scaling throughput with increased concurrent load? - tomcat8

For this test, I have a simple Java servlet that reads data in and calculates the CRC32 for it. When making serial requests of 512MB each, I get about 600MB/sec. That makes sense since I can't use all 24 cores available to me to calculate a CRC. The program driving this I/O is sitting on the local box to eliminate the possibility of networking issues. I am running Tomcat 8.0.24.0 on FreeBSD using OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode).
Next, I attempt the same test with 6 concurrent requests, expecting that the performance per request might be lower than 600MB/sec, but that the aggregate performance across all 6 requests would be significantly higher.
What I see is the CPU has some idle time at ALL times (so it doesn't appear that I'm CPU-bound). I also see that all processing threads in Tomcat are running concurrently as anticipated. However, it looks like I'm only getting around 800MB/sec in aggregate. The threads in Tomcat spend most of their time waiting to read from the socket, as shown below.
I would appreciate any thoughts on how to improve Tomcat throughput / why so much time is spent waiting for more data (which I assume is what's going on below).
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitLatch(NioEndpoint.java:1386)
at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitReadLatch(NioEndpoint.java:1388)
at org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:251)
at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:232)
at org.apache.coyote.http11.InternalNioInputBuffer.fill(InternalNioInputBuffer.java:133)
at org.apache.coyote.http11.InternalNioInputBuffer$SocketInputBuffer.doRead(InternalNioInputBuffer.java:177)
at org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInputFilter.java:110)
at org.apache.coyote.http11.AbstractInputBuffer.doRead(AbstractInputBuffer.java:416)
at org.apache.coyote.Request.doRead(Request.java:469)
at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:342)
at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:395)
at org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:367)
at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:190)
...

Related

Thousands of TimeoutExceptions after switching to Redis Enterprise

We recently attempted to switch from Azure Redis to Redis Enterprise, unfortunately after about an hour we were forced to roll back due to performance issues. We're looking for advice on how to get to the root cause and proceed. Here's what I've figured out so far, but I'm happy to add any more details as necessary.
First off, the client is a .NET Framework app using StackExchange.Redis version 2.1.30. The Azure Redis instance is using 4 shards, and the Redis Enterprise instance is also configured for 4 shards.
When we switched over to Redis Enterprise, we would immediately see several thousand of these exceptions per 5 minute interval:
Timeout performing GET (5000ms), next: GET [Challenges]::306331, inst:
1, qu: 0, qs: 3079, aw: False, rs: ReadAsync, ws: Idle, in: 0,
serverEndpoint: xxxxxxx:17142, mc: 1/1/0, mgr: 9 of 10 available,
clientName: API, IOCP: (Busy=2,Free=998,Min=400,Max=1000), WORKER:
(Busy=112,Free=32655,Min=2000,Max=32767), Local-CPU: 4.5%, v:
2.1.30.38891 (Please take a look at this article for some common client-side issues that can cause timeouts:
https://stackexchange.github.io/StackExchange.Redis/Timeouts)
Looking at this error message, it appears there's tons of things in the WORKER thread pool (things waiting on a response from Redis Enterprise), but nearly nothing in the IOCP thread pool (responses from Redis waiting to be processed by our client code). So, there's some sort of bottleneck on the Redis side.
Using AppInsights, I created a graph of the busy worker threads (dark blue), busy IO threads (red), and CPU usage (light blue). We see something like this:
The CPU never really goes above 20% or so, the IO threads are barely a blip (I think the max is like 2 busy), but the worker threads kinda grow and grow until eventually everything times out and the process starts over again. A little after 7pm is when we decided to roll back to Azure Redis, so everything is great at that point. So, everything points to Redis being some sort of bottleneck. So, let's look at the Redis side of things.
During this time, Redis reported a max of around 5% CPU usage. Incoming traffic topped out around 1.4MB/s, and outgoing traffic topped out around 9.5MB/s. Ops/sec were around 4k. Latency around this time was 0.05ms, and the slowest thing in the SLOWLOG was like 15ms or so. In other words, the Redis Enterprise node was barely breaking a sweat and was easily able to keep up with the traffic being sent to it. In fact, we had 4 other nodes in the cluster that weren't even being used since Redis didn't even see the need to send anything to other nodes. Redis was basically just yawning.
From here, I was thinking maybe there were network bandwidth contraints. All of our VMs are configured for accelerated networking, and we should have 10gig connections to these machines. I decided to run an iperf between the client and the server:
I can transfer easily over 700Mbit/sec between the client and the Redis Enterprise server, yet the server is processing 9.5MB/sec easily. So, it doesn't appear the problem is network bandwidth.
So, here's where we stand:
The same code works great with Azure Redis, yet causes thousands of timeouts when we switch over to Redis Enterprise.
Redis Enterprise is handling 4,000 operations per second and sending out 9 megs a second, and can usually handle a single operation in a fraction of a ms, with the very longest being 15ms.
I can send 700+ Mb/sec between the client and server.
Yet, the WORKER thread pool builds up with pending requests to Redis and eventually times out.
I'm pretty stuck here. What's a good next step to diagnose this issue? Thanks!

RabbitMQ poor performance

We are facing bad performance in our RabbitMQ clusters. Even when idle.
Once installed the rabbitmq-top plugin, we see many processes with very high reductions/sec. 100k and more!
Questions:
What does it mean?
How to control it?
What might be causing such slowness without any errors?
Notes:
Our clusters are running on Kubernetes 1.15.11
We allocated 3 nodes, each with 8 CPU and 8 GB limits. Set vm_watermark to 7G. Actual usage is ~1.5 CPU and 1 GB RAM
RabbitMQ 3.8.2. Erlang 22.1
We don't have many consumers and producers. The slowness is also on a fairly idle environment
The rabbitmqctl status is very slow to return details (sometimes 2 minutes) but does not show any errors
After some more investigation, we found the actual reason was made up of two issues.
RabbitMQ (Erlang) run time configuration by default (using the bitnami helm chart) assigns only a single scheduler. This is good for some simple app with a few concurrent connections. Production grade with 1000s of connections have to use many more schedulers. Bumping up from 1 to 8 schedulers improved throughput dramatically.
Our monitoring that was hammering RabbitMQ with a lot of requests per seconds (about 100/sec). The monitoring hits the aliveness-test, which creates a connection, declares a queue (not mirrored), publishes a message and then consumes that message. Disabling the monitoring reduced load dramatically. 80%-90% drop in CPU usage and the reductions/sec also dropped by about 90%.
References
Performance:
https://www.rabbitmq.com/runtime.html#scheduling
https://www.rabbitmq.com/blog/2020/06/04/how-to-run-benchmarks/
https://www.rabbitmq.com/blog/2020/08/10/deploying-rabbitmq-to-kubernetes-whats-involved/
https://www.rabbitmq.com/runtime.html#cpu-reduce-idle-usage
Monitoring:
http://rabbitmq.1065348.n5.nabble.com/RabbitMQ-API-aliveness-test-td32723.html
https://groups.google.com/forum/#!topic/rabbitmq-users/9pOeHlhQoHA
https://www.rabbitmq.com/monitoring.html

Why are stuck threads other than contention ,like slow IO , slow backends (DB queries, web services, rmi calls)?

I am trying to figure out what are the main reasons for stuck thread . Now in WebLogic Server diagnoses a thread as stuck if it is continually working (not idle) for a set period of time. And if a user wants he/she can tune a server's thread detection behavior by changing the length of time before a thread is diagnosed as stuck (Stuck Thread Max Time), and by changing the frequency with which the server checks for stuck threads. My analysis is it is either cause by contention or different reasons like slow IO , slow backends (DB queries, web services, rmi calls) … rarely it is caused by bad coding or huge data (infinite lops) .
Other than above reasons are there more reasons for a thread to stuck ?
not sure what your question is here, here's my 2 cents
Bad Coding can lead to stuck threads
say a developer using a singleton map or hash etc that all servlets need to access.. when you have high load it can lead to contention for that resource and lead to stuck threads easily.
Stuck threads can be caused by slow running server (high cpu)
Sometimes bugs in WLS can cause it to be busy with internal processes resulting in stuck threads.. like WLS stuck in cluster communication.
You can even have stuck thread when Admin server is waiting to hear from a managed server that failed..
The list can go on and on. Only by taking 3-4 thread dumps in a short span of time can one confirm the cause.

Optimise play framework instance for 64mb server

I trying to have the best optimisation in my play
framework server.
I try to optimize with :
%prod.jvm.memory=-server -Xms64m -Xmx128m -Xoptimize
# Jobs executor
# ~~~~~~
# Size of the Jobs pool
play.jobs.pool=2
# Execution pool
# ~~~~~
# Default to 1 thread in DEV mode or (nb processors + 1) threads in
PROD mode.
# Try to keep a low as possible. 1 thread will serialize all requests
(very useful for debugging purpose)
play.pool=5
However I did not success to have good perf on 256 mb server. it seems
that http://www.playframework.org/ run on 64mb server and it work
fine. How it is possible ? Have I missed something in optimization?
What do you mean by 256 mb server ? If 256 mb is all the ram of your server, it is not enough.
When you do -Xmx64M you set a maximum limit for your heap size but java also needs memory for native, classloading, threads.
You also need memory for your os.
From my experience, 256 Mb is the lower limit for one java process.
There could be many reasons external to Play that impact performance:
Server too busy (too many processes competing for CPU)
Not enough RAM and server doing Swapping (performance killer)
Slow connection that adds extra delay
You may also have issues in your application:
- Your application is getting too many requests and it requires more RAM to manage the clients
- You are creating too many objects in memory while processing requests, taking most of the RAM (and triggering many GC)
- Connection to database is slow and delays responses
To be honest, there are many reasons why your app may eb slow, many related to your implementation or the server. You'll need to monitor and see what's the issue by yourself (or give us much more data on server performance, ram, swap, i/o, your code, etc)

Apache KeepAlive on API Server

I have a LAMP server (Quad Core Debian with 4GB RAM, Apache 2.2 and PHP 5.3) with Rackspace which is used as an API Server. I would like to know what is the best KeepAlive option for Apache given our setup.
The API server hosts a single PHP file which responds with plain JSON. This is a fairly hefty file which performs some MySql reads/writes and quite a few Memcache lookups.
We have about 90 clients that are logged into the system at any one time.
Roughly 1/3rd of clients would be idle.
Of the active clients (roughly 60) they send a request to the API every 3 seconds.
Clients switch from active to idle and vice versa every 15 or 20 minutes or so.
With KeepAlive On, the server goes nuts and memory peaks at close to 4GB (swap is engaged etc).
With KeepAlive Off, the memory sits at 3GB however I notice that Apache is constantly killing and creating new processes to handle each connection.
So, my three options are:
KeepAlive On and KeepAliveTimeout Default - In this case I guess I will just need to get more RAM.
KeepAlive On and KeepAliveTimeout Low (perhaps 10 seconds?) If KeepAliveTimeout is set at 10 seconds, will a client maintain a constant connection to that one process by accessing the resource at regular 3 second intervals? When that client becomes idle for longer than 10 seconds will the process then be killed? If so I guess option 2 looks like the best one to go for?
KeepAlive Off This is clearly best for RAM, but will it have an impact on the response times due to the work involved in setting up a new process for each request?
Which option is best?
It looks like your php script is leaking memory. Before making them long running processes you should get to grips with that.
If you have not a good idea of the memory usage per request and from request to request adding memory is not a real solution. It might help for now and break again next week.
I would keep running separate processes till memory management is under control. If you have response problems currently your best bet is add another server to spread load.
The very first thing you should be checking is whether the clients are actually using the keepalive functioality at all. I'm not sure what you mean by an 'API server' but if its some sort of webservice then (IME) its rather difficult to implement well behaved clients using keepalives.(See %k directive for mod_log_config).
ALso, we really need to know what your objectives and constraints are? Performance / capacity / low cost?
Is this running over HTTP or HTTPS - there's a big difference in latency.
I'd have said that a keeplive time of 10 seconds is ridiculously high - not low at all.
Even if you've got 90 clients holding connections open, 4Gb seems a rather large amount of memory for them to be using - I'e run systems with 150-200 concurrent connections to complex PHP scripts using approx 0.5Gb over resting usage. Your figures of 250 + 90 x 20M only gives you a footprint of about 2Gb (I know is not that simple - but its not much more complicated).
For the figures you've given I wouldn't expect any benefit - but a significantly bigger memory footprint - using anything over 5 seconds for the keepalive. You could probably use a keepalive time of 2 seconds without any significant loss of throughput, But there's no substitute for measuring the effectiveness of various configs - and analysing the data to find the optimal config.
Certainly if you find that your clients are able to take advantage of keepalives and get a measurable benefit from doing so then you need to find the best way of accomodating that. Using a threaded server might help a little with memory usage, but you'll probably find a lot more benefit in running a reverse proxy in front of the webserver - particularly which SSL.
Besides that you may get significant benefits through normal tuning - code profiling, output compression etc.
Instead of managing the KeepAlive settings, which clearly have no real advantage in your particular situation between the 3 options, you should consider switching the Apache to an event or a thread based MPM where you could easily use KeepAlive On and set the Timeout value high.
I would go as far as also considering the switch to Apache on Windows. The benefit here is that it's MPM is completely thread based and takes advantage of Windows preference for threads over processes. You can easily do 512 threads with KeepAlive On and Timeout of 3-10 seconds on 1-2GB of RAM.
WampDeveloper Pro -
Xampp -
WampServer
Otherwise, your only other options are to switch MPM from Prefork to Worker...
http://httpd.apache.org/docs/2.2/mod/worker.html
Or to Event (which also got better with Apache 2.4)...
http://httpd.apache.org/docs/2.2/mod/event.html