Getting total number of request per minute data from glassfish using nagios check_jmx4perl plugin - glassfish

I have nagios monitoring system running and gathering details from mysql, glassfish memory footprints, etc...
Now I am trying to fetch the total number of requests that glassfish handles per minute and average response time of those requests.
I have jolokia agent installed in glassfish cluster and getting memory and other stuff.
But I have no idea on how to fetch request/reponse data from glassfish
using nagios check_jmx4perl plugin

After searching for an hour, I came up with another tool to monitor total number of request per minute in glassfish, tomcat, etc. It has so many features and not just request and response stats.
You can use nagios to monitor memory footprints and other java heap memory stats but to monitor request, response, throughput, etc, Use the below one
It's JavaMelody
https://github.com/javamelody/javamelody/wiki

Related

RabbitMQ as Message Broker used by Spring Websocket dies under load

I develop an application where we need to handle 160k concurrent users which are connected to the backend via a websocket connection.
We decided to use the spring websocket implementation and RabbitMQ as the message broker.
In our application every user needs to subscribe to its user queue /exchange/amq.direct/update as well as to another queue where also other users can potential subscribe to /topic/someUniqueName.
In our first performance test we did the naive approach where every user subscribes to two new queues.
When running the test RabbitMQ dies silently when around 800 users are connected at the same time, so around 1600 queues are active (See the graph of all RabbitMQ objects here).
I read though that you should be careful opening many connections to RabbitMQ.
Now I wonder if the approach that is anticipated by Spring Websocket with opening one queue per user is a conceptional problem for systems with high load or if there is another error in my system.
Limiting factors for RabbitMQ are usually:
memory (can be checked in dashboard) that needs to grow with number of messages and number of queues (if you don't use lazy queues that go directly to disk).
maximum number of file descriptors (at least 1 per connection) that often defaults to too low values on many distributions (ref: https://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2012-April/019615.html)
CPU for routing the messages
I did find the issue. I actually misconfigured the RabbitMQ service and just gave it a 1024 file descriptor limit. Increasing it solved the issue.

spring-data-redis cluster recovery issue

We're running a 7-node redis cluster, with all nodes as masters (no slave replication). We're using this as an in-memory cache, so we've commented out all saves in redis.conf, and we've got the following other non-defaults in redis.conf:
maxmemory: 30gb
maxmemory-policy allkeys-lru
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
cluster-require-full-coverage no
The client for this cluster is a spring-boot rest api application, using spring-data-redis with jedis as the driver. We mainly use the spring caching annotations.
We had an issue the other day where one of the masters went down for a while. With a single master down in a 7-node cluster we noted a marked increase in the average response time for api calls involving redis, which I would expect.
When the down master was brought back online and re-joined the cluster, we had a massive spike in response time. Via newrelic I can see that the app started making a ton of redis cluster calls (newrelic doesn't tell me which cluster subcommand was being used). Our normal avg response time is around 5ms; during this time it went up to 800ms and we had a few slow sample transactions that took > 70sec. On all app jvms I see the number of active threads jump from a normal 8-9 up to around 300 during this time. We have configured the tomcat http thread pool to allow 400 threads max. After about 3 minutes, the problem cleared itself up, but I now have people questioning the stability of the caching solution we chose. Newrelic doesn't give any insight into where the additional time on the long requests is being spent (it's apparently in an area that Newrelic doesn't instrument).
I've made some attempt to reproduce by running some jmeter load tests against a development environment, and while I see some moderate response time spikes when re-attaching a redis-cluster master, I don't see anything near what we saw in production. I've also run across https://github.com/xetorthio/jedis/issues/1108, but I'm not gaining any useful insight from that. I tried reducing spring.redis.cluster.max-redirects from the default 5 to 0, which didn't seem to have much effect on my load test results. I'm also not sure how appropriate a change that is for my use case.

Worklight Server Adapter Errors

We are using a pair of load-balanced Worklight 6.1.0.02.20150520-1015 servers in a production environment to support a mobile app with about 15 ~ 20k queries per day coming through to the Worklight servers adapters.
These adapters calls are not really doing any processing. For the most part they are simply passing http requests along to internal servers located in the same zone as the Worklight server. The internal servers typically respond to requests within 100ms or less.
We are seeing an average of 12 errors per thousand requests in the Worklight logs. They are roughly 2/3 UNEXPECTED_ERROR, 1/3 REQUEST_TIMEOUT, and 1/3 UNRESPONSIVE_HOST. As far as we can see, these requests never even reach the internal servers.
It is as if these requests are queuing up or failing on the Worklight servers somehow.
The adapters typically have these settings;
<loadConstraints maxConcurrentConnectionsPerNode="50" />
<procedure name=... requestTimeoutInSeconds="60" />
What should we be doing to reduce this error rate?
Does it indicate the servers need more memory or processing speed? Do we need to experiment with changing the settings? Or what?
My suggestion would be to open an IBM PMR (support ticket) by your business/dev unit, since this question is not suitable for Stack Overflow (more about infrastructure handling than programming). The support/dev team could then investigate and possibly provide a solution.

Help analyzing glassfish server hang problem

We are running a glassfish server with around 20 jax-ws metro web services. The server specs are Core2Duo with 8GB RAM. We are using a single http listener for all the web services. Development is set to True. Request Thread Count is 2 and Acceptor Count is 1.
The Minimum and Maximum Heap Sizes are 1GB and the Perm Gen is set to 512MB.
The services access an Oracle database via a Hibernate layer and there are many interservice calls between the services.
The front end is ASP.Net. Our problem is that when 4-5 users try to access the application simultaneously for some time (1 hour) the glassfish server hangs with the CPU going to 100% but the memory utilization is around 10-11%.
We are not able to find any pointers as to how to debug this problem. On some instances the log file gives java.lang.OutofMemory Exception : PermGenSpace. But this is also not everytime, i.e. on many occassions the log file does not give any error on hanging. Also the glass fish server does not start if we try to increase the Perm Gen Space. We need some direction on how to diagnose and move towards the solution to this problem.
The Glass Fish Version we are using is v2.1.
We have the following observations:
1. Adding more http listeners (1 listener per 4-5 services) does prolong the failing time but not with much effect.
2. When calling some of the heavy services (one by one operation) with SOAP-UI we also get the hang problem when running many threads simultaneously. (e.g. 8-10 threads)
3. We have observed that when calling with SOAP-UI a service operation (which does not call any other services) rarely hangs while a service calling other services hangs much frequently.

IIS, APACHE, YAWS runtime environment

Recently I gone through a an article explaining potentiality of YAWS server and the number of requests it processes per second. It was mentioned that YAWS can handle 80K requests per second and it also run in multi threaded environment to improve request processing limit.
How can we compare IIS, Apache with YAWS? Which one will process maximum requests? Can I find any comparisons somewhere?
Check this link out:http://www.sics.se/~joe/apachevsyaws.html Link to Yaws vs apache
You see that Yaws handles 80000 concurrent requests (and continuing) while apache fails at around 4000 connections. This is because Yaws runs on the Erlang/OTP VM. Processes belong to this machine and not the operating system. Erlang has been highly customised for concurrent programming. Infact, other erlang web applications like:mochiweb,webmachine, e.t.c are much more powerful than apache when it comes to handling many concurrent requests. Yaws web server scales better than any web server i know of today. With the ability to create appmods, you can create Erlang Applications that communicate over http protocol, making use of the power of yaws.
Yaws home page is: http://yaws.hyber.org/. Actually, Yaws gets its power from OTP (Open Telecom Platform). This set of powerful libraries found at http://erlang.org/, has the most advanced design patterns such as fail over systems, supervision trees, finite state machines, event handlers, e.t.c, You should actually start using erlang for your next web application!!!!