Apache + Tomcat with mod_jk: maxThread setting upon load balancing - apache

I have Apache + Tomcat setup with mod_jk on 2 servers. Each server has its own Apache+Tomcat pair, and every request is being served by Tomcat load balancing workers on 2 servers.
I have a question about how Apache's maxClient and Tomcat's maxThread should be set.
The default numbers are,
Apache: maxClient=150, Tomcat: maxThread=200
In this configuration, if we have only 1 server setup, it would work just fine as Tomcat worker never receives the incoming connections more than 150 at once. However, if we are load balancing between 2 servers, could it be possible that Tomcat worker receives 150 + (some number from another server) and make the maxThread overflow as SEVERE: All threads (200) are currently busy?
If so, should I set Tomcat's maxThread=300 in this case?
Thanks

Setting maxThreads to 300 should be fine - there are no fixed rules. It depends on whether you see any connections being refused.
Increasing too much causes high memory consumption but production Tomcats are known to run with 750 threads. See here as well. http://java-monitor.com/forum/showthread.php?t=235
Have you actually got the SEVERE error? I've tested on our Tomcat 6.0.20 and it throws an INFO message when the maxThreads is crossed.
INFO: Maximum number of threads (200) created for connector with address null and port 8080
It does not refuse connections until the acceptCount value is crossed. The default is 100.
From the Tomcat docs http://tomcat.apache.org/tomcat-5.5-doc/config/http.html
The maximum queue length for incoming
connection requests when all possible
request processing threads are in use.
Any requests received when the queue
is full will be refused. The default
value is 100.
The way it works is
1) As the number of simultaneous requests increase, threads will be created up to the configured maximum (the value of the maxThreads attribute).
So in your case, the message "Maximum number of threads (200) created" will appear at this point. However requests will still be queued for service.
2) If still more simultaneous requests are received, they are queued up to the configured maximum (the value of the acceptCount attribute).
Thus a total of 300 requests can be accepted without failure. (assuming your acceptCount is at default of 100)
3) Crossing this number throws Connection Refused errors, until resources are available to process them.
So you should be fine until you hit step 3

Related

org.apache.ignite.client.ClientConnectionException: Ignite cluster is unavailable inside kubernetes environment

We have a setup wherein, one ignite server node serves 15 to 20 thick client nodes and 40 to 50 thin client nodes, thin client connection is singlton,
In operation, some times we get below error,
org.apache.ignite.client.ClientConnectionException: Ignite cluster is unavailable [sock=Socket[addr=hostnm19.hostx.com/10.13.10.19,port=30519,localport=57552]]
On the Server node, we are inserting data inside a third party store using CacheStoreAdapters
Don't know where it goes wrong since out of 100 operations one operation fails with the above error.
Also, let me know what can we do for this failure handling.
Apache Ignite version: 2.8
Edits: (Code Snippet)
ClientConfiguration cfg = new ClientConfiguration()
.setAddresses("host:port");
IgniteClient client = Ignition.startClient(cfg); // this client is singleton
client.getOrCreateCache("ABC_CACHE").put(key, val);
StatckTrace:
org.apache.ignite.client.ClientConnectionException: Ignite cluster is unavailable [sock=Socket[addr=hostnm19.hostx.com/10.13.10.19,port=30519,localport=57552]]
at org.apache.ignite.internal.client.thin.TcpClientChannel.handleIOError(TcpClientChannel.java:499)
at org.apache.ignite.internal.client.thin.TcpClientChannel.handleIOError(TcpClientChannel.java:491)
at org.apache.ignite.internal.client.thin.TcpClientChannel.access$100(TcpClientChannel.java:92)
at org.apache.ignite.internal.client.thin.TcpClientChannel$ByteCountingDataInput.read(TcpClientChannel.java:538)
at org.apache.ignite.internal.client.thin.TcpClientChannel$ByteCountingDataInput.readInt(TcpClientChannel.java:572)
at org.apache.ignite.internal.client.thin.TcpClientChannel.processNextResponse(TcpClientChannel.java:272)
at org.apache.ignite.internal.client.thin.TcpClientChannel.receive(TcpClientChannel.java:234)
at org.apache.ignite.internal.client.thin.TcpClientChannel.service(TcpClientChannel.java:171)
at org.apache.ignite.internal.client.thin.ReliableChannel.service(ReliableChannel.java:160)
at org.apache.ignite.internal.client.thin.ReliableChannel.request(ReliableChannel.java:187)
at org.apache.ignite.internal.client.thin.TcpIgniteClient.getOrCreateCache(TcpIgniteClient.java:114)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.ignite.internal.client.thin.TcpClientChannel$ByteCountingDataInput.read(TcpClientChannel.java:535)
... 36 more
You probably have network or NAT configured which will reset connections when not used, or even sporadically.
In this case, you will have to reconnect.
Another option, are you sure you are connecting to thin client port and not some other port?

Timeout during allocate while making RFC call

I am trying to create a SAP RFC connection to a new system.
AFAIK the firewall (in this case to port 3321) is open.
I get this message at the client:
RFC_COMMUNICATION_FAILURE (rc=1): key=RFC_COMMUNICATION_FAILURE, message=
LOCATION SAP-Gateway on host ax-swb-q06.prod.lokal / sapgw21
ERROR timeout during allocate
TIME Thu Jul 26 16:45:48 2018
RELEASE 753
COMPONENT SAP-Gateway
VERSION 2
RC 242
MODULE /bas/753_REL/src/krn/si/gw/gwr3cpic.c
LINE 2210
DETAIL no connect of TP sapdp21 from host 10.190.10.32 after 20 sec
COUNTER 3
[MSG: class=, type=, number=, v1-4:=;;;]
And this message on the SAP server
Any clue what needs to be done, to get RFC working?
With this little info no one can know what the issue is here.
But it is something related to your network and SAP system configuration.
I guess your firewall does some network address translation (NAT) and the new IP behind the firewall does not match anymore with the known one. SAP is doing some own IP / host name security checks.
If not already done, check with opening the ports 3221, 3321 and 4821 in the firewall. Also check the SAP gateway configuration which IP addresses and host names are configured to be valid ones for it (look at what is traced in the beginning of the gateway trace file dev_rd at ABAP side).
Also consider if maybe the usage of a SAProuter would be the better option for your needs.
it works in my case if ashost is the host name, and not an IP address!
Do not ask me why, but this fails:
Connection(user='x', passwd='...', ashost='10.190.10.32', sysnr='21', client='494')
But this works:
Connection(user='x', passwd='...', ashost='ax-swb-q06.prod.lokal', sysnr='21', client='494')
This is strange, since DNS resolution happens before TCP communication.
It seems that the ashost value gets used inside the connection. Strange. For most normal protocols (http, ftp, pop3, ...) this does not matter. Or you get at least a better error message.

Openshift online v3 - Timeout when reading response headers from daemon process

I created an python api on openshift online with python image. If you request all the data, it takes more than 30 seconds to respond. The server gives a 504 gateway timeout http response. How do you configure the length a response can take? > I created an annotation on the route, this seems to set proxy timeout.
haproxy.router.openshift.io/timeout: 600s
Problem remains, I now got logging. It looks like the message comes from mod_wsgi.
I want to try alter the configuration of the httpd (mod_wsgi-express process) from request-timeout 60 to request-timeout 600. Where doe you configure this. I am using base image https://github.com/sclorg/s2i-python-container/tree/master/2.7
Logging:
Timeout when reading response headers from daemon process 'localhost:8080':/tmp/mod_wsgi-localhost:8080:1000430000/htdocs
Does someone know how to fix this error on openshift online
Next to alter timeout of haproxy of the route of my app
haproxy.router.openshift.io/timeout: 600s
I altered the request-timeout and socket-timeout in app.sh of my python application. So the mod_wsgi-express server is configured with a higher timeout
ARGS="$ARGS --request-timeout 600"
ARGS="$ARGS --socket-timeout 600"
My application now wait 10 minutes before cancelling a request

apache2 processes stuck in sending reply - W

I am hosting multiple sites on a server with 7.5gb RAM. Using apache2 mpm_prefork.
Following command gives me a value of 200-300 in production
ps aux|grep -c 'apache2'
Using top i see only some hundred megabytes of RAM is free. Error log show nothing unusual. Is this much apache2 process normal?
MaxRequestWorkers is set to 512
Update:
Now i am using mod-status to check apache activity.
I have a row like this
Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request
0-0 29342 2/2/70 W 0.07 5702 0 3.0 0.00 1.67 XXX XXX /someurl
If i check again after sometime PID not changes and i get SS with greater value that previous time. M of this request is in 'W` sending reply state. So that means apache2 process locked in for that request?
On my VPS and root servers, the situation is partially similar. AFAIK the os tries to distribute most of the processing power/RAM to running processes and frees the resources for other processes as the need arises.

How to get tomcat worker status from jkmanager

We have 3 Machines
mod_jk with load balancer
first Worker on tomcat8
2nd Worker on tomcat8
everything works as expected but, when one of the tomcat is being shutting down the status page on the load balancer still shows that the state of this worker is OK/IDLE.
Any ideas how to force the status page to check the real status of the worker?
Related Materials
worker.properties
\### Define worker names
worker.list=status,loadbalancer
\### Declare Tomcat server 1
worker.worker1.port=8409
worker.worker1.host=centureapp1
worker.worker1.type=ajp13
worker.worker1.lbfactor=1
\### Declare Tomcat server 2
worker.worker2.port=8410
worker.worker2.host=centureapp2
worker.worker2.type=ajp13
worker.worker2.lbfactor=1
worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=worker1,worker2
worker.loadbalancer.sticky_session=1
worker.status.type=status
~
By default balancer maintenance runs every 60 seconds. So, you will see the state of this worker after 60 seconds.