how to monitor apache load balancer nodes - apache

I have apache load balancer doing load balancing via tomcat servers ajp as follows:
Worker URL Route RouteRedir Factor Set Status Elected To From
ajp://localhost:8009/myapp s1 2 0 Ok 2292 0 22M
ajp://xx.xx.xx.64:8009/myapp s2 2 0 Ok 2291 0 23M
ajp://xx.xx.xx.228:8009/myapp s3 2 0 Ok 2292 0 23M
ajp://xx.xxx.xx.242:8009/myapp s4 1 0 Ok 2121 0 21M
Sometimes the status results in Error, is there a way I can monitor this constantly and get an email notification if it happens?

Related

Apache Server Many requests stuck in "R" Reading Request

below apache2ctl status with almost no users online.
For over 5 years we (cloud ERP supplier) deploy instances on Google Cloud with Apache with mod_perl.
This week our largest server became slow and unresponsive. No idle workers were available. It turned out increasing both MaxRequestWorkers and ServerLimit to 400 from 150 in mpm_prefork.conf got our server back fast.
Iā€™m wondering why many requests stay in "R" Reading Request, at least 10 times more requests then actually should be.
We did further checking, DoS does not seem to be the issue, as also other servers ā€“ in different clouds as ASW or Alibaba ā€“ we notice the same ratio of 10 between requests actually being processed (R/W/K) and requests that stay in Reading mode.
What could cause this?
sudo /usr/sbin/apache2ctl status
Apache Server Status for localhost (via 127.0.0.1)
Server Version: Apache/2.4.7 (Ubuntu) PHP/5.5.9-1ubuntu4.29 OpenSSL/1.0.1f
mod_perl/2.0.8 Perl/v5.18.2
Server MPM: prefork
Server Built: Apr 3 2019 18:04:25
Current Time: Saturday, 29-Feb-2020 10:15:35 CET
Restart Time: Thursday, 27-Feb-2020 09:45:48 CET
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 2 days 29 minutes 47 seconds
Server load: 0.75 0.77 0.75
Total accesses: 1581181 - Total Traffic: 8.6 GB
CPU Usage: u30.32 s9.64 cu0 cs0 - .0229% CPU load
9.06 requests/sec - 51.5 kB/second - 5.7 kB/request
96 requests currently being processed, 9 idle workers
RRKRRRK_RKRKKRRRRRK_RRRRKRCK_RRRC_CKK_KCRKCRK_RCR__CKKCCRCRRRRRR
RRRRR.RRRKRRRKRRR_RR..R.K.RCRKR.CKK.RRKKR.W.RRKR.....RR.........
................................................................
................................................................
................................................................
................................................................
................
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process

apache2 processes stuck in sending reply - W

I am hosting multiple sites on a server with 7.5gb RAM. Using apache2 mpm_prefork.
Following command gives me a value of 200-300 in production
ps aux|grep -c 'apache2'
Using top i see only some hundred megabytes of RAM is free. Error log show nothing unusual. Is this much apache2 process normal?
MaxRequestWorkers is set to 512
Update:
Now i am using mod-status to check apache activity.
I have a row like this
Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request
0-0 29342 2/2/70 W 0.07 5702 0 3.0 0.00 1.67 XXX XXX /someurl
If i check again after sometime PID not changes and i get SS with greater value that previous time. M of this request is in 'W` sending reply state. So that means apache2 process locked in for that request?
On my VPS and root servers, the situation is partially similar. AFAIK the os tries to distribute most of the processing power/RAM to running processes and frees the resources for other processes as the need arises.

Apache 2.4.10 hangs AH00485: scoreboard is full, not at MaxRequestWorkers

Apache server will stay up for random amount of time, usually days, but eventually enters a hung state. When hung the CPU load gradually spikes on the machine and new web server requests are unresponsive.
Error logs typically contain lots of these:
Wed Jan 28 16:06:58.667188 2015] [mpm_event:error] [pid 25336:tid 1] AH00485: scoreboard is full, not at MaxRequestWorkers
Environment:
LDOM (VM) SunOS myhostname 5.10 Generic_118833-36 sun4v sparc SUNW,Sun-Fire-T200
http Conf:
StartServers 8
MinSpareServers Not set
MaxSpareServers Not set
ServerLimit 256
MaxRequestWorkers 100
MaxConnectionsPerChild 1000
KeepAlive On
TimeOut 3000
MaxKeepAliveRequests 50
KeepAliveTimeout 2
Current non-hung Score Board:
Server Version: Apache/2.4.10 (Unix)
Server MPM: event
Server Built: Oct 30 2014 16:29:03
Current Time: Wednesday, 28-Jan-2015 10:59:39 PST
Restart Time: Wednesday, 28-Jan-2015 09:49:21 PST
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 1 hour 10 minutes 17 seconds
Server load: 0.60 0.46 0.41
Total accesses: 1134 - Total Traffic: 2.2 GB
CPU Usage: u9.07 s16.94 cu609.51 cs69.31 - 16.7% CPU load
.269 requests/sec - 0.5 MB/second - 2.0 MB/request
1 requests currently being processed, 99 idle workers
PID Connections Threads Async connections
total accepting busy idle writing keep-alive closing
25337 0 yes 1 24 0 0 0
25338 1 yes 0 25 1 0 0
25339 1 yes 0 25 0 0 1
25340 1 yes 0 25 0 0 1
Sum 3 1 99 1 0 2
Any thoughts on http conf tuning, OS patches, apache bug fixes appreciated.
Yes I have seen the open ASF bugzilla for the same error message.
This is a production server, so you can imagine, having it go down at random times (usually when I am asleep) is not fun!

How to get tomcat worker status from jkmanager

We have 3 Machines
mod_jk with load balancer
first Worker on tomcat8
2nd Worker on tomcat8
everything works as expected but, when one of the tomcat is being shutting down the status page on the load balancer still shows that the state of this worker is OK/IDLE.
Any ideas how to force the status page to check the real status of the worker?
Related Materials
worker.properties
\### Define worker names
worker.list=status,loadbalancer
\### Declare Tomcat server 1
worker.worker1.port=8409
worker.worker1.host=centureapp1
worker.worker1.type=ajp13
worker.worker1.lbfactor=1
\### Declare Tomcat server 2
worker.worker2.port=8410
worker.worker2.host=centureapp2
worker.worker2.type=ajp13
worker.worker2.lbfactor=1
worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=worker1,worker2
worker.loadbalancer.sticky_session=1
worker.status.type=status
~
By default balancer maintenance runs every 60 seconds. So, you will see the state of this worker after 60 seconds.

APC (PHP Cache) Uptime 0 minutes, not caching

My goal is to implement APC for opcode cache for a drupal 6 production site.
I have so far tested APC with several php files with and without including other php files with include_once.
Also tried to tweak the apc.ini values for shm_size, apc.include_once_override and apc.stat.
Restarted apache every time.
Resulting in apc.php not showing any changes in any values. (except of course the changed apc.ini values are shown as they should)
Every time i refresh the apc.php test page, the start time resets as the current time showing uptime 0 minutes.
apc.php -testpage shows:
General Cache InformationAPC Version 3.1.9
PHP Version 5.2.10
APC Host xxxx.xx.xx
Server Software Apache/2.2.3 (CentOS)
Shared Memory 1 Segment(s) with 128.0 MBytes
(mmap memory, pthread mutex Locks locking)
Start Time 2011/07/26 11:53:56
Uptime 0 minutes
File Upload Support 1
Cached Files 0 ( 0.0 Bytes)
Hits 1
Misses 1
Request Rate (hits, misses) 2.00 cache requests/second
Hit Rate 1.00 cache requests/second
Miss Rate 1.00 cache requests/second
Insert Rate 0.00 cache requests/second
Cache full count 0
Cached Variables 0 ( 0.0 Bytes)
Hits 0
Misses 0
Request Rate (hits, misses) 0.00 cache requests/second
Hit Rate 0.00 cache requests/second
Miss Rate 0.00 cache requests/second
Insert Rate 0.00 cache requests/second
Cache full count 0
apc.cache_by_default 1
apc.canonicalize 1
apc.coredump_unmap 0
apc.enable_cli 0
apc.enabled 1
apc.file_md5 0
apc.file_update_protection 2
apc.filters
apc.gc_ttl 3600
apc.include_once_override 0
apc.lazy_classes 0
apc.lazy_functions 0
apc.max_file_size 16
apc.mmap_file_mask /tmp/apcphp5.095eRm
apc.num_files_hint 1024
apc.preload_path
apc.report_autofilter 0
apc.rfc1867 0
apc.rfc1867_freq 0
apc.rfc1867_name APC_UPLOAD_PROGRESS
apc.rfc1867_prefix upload_
apc.rfc1867_ttl 3600
apc.serializer default
apc.shm_segments 1
apc.shm_size 128M
apc.slam_defense 0
apc.stat 0
apc.stat_ctime 0
apc.ttl 7200
apc.use_request_time 1
apc.user_entries_hint 4096
apc.user_ttl 7200
apc.write_lock 1
Host Status Diagrams:
Free: 128.0 MBytes (100.0%) Hits: 1 (50.0%)
Used: 20.3 KBytes (0.0%) Misses: 1 (50.0%)
Detailed Memory Usage and Fragmentation:
Fragmentation: 0%
phpinfo shows:
Server API CGI/FastCGI
APC:
Version 3.1.9
APC Debugging Enabled
MMAP Support Enabled
MMAP File Mask /tmp/apcphp5.JkKDk7
Locking type pthread mutex Locks
Serialization Support php
Revision $Revision: 308812 $
Build Date Jul 21 2011 14:31:12
I followed these steps to find if suexec settings would prevent caching:
http://www.litespeedtech.com/support/forum/showthread.php?t=4189
[root#host /]# ps -ef|grep lsphp
root 20402 17833 0 11:21 pts/0 00:00:00 grep lsphp
[root#host /]# ps -waux
root 17833 0.0 0.1 5004 1484 pts/0 S 10:39 0:00 bash
..indicates that there is no lsphp running on the host
also I read the following article and comments, concluding that in my case the problem is not the suexec as the user apache is the httpd process owner
http://www.brandonturner.net/blog/2009/07/fastcgi_with_php_opcode_cache/
also suexec command is not recognized when logged and launced as root # host
also i'm almost confident that there is no cPanel running on the host to check if a setting there would reset the running cache process at some interval
This leaves me with few clues where to head next.
I tried to set (with chown and chgrp) apache as the owner of the apc.php file and some test php files resulting in 500 server error.
Is there a way to check if the file permissions prevent the apc stay running?
I'm tremendously grateful for any suggestions or help.
Can you give your php.ini settings for APC ?
You must restart httpd to take setting change into account.
Try to change max file size
apc.max_file_size = 20M
You are alowing 128M of ram, it's quite low for big php applications like we have today (a single wordpress uses 32M)
apc.shm_size 128M
increase it also