Apache 2.4.10 hangs AH00485: scoreboard is full, not at MaxRequestWorkers - apache

Apache server will stay up for random amount of time, usually days, but eventually enters a hung state. When hung the CPU load gradually spikes on the machine and new web server requests are unresponsive.
Error logs typically contain lots of these:
Wed Jan 28 16:06:58.667188 2015] [mpm_event:error] [pid 25336:tid 1] AH00485: scoreboard is full, not at MaxRequestWorkers
Environment:
LDOM (VM) SunOS myhostname 5.10 Generic_118833-36 sun4v sparc SUNW,Sun-Fire-T200
http Conf:
StartServers 8
MinSpareServers Not set
MaxSpareServers Not set
ServerLimit 256
MaxRequestWorkers 100
MaxConnectionsPerChild 1000
KeepAlive On
TimeOut 3000
MaxKeepAliveRequests 50
KeepAliveTimeout 2
Current non-hung Score Board:
Server Version: Apache/2.4.10 (Unix)
Server MPM: event
Server Built: Oct 30 2014 16:29:03
Current Time: Wednesday, 28-Jan-2015 10:59:39 PST
Restart Time: Wednesday, 28-Jan-2015 09:49:21 PST
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 1 hour 10 minutes 17 seconds
Server load: 0.60 0.46 0.41
Total accesses: 1134 - Total Traffic: 2.2 GB
CPU Usage: u9.07 s16.94 cu609.51 cs69.31 - 16.7% CPU load
.269 requests/sec - 0.5 MB/second - 2.0 MB/request
1 requests currently being processed, 99 idle workers
PID Connections Threads Async connections
total accepting busy idle writing keep-alive closing
25337 0 yes 1 24 0 0 0
25338 1 yes 0 25 1 0 0
25339 1 yes 0 25 0 0 1
25340 1 yes 0 25 0 0 1
Sum 3 1 99 1 0 2
Any thoughts on http conf tuning, OS patches, apache bug fixes appreciated.
Yes I have seen the open ASF bugzilla for the same error message.
This is a production server, so you can imagine, having it go down at random times (usually when I am asleep) is not fun!

Related

Intermittent Cannot connect to shibd process, a site adminstrator should be notified

We have a shibboleth native SP 2.5.4 that's been running for a few years without any issues. Yesterday I had to update a certificate for one of the IDP. Since that restart I've been getting intermittent errors:
Cannot connect to shibd process, a site adminstrator should be notified.
Errors appear to occur in bursts as shown by these number of errors per minute:
nb | time
58 Sep 22 09:56
82 Sep 22 10:53
82 Sep 22 11:16
80 Sep 22 11:17
89 Sep 22 11:37
71 Sep 22 11:38
130 Sep 22 11:43
47 Sep 22 11:44
Restarting httpd and shibd didn't resolve the issue. SElinux is disabled.
In /var/log/shibboleth-www/native_warn.log I have:
2020-09-22 11:54:13 ERROR Shibboleth.Listener [15798] shib_check_user: socket call (connect) resulted in error (2): no message
2020-09-22 11:54:13 WARN Shibboleth.Listener [15798] shib_check_user: cannot connect socket (21)...
2020-09-22 11:54:13 CRIT Shibboleth.Listener [15798] shib_check_user: socket server unavailable, failing
2020-09-22 11:54:13 ERROR Shibboleth.Apache [15798] shib_check_user: Cannot connect to shibd process, a site adminstrator should be notified.
Memory and CPU look good to me:
top - 12:08:08 up 25 days, 22:33, 2 users, load average: 1.01, 1.03, 1.01
Tasks: 294 total, 1 running, 293 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.1%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32880188k total, 4712256k used, 28167932k free, 426772k buffers
Swap: 5242876k total, 0k used, 5242876k free, 1993996k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16876 apache 20 0 370m 10m 4160 S 0.7 0.0 0:00.20 httpd
17418 shibd 20 0 4894m 58m 8084 S 0.7 0.2 0:01.46 shibd
2401 root 20 0 3116m 270m 19m S 0.3 0.8 128:41.88 cylancesvc
17519 apache 20 0 370m 10m 3948 S 0.3 0.0 0:00.12 httpd
17766 apache 20 0 370m 10m 3872 S 0.3 0.0 0:00.13 httpd
Any idea what could cause this?

Apache Server Many requests stuck in "R" Reading Request

below apache2ctl status with almost no users online.
For over 5 years we (cloud ERP supplier) deploy instances on Google Cloud with Apache with mod_perl.
This week our largest server became slow and unresponsive. No idle workers were available. It turned out increasing both MaxRequestWorkers and ServerLimit to 400 from 150 in mpm_prefork.conf got our server back fast.
Iā€™m wondering why many requests stay in "R" Reading Request, at least 10 times more requests then actually should be.
We did further checking, DoS does not seem to be the issue, as also other servers ā€“ in different clouds as ASW or Alibaba ā€“ we notice the same ratio of 10 between requests actually being processed (R/W/K) and requests that stay in Reading mode.
What could cause this?
sudo /usr/sbin/apache2ctl status
Apache Server Status for localhost (via 127.0.0.1)
Server Version: Apache/2.4.7 (Ubuntu) PHP/5.5.9-1ubuntu4.29 OpenSSL/1.0.1f
mod_perl/2.0.8 Perl/v5.18.2
Server MPM: prefork
Server Built: Apr 3 2019 18:04:25
Current Time: Saturday, 29-Feb-2020 10:15:35 CET
Restart Time: Thursday, 27-Feb-2020 09:45:48 CET
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 2 days 29 minutes 47 seconds
Server load: 0.75 0.77 0.75
Total accesses: 1581181 - Total Traffic: 8.6 GB
CPU Usage: u30.32 s9.64 cu0 cs0 - .0229% CPU load
9.06 requests/sec - 51.5 kB/second - 5.7 kB/request
96 requests currently being processed, 9 idle workers
RRKRRRK_RKRKKRRRRRK_RRRRKRCK_RRRC_CKK_KCRKCRK_RCR__CKKCCRCRRRRRR
RRRRR.RRRKRRRKRRR_RR..R.K.RCRKR.CKK.RRKKR.W.RRKR.....RR.........
................................................................
................................................................
................................................................
................................................................
................
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process

CouchDB crashes after few minutes running

CouchDB is very unpleasant for me. Niether documentation, nor tips could help me at all. The situation is like that:
FreeBSD 9.2 amd64
couchdb-1.5.0,2 installed from ports
npm couchapp
npm semver.
Started replication in CouchDB for node repo and amazing crashes are happening every several minutes. I wrote a script which tests process every 5 seconds:
13:40:53
13:48:11 7m42s [growing tendention]
13:56:09 7m58s
14:04:11 8m02s
14:12:23 8m12s
14:21:14 8m12s
14:30:08 8m54s
14:40:48 10m40s
14:57:13 16m35s [stop growing tendention]
15:08:29 11m16s
...
couch.log: (not always, sometimes nothing at all)
Tue, 06 May 2014 12:59:51 GMT] [error] [<0.134.0>] Error in replication `[REPLICATION_HASH]+continuous` (triggered by document `npmjs_repl`): timeout
Restarting replication in 40 seconds.
[info] [<0.372.0>] Replication `"[REPLICATION_HASH]+continuous"` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 30000 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
source start sequence 203628
[Tue, 06 May 2014 13:00:32 GMT] [info] [<0.372.0>] Replication `"[REPLICATION_HASH]7+continuous"` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 30000 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
source start sequence 203628
err.log (in every crash)
heart: Tue May 6 15:06:14 2014: heart-beat time-out, no activity for 13 seconds
heart: Tue May 6 15:06:16 2014: Executed "/usr/local/bin/couchdb -k" -> 0. Terminating.
heart_beat_kill_pid = 52979
heart_beat_timeout = 11
truss output:
...
kevent(3,0x0,0,{},256,{0.000000000 }) = 0 (0x0)
kevent(3,0x0,0,{},256,{0.000000000 }) = 0 (0x0)
kevent(3,0x0,0,{},256,{0.000000000 }) = 0 (0x0)
kevent(3,0x0,0,{},256,{0.000000000 }) = 0 (0x0)
kevent(3,0x0,0,{},256,{0.000000000 }) = 0 (0x0)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 17431527424 (0x40f000000)
SIGNAL 9 (SIGKILL)
process exit, rval = 0
Thanks for helping.

Apache performance tuning with 1 GB with httpd.conf

I have a 1 GB VPS and Apache slows to a crawl almost from start up. I ran ApacheBench on a static.html file and things don't differ. However, the site will have both MySQL and PHP and a high volume of AJAX requests, so I'd like to tune for that.
When I restart, error logs show this almost immediately:
[error] server reached MaxClients setting, consider raising the MaxClients setting
ab -n 1000 -c 1000
shows:
Document Path: /static.html
Document Length: 7 bytes
Concurrency Level: 1000
Time taken for tests: 57.784 seconds
Complete requests: 1000
Failed requests: 64
(Connect: 0, Receive: 0, Length: 64, Exceptions: 0)
Write errors: 0
Total transferred: 309816 bytes
HTML transferred: 6552 bytes
Requests per second: 17.31 [#/sec] (mean)
Time per request: 57784.327 [ms] (mean)
Time per request: 57.784 [ms] (mean, across all concurrent requests)
Transfer rate: 5.24 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 25 13.4 25 48
Processing: 1070 16183 15379.4 9601 57737
Waiting: 0 14205 15176.5 9591 42516
Total: 1070 16208 15385.0 9635 57783
Percentage of the requests served within a certain time (ms)
50% 9635
66% 20591
75% 20629
80% 36357
90% 42518
95% 42538
98% 42556
99% 42560
100% 57783 (longest request)
If I run ab on a php file, it finishes sometimes, most of the time it won't and sometimes gets errors like
apr_socket_recv: Connection reset by peer (104)
and
socket: No buffer space available (105)
httpd.conf items:
Timeout 10
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 1
<IfModule prefork.c>
StartServers 3
MinSpareServers 5
MaxSpareServers 9
ServerLimit 40
MaxClients 40
MaxRequestsPerChild 5000
</IfModule>
Top... (CPU and Load 1min are very erratic during testing):
top - 10:44:51 up 11:50, 3 users, load average: 0.17, 0.42, 0.90
Tasks: 84 total, 2 running, 82 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.8%us, 3.1%sy, 0.0%ni, 94.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1793072k total, 743604k used, 1049468k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 0k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21831 mysql 18 0 506m 71m 6688 S 0.7 4.1 4:03.18 mysqld
1828 root 15 0 113m 52m 2052 S 0.0 3.0 0:02.85 spamd
1830 popuser 18 0 113m 51m 956 S 0.0 2.9 0:00.00 spamd
8012 apache 15 0 327m 35m 17m S 3.7 2.0 0:11.83 httpd
8041 apache 15 0 320m 28m 15m S 0.0 1.6 0:11.83 httpd
8022 apache 15 0 321m 27m 14m S 2.3 1.6 0:11.05 httpd
8033 apache 15 0 320m 27m 14m S 1.7 1.6 0:10.06 httpd
Is there something obvious that is wrong here? or what would be my next step in troubleshooting?
Sounds like you don't have enough memory -- 1GB isn't much when you're running PHP with prefork and MySQL on the same server. Your MaxClients should probably be 10-20, not 40.
A few weeks ago I wrote a script to tune Apache httpd that would probably help determine the maximum values for your server. You can find the weblog entry here http://surniaulula.com/2012/11/09/check-apache-httpd-mpm-config-limits/ and the script is on Google Code as well.
Enjoy!
js.

APC (PHP Cache) Uptime 0 minutes, not caching

My goal is to implement APC for opcode cache for a drupal 6 production site.
I have so far tested APC with several php files with and without including other php files with include_once.
Also tried to tweak the apc.ini values for shm_size, apc.include_once_override and apc.stat.
Restarted apache every time.
Resulting in apc.php not showing any changes in any values. (except of course the changed apc.ini values are shown as they should)
Every time i refresh the apc.php test page, the start time resets as the current time showing uptime 0 minutes.
apc.php -testpage shows:
General Cache InformationAPC Version 3.1.9
PHP Version 5.2.10
APC Host xxxx.xx.xx
Server Software Apache/2.2.3 (CentOS)
Shared Memory 1 Segment(s) with 128.0 MBytes
(mmap memory, pthread mutex Locks locking)
Start Time 2011/07/26 11:53:56
Uptime 0 minutes
File Upload Support 1
Cached Files 0 ( 0.0 Bytes)
Hits 1
Misses 1
Request Rate (hits, misses) 2.00 cache requests/second
Hit Rate 1.00 cache requests/second
Miss Rate 1.00 cache requests/second
Insert Rate 0.00 cache requests/second
Cache full count 0
Cached Variables 0 ( 0.0 Bytes)
Hits 0
Misses 0
Request Rate (hits, misses) 0.00 cache requests/second
Hit Rate 0.00 cache requests/second
Miss Rate 0.00 cache requests/second
Insert Rate 0.00 cache requests/second
Cache full count 0
apc.cache_by_default 1
apc.canonicalize 1
apc.coredump_unmap 0
apc.enable_cli 0
apc.enabled 1
apc.file_md5 0
apc.file_update_protection 2
apc.filters
apc.gc_ttl 3600
apc.include_once_override 0
apc.lazy_classes 0
apc.lazy_functions 0
apc.max_file_size 16
apc.mmap_file_mask /tmp/apcphp5.095eRm
apc.num_files_hint 1024
apc.preload_path
apc.report_autofilter 0
apc.rfc1867 0
apc.rfc1867_freq 0
apc.rfc1867_name APC_UPLOAD_PROGRESS
apc.rfc1867_prefix upload_
apc.rfc1867_ttl 3600
apc.serializer default
apc.shm_segments 1
apc.shm_size 128M
apc.slam_defense 0
apc.stat 0
apc.stat_ctime 0
apc.ttl 7200
apc.use_request_time 1
apc.user_entries_hint 4096
apc.user_ttl 7200
apc.write_lock 1
Host Status Diagrams:
Free: 128.0 MBytes (100.0%) Hits: 1 (50.0%)
Used: 20.3 KBytes (0.0%) Misses: 1 (50.0%)
Detailed Memory Usage and Fragmentation:
Fragmentation: 0%
phpinfo shows:
Server API CGI/FastCGI
APC:
Version 3.1.9
APC Debugging Enabled
MMAP Support Enabled
MMAP File Mask /tmp/apcphp5.JkKDk7
Locking type pthread mutex Locks
Serialization Support php
Revision $Revision: 308812 $
Build Date Jul 21 2011 14:31:12
I followed these steps to find if suexec settings would prevent caching:
http://www.litespeedtech.com/support/forum/showthread.php?t=4189
[root#host /]# ps -ef|grep lsphp
root 20402 17833 0 11:21 pts/0 00:00:00 grep lsphp
[root#host /]# ps -waux
root 17833 0.0 0.1 5004 1484 pts/0 S 10:39 0:00 bash
..indicates that there is no lsphp running on the host
also I read the following article and comments, concluding that in my case the problem is not the suexec as the user apache is the httpd process owner
http://www.brandonturner.net/blog/2009/07/fastcgi_with_php_opcode_cache/
also suexec command is not recognized when logged and launced as root # host
also i'm almost confident that there is no cPanel running on the host to check if a setting there would reset the running cache process at some interval
This leaves me with few clues where to head next.
I tried to set (with chown and chgrp) apache as the owner of the apc.php file and some test php files resulting in 500 server error.
Is there a way to check if the file permissions prevent the apc stay running?
I'm tremendously grateful for any suggestions or help.
Can you give your php.ini settings for APC ?
You must restart httpd to take setting change into account.
Try to change max file size
apc.max_file_size = 20M
You are alowing 128M of ram, it's quite low for big php applications like we have today (a single wordpress uses 32M)
apc.shm_size 128M
increase it also