below apache2ctl status with almost no users online.
For over 5 years we (cloud ERP supplier) deploy instances on Google Cloud with Apache with mod_perl.
This week our largest server became slow and unresponsive. No idle workers were available. It turned out increasing both MaxRequestWorkers and ServerLimit to 400 from 150 in mpm_prefork.conf got our server back fast.
I’m wondering why many requests stay in "R" Reading Request, at least 10 times more requests then actually should be.
We did further checking, DoS does not seem to be the issue, as also other servers – in different clouds as ASW or Alibaba – we notice the same ratio of 10 between requests actually being processed (R/W/K) and requests that stay in Reading mode.
What could cause this?
sudo /usr/sbin/apache2ctl status
Apache Server Status for localhost (via 127.0.0.1)
Server Version: Apache/2.4.7 (Ubuntu) PHP/5.5.9-1ubuntu4.29 OpenSSL/1.0.1f
mod_perl/2.0.8 Perl/v5.18.2
Server MPM: prefork
Server Built: Apr 3 2019 18:04:25
Current Time: Saturday, 29-Feb-2020 10:15:35 CET
Restart Time: Thursday, 27-Feb-2020 09:45:48 CET
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 2 days 29 minutes 47 seconds
Server load: 0.75 0.77 0.75
Total accesses: 1581181 - Total Traffic: 8.6 GB
CPU Usage: u30.32 s9.64 cu0 cs0 - .0229% CPU load
9.06 requests/sec - 51.5 kB/second - 5.7 kB/request
96 requests currently being processed, 9 idle workers
RRKRRRK_RKRKKRRRRRK_RRRRKRCK_RRRC_CKK_KCRKCRK_RCR__CKKCCRCRRRRRR
RRRRR.RRRKRRRKRRR_RR..R.K.RCRKR.CKK.RRKKR.W.RRKR.....RR.........
................................................................
................................................................
................................................................
................................................................
................
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process
Related
I am hosting multiple sites on a server with 7.5gb RAM. Using apache2 mpm_prefork.
Following command gives me a value of 200-300 in production
ps aux|grep -c 'apache2'
Using top i see only some hundred megabytes of RAM is free. Error log show nothing unusual. Is this much apache2 process normal?
MaxRequestWorkers is set to 512
Update:
Now i am using mod-status to check apache activity.
I have a row like this
Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request
0-0 29342 2/2/70 W 0.07 5702 0 3.0 0.00 1.67 XXX XXX /someurl
If i check again after sometime PID not changes and i get SS with greater value that previous time. M of this request is in 'W` sending reply state. So that means apache2 process locked in for that request?
On my VPS and root servers, the situation is partially similar. AFAIK the os tries to distribute most of the processing power/RAM to running processes and frees the resources for other processes as the need arises.
Apache server will stay up for random amount of time, usually days, but eventually enters a hung state. When hung the CPU load gradually spikes on the machine and new web server requests are unresponsive.
Error logs typically contain lots of these:
Wed Jan 28 16:06:58.667188 2015] [mpm_event:error] [pid 25336:tid 1] AH00485: scoreboard is full, not at MaxRequestWorkers
Environment:
LDOM (VM) SunOS myhostname 5.10 Generic_118833-36 sun4v sparc SUNW,Sun-Fire-T200
http Conf:
StartServers 8
MinSpareServers Not set
MaxSpareServers Not set
ServerLimit 256
MaxRequestWorkers 100
MaxConnectionsPerChild 1000
KeepAlive On
TimeOut 3000
MaxKeepAliveRequests 50
KeepAliveTimeout 2
Current non-hung Score Board:
Server Version: Apache/2.4.10 (Unix)
Server MPM: event
Server Built: Oct 30 2014 16:29:03
Current Time: Wednesday, 28-Jan-2015 10:59:39 PST
Restart Time: Wednesday, 28-Jan-2015 09:49:21 PST
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 1 hour 10 minutes 17 seconds
Server load: 0.60 0.46 0.41
Total accesses: 1134 - Total Traffic: 2.2 GB
CPU Usage: u9.07 s16.94 cu609.51 cs69.31 - 16.7% CPU load
.269 requests/sec - 0.5 MB/second - 2.0 MB/request
1 requests currently being processed, 99 idle workers
PID Connections Threads Async connections
total accepting busy idle writing keep-alive closing
25337 0 yes 1 24 0 0 0
25338 1 yes 0 25 1 0 0
25339 1 yes 0 25 0 0 1
25340 1 yes 0 25 0 0 1
Sum 3 1 99 1 0 2
Any thoughts on http conf tuning, OS patches, apache bug fixes appreciated.
Yes I have seen the open ASF bugzilla for the same error message.
This is a production server, so you can imagine, having it go down at random times (usually when I am asleep) is not fun!
We recently upgraded one of our webservers from PHP 5.3 (Debian Squeeze package, using libmysqlclient and APC) to PHP 5.4 (Debian Wheezy, Dotdeb package, using mysqlnd, Opcache and APCu). After working fine for almost one day, we experienced "mysql server has gone away" errors for every request. All other servers with the same load which still run PHP 5.3 with libmysqlclient using the same MySQL server had no problem at all. On all servers we use:
max_execution_time = 60
default_socket_timeout = 60
On our PHP 5.3 servers we did not change any mysql/my.cnf timeouts. We know about problems with read_timeout (mysql), wait_timeout (mysql), default_socket_timeout (php) and max_execution_time (php), but only in context of batch scripts with long running queries. Our webservers usually respond in about 300ms, so those timeouts should not be an issue here.
It became really strange when we removed the server from loadbalancing, so there was no load anymore, but we still had 180 busy Apache processes. Even a apache2ctl graceful did not change anything, even hours later apache2ctl status said:
Apache Server Status for localhost
Server Version: Apache/2.2.22 (Debian)
Server Built: Jun 16 2014 03:51:14
__________________________________________________________________
Current Time: Tuesday, 22-Jul-2014 10:17:44 CEST
Restart Time: Monday, 21-Jul-2014 18:43:37 CEST
Parent Server Generation: 26
Server uptime: 15 hours 34 minutes 6 seconds
Total accesses: 596973 - Total Traffic: 1.6 GB
CPU Usage: u6288.72 s463.96 cu.01 cs0 - 12% CPU load
10.7 requests/sec - 30.8 kB/second - 2962 B/request
176 requests currently being processed, 99 idle workers
GGGGGG_GGGGGGGGG_GG_GGGGGGGGGGGGGGGGGGGG_GGGGGG_GGGGGGGG_GGGGGGG
GGGGGGGGGG_G_GGGGGGGG_G_GG__GGGGGG_GGGGG_GGG___GG_GGGGGGGG_G_GGG
GGGGGGGGGGGG_G_GG__GG_GGG_GGGGGGGGG__GGG_GGG_G_G_GG_G_GGGGGGGGGG
GGG_GGG_GG_GGG_GG_G_GGG_______________.___._W___________________
____.___________.______.........................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
....
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process
Only apache2ctl restart solved the issue and everything worked fine again. The MySQL error is the only "useful" error message we found so far.
Could it be an issue with mysqlnd, opcache or apcu and PHP 5.4.30? Are there any known problems which could result in the behavior we have experienced?
Or do you have an idea how to debug the "mysql server has gone away" issue?
We probably found out why it comes to the "mysql server has gone away" error: On the MySQL server we configured a wait_timeout of 30 seconds, which is less than 60 seconds max_execution_time. So under certain conditions something seems to take more than 30 seconds while we are reading a result-set from MySQL, so the server closes the connection while we are still trying to get data from the server. That leads us to the next questions:
What function consumes so much time while we are in a loop reading a result-set from mysql?
Why does apache2ctl graceful not restart the Apache Processes, even when max_execution_time should abort the scripts after 60 seconds?
I think the answer to both questions is a bug in APCu. Because if I look at the hanging Apache childs, I get FUTEX_WAIT from strace:
[pid 28354] futex(0x7f3a8c3d2094, FUTEX_WAIT, 69, NULL <unfinished ...>
If I look at such a process using gdb it seems to hang at pthread_rwlock_wrlock(), what I get is:
0x00007f3adcd18abd in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
OK, pthread_rwlock is used for locking in APCu, and problems with locking mechanism is a really good explanation for what we see here, in that we definitely have code which reads/writes to APCu inside of loops through MySQL result-sets, and if there is a problem with locking (which already has been a problem for us in the past with APC as well) it could take >30 and <60 seconds so the MySQL error is what we see. And after that situation something with APCu goes really wrong, so the php script can't be aborted by max_execution_time and not be restarted by apache2ctl graceful anymore.
In APCu issue tracker I could find very similar issues:
https://github.com/krakjoe/apcu/issues/19
But we found another hint. The crash always happens, when there are about 70k keys in APCu, and it does not depend on apc.shm_size, but we found out that our APCu monitoring script produces "PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 78 bytes)" errors when calling apcu_cache_info() at line 47 at the same time when we see the crash. So we have to look into the script why it consumes so much memory, AFAIR id read's all the data for calculating memory fragmentation, perhaps we should remove that part...
But we had a lot of problems with APC in the past, we switched to APCu/Opcache only because we got seg faults with latest APC and PHP 5.4.30 and the issue mentioned above is open for one year now. We are happy to see recent activity on yac, perhaps lockless is a more stable option. If we can't fix by removing problems from our monitoring script, we will switch to local memcached instances, it will be slower but we know it's very stable.
CouchDB is very unpleasant for me. Niether documentation, nor tips could help me at all. The situation is like that:
FreeBSD 9.2 amd64
couchdb-1.5.0,2 installed from ports
npm couchapp
npm semver.
Started replication in CouchDB for node repo and amazing crashes are happening every several minutes. I wrote a script which tests process every 5 seconds:
13:40:53
13:48:11 7m42s [growing tendention]
13:56:09 7m58s
14:04:11 8m02s
14:12:23 8m12s
14:21:14 8m12s
14:30:08 8m54s
14:40:48 10m40s
14:57:13 16m35s [stop growing tendention]
15:08:29 11m16s
...
couch.log: (not always, sometimes nothing at all)
Tue, 06 May 2014 12:59:51 GMT] [error] [<0.134.0>] Error in replication `[REPLICATION_HASH]+continuous` (triggered by document `npmjs_repl`): timeout
Restarting replication in 40 seconds.
[info] [<0.372.0>] Replication `"[REPLICATION_HASH]+continuous"` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 30000 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
source start sequence 203628
[Tue, 06 May 2014 13:00:32 GMT] [info] [<0.372.0>] Replication `"[REPLICATION_HASH]7+continuous"` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 30000 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
source start sequence 203628
err.log (in every crash)
heart: Tue May 6 15:06:14 2014: heart-beat time-out, no activity for 13 seconds
heart: Tue May 6 15:06:16 2014: Executed "/usr/local/bin/couchdb -k" -> 0. Terminating.
heart_beat_kill_pid = 52979
heart_beat_timeout = 11
truss output:
...
kevent(3,0x0,0,{},256,{0.000000000 }) = 0 (0x0)
kevent(3,0x0,0,{},256,{0.000000000 }) = 0 (0x0)
kevent(3,0x0,0,{},256,{0.000000000 }) = 0 (0x0)
kevent(3,0x0,0,{},256,{0.000000000 }) = 0 (0x0)
kevent(3,0x0,0,{},256,{0.000000000 }) = 0 (0x0)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 17431527424 (0x40f000000)
SIGNAL 9 (SIGKILL)
process exit, rval = 0
Thanks for helping.
Can someone please walk me through the process of how I can load test my website using apache bench tool (ab)?
I want to know the following:
How many people per minute can the site handle?
Please walk me through the commands I should run to figure this out.
I tried every tutorial, and they are confusing.
The Apache benchmark tool is very basic, and while it will give you a solid idea of some performance, it is a bad idea to only depend on it if you plan to have your site exposed to serious stress in production.
Having said that, here's the most common and simplest parameters:
-c: ("Concurrency"). Indicates how many clients (people/users) will be hitting the site at the same time. While ab runs, there will be -c clients hitting the site. This is what actually decides the amount of stress your site will suffer during the benchmark.
-n: Indicates how many requests are going to be made. This just decides the length of the benchmark. A high -n value with a -c value that your server can support is a good idea to ensure that things don't break under sustained stress: it's not the same to support stress for 5 seconds than for 5 hours.
-k: This does the "KeepAlive" funcionality browsers do by nature. You don't need to pass a value for -k as it it "boolean" (meaning: it indicates that you desire for your test to use the Keep Alive header from HTTP and sustain the connection). Since browsers do this and you're likely to want to simulate the stress and flow that your site will have from browsers, it is recommended you do a benchmark with this.
The final argument is simply the host. By default it will hit http:// protocol if you don't specify it.
ab -k -c 350 -n 20000 example.com/
By issuing the command above, you will be hitting http://example.com/ with 350 simultaneous connections until 20 thousand requests are met. It will be done using the keep alive header.
After the process finishes the 20 thousand requests, you will receive feedback on stats. This will tell you how well the site performed under the stress you put it when using the parameters above.
For finding out how many people the site can handle at the same time, just see if the response times (means, min and max response times, failed requests, etc) are numbers your site can accept (different sites might desire different speeds). You can run the tool with different -c values until you hit the spot where you say "If I increase it, it starts to get failed requests and it breaks".
Depending on your website, you will expect an average number of requests per minute. This varies so much, you won't be able to simulate this with ab. However, think about it this way: If your average user will be hitting 5 requests per minute and the average response time that you find valid is 2 seconds, that means that 10 seconds out of a minute 1 user will be on requests, meaning only 1/6 of the time it will be hitting the site. This also means that if you have 6 users hitting the site with ab simultaneously, you are likely to have 36 users in simulation, even though your concurrency level (-c) is only 6.
This depends on the behavior you expect from your users using the site, but you can get it from "I expect my user to hit X requests per minute and I consider an average response time valid if it is 2 seconds". Then just modify your -c level until you are hitting 2 seconds of average response time (but make sure the max response time and stddev is still valid) and see how big you can make -c.
Please walk me through the commands I should run to figure this out.
The simplest test you can do is to perform 1000 requests, 10 at a time (which approximately simulates 10 concurrent users getting 100 pages each - over the length of the test).
ab -n 1000 -c 10 -k -H "Accept-Encoding: gzip, deflate" http://www.example.com/
-n 1000 is the number of requests to make.
-c 10 tells AB to do 10 requests at a time, instead of 1 request at a time, to better simulate concurrent visitors (vs. sequential visitors).
-k sends the KeepAlive header, which asks the web server to not shut down the connection after each request is done, but to instead keep reusing it.
I'm also sending the extra header Accept-Encoding: gzip, deflate because mod_deflate is almost always used to compress the text/html output 25%-75% - the effects of which should not be dismissed due to it's impact on the overall performance of the web server (i.e., can transfer 2x the data in the same amount of time, etc).
Results:
Benchmarking www.example.com (be patient)
Completed 100 requests
...
Finished 1000 requests
Server Software: Apache/2.4.10
Server Hostname: www.example.com
Server Port: 80
Document Path: /
Document Length: 428 bytes
Concurrency Level: 10
Time taken for tests: 1.420 seconds
Complete requests: 1000
Failed requests: 0
Keep-Alive requests: 995
Total transferred: 723778 bytes
HTML transferred: 428000 bytes
Requests per second: 704.23 [#/sec] (mean)
Time per request: 14.200 [ms] (mean)
Time per request: 1.420 [ms] (mean, across all concurrent requests)
Transfer rate: 497.76 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 5 14 7.5 12 77
Waiting: 5 14 7.5 12 77
Total: 5 14 7.5 12 77
Percentage of the requests served within a certain time (ms)
50% 12
66% 14
75% 15
80% 16
90% 24
95% 29
98% 36
99% 41
100% 77 (longest request)
For the simplest interpretation, ignore everything BUT this line:
Requests per second: 704.23 [#/sec] (mean)
Multiply that by 60, and you have your requests per minute.
To get real world results, you'll want to test Wordpress instead of some static HTML or index.php file because you need to know how everything performs together: including complex PHP code, and multiple MySQL queries...
For example here is the results of testing a fresh install of Wordpress on the same system and WAMP environment (I'm using WampDeveloper, but there are also Xampp, WampServer, and others)...
Requests per second: 18.68 [#/sec] (mean)
That's 37x slower now!
After the load test, there are a number of things you can do to improve the overall performance (Requests Per Second), and also make the web server more stable under greater load (e.g., increasing the -n and the -c tends to crash Apache), that you can read about here:
Load Testing Apache with AB (Apache Bench)
Steps to set up Apache Bench(AB) on windows (IMO - Recommended).
Step 1 - Install Xampp.
Step 2 - Open CMD.
Step 3 - Go to the apache bench destination (cd C:\xampp\apache\bin) from CMD
Step 4 - Paste the command (ab -n 100 -c 10 -k -H "Accept-Encoding: gzip, deflate" http://localhost:yourport/)
Step 5 - Wait for it. Your done
Load testing your API by using just ab is not enough. However, I think it's a great tool to give you a basic idea how your site is performant.
If you want to use the ab command in to test multiple API endpoints, with different data, all at the same time in background, you need to use "nohup" command. It runs any command even when you close the terminal.
I wrote a simple script that automates the whole process, feel free to use it: http://blog.ikvasnica.com/entry/load-test-multiple-api-endpoints-concurrently-use-this-simple-shell-script
I was also curious if I can measure the speed of my script with apache abs or a construct / destruct php measure script or a php extension.
the last two have failed for me: they are approximate.
after which I thought to try "ab" and "abs".
the command "ab -k -c 350 -n 20000 example.com/" is beautiful because it's all easier!
but did anyone think to "localhost" on any apache server for example www.apachefriends.org?
you should create a folder such as "bench" in root where you have 2 files: test "bench.php" and reference "void.php".
and then: benchmark it!
bench.php
<?php
for($i=1;$i<50000;$i++){
print ('qwertyuiopasdfghjklzxcvbnm1234567890');
}
?>
void.php
<?php
?>
on your Desktop you should use a .bat file(in Windows) like this:
bench.bat
"c:\xampp\apache\bin\abs.exe" -n 10000 http://localhost/bench/void.php
"c:\xampp\apache\bin\abs.exe" -n 10000 http://localhost/bench/bench.php
pause
Now if you pay attention closely ...
the void script isn't produce zero results !!!
SO THE CONCLUSION IS: from the second result the first result should be decreased!!!
here i got :
c:\xampp\htdocs\bench>"c:\xampp\apache\bin\abs.exe" -n 10000 http://localhost/bench/void.php
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software: Apache/2.4.33
Server Hostname: localhost
Server Port: 80
Document Path: /bench/void.php
Document Length: 0 bytes
Concurrency Level: 1
Time taken for tests: 11.219 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 2150000 bytes
HTML transferred: 0 bytes
Requests per second: 891.34 [#/sec] (mean)
Time per request: 1.122 [ms] (mean)
Time per request: 1.122 [ms] (mean, across all concurrent requests)
Transfer rate: 187.15 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 1
Processing: 0 1 0.9 1 17
Waiting: 0 1 0.9 1 17
Total: 0 1 0.9 1 17
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 1
95% 2
98% 2
99% 3
100% 17 (longest request)
c:\xampp\htdocs\bench>"c:\xampp\apache\bin\abs.exe" -n 10000 http://localhost/bench/bench.php
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software: Apache/2.4.33
Server Hostname: localhost
Server Port: 80
Document Path: /bench/bench.php
Document Length: 1799964 bytes
Concurrency Level: 1
Time taken for tests: 177.006 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 18001600000 bytes
HTML transferred: 17999640000 bytes
Requests per second: 56.50 [#/sec] (mean)
Time per request: 17.701 [ms] (mean)
Time per request: 17.701 [ms] (mean, across all concurrent requests)
Transfer rate: 99317.00 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 1
Processing: 12 17 3.2 17 90
Waiting: 0 1 1.1 1 26
Total: 13 18 3.2 18 90
Percentage of the requests served within a certain time (ms)
50% 18
66% 19
75% 19
80% 20
90% 21
95% 22
98% 23
99% 26
100% 90 (longest request)
c:\xampp\htdocs\bench>pause
Press any key to continue . . .
90-17= 73 the result i expect !
Steps to perform:
Commands For load testing
Step 1 - Install Xampp.
Step 2 - Open CMD.
Step 3 - Go to the Apache bench destination (cd C:\xampp\apache\bin) from CMD.
Step 4 - Paste the command
ab -n 100 -c 10 -k -H "Accept-Encoding: gzip, deflate" http://localhost:port/
Step 5 - Wait for it. You're done
Main Command Only “Get”
ab -k -c 25 -n 2000 http://192.168.1.113:3001/api/v1/filters/3
One of the best solution for post request load testing. Click here