mod_wsgi and Django app periodically hang - mod-wsgi

I've got a large Django app. It has two Apache virtual hosts pointing to different settings files, so part of the application is accessible via one URL, and part via another. The Django app uses virtualenv.
mod_wsgi is configured to run in daemon mode with the following in Apache's VirtualHost block:
# domain 1:
WSGIDaemonProcess fc processes=5 threads=5 display-name=%{GROUP} \
user=nobody group=nobody
WSGIScriptAlias / /var/www/python/mine/apache/my.wsgi \
process-group=fc application-group=%{GROUP}
# different apache.conf file for domain 2:
WSGIDaemonProcess fm processes=5 threads=5 display-name=%{GROUP} \
user=nobody group=nobody
WSGIScriptAlias / /var/www/python/mine/apache/other.wsgi \
process-group=fm application-group=%{GROUP}
Every now and again while using the sites, a request will hang. It never completes. I have to use the browser's 'refresh' button to reload the page, and then the request normally works.
Apache itself runs in prefork mode and MaxRequestsPerChild is set to 0 because I've read that could be a problem. This happens often enough for it to be a potential problem - every 100 requests perhaps, something like that.
Has anyone got any idea why this is happening?
Thanks

That should be 'application-group=%{GLOBAL}' for WSGIDaemonProcess options, not '%{GROUP}'. The '%{GLOBAL}' is special and means the main Python interpreter. Using the main interpreter often gets around problems with third party C extension modules for Python which don't work in sub interpreters, including experiencing deadlocks. The '%{GROUP}' value is only relevant to the 'display-name' option.

Related

mod_wsgi keeps restarting flask app [duplicate]

This question already has an answer here:
Why not generate the secret key every time Flask starts?
(1 answer)
Closed 5 years ago.
From Flask's documentation, I have the following in my config:
<VirtualHost *>
ServerName example.com
WSGIDaemonProcess yourapplication user=user1 group=group1 threads=5
WSGIScriptAlias / /var/www/yourapplication/yourapplication.wsgi
<Directory /var/www/yourapplication>
WSGIProcessGroup yourapplication
WSGIApplicationGroup %{GLOBAL}
Order deny,allow
Allow from all
</Directory>
</VirtualHost>
In my .wsgi file, I import the proper python file and import the flask app as application. Everything works fine, but I added logging to that file because I suspected something was wrong. Apparently, that wsgi file gets called every so often whenever a browser makes connection. It restarts the app (or at least a new process). I never noticed this, nor did I see it as a problem until I imported flask-login to manage authenticated sessions. Now whenever I login, after some short time, the wsgi app is reloaded and the session history no longer exists. In effect, I have to login every few seconds. Is this the intended way mod_wsgi works? I've tested my flask app running in standalone mode (flask's own devel server) and it works flawlessly.
In a way it's a duplicate, but it's also not. The server code isn't buggy. It's just mod_wsgi restarts the application over and over. Thanks for linking to the other post, though!
So what I found is that wsgi does restart the application every so often. I suppose this is expected behavior, but it's not what I expected. My issue with getting logged out was caused because I generate the app's secret key on startup. Therefore, the secret key was being changed constantly. Obviously this invalidates the cookies and logs a user out. So, I guess if you want to generate a secret key and not just have plain text in your source, you need to generate it once externally and import it into your flask application so that it doesn't always change.

Apache: include htaccess in conf with AllowOverride None, better performance?

Suppose we have the /home/example.org/public_html/ directory on the filesystem, which serves as the document root of my virtualhost.
The relevant httpd configuration for that vhost would look like this:
<VirtualHost *:80>
ServerName example.org:80
...
DocumentRoot /home/example.org/public_html
<Directory /home/example.org/public_html>
AllowOverride All
...
</Directory>
...
</VirtualHost>
In order to prevent the htaccess lookups on the filesystem without losing the htaccess functionality – at least at the DocumentRoot level- I transformed the configuration to the following:
<VirtualHost *:80>
ServerName example.org:80
...
DocumentRoot /home/example.org/public_html
<Directory /home/example.org/public_html>
AllowOverride None
Include /home/example.org/public_html/.htaccess
...
</Directory>
...
</VirtualHost>
Difference
AllowOverride None
Include /home/example.org/public_html/.htaccess
Let’s see what we have accomplished with this:
httpd does not waste any time looking for and parsing htaccess files
resulting in faster request processing
Questions:
Using Include directive, Apache load htaccess only on service start or for each request?
If point 1 it's true, how do refresh apache conf without httpd.exe -k restart?
Firstly, note that checking for .htaccess is commonly not all that big an issue, since the relevant bits of the disk are cached in memory. It becomes an issue where for example you have a very large number of directories under your web root directory or directories, and the hits are scattered amongst them so that the hit rate on cached disk blocks is low. You might be better dealing with that by disabling .htaccess selectively for directory trees where it creates a problem. Parsing the .htaccess directives creates a little CPU load of course, but CPU should generally not be your server's bottleneck.
Answering your question as posed though; Yes, you will need to run a command as root to load the new configuration. Rather than using restart though, use reload or (better) graceful.
httpd.exe -k graceful
You could (but probably shouldn't) write a cron job to periodically check whether this needs to be run. Without a lot of testing, I think something like this should work, run as a regular root cron job:
#!/bin/bash
[ /var/run/httpd/http.pid -nt /home/example.org/public_html/.htaccess ] \
&& httpd.exe -k graceful
This creates a bit of disk load itself of course. This load doesn't increase with traffic volume, but might be an issue if you have many such included files.
SECURITY WARNING: It sounds like you are setting up a situation where a non root user is likely to be able to get Apache to Include directives at will. This is much more powerful than what can be done with a .htaccess file, and amounts to a root exploit. E.g. it gives access to things like the User and LoadModule directives, which .htaccess directives can never do.
I recommend that you should put Included directives in a file inside your Apache configuration directory, and have it accessible only by root. There are other ways to make sure that only root can edit the .htaccess file, but getting these files out of the user-owned area makes it less likely you'll inadvertently open access again later.
While the .htaccess mechanism does incur extra disk load, it is the mechanism that's designed for use by non-root users. It would be nice to have a mechanism for untrusted users to modify configuration with a limit on how often the .htaccess file would be checked for, but if it exists, I don't know it.
Apache accesses and processes the htaccess files on each request. This is why one does not need to restart the server every time to check their current configurations.
You do need to restart the server/service for testing any changes made to apache.conf, httpd.conf or the vhost configurations.
Quoting from Apache's tutorial on htaccess file:
You should avoid using .htaccess files completely if you have access
to httpd main server config file. Using .htaccess files slows down
your Apache http server. Any directive that you can include in a
.htaccess file is better set in a Directory block, as it will have
the same effect with better performance.
Since you already are trying to Include the htaccess from inside a <Directory> module block, the performance would be better if you include everything from the file to this block itself instead. There is, although no difference; apart from having to maintain configurations in two places simultaneously.
The htaccess file will get processed just once, at the time of server start.

how to improve the performance of Apache with mod_wsgi?

Use Apache/2.4.12(Unix) and mod_wsgi-4.4.11 and blow configuration of apache/conf/extra:
//httpd-mpm.conf
<IfModule mpm_worker_module>
StartServers 3
MinSpareThreads 75
MaxSpareThreads 250
ThreadsPerChild 25
MaxRequestWorkers 400
MaxConnectionsPerChild 0
</IfModule>
//httpd-vhosts.conf
WSGIRestrictEmbedded On
<VirtualHost *:443>
ServerName form.xxx.com
WSGIScriptAlias / /usr/local/apache/services/form/form.wsgi
WSGIDaemonProcess paymentform user=test processes=10 threads=5 display-name=%{GROUP} maximum-requests=100
WSGIApplicationGroup %{RESOURCE}
WSGIProcessGroup form
DocumentRoot /usr/local/apache/services/form
SSLEngine On
//any certification files
<Directory /usr/local/apache/services/form>
Require all granted
</Directory>
</VirtualHost>
In this configuration, I use Apache jmeter for testing.
GET : form.xxx.com //only return "index" string
Number of Threads(users):100
Ramp-up Period : 0
Loop count : 10
But result is..
samples: 1000
Average: 3069
Min : 13
Max : 22426
Std.Dev: 6671.693614549157
Error %: 10.0%
Throughput : 24.1/sec
KB/sec : 10.06/sec
AvgBytes : 428.5
During testing, raise connection refused or connection timeout and stop receving requests in 400~500 requests. Server cpu or memory is not full.
How to improve performance?
fix mpm worker configuration? or fix WSGI configuration in httpd-vhosts?
I modify httpd-mpm.conf below, but no difference.
<IfModule mpm_worker_module>
StartServers 10
ServerLimit 32
MinSpareThreads 75
MaxSpareThreads 250
ThreadsPerChild 25
MaxRequestWorkers 800
MaxConnectionsPerChild 0
</IfModule>
You have a number of things which are wrong in your configuration. One may be a cut and paste error. Another is a potential security issue. And one will badly affect performance.
The first is that you have:
WSGIProcessGroup form
If that is really want you have, then the web request wouldn't even be getting to the WSGI application and should return a 500 error response. If it isn't giving an error, then your request is being delegated to a mod_wsgi daemon process group not even mentioned in the above configuration. This would all come about as the value to WSGIProcessGroup doesn't match the name of the defined daemon process group specified by the WSGIDaemonProcess directive.
What you would have to have is:
WSGIProcessGroup paymentform
I suspect you have simply mucked up the configuration when you pasted it in to the question.
A related issue with delegation is that you have:
WSGIApplicationGroup %{RESOURCE}
This is what the default is anyway. There would usually never be a need to set it explicitly. What one would normally use if only delegating one WSGI application to a daemon process group is:
WSGIApplicationGroup %{GLOBAL}
This particular value forces the use of the main Python interpreter context of each process which avoids problems with some third party extension modules that will not work properly in sub interpreter contexts.
The second issue is a potential security issue. You have:
DocumentRoot /usr/local/apache/services/form
When using WSGIScriptAlias directive, there is no need to set DocumentRoot to be a parent directory of where your WSGI script file or source code for your application is.
The danger in doing this is that if WSGIScriptAlias was accidentally disabled, or changed to a sub URL, all your source code then becomes downloadable.
In short, let DocumentRoot default to the empty default directory for the whole server, or create an empty directory just for the VirtualHost and set it to that.
The final thing and which would drastically affect your performance is the use of maximum-requests option to WSGIDaemonProcess. You should never use maximum-requests in a production system unless you understand the implications and have a specific temporary need.
Setting this value and to a low value, means that the daemon processes will be killed off and restarted every 100 requests. Under a high volume of requests as with a benchmark, you would be constantly restarting your application processes.
The result of this would be increased CPU load and much slower response times, with potential for backlogging to the extent of very long response times due to overloading the server due to everything restarting all the time.
So, absolute first thing you should do is remove maximum-requests and you should see some immediate improvement.
You also have issues with process restarts in your Apache MPM settings. It is not as major as this only affects the Apache worker processes which are proxying requests, but it will also cause extra CPU usage, plus a potential need for a higher number of worker processes being required.
I have talked about the issue of Apache process churn due to MPM settings before in:
http://lanyrd.com/2013/pycon/scdyzk/
One final problem with your benchmarking is that your test, if all it is returning is the 'index' string from some simple hello world type program, is that it bears no relationship to your real world application.
Real applications are not usually so simple and time within the WSGI application is going to be much more due to template rendering, database access etc etc. This means the performance profile of a real application is going to be completely different and changes how you should configure the server.
In other words, testing with a hello world program is going to give you the completely wrong idea of what you need to do to configure the server appropriately. You really need to understand what the real performance profile of your application is under normal traffic loads and work from there. That is, hammering the server to the point of breaking is also wrong and not realistic.
I have been blogging on my blog site recently about how typical hello world tests people use are wrong, and give some examples of specific tests which show out how the performance of different WSGI servers and configurations can be markedly different. The point of that is to show that you can't base things off one simple test and you do need to understand what your WSGI application is doing.
In all of this, to really truly understand what is going on and how to tune the server properly, you need to use a performance monitoring solution which is built into the WSGI server and so can give insights into the different aspects of how it works and therefore what knobs to adjust. The blog posts are covering this also.
I encountered a similar problem as ash84 described, I used jmeter to test the performance and found the error % becomes non-zero when the jmeter thread number is set beyond some value (50 in my case).
After I watched Graham Dumpleton's talk, I realized it happens mainly because there are not enough spare MPM threads prepared for serving the upcoming burst jmeter requests. In this case, some jmeter requests are not served in the beginning, even though the number of MPM threads catched up later.
In short, setting MinSpareThreads to a larger value fixed my problem, I raised jmeter threads from 50 to 100 and get 0% error.
MinSpareThreads 120
MaxSpareThreads 150
MaxRequestWorkers 200
The number of WSGIDaemonProcess processes times the number of WSGIDaemonProcess threads doesn't have to be greater than the number of jmeter threads. But you may need to set them to higher values to make sure WSGIDaemonProcess could handle the requests quickly enough.

Apache always get 403 permisson after changing DocumentRoot

I'm just a newbie for Apache. I just installed apache 2.2 on the FreeBSD box at my home office. The instruction on FreeBSD documentation is that I can change the DocumentRoot directive in order to use the customized directory data. Therefore, I replaced...
/usr/local/www/apache22/data
with
/usr/home/some_user/public_html
but something is not right. There's index.html file inside the directory, but it seems that apache could not read the directory/file.
Forbidden
You don't have permission to access / on this server.
The permission of
public_html
is
drwxr-xr-x
I wonder what could be wrong here. Also, in my case, I am not going to host more than one website for this FreeBSD box, so I didn't look at using VirtualHost at all. Is this a good practice just to change the DirectoryRoot directive?
Somewhere in the apache config is a line like:
# This should be changed to whatever you set DocumentRoot to.
#
<Directory "/usr/local/www/apache22/data">
You must change this path too, to make it work. This directive contains for example:
Order allow,deny
Allow from all
Which give initial user access to the directory.
one possibility that comes to mind is SELinux blocking web process from accessing that folder. If this is the case, you would see it in selinux log. You would have to check the context for your original web root with:
ls -Zl
and then apply it to your new web folder:
chcon whatevercontextyousaw public_html
Or, instead, if its not a production server that requires security (like a development machine behind a firewall), you might want to just turn selinux off.
Just one idea. Could be a number of other things.

Can you disable apache logs for a single site using htaccess or in the Virtual Host settings?

I'm working on a web site where the client doesn't want ANY logging on the site for privacy reasons. The site will be hosted on the same Apache Web Server as a number of other websites which is why I can just turn logging off in Apache. Is there some way to disable logging for an individual site using htaccess rules or by adding something to the VirtualHost settings?
The options seem to be
Sending to /dev/null on *nix or C:/nul on Windows (see here)
Removing the base logging directives and duplicating them in each vhost (so there is no logging on for vhosts by default)
Seems like there should be some better way to do this, but that's what I've found.
Yes, just comment out (using a '#') the ErrorLog and CustomLog entries in the httpd conf for your virtual host.
http://www.mydigitallife.info/how-to-disable-and-turn-off-apache-httpd-access-and-error-log/
I achieve this by making the logging dependent on a non-existing environment variable. So in the VirtualHost you can have:
CustomLog /var/log/httpd/my_access_log combined env=DISABLED
and so long as there is no environment variable called DISABLED then you'll get no logs.
I actually arrived here looking for a neater solution but this works without having to change the global httpd.conf.
Edit: removed reference to .htaccess because CustomLog only applies in the global config or in the virtual host config as pointed out by #Basj