Share Python Interpreter in Apache Prefork / WSGI - apache

I am attempting to run a Python application within Apache (prefork) with WSGI in such a way that a single Python interpreter will be used. This is necessary since the application uses thread synchronization to prevent race conditions from occurring. Since Apache prefork spawns multiple processes, the code winds up not being shared between the interpreters and thus the thread synchronization is irrelevant (i.e. each thread only sees it own locks which have no bearing on the other processes).
Here is the setup:
Apache 2.0 (prefork)
WSGI
Python 2.5
Here is the relevant Apache configuration:
WSGIApplicationGroup %{GLOBAL}
<VirtualHost _default_:80>
WSGIScriptAlias / /var/convergedsecurity/apache/osvm.wsgi
Alias /admin_media/ /var/www/html/admin_media/
<Directory /var/www/html/admin_media>
Order deny,allow
Allow from all
</Directory>
Alias /media/ /var/www/html/media/
<Directory /var/www/html/media>
Order deny,allow
Allow from all
</Directory>
</VirtualHost>
Here is what I tried so far (none of which worked):
Adding WSGIApplicationGroup %{GLOBAL}
Specifying WSGIDaemonProcess and WSGIProcessGroup within the virtual host:
WSGIDaemonProcess osvm threads=50
WSGIProcessGroup osvm
Is there no way to force Apache prefork to use a single Python interpreter with WSGI? The documents seem to imply you can with the WSGIDaemonProcess and WSGIApplicationGroup options but Apache still creates a separate Python interpreter for each process.

You can't have the WSGI application run in embedded mode on UNIX systems, whether it be prefork or worker MPM, as there will indeed be multiple processes. See:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Creating a daemon process group consisting of single process and delegating WSGI application to that should achieve what you want. You shouldn't even need to use WSGIApplicationGroup if it is only one mounted WSGI application you are talking about. If you want to be absolutely sure though, you can also set it.
Thus configuration within VirtualHost would be:
WSGIDaemonProcess osvm
WSGIProcessGroup osvm
WSGIApplicationGroup %{GLOBAL}
WSGIScriptAlias / /var/convergedsecurity/apache/osvm.wsgi
Although 'processes=1' for WSGIDaemonProcess makes it explicit that one process is created, don't provide the option though and just let it default to one process. Any use of 'processes' option, even if for one process will see 'wsgi.multiprocess' set to True.
Rather than use your actual WSGI application, I would suggest you test with the following simple test program.
import cStringIO
import os
def application(environ, start_response):
headers = []
headers.append(('Content-Type', 'text/plain'))
write = start_response('200 OK', headers)
input = environ['wsgi.input']
output = cStringIO.StringIO()
print >> output, "PID: %s" % os.getpid()
print >> output
keys = environ.keys()
keys.sort()
for key in keys:
print >> output, '%s: %s' % (key, repr(environ[key]))
print >> output
output.write(input.read(int(environ.get('CONTENT_LENGTH', '0'))))
return [output.getvalue()]
In the output of that, the PID value should always be the same. The wsgi.multiprocess flag should be False. The mod_wsgi.process_group value should be what ever you called the daemon process group. And the mod_wsgi.application_group should be an empty string.
If this isn't what you are seeing, ensure you actually restarted Apache after making configuration changes. Also add:
LogLevel debug
to Apache configuration for VirtualHost. Doing that will cause mod_wsgi to log a lot more messages in Apache error log about process creation and script loading, including details of process group and application group things are happening for.
For other information on debugging, see:
http://code.google.com/p/modwsgi/wiki/DebuggingTechniques
If still problems, suggest you go to the mod_wsgi mailing list on Google Groups.

Related

mod_wsgi and WSGIScript directive

On my virtual server configuration I have this:
DocumentRoot /var/www/project/app/
and also I have this directive:
WSGIScriptAlias / /var/www/project/app/wsgi.py
from mod_wsgi documentation: "avoid placing WSGI scripts under the DocumentRoot in order to avoid accidentally revealing their source code if the configuration is ever changed"
It's clear to me that I must delete the DocumentRoot directive here! I just want to know how it is possible to reveal the code of my wsgi.py file. What kind of request could have a response with that file ?
Change:
WSGIScriptAlias / /var/www/project/app/wsgi.py
to:
WSGIScriptAlias /suburl /var/www/project/app/wsgi.py
Restart Apache and then visit /wsgi.py. It will download and show you your source code.
There is usually no reason to set DocumentRoot to be the directory your WSGI script file is in when using WSGIScriptAlias. By doing it when you don't need to, you are one step away from making your code available if you decided to change your configuration to mount the application at a sub URL and didn't understand the implications of it.
Since it isn't necessary, just don't expose yourself to the extra risk.

Apache mod_wsgi flask crashes after about a minute

I have an Amazon EC2 server running Apache 2.4. I am running one website on there using Python and regular CGI, and then another virtual host using mod_wsgi and an index.wsgi script. When I use a default WSGI callable class object script in my wsgi file, it works fine. However, if I use a WSGI-compatible framework like Flask or Bottle, it loads and works perfectly for about a minute, and then suddenly gives an error 503 ON BOTH OF MY SITES. Even if I change my script back to the default, this error persists for about 5 minutes and then it starts working again. I am using mod_wsgi with the usual daemon mode. Please help. I am using RedHat Linux, Apache 2.4, Python 2.7, and the latest flask and mod_wsgi.
EDIT: Here's my site-specific apache .conf file
<VirtualHost *:80>
ServerName ihave.nolife.lol
WSGIScriptAlias / /var/www/ihave/index.wsgi
WSGIDaemonProcess ihave user=apache group=apache processes=1 threads=5
<Directory /var/www/ihave>
Require all granted
WSGIProcessGroup ihave
WSGIApplicationGroup %{GLOBAL}
</Directory>
ErrorLog /var/www/html/ihave/errorlog
LogLevel debug
CustomLog /var/www/html/ihave/requests combined
Not enough information. But at a guess it is because you are using some third party Python packages which use a C extension module which will not work in sub interpreters. Read the following and set that directive. Also recommended that you make sure you are using daemon mode and not embedded mode.
http://modwsgi.readthedocs.io/en/develop/user-guides/application-issues.html#python-simplified-gil-state-api

How can I make Flask and Apache (mod_wsgi) update my database queries on each visit to a page?

In my Flask application I have a def that queries a database. When I changed the file, the SQL, the results did not show up on the webpage. When I stopped and started Apache, service apache2 restart (on Debian 7), then the new query results showed up.
I am running my WSGI process in daemon mode using mod_wsgi, v. 3.3, Apache 2.2.
I am not using SQLAlchemy or any other ORM, straight up SQL with a pymssql connect statement.
I am using Blueprints.
If I touch the .wsgi file, Apache will load the results as expected.
I am not sure how Flask-Cache can help me (or any other Flask module).
WSGIDaemonProcess myapp python-path=/var/www/intranet/application/flask:/var/www/intranet/application/flask/lib/python2.7/site-packages
WSGIProcessGroup myapp
WSGIScriptAlias /myapp/var/www/intranet/intranet.wsgi
<Directory /var/www/intranet>
WSGIApplicationGroup %{GLOBAL}
Order allow,deny
Allow from all
</Directory>
<Location />
Options FollowSymLinks
AllowOverride None
order allow,deny
allow from all
AuthType Basic
AuthName "Subversion Repository"
Require valid-user
AuthUserFile /etc/apache2/dav_svn.passwd
<IfModule mod_php4.c>
php_flag magic_quotes_gpc Off
php_flag track_vars On
</IfModule>
I have read much of this, https://code.google.com/p/modwsgi/wiki/ReloadingSourceCode, but I do not know if this is something Flask may already have built in for production.
How can I make a code change take effect without restarting Apache?
Edit: My query is not in the .wsgi file.
What I ended up doing was use a post-receive hook in my --bare directory.
I started from here:
http://krisjordan.com/essays/setting-up-push-to-deploy-with-git
and added a touch to the end of it. Here is what I did:
#!/usr/bin/ruby
#Changed shebang a little from the website version for mine, Debian 7.
# post-receive
#johnny
require 'fileutils'
#
# 1. Read STDIN (Format: "from_commit to_commit branch_name")
from, to, branch = ARGF.read.split " "
# 2. Only deploy if master branch was pushed
if (branch =~ /master$/) == nil
puts "Received branch #{branch}, not deploying."
exit
end
# 3. Copy files to deploy directory
deploy_to_dir = File.expand_path('../deploy')
`GIT_WORK_TREE="#{deploy_to_dir}" git checkout -f master`
puts "DEPLOY: master(#{to}) copied to '#{deploy_to_dir}'"
# 4.TODO: Deployment Tasks
# i.e.: Run Puppet Apply, Restart Daemons, etc
#johnny
FileUtils.touch('/path/to/my/file.wsgi')
I commit:
git commit -a -m'my commit message'
then,
git push production master
After much reading most people do not seem to like the auto update. Where I work, they need to see things immediately. Most things are database reads or static templates, so I don't mind using the "auto" touch for this particular application.

CSRF token mismatch in Apache Flask due to session reset

I have an example of a CSRF protected form that runs perfectly in the development environment (Flask runs the server itself with app.run) but fails when I run the app via mod_wsgi in Apache. The versions I use are:
Server version: Apache/2.4.4 (Unix)
Python 2.7.3
Flask==0.10.1
Flask-WTF==0.9.5
WTForms==2.0
Flask-KVSession==0.4
simplekv==0.8.4
The reason that it fails is a csrf_token mismatch during form validation. I log the contents of the flask.session and flask.request.form at the beginning of the view and the contents of the session again at the end of the view. In development mode the content of the csrf_token in the session stays constant across multiple requests, for example,
<KVSession {'csrf_token': '79918c1e3191e4d4fe89a9499f576404a18be8e4'}>
The contents of the form are transmitted correctly in both cases, e.g.,
ImmutableMultiDict([('csrf_token', u'1403778775.86##34f1447f1b8c78808f4e71f2ff037bcd1df41dcd'),
('time', u'8'), ('submit', u'Go'), ('dose', u'Low')])
When I run my app via Apache the session contents are reset with each request. At the beginning of the view the session contents are empty:
<KVSession {}>
and then a new token is set each time which leads to the mismatch. Currently, my __init__.py module looks as follows:
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
from simplekv.memory import DictStore
from flaskext.kvsession import KVSessionExtension
app = Flask(__name__)
app.config.from_object("myapp.config.Config")
db = SQLAlchemy(app)
store = DictStore()
KVSessionExtension(store, app)
from . import views
I removed the KVSession statements and that didn't change the problem. So I think server side sessions are not the culprit.
And yes, I have set the SECRET_KEY to os.urandom(128) in the config.
The relevant (I think) section of my httpd.conf is:
Listen url.com:8090
<VirtualHost url.com:8090>
# --- Configure VirtualHost ---
LogLevel debug
ServerName url.com
DocumentRoot /path/to/flaskapp/htdocs
<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>
<Directory /path/to/flaskapp/htdocs/>
Options Indexes FollowSymLinks MultiViews
AllowOverride None
Require all granted
</Directory>
# --- Configure WSGI Listening App(s) ---
WSGIDaemonProcess mysite user=me group=us processes=2 threads=10
WSGIScriptAlias / /path/to/flaskapp/wsgi/wsgi.py
<Directory /path/to/flaskapp/wsgi/>
WSGIProcessGroup mysite
WSGIApplicationGroup %{GLOBAL}
WSGIScriptReloading On
Require all granted
</Directory>
# --- Configure Static Files ---
Alias /static/ /path/to/flaskapp/htdocs/static/
Alias /tmp/ /path/to/flaskapp/htdocs/tmp/
</VirtualHost>
Does anyone know about Apache settings or mod_wsgi with Flask interactions that could cause the session not to persist between requests?
What happens here is that you store your sessions using Flask-KVSession, and provide a memory based DictStore as a storage:
from simplekv.memory import DictStore
store = DictStore()
KVSessionExtension(store, app)
Root cause
In a single-threaded environment, this will work. However, when multiple processes comes into play, they do not share the same memory, and multiple instances of DictStore are created, one per process. As a result, when two subsequent requests are served by two different processes, first request will not be able to pass session changes to a next request.
Or, even shorter: Two processes = two CSRF tokens. Not good.
Solution
Use a persistent storage. This is what I use:
def configure_session(app):
with app.app_context():
if config['other']['local_debug']:
store = simplekv.memory.DictStore()
else:
store = simplekv.db.sql.SQLAlchemyStore(engine, metadata, 'sessions')
# Attach session store
flask_kvsession.KVSessionExtension(store, app)

mod_wsgi (Daemon mode) is not reloading the the sourcecode

I've read through the docs, and it seems clear.
I have 2 multi-threaded mod_wsgi processes. Normally I just touch the wsgi script and the source code is reloaded. But periodically, changes aren't reloaded, and the problem persists for a few hours. I don't understand what happens to cause it to stop reloading changes, nor what caused it to start reloading again when I've had the problem in the past.
I've tried killing the mod_wsgi processes, but it made no difference. I cannot restart apache myself.
What else can I do to try to force a reload?
How can I prevent this from continuing to happen?
Here is the wsgi configuration:
WSGIScriptAlias /ms20 /var/www-dev/wsgi-scripts/ms20.wsgi
WSGIDaemonProcess ms20 user=glpp group=glab processes=2 display-name=%{GROUP}
WSGIProcessGroup ms20
<Directory "/var/www-dev/wsgi-scripts">
Order allow,deny
Allow from all
</Directory>
You did run the tests in the documentation to validate that requests are handled in the daemon process?
Use the display-name option to WSGIDaemonProcess so you can validate using 'ps' that only the mod_wsgi daemon processes are using a lot of memory and not all the Apache 'httpd' processes. It is possible that your VirtualHost configuration is wrong and your WSGI application is running in embedded mode.
http://code.google.com/p/modwsgi/wiki/ConfigurationDirectives#WSGIDaemonProcess