Apache, LDAP and WSGI encoding issue - apache

I am using Apache 2.4.7 with mod_wsgi 3.4 on Ubuntu 14.04.2 (x86_64) and python 3.4.0. My python app relies on apache to perform user authentication against our company’s LDAP server (MS Active Directory 2008). It also passes some additional LDAP data to the python app using the OS environment. In the apache config, I query the LDAP like so:
…
AuthLDAPURL "ldap://server:389/DC=company,DC=lokal?sAMAccountName,sn,givenName,mail,memberOf?sub?(objectClass=*)"
AuthLDAPBindDN …
AuthLDAPBindPassword …
AuthLDAPRemoteUserAttribute sAMAccountName
AuthLDAPAuthorizePrefix AUTHENTICATE_
…
This passes some user data to my WSGI script where I handle the info as follows:
# Make sure the packages from the virtualenv are found
import site
site.addsitedir('/home/user/.virtualenvs/ispot-cons/lib/python3.4/site-packages')
# Patch path for app (so that libispot can be found)
import sys
sys.path.insert(0, '/var/www/my-app/')
import os
from libispot.web import app as _application
def application(environ, start_response):
os.environ['REMOTE_USER'] = environ.get('REMOTE_USER', "")
os.environ['REMOTE_USER_FIRST_NAME'] = environ.get('AUTHENTICATE_GIVENNAME', "")
os.environ['REMOTE_USER_LAST_NAME'] = environ.get('AUTHENTICATE_SN', "")
os.environ['REMOTE_USER_EMAIL'] = environ.get('AUTHENTICATE_MAIL', "")
os.environ['REMOTE_USER_GROUPS'] = environ.get('AUTHENTICATE_MEMBEROF', "")
return _application(environ, start_response)
I can then access this info in my python app using os.environ.get(…). (BTW: If you have a more elegant solution, please let me know!)
The problem is that some of the user names contain special characters (German umlauts, e.g., äöüÄÖÜ) that are not encoded correctly. So, for example, the name Tölle arrives in my python app as Tölle.
Obviously, this is an encoding problem, because
$ echo "Tölle" | iconv --from utf-8 --to latin1
gives me the correct Tölle.
Another observation that might help: in my apache logs I found the character ü represented as \xc3\x83\xc2\xbc.
I told my Apache in /etc/apache2/envvars to use LANG=de_DE.UTF-8 and python 3 is utf-8 aware as well. I can’t seem to specify anything about my LDAP server. So my question is: where is the encoding getting mixed up and how do I mend it?

It is bad practice to copy the values to os.environ on each request as this will fail miserable if the WSGI server is running with a multithreaded configuration, with concurrent requests interfering with each other. Look at thread locals instead.
As to the issue of encoded data from LDAP, if I under stand the problem, you would need to do:
"Tölle".encode('latin-1').decode('utf-8')

Related

apache + mod_perl + couchbase = occasional connection problems

We use couchbase as session storage for mod_perl scripts. To avoid delays on clients caused by waiting for a new connection we do preconnect to couchbase on child_init apache stage. So during apache restart / new child creation it connects to couchbase automatically and later use that connection during apche child lifetime.
Generally everything works fine, but sometimes we got the following errors during that preconnection:
Couldn't connect: 0x13 (Operation not supported) at /perl/lib64/perl5/Couchbase/Bucket.pm line 38.
Usually it appears during apache restart and on several (or dozens) of childs, and almost never on one child only. Usually restarting apache again solves the problem.
What can cause such a problems? Is it a problem with code / server configuration / couchbase server itself?
May be it caused somehow with a lot of reconnections at the same time? Some ulimits stuff / or selinux restrictions?
UPD: versions
OS:
Centos 6, 2.6.32-358.2.1.el6.x86_64
libcouchbase:
libcouchbase-devel.x86_64 2.4.7-1.el6
libcouchbase2-core.x86_64 2.4.7-1.el6
libcouchbase2-libevent.x86_64 2.4.7-1.el6
couchbase server:
2.2.0 community edition (build-837)
SDK:
perl (Couchbase::Core v2.0.2)
connection code (isolated & simplified):
# in mod_perl environment
use Couchbase;
use Couchbase::Bucket;
use Couchbase::Document;
use Apache2::ServerUtil ();
my $cb = undef;
# connection handler, initialized once, used during apache child lifetime
sub connect_couchbase_on_child_init {
my ($child_pool, $s) = #_;
my $dsn = 'couchbase://192.168.0.1,192.168.0.2/my_bucket_name?detailed_errcodes=1';
eval { $cb = Couchbase::Bucket->new($dsn); };
# here we get the occasional warnings during apache restarts
if ($#) { warn "COUCHBASE CONNECTION ERROR! $#"; $cb = undef; }
return Apache2::Const::OK;
}
Apache2::ServerUtil->server->push_handlers(PerlChildInitHandler => \&connect_couchbase_on_child_init);
# in request handlers it used with the following calls (only if connected):
# $doc = Couchbase::Document->new($key);
# $cb->get($doc);
# ...
# $cb->replace($doc);
# ...
# $cb->insert($doc);
# ...
# $cb->remove($doc);
Because you are using server 2.2.0 and because this seems to happen when you are connecting many clients at once, my theory is that you are receiving the last error from the server. The current client bootstrap process attempts using bootstrap over memcached (which is only supported from version >= 2.5.0 of the server), that fails and it attempts to use 'terse' bootstrapping (again, only supported on >= 2.5.0 of the server) and finally 'classic' HTTP (which is available on all versions).
Add the following options to your DSN/connection string to cut out some of the steps for your server. Note that should you ever upgrade to >= 2.5 these options should be removed:
bootstrap_on=http Does not try memcached bootstrap
http_urlmode=2 Uses the pre-2.5 style of bootstrapping by default
These two options will not necessarily fix your issue, but they will at least cut out some of the initial connection time, and perhaps show a clearer reason for the error (you can also set LCB_LOGLEVEL=5 in the environment to get actual logging).
In your case, the connection string would be:
couchbase://192.168.0.1,192.168.0.2/my_bucket_name?detailed_errcodes=1&bootstrap_on=http&http_urlmode=2

Install graphite with apache .4 on ubuntu 14 error

mod_wsgi Exception occurred processing WSGI script '/usr/share/graphite-web/graphite.wsgi'
I copied only apache-graphite.conf to /etc/apache/sites-available, why does it complain about graphite.wsgi?
Content of apache-graphite.conf:
import os, sys
os.environ['DJANGO_SETTINGS_MODULE'] = 'graphite.settings'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
from graphite.logger import log
log.info("graphite.wsgi - pid %d - reloading search index" % os.getpid())
import graphite.metrics.search
graphite.wsgi is the wsgi application callled by your apache webserver to answer incoming requests.
The apache-graphite.conf site defines a wsgi application running django which will process requests using graphite code. I guess it looks more like this : https://github.com/graphite-project/graphite-web/blob/0.9.x/examples/example-graphite-vhost.conf
graphite.wsgi usually looks like : https://github.com/graphite-project/graphite-web/blob/0.9.x/conf/graphite.wsgi.example

How to prevent Gunicorn from returning a 'Server' http header?

I would like to mask the version or remove the header altogether.
To change the 'Server:' http header, in your conf.py file:
import gunicorn
gunicorn.SERVER_SOFTWARE = 'Microsoft-IIS/6.0'
And use an invocation along the lines of gunicorn -c conf.py wsgi:app
To remove the header altogether, you can monkey-patch gunicorn by replacing its http response class with a subclass that filters out the header. This might be harmless, but is probably not recommended. Put the following in conf.py:
from gunicorn.http import wsgi
class Response(wsgi.Response):
def default_headers(self, *args, **kwargs):
headers = super(Response, self).default_headers(*args, **kwargs)
return [h for h in headers if not h.startswith('Server:')]
wsgi.Response = Response
Tested with gunicorn 18
This hasn't been clearly written here so I'm gonna confirm that the easiest way for the latest version of Gunicorn (20.1.x) is to add following lines into configuration file:
import gunicorn
gunicorn.SERVER = 'undisclosed'
For newer releases (20.0.4): Create a gunicorn.conf.py file with the content below in the directory from where you will run the gunicorn command:
import gunicorn
gunicorn.SERVER_SOFTWARE = 'My WebServer'
It's better to change it to something unique than remove it. You don't want to risk, e.g., spiders thinking you're noncompliant. Changing it to the name of software you aren't using can cause similar problems. Making it unique will prevent the same kind of assumptions ever being made. I recommend something like this:
import gunicorn
gunicorn.SERVER_SOFTWARE = 'intentionally-undisclosed-gensym384763'
You can edit __init__.py to set SERVER_SOFTWARE to whatever you want. But I'd really like the ability to disable this with a flag so I didn't need to reapply the patch when I upgrade.
My mocky-patch free solution, involves wrapping the default_headers method:
import gunicorn.http.wsgi
from six import wraps
def wrap_default_headers(func):
#wraps(func)
def default_headers(*args, **kwargs):
return [header for header in func(*args, **kwargs) if not header.startswith('Server: ')]
return default_headers
gunicorn.http.wsgi.Response.default_headers = wrap_default_headers(gunicorn.http.wsgi.Response.default_headers)
This doesn't directly answer to the question but could address the issue as well and without monkey patching gunicorn.
If you are using gunicorn behind a reverse proxy, as it usually happens, you can set, add, remove or perform a replacement in a response header coming downstream from the backend. In our case the Server header.
I guess every Webserver should have an equivalent feature.
For example, in Caddy 2 (currently in beta) it would be something as simple as:
https://localhost {
reverse_proxy unix//tmp/foo.sock {
header_down Server intentionally-undisclosed-12345678
}
}
For completeness I still add a minimal (but fully working) Caddyfile to handle Server header modification even in manual http->https redirect process (Caddy 2 does it automatically, if you don't override it), which could a bit tricky to figure it out correctly.
http://localhost {
# Fact: the `header` directive has less priority than `redir` (which means
# it's evaluated later), so the header wouldn't be changed (and Caddy would
# shown instead of the faked value).
#
# To override the directive ordering only for this server, instead of
# change the "order" option globally, put the configuration inside a
# route directive.
# ref.
# https://caddyserver.com/docs/caddyfile/options
# https://caddyserver.com/docs/caddyfile/directives/route
# https://caddyserver.com/docs/caddyfile/directives#directive-order
route {
header Server intentionally-undisclosed-12345678
redir https://{host}{uri}
}
}
https://localhost {
reverse_proxy unix//tmp/foo.sock {
header_down Server intentionally-undisclosed-12345678
}
}
To check if it works just use curl as curl --insecure -I http://localhost and curl --insecure -I http://localhost (--insecure because localhost certs are automatically generated as self signed).
It's so simple to setup that you could also think to use it in development (with gunicorn --reload), especially if it resembles your staging/production environment.

TortoiseSVN Can't Authenticate

After my previous problem, TortoiseSVN Can't Connect was resolved, I ran into a new problem.
On the linux server hosting my svn repository, in the repository's directory, there is a conf/svnserve.conf file. In this file, I have the option:
anon-access = none | read | write
Initially, this line was commented out and the default value must have been read.
Of course, I want to set anon-access = none, and I want auth-access = write (which is the default).
But when I set anon-access = none, when I try to browse with TortoiseSVN Repository Browser
using url svn://host:port/repositoryname, I get the error:
Unable to connect to a repository at URL
'svn://host:port/repositoryname' No access allowed to this repository
I'd like to successfully authenticate without ssh if possible, because I gather ssh has more moving parts and might be a little slower.
The server is CloudLinux Server release 5.8
The svn server information follows. I have only tried svn protocol so far.
svn, version 1.6.17 (r1128011) compiled Jul 26 2012, 03:59:19
Copyright (C) 2000-2009 CollabNet. Subversion is open source software,
see http://subversion.apache.org/ This product includes software
developed by CollabNet (http://www.Collab.Net/).
The following repository access (RA) modules are available:
ra_neon : Module for accessing a repository via WebDAV protocol using Neon.
handles 'http' scheme
ra_svn : Module for accessing a repository using the svn network protocol.
with Cyrus SASL authentication
handles 'svn' scheme
ra_local : Module for accessing a repository on local disk.
handles 'file' scheme
ra_serf : Module for accessing a repository via WebDAV protocol using serf.
handles 'http' scheme
handles 'https' scheme
I hope this is a good question because this is kind of the "out of the box" behavior connecting to svn with windows, which might be pretty common when someone adds svn to a shared hosting account.
Thank you!
Set these lines in your svnserve.conf file:
19 anon-access = none
20 auth-access = write
[...]
27 password-db = passwd
[...]
39 realm = Name-of-your-repository
46 force-username-case = lower
The line numbers are approximate.
The realm should equal the name of your repository. It can be anything. The password-db is who is authorized to use the repository. By default, the line is NOPed out.
Next, you'll edit the passwd file that's in the same directory. The format is very simple:
<userName> = <password>
There are two NOPed entries that show you how it's done.

Unexpected Connection Reset: A PHP or an Apache issue?

I have a PHP script that keeps stopping at the same place every time and my browser reports:
The connection to the server was reset
while the page was loading.
I have tested this on Firefox and IE, same thing happens. So, I am guessing this is an Apache/PHP config problem. Here are few things I have set.
PHP.ini
max_execution_time = 300000
max_input_time = 300000
memory_limit = 256M
Apache (httpd.conf)
Timeout 300000
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 0
Are the above correct? What can be causing this and what can I set?
I am running PHP (5.2.12.12) as a
module on Apache (2.2) on a Windows
Server 2003.
It is very likely this is an Apache or PHP issue as all browsers do the same thing. I think the script runs for exactly 10 mins (600 seconds).
I had a similar issue - turns out apache2 was segfaulting. Cause of the segfault was php5-xdebug for 5.3.2-1ubuntu4.14 on Ubuntu 10.04 LTS. Removing xdebug fixed the problem.
I also had this problem today, it turned out to be a stray break; statement in the PHP code (outside of any switch or any loop), in a function with a try...catch...finally block.
Looks like PHP crashes in this situation:
<?php
function a ()
{
break;
try
{
}
catch (Exception $e)
{
}
finally
{
}
}
This was with PHP version 5.5.5.
Differences between 2 PHP configs were indeed the root cause of the issue on my end. My app is based on the NuSOAP library.
On config 1 with PHP 5.2, it was running fine as PHP's SOAP extension was off.
On config 2 with PHP 5.3, it was giving "Connection Reset" errors as PHP's SOAP extension was on.
Switching the extension off allowed to get my app running on PHP 5.3 without having to rewrite everything.
I had an issue where in certain cases PHP 5.4 + eAccelerator = connection reset. There was no error output in any log files, and it only happened on certain URLs, which made it difficult to diagnose. Turns out it only happened for certain PHP code / certain PHP files, and was due to some incompatibilities with specific PHP code and eAccelerator. Easiest solution was to disable eAccelerator for that specific site, by adding the following to .htaccess file
php_flag eaccelerator.enable 0
php_flag eaccelerator.optimizer 0
(or equivalent lines in php.ini):
eaccelerator.enable="0"
eaccelerator.optimizer="0"
It's an old post, I know, but since I couldn't find the solution to my problem anywhere and I've fixed it, I'll share my experience.
The main cause of my problem was a file_exists() function call.
The file actually existed, but for some reason an extra forward slash on the file location ("//") that normally works on a regular browser, seems not to work in PHP. Maybe your problem is related to something similar. Hope this helps someone!
I'd try setting all of the error reporting options
-b on error batch abort
-V severitylevel
-m error_level
and sending all the output to the client
<?php
echo "<div>starting sql batch</div>\n<pre>"; flush();
passthru('sqlcmd -b -m -1 -V 11 -l 3 -E -S TYHSY-01 -d newtest201 -i "E:\PHP_N\M_Create_Log_SP.sql"');
echo '</pre>done.'; flush();
My PHP was segfaulting without any additional information as to the cause of it as well. It turned out to be two classes calling each other's magic __call() method because both of them didn't have the method being called. PHP just loops until it's out of memory. But it didn't report the usual "Allowed memory size of * bytes exhausted" message, probably because the methods are "magic".
I thought I would add my own experience as well.
I was getting the same error message, which in my case was caused by a PHP error in an exception.
The culprit was a custom exception class that did some logging internally, and a fatal error occurred in that logging mechanism. This caused the exception to not be triggered as expected, and no meaningful message to be displayed either.