apache + mod_perl + couchbase = occasional connection problems - apache

We use couchbase as session storage for mod_perl scripts. To avoid delays on clients caused by waiting for a new connection we do preconnect to couchbase on child_init apache stage. So during apache restart / new child creation it connects to couchbase automatically and later use that connection during apche child lifetime.
Generally everything works fine, but sometimes we got the following errors during that preconnection:
Couldn't connect: 0x13 (Operation not supported) at /perl/lib64/perl5/Couchbase/Bucket.pm line 38.
Usually it appears during apache restart and on several (or dozens) of childs, and almost never on one child only. Usually restarting apache again solves the problem.
What can cause such a problems? Is it a problem with code / server configuration / couchbase server itself?
May be it caused somehow with a lot of reconnections at the same time? Some ulimits stuff / or selinux restrictions?
UPD: versions
OS:
Centos 6, 2.6.32-358.2.1.el6.x86_64
libcouchbase:
libcouchbase-devel.x86_64 2.4.7-1.el6
libcouchbase2-core.x86_64 2.4.7-1.el6
libcouchbase2-libevent.x86_64 2.4.7-1.el6
couchbase server:
2.2.0 community edition (build-837)
SDK:
perl (Couchbase::Core v2.0.2)
connection code (isolated & simplified):
# in mod_perl environment
use Couchbase;
use Couchbase::Bucket;
use Couchbase::Document;
use Apache2::ServerUtil ();
my $cb = undef;
# connection handler, initialized once, used during apache child lifetime
sub connect_couchbase_on_child_init {
my ($child_pool, $s) = #_;
my $dsn = 'couchbase://192.168.0.1,192.168.0.2/my_bucket_name?detailed_errcodes=1';
eval { $cb = Couchbase::Bucket->new($dsn); };
# here we get the occasional warnings during apache restarts
if ($#) { warn "COUCHBASE CONNECTION ERROR! $#"; $cb = undef; }
return Apache2::Const::OK;
}
Apache2::ServerUtil->server->push_handlers(PerlChildInitHandler => \&connect_couchbase_on_child_init);
# in request handlers it used with the following calls (only if connected):
# $doc = Couchbase::Document->new($key);
# $cb->get($doc);
# ...
# $cb->replace($doc);
# ...
# $cb->insert($doc);
# ...
# $cb->remove($doc);

Because you are using server 2.2.0 and because this seems to happen when you are connecting many clients at once, my theory is that you are receiving the last error from the server. The current client bootstrap process attempts using bootstrap over memcached (which is only supported from version >= 2.5.0 of the server), that fails and it attempts to use 'terse' bootstrapping (again, only supported on >= 2.5.0 of the server) and finally 'classic' HTTP (which is available on all versions).
Add the following options to your DSN/connection string to cut out some of the steps for your server. Note that should you ever upgrade to >= 2.5 these options should be removed:
bootstrap_on=http Does not try memcached bootstrap
http_urlmode=2 Uses the pre-2.5 style of bootstrapping by default
These two options will not necessarily fix your issue, but they will at least cut out some of the initial connection time, and perhaps show a clearer reason for the error (you can also set LCB_LOGLEVEL=5 in the environment to get actual logging).
In your case, the connection string would be:
couchbase://192.168.0.1,192.168.0.2/my_bucket_name?detailed_errcodes=1&bootstrap_on=http&http_urlmode=2

Related

302 Redirect from CGI script stopped working in Apache 2.4

I inherited maintenance of a self-written CGI application without
documentation and have never met the original author. The application
stopped working in Debian 8, but worked in Debian 7 and CentOS 5. The
main changes were the upgrade from Apache 2.2 (used by Debian 7 &
CentOS 5) to Apache 2.4 (used by Debian 8) and the upgrade from perl
5.8 (in CentOS 5) respectively perl 5.14 (in Debian 7) to perl 5.20.
The problematic part boils down to the following script (a
302-redirect):
#!/usr/bin/perl
$|=1; # activate auto-flushing of stdout
use strict;
use warnings;
my $CRLF = "\015\012";
print STDOUT "Status: 302 Moved Temporarily$CRLF" .
"Location: /does_not_matter$CRLF" .
"URI: /does_not_matter$CRLF" .
"Connection: close$CRLF" .
"Content-type: text/html; charset=UTF-8$CRLF$CRLF";
close STDOUT;
while(1) {
sleep 1;
}
The observed behavior is that the redirect never reaches the client
as long as the script is still running when used with Apache 2.4, but
there is no error message in Apache's error.log. I changed the client
(Firefox, Chromium, wget), the Apache module (mod_cgid & mod_cgi),
sent additional headers, removed the close STDOUT, removed the
$|=1, replaced the $CRLF with \n and made the script
fork and exit the parent process (so that Apache is no longer the
parent process), all to no avail. The only things that worked: use
Apache 2.2, turn the script into an NPH-CGI-script (which has to send
complete HTTP headers that Apache will not modify in any way, even if
they contain errors), make the script exit instead of entering an
endless loop. I confirmed via tcpdump that the packages with the
redirect indeed never leave the server before the script is killed.
And from the Date-line in the response and the time of the eventual
arrival I gather that Apache receives my output immediately (&
immediately adds the Date-line to the headers), but does not send the
response to the client.
Don't bother answering, I already figured the solution out by myself
and will write an answer. I just want to make the solution available
to others who might encounter the same problem.
The problem was the missing message body. A 302 redirect may contain
a message body (see RFC 2616, section 4.3 (Message Body): "All other
responses do include a message-body, although it MAY be of zero
length"), but a Content-Length-line is optional (section 4.4 of RFC
2616 says the message body length can be determined by closing the
connection if the Content-Length-line is missing). Since Apache can
not know whether I want to send a message body or not, it has to wait
until I close the connection or actually send the message body
(Apache 2.2 apparently behaved erroneously here by not waiting for
the message body - or maybe close STDOUT; does not do in perl
5.20 what it did in older perl versions). The correct script should
therefore look like this (verified to work both in Apache 2.2 and in
Apache 2.4) - the only difference is an additional $CRLF which
terminates a zero-length message body:
#!/usr/bin/perl
$|=1; # activate auto-flushing of stdout
use strict;
use warnings;
my $CRLF = "\015\012";
print STDOUT "Status: 302 Moved Temporarily$CRLF" .
"Location: /doesNotMatter$CRLF" .
"URI: /doesNotMatter$CRLF" .
"Connection: close$CRLF" .
"Content-type: text/html; charset=UTF-8$CRLF$CRLF$CRLF";
close STDOUT;
while(1) {
sleep 1;
}
It was https://stackoverflow.com/a/8062277/2845840 that pointed me in the right direction.

CouchDB 1.6.1 SSL error on Raspbian

I am trying to setup SSL in CouchDB 1.6.1 on Raspbian and I get that :
** {badarg,[{ets,select_delete,
[undefined,[{{{undefined,'_','_'},'_'},[],[true]}]],
[]},
{ets,match_delete,2,[{file,"ets.erl"},{line,655}]},
{ssl_pkix_db,remove_certs,2,[{file,"ssl_pkix_db.erl"},{line,221}]},
{ssl_connection,terminate,3,
[{file,"ssl_connection.erl"},{line,934}]},
{tls_connection,terminate,3,
[{file,"tls_connection.erl"},{line,326}]},
{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,595}]},
{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,517}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
My version of erlang is Erlang/OTP 17 [erts-6.2].
My local.ini file contains :
httpsd = {couch_httpd, start_link, [https]}
[ssl]
port = 6984
cert_file = /etc/couchdb/cert/couchdb.crt
key_file = /etc/couchdb/cert/couchdb.key
Http works fine. Any idea?
Cheers.
This looks like the crash fixed by this pull request, merged into Erlang/OTP 18.2.
To the best of my knowledge, the crash itself is harmless: it occurs during connection termination, because the code is trying to clean up something that never was set up. However, it might draw your attention from other errors happening before it, such as incorrect path to key / certificate files.

Apache, LDAP and WSGI encoding issue

I am using Apache 2.4.7 with mod_wsgi 3.4 on Ubuntu 14.04.2 (x86_64) and python 3.4.0. My python app relies on apache to perform user authentication against our company’s LDAP server (MS Active Directory 2008). It also passes some additional LDAP data to the python app using the OS environment. In the apache config, I query the LDAP like so:
…
AuthLDAPURL "ldap://server:389/DC=company,DC=lokal?sAMAccountName,sn,givenName,mail,memberOf?sub?(objectClass=*)"
AuthLDAPBindDN …
AuthLDAPBindPassword …
AuthLDAPRemoteUserAttribute sAMAccountName
AuthLDAPAuthorizePrefix AUTHENTICATE_
…
This passes some user data to my WSGI script where I handle the info as follows:
# Make sure the packages from the virtualenv are found
import site
site.addsitedir('/home/user/.virtualenvs/ispot-cons/lib/python3.4/site-packages')
# Patch path for app (so that libispot can be found)
import sys
sys.path.insert(0, '/var/www/my-app/')
import os
from libispot.web import app as _application
def application(environ, start_response):
os.environ['REMOTE_USER'] = environ.get('REMOTE_USER', "")
os.environ['REMOTE_USER_FIRST_NAME'] = environ.get('AUTHENTICATE_GIVENNAME', "")
os.environ['REMOTE_USER_LAST_NAME'] = environ.get('AUTHENTICATE_SN', "")
os.environ['REMOTE_USER_EMAIL'] = environ.get('AUTHENTICATE_MAIL', "")
os.environ['REMOTE_USER_GROUPS'] = environ.get('AUTHENTICATE_MEMBEROF', "")
return _application(environ, start_response)
I can then access this info in my python app using os.environ.get(…). (BTW: If you have a more elegant solution, please let me know!)
The problem is that some of the user names contain special characters (German umlauts, e.g., äöüÄÖÜ) that are not encoded correctly. So, for example, the name Tölle arrives in my python app as Tölle.
Obviously, this is an encoding problem, because
$ echo "Tölle" | iconv --from utf-8 --to latin1
gives me the correct Tölle.
Another observation that might help: in my apache logs I found the character ü represented as \xc3\x83\xc2\xbc.
I told my Apache in /etc/apache2/envvars to use LANG=de_DE.UTF-8 and python 3 is utf-8 aware as well. I can’t seem to specify anything about my LDAP server. So my question is: where is the encoding getting mixed up and how do I mend it?
It is bad practice to copy the values to os.environ on each request as this will fail miserable if the WSGI server is running with a multithreaded configuration, with concurrent requests interfering with each other. Look at thread locals instead.
As to the issue of encoded data from LDAP, if I under stand the problem, you would need to do:
"Tölle".encode('latin-1').decode('utf-8')

How do I use Nagios to monitor a log file

We are using Nagios to monitor our network with great success. However, we have a syslog for critical application errors and while I set up check_log, it doesn't seem to work as well as monitering a device.
The issues are:
It only shows the last entry
There doesn't seem to be a way to acknowledge the critical error and
return the monitor to a good state
Is nagios the wrong tool, or are we just not setting up the service monitering right?
Here are my entries
# log file
define command{
command_name check_log
command_line $USER1$/check_log -F /var/log/applications/appcrit.log -O /tmp/appcrit.log -q ?
}
# Define the log monitering service
define service{
name logfile-check ;
use generic-service ;
check_period 24x7 ;
max_check_attempts 1 ;
normal_check_interval 5 ;
retry_check_interval 1 ;
contact_groups admins ;
notification_options w,u,c,r ;
notification_period 24x7 ;
register 0 ;
}
define service{
use logfile-check
host_name localhost
service_description CritLogFile
check_command check_log
}
For monitoring logs with Nagios, typically the log checker will return a warning only for newly discovered error messages each time it is invoked (so it must retain some state in order to know to ignore them on subsequent runs). Therefore I usually set:
max_check_attempts 1
is_volatile 1
This causes Nagios to send out the alert immeidately, but only once, and then go back to normal.
My favorite log checker is logwarn, but I'm biased because I wrote it myself after not finding any existing ones that I liked. The logwarn package includes a Nagios plugin.
Nothing in your config jumps out at me as being misconfigured.
By design, check_log will only show either an OK message, or the last log entry that triggered an alert. If you need to see multiple entries, you'll need to modify the plugin.
However, I find the fact that you're not getting recoveries somewhat odd. The way check_log works (by comparing the current log to the previous version), you should get a recovery on the very next service check. Except of course, when there have been additional matching entries added to the log since the last check.
Does forcing another service check (or several) cause it to recover?
Also, I don't intend this in a mean way, but make sure it's really malfunctioning.
Is your log getting additional matching entries in between checks, causing it not to recover? Your check is matching "?" which will match anything new in the log. Is something else (a non-error) being added to the log and inadvertently causing a match?
If none of the above are the issue, I would suggest narrowing it down by taking Nagios out of the equation. Try running check_log manually (from the command line, but as the same user as nagios), and with a different oldlog. It should go something like this -
run check with a new "oldlog" - get initialization message
run check - check OK
make change to log
run check - check fails
run check - check OK
If this doesn't work, then you know to focus on the log, the oldlog, and how the check_log is doing the check.
If it works, then it points more towards a problem with your nagios configuration.
There is a Nagios plugin that you can use to check the log files: it's called check_logfiles and it's used to scan the lines of a file for regular expressions.
The following link shows how to install and configure check_logfiles for Nagios and Opsview:
https://www.opsview.com/resources/nagios-alternative/blog/syslog-monitoring-nagios-opsview
As there are many ways to achieve a goal, there is also a nice plugin from Consol available:
https://labs.consol.de/lang/en/nagios/check_logfiles/
supports regex
supports log rotation
To use it, you need a cfg file, this is an example for oracle databases
#searches = ({
tag => 'oraalerts',
options => 'sticky=28800',
logfile => '/u01/app/oracle/diag/rdbms/davmdkp/DAVMDKP1/trace/alert_DAVMDKP1.log',
criticalpatterns => [
'ORA\-0*204[^\d]', # error in reading control file
'ORA\-0*206[^\d]', # error in writing control file
'ORA\-0*210[^\d]', # cannot open control file
'ORA\-0*257[^\d]', # archiver is stuck
'ORA\-0*333[^\d]', # redo log read error
'ORA\-0*345[^\d]', # redo log write error
'ORA\-0*4[4-7][0-9][^\d]',# ORA-0440 - ORA-0485 background process failure
'ORA\-0*48[0-5][^\d]',
'ORA\-0*6[0-3][0-9][^\d]',# ORA-6000 - ORA-0639 internal errors
'ORA\-0*1114[^\d]', # datafile I/O write error
'ORA\-0*1115[^\d]', # datafile I/O read error
'ORA\-0*1116[^\d]', # cannot open datafile
'ORA\-0*1118[^\d]', # cannot add a data file
'ORA\-0*1122[^\d]', # database file 16 failed verification check
'ORA\-0*1171[^\d]', # datafile 16 going offline due to error advancing checkpoint
'ORA\-0*1201[^\d]', # file 16 header failed to write correctly
'ORA\-0*1208[^\d]', # data file is an old version - not accessing current version
'ORA\-0*1578[^\d]', # data block corruption
'ORA\-0*1135[^\d]', # file accessed for query is offline
'ORA\-0*1547[^\d]', # tablespace is full
'ORA\-0*1555[^\d]', # snapshot too old
'ORA\-0*1562[^\d]', # failed to extend rollback segment
'ORA\-0*162[89][^\d]', # ORA-1628 - ORA-1632 maximum extents exceeded
'ORA\-0*163[0-2][^\d]',
'ORA\-0*165[0-6][^\d]', # ORA-1650 - ORA-1656 tablespace is full
'ORA\-16014[^\d]', # log cannot be archived, no available destinations
'ORA\-16038[^\d]', # log cannot be archived
'ORA\-19502[^\d]', # write error on datafile
'ORA\-27063[^\d]', # number of bytes read/written is incorrect
'ORA\-0*4031[^\d]', # out of shared memory.
'No space left on device',
'Archival Error',
],
warningpatterns => [
'ORA\-0*3113[^\d]', # end of file on communication channel
'ORA\-0*6501[^\d]', # PL/SQL internal error
'ORA\-0*1140[^\d]', # follows WARNING: datafile #20 was not in online backup mode
'Archival stopped, error occurred. Will continue retrying',
]
});
I believe there's now a real Nagios plugin that monitors logs effectively.
http://support.nagios.com/forum/viewtopic.php?f=6&t=8851&p=42088&hilit=unixautomation#p42088
The home page of the Nagios plugin on that page is Nagios Log Monitor
Your [ commands.cfg file ] will contain:
define command {
command_name NagiosLogMonitor
command_line $USER1$/NagiosLogMonitor $HOSTNAME$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ '$ARG5$' '$ARG6$' $ARG7$ $ARG8$ $ARG9$ $ARG10$
}
OR
define command {
command_name NagiosLogMonitor
command_line $USER1$/NagiosLogMonitor $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ '$ARG5$' '$ARG6$' $ARG7$ $ARG8$ $ARG9$ $ARG10$
}
Your [ services.cfg file ] will look similar to:
define service {
check_command NagiosLogMonitor!logrobot!autofig!/var/log/proteus.log!15!500.html!500 Internal Server Error!1!2!-foundn
max_check_attempts 1
service_description 500_ERRORS_LOGCHECK
host_name sky.blat-01.net,sky.blat-02.net,sky.blat-03.net
use fifteen-minute-interval
}
Nagios now has a solution that integrates tightly with Nagios Core, XI, etc.
Nagios Log Server which can alert on any query on any log file on any system in your infrastructure.

Unexpected Connection Reset: A PHP or an Apache issue?

I have a PHP script that keeps stopping at the same place every time and my browser reports:
The connection to the server was reset
while the page was loading.
I have tested this on Firefox and IE, same thing happens. So, I am guessing this is an Apache/PHP config problem. Here are few things I have set.
PHP.ini
max_execution_time = 300000
max_input_time = 300000
memory_limit = 256M
Apache (httpd.conf)
Timeout 300000
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 0
Are the above correct? What can be causing this and what can I set?
I am running PHP (5.2.12.12) as a
module on Apache (2.2) on a Windows
Server 2003.
It is very likely this is an Apache or PHP issue as all browsers do the same thing. I think the script runs for exactly 10 mins (600 seconds).
I had a similar issue - turns out apache2 was segfaulting. Cause of the segfault was php5-xdebug for 5.3.2-1ubuntu4.14 on Ubuntu 10.04 LTS. Removing xdebug fixed the problem.
I also had this problem today, it turned out to be a stray break; statement in the PHP code (outside of any switch or any loop), in a function with a try...catch...finally block.
Looks like PHP crashes in this situation:
<?php
function a ()
{
break;
try
{
}
catch (Exception $e)
{
}
finally
{
}
}
This was with PHP version 5.5.5.
Differences between 2 PHP configs were indeed the root cause of the issue on my end. My app is based on the NuSOAP library.
On config 1 with PHP 5.2, it was running fine as PHP's SOAP extension was off.
On config 2 with PHP 5.3, it was giving "Connection Reset" errors as PHP's SOAP extension was on.
Switching the extension off allowed to get my app running on PHP 5.3 without having to rewrite everything.
I had an issue where in certain cases PHP 5.4 + eAccelerator = connection reset. There was no error output in any log files, and it only happened on certain URLs, which made it difficult to diagnose. Turns out it only happened for certain PHP code / certain PHP files, and was due to some incompatibilities with specific PHP code and eAccelerator. Easiest solution was to disable eAccelerator for that specific site, by adding the following to .htaccess file
php_flag eaccelerator.enable 0
php_flag eaccelerator.optimizer 0
(or equivalent lines in php.ini):
eaccelerator.enable="0"
eaccelerator.optimizer="0"
It's an old post, I know, but since I couldn't find the solution to my problem anywhere and I've fixed it, I'll share my experience.
The main cause of my problem was a file_exists() function call.
The file actually existed, but for some reason an extra forward slash on the file location ("//") that normally works on a regular browser, seems not to work in PHP. Maybe your problem is related to something similar. Hope this helps someone!
I'd try setting all of the error reporting options
-b on error batch abort
-V severitylevel
-m error_level
and sending all the output to the client
<?php
echo "<div>starting sql batch</div>\n<pre>"; flush();
passthru('sqlcmd -b -m -1 -V 11 -l 3 -E -S TYHSY-01 -d newtest201 -i "E:\PHP_N\M_Create_Log_SP.sql"');
echo '</pre>done.'; flush();
My PHP was segfaulting without any additional information as to the cause of it as well. It turned out to be two classes calling each other's magic __call() method because both of them didn't have the method being called. PHP just loops until it's out of memory. But it didn't report the usual "Allowed memory size of * bytes exhausted" message, probably because the methods are "magic".
I thought I would add my own experience as well.
I was getting the same error message, which in my case was caused by a PHP error in an exception.
The culprit was a custom exception class that did some logging internally, and a fatal error occurred in that logging mechanism. This caused the exception to not be triggered as expected, and no meaningful message to be displayed either.