How to trace memory allocations in Apache httpd server? - apache

I am running apache benchmark (ab) with server [httpd2.4.52] running locally. I want to track how many memory allocations and what size allocations does the server make.
I run 'valgrind --trace-malloc=yes ab -n 10 http://127.0.0.1/'
But the number of allocations is ~4.6k regardless of the number of requests (I tried 10,100,1000).
Is this because Apache uses its own custom memory allocator?
How can I track the allocations (specifically #allocations, total/avg size of allocations) for this custom allocator?
This page mentions an option named ALLOC_USE_MALLOC in apr code, but, I could not find this option in apr source code (I checked versions 1.7.0, 1.4.8, 1.4.2 and httpd2.0.51)

Related

Weblogic 10.3.6 generates empty heapdump on OutOfMemoryError

I'm trying to generate a full heapdump from Weblogic 10.3.6 due to an OutOfMemoryError generated by a Web Application deployed on the Server.
I've setted the following start script:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/heapdump
When the OutOfMemoryError occurs, Weblogic generates an empty hprof file (0 bytes size) in /path/to/heapdump folder, and nothing happens: the Server remains in RUNNING mode, even if is not reachable anymore.
The java process is still alive, but with 0% of processor.
Even the server.out log seems completely frozen, without any trace of the OutOfMemoryError.
What's wrong with the configuration?
Probably you can use Java Flight Recorder to save events and check which objects are generating OOM.
(any profiler should work as well).
Been there :( . I remember at the time that we've found it was somewhat logical since there was not enough memory for normal operation, the JVM could not automagically find enough memory to create a heapdump either. If memory serves me well, at that time we did 2 things to debug the memory leak. First we were "lucky" enough that the problem was happening fairly regularly so a close manual monitoring was possible (monitoring of the gc.log looking for repeated FullGC and monitoring of the performance tab in the console). Knowing when the onset of the problem was starting we were doing some kill -3 to get the dump manually. We also used jstack {PID} (JDK 1.6 on Linux) with some luck. With those, at the time, the devs were able to identify the memory leak. Hope that helps.
Okay, your configuration looks alright.. you might want to check if the weblogic process user has the rights to edit the heap dump file.
You can take heap dump by Java tools :
JAVA_HOME/bin/jmap -dump:format=b,file=path_of_the_file
OR
%JROCKIT_HOME%\bin\jrcmd hprofdump filename=path_of_the_file

Large commit stalls halfway through

I have a problem with our subversion server. Doing small commits works fine, but as soon as someone tried to commit a large collection sizeable files the commit stalls halfway through and the client finally time out. My test set consists of roughly 2000 files and the total size of the commit is about 1 GB. When I commit the files the file uploading starts but about halfway through the transfer rate drops to 0kb/s and the commit just stalls and never recovers. If I splitting the commit into smaller pieces (<150 Mb) everything works just fine, but that breaks the atomicity of the commit structure and is something I really want to avoid.
When I look at the logs generate by Apache there is no error messages.
When I bumped the loglevel from debug to trace6 on the Apache server, there is some errors appearing at the moment when the upload stalls:
...
OpenSSL: I/O error, 2229 bytes expected to read on BIO
OpenSSL: read 1460/2229 bytes from BIO
...
Versions used:
We are running the connection to the subversion via apache, mod_dav, mod_dav_svn, mod_authz_svn and mod_auth_digest. The client connects via https.
Server:
OpenSuse 42.3
svnserve: 1.9.7
Apache: 2.4.23
Client:
Windows 10 enterprise
svn client: 1.10.0-dev.
What I tried so far:
I have tried increasing the TimeOut value in the apache configuration. The only difference is that the client ends up in stalled mode longer before posting the timeout message.
I have tried increasing the MaxKeepAliveRequests from 100 to 1000. No change.
I have tried adding SVNAllowBulkUpdates Prefer to the svn settings. No change.
Have anyone got any hints on how to debug these types of errors?

redis cluster continuously print log WSA_IO_PENDING

When I start up all the redis-server of the redis cluster, all these servers continuously print logs like WSA_IO_PENDING clusterWriteDone
[9956] 03 Feb 18:17:25.044 # WSA_IO_PENDING writing to socket fd --------------------------------------------------------
[9956] 03 Feb 18:17:25.062 # clusterWriteDone written 2520 fd 15----------------------------------------------------------‌​---
[9956] 03 Feb 18:17:25.545 # WSA_IO_PENDING writing to socket fd --------------------------------------------------------
[9956] 03 Feb 18:17:25.568 # WSA_IO_PENDING writing to socket fd -------------------------------------------------------- –
There is no way to specifically turn those "warnings" off in 3.2.x port of Redis for Windows as the logging statements use highest LL_WARNING level. This issue has been reported in my fork of that unmaintained MSOpenTech's repo (which I updated to Redis 4.0.2) and has been fixed by decreasing that level to LL_DEBUG. More details: https://github.com/tporadowski/redis/issues/14
This change will be included in the next release (4.0.2.3) or you can get the latest source code and build it for yourself.
Current releases can be found here: https://github.com/tporadowski/redis/releases
An issue was open in the official redis repo 10 months ago about that problem. Unfortunately it seems to be abandoned, and it hasn't been solved yet:
Redis cluster print "WSA_IO_PENDING writing to socket..." continuously, does it matter?
However, that issue may not be related to redis itself, but to the Windows Sockets API, as pointed out by Cy Rossignol in the comments. It's the winsock API that returns that status to the application, as seen in the documentation:
WSA_IO_PENDING (997)
Overlapped operations will complete later.
The application has
initiated an overlapped operation that cannot be completed
immediately. A completion indication will be given later when the
operation has been completed. Note that this error is returned by the
operating system, so the error number may change in future releases of
Windows.
Maybe it didn't get much attention because it's not a bug, although it's indeed an inconvenience that floods the system logs. In that case, you may not get help there.
Seems like there's no temporary fix. The Windows Redis fork is archived and I don't know if you could get any help there either.
Go on this location C:\Program Files\Redis
Open file redis.windows-service.conf in Notepad.
You will find a section like below:
# Specify the server verbosity level.
# This can be one of:
# debug (a lot of information, useful for development/testing)
# verbose (many rarely useful info, but not a mess like the debug level)
# notice (moderately verbose, what you want in production probably)
# warning (only very important / critical messages are logged)
loglevel notice
# Specify the log file name. Also 'stdout' can be used to force
# Redis to log on the standard output.
logfile "Logs/redis_log.txt"
Here, you can change the value of loglevel as per your requirement. I think changing it to warning will solve this issue because it will log only essential errors.

apache 2.2 couldn't load a module on AIX 6.1

I am testing an auth module with apache 2.2 server on 6.1 power AIX, 64 bit platform. The apache server doesn't start at all when I give my module path name in httpd.conf, it works fine on AIX 5.3 though with same module.
No crash, no other error message than following in error in error_log file:
httpd: Syntax error on line 423 of /home/apache22-aix64/installApache/conf/httpd.conf: Syntax error on line 9 of /
/home/apache22-aix64/installApache/conf/agent.conf: Cannot load /home/agent/apache/lib/auth-module.so into server: Not enough space
I have checked by increasing ThreadStackSize to 6mb, increased memory and other parameters, but issue is still the same. Issue is common in prefork n worker mode of apache server.
That's a new one on me... I'm guessing you are out of something (yea, brilliant guess right? ). Try checking ulimit -a between the two systems (5.3 and 6.1). I presume you are starting apache using the same type of id (non-root id with the same limits, permissions, etc).
I would also suggest tagging this with Apache to see if some apache guys might be able to help out. We need to determine what is it out of -- memory, stack, disk space, paging space, etc.
Did you build this apache version yourself?

Which tools to use and how to find file descriptors leaking from Glassfish?

We release new code to production every week and Glassfish hasn't had any problems. This weekend we had to move racks at our hosting provider. There were not any code changes (they just powered off, moved, re-racked and powered on) but we're on a new network infrastructure and suddenly we're leaking file descriptors like a sieve. So I'm guessing there is some sort of connection attempting to be made which now fails due to a network change.
I'm running Glassfish v2ur2-b04/AS9.1_02 on RHEL4 with an embedded IMQ instance. After the move I started seeing:
[#|2010-04-25T05:34:02.783+0000|SEVERE|sun-appserver9.1|javax.enterprise.system.container.web|_ThreadID=33;_ThreadName=SelectorThread-?4848;_RequestID=c4de6f6d-c1d6-416d-ac6e-49750b1a36ff;|WEB0756: Caught exception during HTTP processing.
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
...
[#|2010-04-25T05:34:03.327+0000|WARNING|sun-appserver9.1|javax.enterprise.system.stream.err|_ThreadID=34;_ThreadName=Timer-1;_RequestID=d27e1b94-d359-4d90-a6e3-c7ec49a0f383;|java.lang.NullPointerException at
com.sun.jbi.management.system.AutoAdminTask.pollAutoDirectory(AutoAdminTask.java:1031)
Using lsof I check the number of file descriptors and I see quite a few entries which look like:
java 18510 root 8556u sock 0,4 1555182 can't identify protocol
java 18510 root 8557u sock 0,4 1555320 can't identify protocol
java 18510 root 8558u sock 0,4 1555736 can't identify protocol
java 18510 root 8559u sock 0,4 1555883 can't identify protocol
If I do a count of open file descriptors every minute I see it growing by 12 every minute. I have no idea what these sockets are.
I've undeployed my application so there is only a plain Glassfish instance running and I still see it leaking 12 file descriptors a minute. So I think this leak is in Glassfish or potentially IMQ.
What approach should I take to tracking down these sockets of unknown protocol? What tools can I use (or flags can I pass to lsof) to get more information about where to look?
thanks,
chuck
I found this solution;
assuming GlassFish runs as user
"glassfish"
Add the following lines to
/etc/security/limits.conf to increase
the maximum number of open files for
the user that Glassfish runs as:
glassfish soft nofile 32768
glassfish hard nofile 65536