How to measure the memory usage per active Apache Connection? - apache

I would like to measure the memory consumption for one active Apache connection(=Thread) under Ubuntu.
Is there a monitoring tool which is capable of doing this?
If not, does anyone knows how much memory an Apache connection roughly needs?

Activate the mod_status module, you'll get a report on /server-status page, there is a more parseable version on /server-status?q=auto. If you enable ExtendedStatus On you will have a lot of information on processes and threads.
This is the page used by monitoring tools to track a lot of stats parameters, so you will certainly find the one you need (edit: if it is not memory...) . Be careful with security/access settings of this file, it's a nice tool to check how your server respond to DOS :-)
About memory you must note that Apache loves memory, how much memory per process depends on a lot of things (number of modules loaded - check that you need all the ones you have, number of virtualHosts, etc). But on a stable configuration it does not move a lot (except if you use PHP scripts with high memory limit usage...). If you find memory leaks try to limit the number of requests per process MaxRequests (apache will kill him and put a new one).
edit: in fact not a lot of memory info in the server-status. About monitoring tools, any tool using SNMP MIB-II can track memory usage per process, with average/top/low values for the different childs (Cacti, Nagios, Munin, etc) if you had a snmpd daemon. Check this excellent Munin example. It's not a tracking of each apache child but it will give you an idea of what you can track with these tools. If you do not need a complete monitoring system such as Nagios or Centreon, with alerts, user managmenent, big networks (and if you do not have a lot of days for books reading) Munin is, IMHO, a pretty tool to get monitoring reports quite fast.

I'm not sure if there are any tools for doing this. But you could estimate it yourself. Start apache and check how much memory it uses without any sessions. Than create a big number of sessions and check again how much memory it uses.
You could use JMeter to create different workloads.

Related

When to turn off Transparent Huge Pages for redis

According to redis docs, it's advisable to disable Transparent Huge Pages.
Would the guidance be the same if the machine was shared between the redis server and the application.
Moreover, for other technologies, I've also read guidance that THP should be disabled for all production environments when setting up the server. Is this kind of pre-emptiveness applicable to redis as well, or one must first strictly monitor latency issues before deciding to turn off THP?
Turn it off. The problem lies in how THP shifts memory around to try and keep or create contiguous pages. Some applications can tolerate this, most databases cannot and it causes intermittent performance problems, some pretty bad. This is not unique to Redis by any means.
For your application, especially if it is JAVA, set up real HugePages and leave the transparent variety out of it. If you do that just make sure you alocate memory correctly for the app and redis. Though I have to say, I probably would not recommend running both the app and redis on the same instance/server/vm.
Turning off transparent hugepages is a bad idea, and redis no longer recommends it.
What you should do instead is make sure transparent_hugepage is not set to always. (This is what recent versions of redis check for.) You can check the current value of the setting with:
$ cat /sys/kernel/mm/transparent_hugepage/enabled
And correct it like so:
# echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
Although no action is likely to be necessary, since madvise is typically the default setting in recent linux distros.
Some background:
transparent_hugepage = always: can force applications to use hugepages unless they opt out with madvise. This has a lot of problems and is rarely enabled.
transparent_hugepage = never: does not fulfill allocations with hugepages, even if the application requests it with madvise
transparent_hugepage = madvise: allows applications to opt-in to hugepages. This is normally a good idea because hugepages can improve performance in some applications, but this setting doesn't force them on applications that, like redis, don't opt in
It is rather annoying that searching for "transparent huge pages" yields top results about how to disable them because Redis and some other databases cannot handle transparent huge pages without performance degradation.
These applications should do any of:
Use prctl(PR_SET_THP_DISABLE, ...) call to opt out from using transparent huge pages.
Be started by a script which does this call for them prior to fork/exec the database process. PR_SET_THP_DISABLE get inherited by child processes/threads for exactly this scenario when an existing application cannot be modified.
prctl(PR_SET_THP_DISABLE, ...) has been available since Linux 3.15 circa 2014, so that there is little excuse for those databases to not mention this solution, instead of giving this poor/panic advice to their users to disable transparent huge pages for the entire system.
3 years after this question was asked, Redis got disable-thp config option to make prctl(PR_SET_THP_DISABLE, ...) call on its own, by default.
My production memory-intensive processes go 5-15% faster with /sys/kernel/mm/transparent_hugepage/enabled set to always. Many popular desktop applications benefit from always transparent huge pages immensely.
This is why I cannot appreciate those search results for "transparent huge pages" spammed with Redis adviŅe to disable them. That's a panic advice from Redis, not the best practice.
The overhead THP imposes occurs only during memory allocation, because of defragmentation costs.
If your redis instance has a (near-)constant memory footprint, you can only benefit from THP. Same applies to java or any other long-lived service that does its own memory management. Pre-allocate memory once and benefit.
why playing such echo-games when there is a kernel-param you can boot with?
transparent_hugepage=never

High CPU with ImageResizer DiskCache plugin

We are noticing occasional periods of high CPU on a web server that happens to use ImageResizer. Here are the surprising results of a trace performed with NewRelic's thread profiler during such a spike:
It would appear that the cleanup routine associated with ImageResizer's DiskCache plugin is responsible for a significant percentage of the high CPU consumption associated with this application. We have autoClean on, but otherwise we're configured to use the defaults, which I understand are optimal for most typical situations:
<diskCache autoClean="true" />
Armed with this information, is there anything I can do to relieve the CPU spikes? I'm open to disabling autoClean and setting up a simple nightly cleanup routine, but my understanding is that this plugin is built to be smart about how it uses resources. Has anyone experienced this and had any luck simply changing the default configuration?
This is an ASP.NET MVC application running on Windows Server 2008 R2 with ImageResizer.Plugins.DiskCache 3.4.3.
Sampling, or why the profiling is unhelpful
New Relic's thread profiler uses a technique called sampling - it does not instrument the calls - and therefore cannot know if CPU usage is actually occurring.
Looking at the provided screenshot, we can see that the backtrace of the cleanup thread (there is only ever one) is frequently found at the WaitHandle.WaitAny and WaitHandle.WaitOne calls. These methods are low-level synchronization constructs that do not spin or consume CPU resources, but rather efficiently return CPU time back to other threads, and resume on a signal.
Correct profilers should be able to detect idle or waiting threads and eliminate them from their statistical analysis. Because New Relic's profiler failed to do that, there is no useful way to interpret the data it's giving you.
If you have more than 7,000 files in /imagecache, here is one way to improve performance
By default, in V3, DiskCache uses 32 subfolders with 400 items per folder (1000 hard limit). Due to imperfect hash distribution, this means that you may start seeing cleanup occur at as few as 7,000 images, and you will start thrashing the disk at ~12,000 active cache files.
This is explained in the DiskCache documentation - see subfolders section.
I would suggest setting subfolders="8192" if you have a larger volume of images. A higher subfolder count increases overhead slightly, but also increases scalability.

Is it meaningful to monitor physical memory usage on AIX?

Due to AIX's special memory-using algorithm, is it meaning to monitor the physical memory usage in order to find out the memory bottleneck during performance tuning?
If not, then what kind of KPI am i supposed to keep eyes on so as to determine whether we need to enlarge the RAM capacity or not?
Thanks
If a program requires more memory that is available as RAM, the OS will start swapping memory sections to disk as it sees fit. You'll need to monitor the output of vmstat and look for paging activity. I don't have access to an AIX machine now to illustrate with an example, but I recall the man page is pretty good at explaining what data is represented there.
Also, this looks to be a good writeup about another AIX specfic systems monitoring tool, and watching your systems overall memory (svgmon).
http://www.aixhealthcheck.com/blog.php?id=255
To track the size of your individual application instance(s), there are several options, with the most common being ps. Again, you'll have to check the man page to get information on which options to use. There are several columns for memory sz per process. You can compare those values to the overall memory that's available on your machine, and understand, by tracking over time, if your application is only increasing is memory, or if it releases memory when it is done with a task.
Finally, there's quite a body of information from IBM on performance tuning for AIX, but I was never able to find a road map guide to reading that information. A lot of it assumes you know facts and features that aren't explained in the current doc set, so you then have to try and find an explanation, which oftens leads to searching for yet another layer of explanations. ! :^/
IHTH.

Memory requirements when hosting R in the cloud

What is the minimal size server we need to run opencpu, if we expect 100,000 hits a month?
I think opencpu is an exciting project, but need to know about memory usage when opencpu is deployed, since a cloud hosting service such as rackspace charges about $40 per month for 1 GB of RAM.
I know that if I load R without doing anything or without loading any data or package in RAM, it uses almost 700m of RAM (virtual) and 50 megabytes of RAM (in residence).
I know that opencpu uses rApache, and rApache uses preforking, but want to know how this will scale as the number of concurrent users increases. Thanks.
Thanks for the responses
I talked with Jeroen Ooms when visiting LA, and am partly convinced that opencpu will work in high concurrency environments if used correctly, and that he is available to fix issues if they arrise. Opencpu related to his dissertation, after all! In particular, what I find useful about opencpu is its integration with ubuntu's AppArmor, which can restrict processes from using too much RAM and CPU. I think apache might also be able to do this, but RAppArmor can do this and much more. Brilliant! If AppArmor were the only advantage, I would just use that and json as a backend, but it seems like opencpu can also streamline the installation of all this stuff and provides a built in API system.
Given the cost of web-hosting, I imagine a workable real-time analytics system is the following:
create R statistical models on demand, on a specialized analytical server, as often as needed (e.g. every day or hour using cron)
transfer the results of the models to a directory on the opencpu servers using ftp, as native R objects
on the opencpu server, go to the directory and grab the R objects representing the statistical models, and then make predictions or do simulations using it. For example, use the 'predict' function to provide estimates based on user supplied variables.
Does anybody else see this as a viable way to make R a backend for real time analytics?
Dirk is right, it all depends on the R functions that you are calling; the overhead of the OpenCPU architecture should be quite minimal. OpenCPU itself will run on a small server, but as we all know some functionality in R requires much more memory/cpu than others.
If you really want to know how much resources are needed just to run opencpu you could do some benchmarking. As you noted, prefork is used to branch sessions of the main process, so in most cases the copy-on-write principle of forking should make it pretty cheap.
Also there is some other stuff that you can tweak; e.g. preloading of frequently used packages.

Forming a web application cluster with 3 VMs running in the same physical box

Are there any advantages what so ever to form a cluster if all the nodes are Virtual machines running inside the same physical host? Our small company just purchased a server with 16GB of Ram. I propose to just setup IIS on the box to handle outside requests, but our 'Network Engineer' argue that it will be better to create 3 VMs on the box and form a cluster with the VMs for load balancing. But since they are all in the same box, are there actual benefits for taking the VM approach rather than no VMs?
THanks.
No, as the overheads of running four operating systems would take a toll on performance, plus, I believe all modern web servers (plus IIS) are multithreaded so are optimised for performance anyway.
Maybe the Network Engineer knows something that you don't. Just ask. Use common sense to analyze the answer.
That said, running VMs always needs resources - but you might not notice. Doesn't make sense? Well, even if you attach the computer with a Gigabit link to the Internet, you still won't be able to process more data than the ISP gives you. If your uplink is 1MB/s, that's the best you can get. Any VM today is able to process that little trickle of data while being bored 99.999% of the time.
Running the servers in VMs does have other advantages, though. First of all, you can take them down individually for maintenance. If the load surges because your company is extremely successful, you can easily add more VMs on other physical boxes and move virtual servers around with a mouse click. If the main server dies, you can set up a replacement machine and migrate the VMs without having to reinstall everything.
I'd certainly question this decision myself as from a hardware perspective you obviously still have a single point of failure so there is no benefit.
From an application perspective it could be somewhat tenuously suggested that this would allow zero downtime deployments by taking VMs out of the "farm" one at a time but you won't get any additional application redundancy or performance from virtualisation in this instance. What you will get is considerably more management overhead in terms of infrastructure and deployment for little gain.
If there's a plan to deploy to a "proper" load balanced environment in the near future this might be a good starting point to ensure your application works correctly in a farm (sticky sessions etc). Although this makes your apparently live environment also a QA server, which is far from ideal.
from a performance perspective, 3 VMs on the same hardware is slower
from an availability perspective, 2 VMs will give higher availability (better protects from app software failures, OS failures, you can perform maintenance on one node while the other is up).