Graphite WSGI process taking up high CPU

Graphite WSGI process taking up high CPU - apache

Our Carbon daemon (in Graphite) takes up no more than 9% CPU on a 2-core machine. However, our Graphite webapp has shot the HTTPD usage high recently to about 95%. Out of this, we have noted that the process "wsgi:graphite" is taking up as much as 93% CPU.
Has anyone come across this problem? What is the solution? We have a lot of monitoring scripts querying graphite via the Graphite URL/Render API. This will of course increase Graphite's HTTPD activity, but we havne't made any drastic changes.
I would appreciate your comments.

Related

Apache Tomcat Crashes In Google Compute Engine f1-micro

I am running Apache Guacamole on a Google Cloud Compute Engine f1-micro with CentOS 7 because it is free.
Guacamole runs fine for some time (an hour or so) then unexpectantly crashes. I get the ERR_CONNECTION_REFUSED error in Chrome and when running htop I can see that all of the tomcat processes have stopped. To get it running again I just have to restart tomcat.
I have a message saying "Instance "guac" is overutilized. Consider switching to the machine type: g1-small (1 vCPU, 1.7 GB memory)" in the compute engine console.
I have tried limiting the memory allocation to tomcat, but that didn't seem to work.
Any suggestions?

I think the reason for the ERR_CONNECTION_REFUSED is likely due to the VM instance falling short on resources and in order to keep the OS up, process manager shuts down some processes. SSH is one of those processes, and once you reboot the vm, resource will resume operation in full.
As per the "over-utilization" notification recommending g1-small (1 vCPU, 1.7 GB memory)", please note that, f1-micro is a shared-core micro machine type with 0.2 vCPU, 0.60 GB of memory, backed by a shared physical core and is only ideal for running smaller non-resource intensive applications..
Depending on your Tomcat configuration, also note that:
Connecting to a database is an intensive process.
Creating a Tomcat with Google Marketplace, the default VM setting is "VM instance: 1 vCPU + 3.75 GB memory (n1-standard-1) so upgrading to machine type: g1-small (1 vCPU, 1.7 GB memory) so should ideal in your case.
Why was g1 small machine type recommended. Please note that Compute Engine uses the same CPU utilization numbers reported on the Compute Engine dashboard to determine what recommendations to make. These numbers are based on the average utilization of your instances over 60-seconds intervals, so they do not capture short CPU usage spikes.
So, applications with short usage spikes might need to run on a larger machine type than the one recommended by Google, to accommodate these spikes"
In summary my suggestion would be to upgrade as recommended. Also note that, the rightsizing gives warnings when VM is underutilized or overutilized and in this case, it is recommending to increase your VM size due to overutilization and keep in mind that this is only a recommendation based on the available data.

Commit transfer performance for large files to HTTP+SVN server

I have a SVN repository behind an Apache HTTPS server that stores small and large (+1GB) files. When I commit a large file, the transfer speed is about 10MB/sec (using a 1GBit network line). When I look at CPU utilization on the server, it is saturated with about 85% being consumed by apache2, and some 15% by the disk driver.
I have already tried disabling Apache logging and SSL, but that didn't help to improve the transfer speed. This makes me think that mod_dav_svn is using most of the CPU? I have also tried to increase the amount of available cores on the server (default = 1 core), but this mysteriously slows down the commits while httpd remains using 1 core. And setting SVNCompressionLevel 0 also didn't result in any noticeable speed improvement.
Is there any way to significantly increase the transfer speed through parallelization or some other optimization?
Server:
Debian 9.3
Apache 2.4.25
libapache2-mod-svn 1.9.5
svn repository: default FSFS config (i.e. all commented out in fsfs.conf). The HDD can write up to 30Mb/sec (hardware limited) without saturating the CPU (tested with copying). FS is NTFS, using ntfs-3g with big_writes enabled which is using some 10-15% CPU while writing #10MB/sec.
Client:
svn 1.8.13
CPU: first generation Intel Core #3.20Ghz
Obviously, I would be very pleased if I could transfer at 25-30MB/sec.

Is there any way to significantly increase the transfer speed through
parallelization or some other optimization?
Yes, there is. However, the question lacks necessary details about the SVN client and server version, the server's and FSFS repository configuration and the hardware it runs on. It is hard to tell what kind of optimizations will help in your case. You may want to upgrade your server and client to the latest versions and disable the compression in the server's config.
FYI: VisualSVN Server in my tests can deliver 1Gbps speed.

Configuring the Linux system daemons "flush" and "kswapd"

We have a heavy duty I/O node which transfers hundreds of giga bytes from disk to the network in each data transmission. We observe that our transmission application almost halts when the "flush"'s and "kswapd"'s CPU utilization approaches 100%.
Can we increase the number of these 2 system daemon processes?
How do we change the behavior of them? For example, changing the threshold system parameters which trigger the running of them.
Where are the executables and configuration files for them?
Many thanks!

Apache hangs/times out when backing up website with gzip or zip?

I'm running some websites on a dedicated Ubuntu web server. If I'm remembering correctly, it has 8 cores, 16GB memory, and running as a 64 bit Ubuntu. Content and files are delivered quickly to web browsers. Everything seems like a dream... until I run gzip or zip to backup an 8.6GB sized website.
When running gzip or zip, Apache stops delivering content. Internal server error messages are delivered until the compression process is complete. During the process, I can login via ssh without delays and run the top command. I can see that the zip process is taking about 50% CPU (I'm guessing that's 50% of a single CPU, not all 8?).
At first I thought this could be a log issue, with Apache logs growing out of control and not wanting to be messed with. Log files are under 5MB though and being rotated when they hit 5MB. Another current thought is that Apache only wants to run on one CPU and lets any other process take the lead. Not sure where to look to address that yet.
Any thoughts on how to troubleshoot this issue? Taking out all my sites while backups occur is not an option, and I can't seem to reproduce this issue on my local machines (granted, it's different hardware and configuration). My hopes are that this question is not to vague. I'm happy to provide additional details as needed.
Thanks for your brains in advance!

I'd suggest running your backup script under the "ionice" command. It will help prevent starving httpd from I/O.

Varnish: how many req per second peak to (reasonably) expect?

We're experiencing a strange problem with our current Varnish configuration.
4x Web Servers (IIS 6.5 on Windows 2003 Server, each installed on a Intel(R) Xeon(R) CPU E5450 # 3.00GHz Quad Core, 4GB RAM)
3x Varnish Servers (varnish-3.0.3 revision 9e6a70f on Ubuntu 12.04.2 LTS - 64 bit/precise, Kernel Linux 3.2.0-29-generic, each installed on a Intel(R) Xeon(R) CPU E5450 # 3.00GHz Quad Core, 4GB RAM)
The 3 Varnish Servers have a pretty much standard, vanilla cfg: the only thing we changed was the vcl_recv and vcl_fetch in order to handle the session cookies. They are currently configured to use in-memory cache, but we already tried switching to HDD cache using an high-performance Raid Drive with the same exact results.
We had put this in place almost two years ago without problems on our old web farm, and everything worked like a blast. Now, using the machines described above and after a clean reinstall, our customers are experiencing a lot of connection problems (pending request on clients, 404 errors, missing files, etc.) when our websites are under heavy traffic. From the console log we can clearly see that these issues start happening when each Varnish reaches roughly 700 request per seconds: it just seems like they can't handle anything more. We can easily reproduce the critical scenario at any tme by shutting down one or two Varnish Servers, and see how the others react: they always start to skip beats everytime the req per seconds count reaches 700. Considering what we've experienced in the past, and looking to the Varnish specs, this doesn't seem to be normal at all.
We're trying to improve our Varnish servers performances and/or understand where the problem actually is: in order to do that, we could really use some kind of "benchmark" from other companies who are using it in a similar fashion in order to help us understand how far we are from the expected performances (I assume we are).
EDIT (added CFG files):
This is our default.vcl file.
This is the output of varnishadm >param.show output console cmd.
I'll also try to post a small part of our varnishlog file.
Thanks in advance,

To answer the question in the headline: A single Varnish server with the specifications you describe should easily serve 20k+ requests/sec with no other tuning than increasing the number of threads.
You don't give enough information (vcl, varnishlog) to answer your remaining questions.
My guess would be that you somehow end up serialising the backend requests. Check out your hit_for_pass objects and make sure they have a valid TTL set. (120s is fine)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas