I have a kubernetes cluster and I am getting cgroup out of memory. I have resources declared in the YAML but I have no idea which apache2 needs more memory. It gives me a process id but how do I tell which pod is being killed?
Thank you.
It is what it is. Your Apache process is using more memory than you are allowing in your pod/container definition.
Reasons why it could be needing more memory:
You have an increase in traffic and sessions being handled
Apache is forking more processes within the container running into memory limits.
Apache not reaping some lingering sessions because of a config issue.
If you are running Docker for containers (which most people do) you can ssh into the node in your cluster and run a:
docker ps -a
You should see the Exited container where your Apache process(es) was running. Then you can run:
docker logs <container-id>
And you might get details on why Apache was doing before it was killed. If you only see minimal info, I recommend increasing the verbosity of your Apache logs.
Hope it helps.
Related
I have two issues with my kubernetes.
kubernetes version 1.12.5, ubuntu16.04
the first issue is
Occasionally, containers on a specific node are restarted including kube-proxy
kernel: IPVS: rr TCP - no destination available
IPVS: __ip_vs_del_service:enter
net_ratelimit: callbacks suppressed
As these logs are continuously recorded,
The load avarage of node system resources is rather high.
Docker containers uploaded to the node keep repeating the restart.
In this case, node drain can relieve the symptoms.
the second issue
Certain java-based containers throw "UnknownHostException".
Restarting the container manually will resolve the symptoms.
Should I look at the container deployment settings?
Should I look at the cluster dns, resolve related settings?
I want to know if UnknownHostException is related to dns settings.
Can you give me some good comments?
I am doing load test to tune my apache to server maximum concurrent https request. Below is the details of my test.
System
I dockerized my httpd and deployed in openshift with pod configuration is 4CPU, 8GB RAM.
Running load from Jmeter with 200 thread, 600sec ramup time, loop is for infinite. duration is long run (Jmeter is running in same network with VM configuration 16CPU, 32GB RAM ).
I compiled by setting module with worker and deployed in openshift.
Issue
Httpd is not scaling more than 90TPS, even after tried multiple mpm worker configuration (no difference with default and higher configuration)
2.Issue which i'am facing after 90TPS, average time is increasing and TPS is dropping.
Please let me know what could be the issue, if any information is required further suggestions.
I don't have the answer, but I do have questions.
1/ What does your Dockerfile look like?
2/ What does your OpenShift cluster look like? How many nodes? Separate control plane and workers? What version?
2b/ Specifically, how is traffic entering the pod (if you are going in via a route, you'll want to look at your load balancer; if you want to exclude OpenShift from the equation then for the short term, expose a NodePort and have Jmeter hit that directly)
3/ Do I read correctly that your single pod was assigned 8G ram limit? Did you mean the worker node has 8G ram?
4/ How did you deploy the app -- raw pod, deployment config? Any cpu/memory limits set, or assumed? Assuming a deployment, how many pods does it spawn? What happens if you double it? Doubled TPS or not - that'll help point to whether the problem is inside httpd or inside the ingress route.
5/ What's the nature of the test request? Does it make use of any files stored on the network, or "local" files provisioned in a network PV.
And,
6/ What are you looking to achieve? Maximum concurrent requests in one container, or maximum requests in the cluster? If you've not already look to divide and conquer -- more pods on more nodes.
Most likely you have run into a bottleneck/limitation at the SUT. See the following post for a detailed answer:
JMeter load is not increasing when we increase the threads count
I'm using CentOS 6.4 (x86) VPS with Nginx.
In Webmin Running processes table I found up to 8 "php-fpm: pool www" running processes that "Apache" is the owner, but Apache isn't running!
This consumes a lot of RAM memory.
It is necessary for the nginx jobs or not? Sorry for this (stupid?) question but I'm newbie about Server management.
Thank you in advance.
The processing running will be needed and won't be being wasted.
One of the first things that should be defined in your PHP-FPM config file is what user and group PHP-FPM should be running under.
Presumably your config file says to run PHP-FPM under the user 'Apache'. You can change this to whatever you like, so long as you get the file permission right for PHP-FPM to access your php files.
However if PHP-FPM is taking up a lot of memory then you should tweak the values for the number of pools and how much memory each one can use. In particular you could reduce the settings:
pm.start_servers = 4
pm.min_spare_servers = 2
To not have as many PHP-FPM processes sitting around idle when there is no load.
PHP-FPM has it's own separate process manager and really isn't connected to anything other than itself. Other software will connect to it, IE: nginx / apache. You probably see the "Apache" user running the process because of the pool configuration you have. You can easily change the configuration and then restart the FPM Process.
If you do not wish to have stale processes running while they are not used, then I would recommend that you change the PM option in the pool configuration from Static/Dynamic to ondemand. This way, FPM will only spool up when it is needed.
Many people use the Static/Dynamic options when they need specific variations for the processes they are running, IE: a site that receives a lot of constant traffic.
Depending on your FPM installation you'll normally find the configurations in /etc/php. I keep my configurations in /usr/local/etc/php-fpm/ or /usr/local/etc/fpm.d/
I am running a variable number of EC2 instances (CentOS 64) that contain an apache web server that caches a bunch of code in production mode.
Now every time I make some changes to the code (generally on a weekly basis) I have to log into each one of them instances and do a "su" then "service httpd restart"
Is there a way to automate this so that I can run a single command on one of the instances it would connect to all others and restart it? Getting really time consuming especially when the application has spawned some 20-30 instances on its own (happens on some days when we get high traffic)
Thanks!
Dancer's shell, dsh, is provided specifically to do this. No 'scripting' required. As #tix3 suggests, you should probably also convince sudo on those machines (configure /etc/sudoers using visudo) to configure them to accept your restart command.
Is it a good practice to run redis in production with Supervisor?
I've googled around, but haven't seen many examples of doing so. If not, what is the proper way of running redis in production?
I personally just use Monit on Redis in production. If Redis crash Monit will restart it but more importantly Monit will be able to monitor (and alert when a threeshold is reach) the amount of RAM that Redis currently takes (which is the biggest issue)
Configuration could be something like this (if maxmemory was set to 1Gb in Redis)
check process redis
with pidfile /var/run/redis.pid
start program = "/etc/init.d/redis-server start"
stop program = "/etc/init.d/redis-server stop"
if 10 restarts within 10 cycles
then timeout
if failed host 127.0.0.1 port 6379 then restart
if memory is greater than 1GB for 2 cycles then alert
Well..it depends. If I were do use redis under daemon control I would use runit. I do use monit but only for monitoring. I like to see the green light.
However, for redis to exploit the true power, you dont run redis as a deamon esp a master. If a master goes down, you will have to switch a slave to a master. Quit simply, I just shoot the node in the head and I have a chef recipe bring up a new node.
But then again....it also depends on how often you snapshot. I do not snapshot thus no need for deamon control.
People use reids for brute force speed. that means not writing to disk and keep all data in ram. If a node goes down...and you dont snapshot...data is lost.