Monit cannot start/stop service - monit

Monit cannot start/stop service,
If i stop the service, just stop monitoring the service in Monit.
Attached the log and config for reference.
#Monitor vsftpd#
check process vsftpd
matching vsftpd
start program = "/usr/sbin/vsftpd start"
stop program = "/usr/sbin/vsftpd stop"
if failed port 21 protocol ftp then restart

The log states: "stop on user request". The process is stopped and monitoring is disabled, since monitoring a stopped (= non existing) process makes no sense.
If you Restart service (by cli or web) it should print info: 'test' restart on user request to the log and call the stop program and continue with the start program (if no dedicated restart program is provided).
In fact one problem can arise: if the stop scripts fails to create the expected state (=NOT(check process matching vsftpd)), the start program is not called. So if there is a task running that matches vsftpd, monit will not call the start program. So it's always better to use a PID file for monitoring where possible.
Finally - and since not knowing what system/versions you are on, an assumption: The vsftpd binary on my system is really only the daemon. It is not supporting any options. All arguments are configuration files as stated in the man page. So supplying start and stop only tries to create new daemons loading start and stop file. -- If this is true, the one problem described above applies, since your vsftpd is never stopped.

Related

Hosts in Nagios are disappearing

This may belong in ServerFault, but I wanted to approach this community first. If this is not correct, please move this thread or close and I will open on the correct thread.
PROBLEM:
Hosts, along with their associated services, disappear and reappear upon refresh (F5 / Ctrl+F5 / etc).
STEPS TO REPRODUCE:
1. Log into Nagios
2. Click Service Detail
3. See a breakdown of services but you don't see the last one you added.
4. Refresh screen by using F5 / Ctrl+F5 / etc and it doesn't show up still
5. Refresh screen by using F5 / Ctrl+F5 / etc and it doesn't show up still
6. Refresh screen and it will show up.
(!) - Steps 4-6 vary
WHAT I'VE TRIED:
Restarting Nagios service (service Nagios restart)
Restarting HTTPD service (service httpd restart)
Restarting VPS
Refresh browser including "Clear Cache and Hard Reload"
Tried different browsers
Tried different computers
Tried different networks
SCREENSHOTS:
GOOD
https://i.imgur.com/KUW5C6E.png
BAD
https://i.imgur.com/rWFLEaf.png
POSSIBLE CAUSE:
The reason we're in this situation now is because we had an intern add this latest host and its associated service. He added it correctly, and I even checked his work. He did the normal preflight but instead of issuing the reset command via SSH he issued the command on the Web interface itself by accessing "Process Info > Restart the Nagios process". Seems like it would work OK, but we've never restarted like this and is the only reason I suspect it's the culprit of the issue we are seeing. Is there something different that this restart does over the normal SSH restart?
EDIT: To add to all of this, we have updated a different file today, unrelated to this host or it's services and Nagios is not updating.
Thanks for helping!
Rich
EXTRA:
Here is a screenshot of the config file:
https://i.imgur.com/2UsYZcw.png
This can happen if you have multiple Nagios services running, There could be a secondary instance of the service running which hasn't been updated with the new configuration files as it technically hasn't been restarted. I've had this happen once or twice.
First, shut down Nagios
service nagios stop
Next, kill all remaining instances.
killall -9 nagios
Finally, start Nagios back up
service nagios start
That should fix your problem.

What happens AFTER Apache says "Script timed out before returning headers" to the running script?

I have a Perl web app served by Apache httpd using plain mod_cgi or optionally mod_perl with PerlHandler ModPerl::Registry. Recently the app encountered the error Script timed out before returning headers on some invocations and behaved differently afterwards: While some requests seemed to be processed successfully in the background, after httpd sent status 504 to the client, others didn't.
So how exactly behaves httpd AFTER it reached its configured timeout and sent the error to the client? The request/response cycle is finished now, so I guess things like KeepAlive come into play to decide if the TCP connections stays alive or not etc. But what happens to the running script in which environment, e.g. mod_cgi vs. mod_perl?
Especially in mod_cgi, where new processes are started for each request, I would have guessed that httpd keeps the processes simply running. Because all our Perl files have a shebang, I'm not even sure if httpd is able to track the processes and does so or not. That could be completely different with mod_perl, because in that case httpd is aware of the interpreters and what they are doing etc. In fact, the same operation which timed out using plain mod_cgi, succeeded using mod_perl without any timeout, but even with a timeout in mod_cgi at least one request succeeded afterwards as well.
I find this question interesting for other runtimes than Perl as well, because they share the concepts of plain mod_cgi vs. some persistent runtime embedded into the httpd processes or using some external daemons.
So, my question is NOT about how to get the error message away. Instead I want to understand how httpd behaves AFTER the error occurred, because I don't seem to find much information on that topic. It's all just about increasing configuration values and try to avoid the problem in the first place, which is fine, but not what I need to know currently.
Thanks!
Both mod_cgi and mod_cgid set a cleanup function on the request scope to kill the child process, but they do it slightly different ways. This would happen shortly after the timeout is reported (a little time for mod_cgi to return control, the error response to be written, the request logged, etc)
mod_cgi uses a core facility in httpd that does SIGTERM, sleeps for 3 seconds, then does a SIGKILL.

How to tell supervisor to restart processes when app code changed?

I am new to Tornado and supervisor. I have deployed a tornado app on Debian server and now it is running fine under supervisor/nginx. After that, I made a small change on the app's template file but it does not take effect apparently because the tornado processes need to be restarted. But I don't know to do so. I tried different things like
service supervisor restart
and also in supervisorctl command line I tried restart, reload, update etc.
But the old process are still running and the change in code still not applied. So wondering how to instruct supervisor to restart the app processes and ideally make supervisor sensitive to code change by adding some commands into supervisor.conf
Ok, I figured out. Here is the answer:
supervisor> restart all
and check whether really restarted:
supervisor> status
tornadoes:tornado-8000 RUNNING pid 17697, uptime 0:00:20
tornadoes:tornado-8001 RUNNING pid 17698, uptime 0:00:20
tornadoes:tornado-8002 RUNNING pid 17707, uptime 0:00:19
tornadoes:tornado-8003 RUNNING pid 17712, uptime 0:00:18

Parallel.For Termination vb.net

I have a service that scans network folders using a parallel.for method.
However recently I am finding if I stop the service then while windows says the service is stopped the process is still running in task manager. However it is at 0 cpu and the memory does not change. If I try and end the task (even a force in command prompt) it just says access denied and i have to reboot the server.
What would be the best way to make sure everything terminates?
I thought of adding a global Boolean that in the stop procedure it turns true and part of my parallel code will check for that and call s.stop.
Thank you
In brief, when your service is stopped, it needs to cancel all pending and running operations, then wait for those operation to actually finish. Check out the MSDN reference for Task Cancellation.

Why doesn't a mono service restart after a kill -9?

I ran a mono-service with
mono-service2 -l:lockfile process.exe
It started the service and it was all fine but I had to change something in source. So I recompiled and deployed it. I killed the service by running
kill -9 <pid>
Now I tried to run the service again. But it doesn't start at all. What is the problem here ?
When mono starts a service, it creates a lock in /tmp based on the program name or given parameter. You should stop the service by sending the SIGTERM not SIGKILL signal - if you did so, the lock would be deleted. Now you should manually delete the lock. Read details here.