Monit - stop service and stay stopped? - monit

I have a daemon which runs via the usual init.d/service scripts.
I have monit running which ensures these daemons are restarted if they crash.
I have a request that 'service foo stop' should stop the deamon, and because it was explicitly stopped, not a crash, monit should not restart it. How can I achieve this with monit?
I could have the service script's stop() routine call 'monit unmonitor' but this seems circular and wrong.
Thanks,
Dave

I think you should use monit stop foo instead of service foo stop. That way Monit is aware that the service didn't crash -- and won't restart it.

There is a MODE param for that:
Monit supports three monitoring modes per service: active, passive and manual.
Syntax:
MODE
In active mode (the default), Monit will pro-actively monitor a service and in case of problems raise alerts and/or restart the service.
In passive mode, Monit will passively monitor a service and will raise alerts, but will not try to fix a problem by executing start, stop or restart.
In manual mode, Monit will enter active mode only if a service was started via Monit
From here: https://mmonit.com/monit/documentation/monit.html#SERVICE-MONITORING-MODE
This way if you manage services via runit or upstart and just want to use monit for alerts and dashboards you simply set for all such services mode to passive.
For example:
check process heka with pidfile /etc/sv/myservice/supervise/pid
start program = "/usr/bin/sv start myservice"
stop program = "/usr/bin/sv stop myservice"
mode passive
If you need to enable/disable that online but not permanently -- please refer to other people's answers, they are fine.

The model is:
Monit runs as a service by init.d and therefore controlled (stop/start/restart) by init.d . (Others, please me if I am wrong).
Applications that require to be monitored are handled by monit.
Therefore, such applications should be only controlled i.e. stop/start/restart via monit.
monit

SET ONREBOOT LASTSTATE
As per: https://mmonit.com/monit/documentation/monit.html#SYSTEM-REBOOT-AND-SERVICE-STARTUP

Related

Failing a Windows Service with an exit code

I want my service to be able to be restarted remotely (by a TCP client which is not part of this question). I configured the service to restart on failure on the Recovery tab for my service. In my code I set the ServiceBase.ExitCode to a non-zero number, say 1. I did not use Environment.Exit to stop the service because it isn't necessary to terminate the process. When I test my service it stops correctly and the Windows System Log reports that my service has stopped with an error. It also names the error. But my service does not restart! When I instead use Environment.Exit(1) the Windows System Log reports that my service has stopped unexpectedly without naming the error. It then does restart the service as if it has failed (like it should).
My question is, why doesn't the service restart with just a non-zero exitcode? The service stops with an error but that isn't failing? Is Environment.Exit the only way to properly trigger a service restart on failure? I liked using the ExitCode better because the System Log is cleaner and more accurate that way.
Did you check the "Enable actions for stops with errors" checkbox on the Recovery tab?
From the technical documentation, the service's exit code is only consulted if that option is checked.

mod_wsgi DaemonProcess mode gets problem with httpd graceful reload

I'm using httpd -k graceful to dynamically reload my server, and I use time.sleep in python code to make a slow request, and I expected the active requests would't be interrupted after apache reload. But it did.
So I tried a simple python server using CGI, it works well. Then I tried mod_wsgi using apache process (only specifying WSGIScriptAlias), and it works well, too.
So I found that the problem is the WSGIDaemonProcess, which I originally used.
Then in the mod_wsgi doc I found this:
eviction-timeout=sss
When a daemon process is sent the graceful restart signal, usually SIGUSR1, to restart a process, this timeout controls how many seconds the process will wait, while still accepting new requests, before it reaches an idle state with no active requests and shutdown.
If this timeout is not specified, then the value of the graceful-timeout will instead be used. If the graceful-timeout is not specified, then the restart when sent the graceful restart signal will instead happen immediately, with the process being forcibly killed, if necessary, when the shutdown timeout has expired.
when I thought I'm going to find the reason, I found that these arguments(and i tried graceful-timeout too) didn't work at all.The requests were still interrupted by graceful reload. So why?
I'm using apache 2.4.6, with mpm mode prefork. And modwsgi 4.6.5, I compiled it myself and replaced my old-version mod_wsgi.so with it.
answer from GrahamDumpleton#Github: (https://github.com/GrahamDumpleton/mod_wsgi/issues/383)
What you are seeing is exactly as expected. Apache does not pass graceful restart signals onto managed sub processes, it only passes them onto its own child worker processes. For managed processes it will send a SIGTERM and it will brutally kill them after 3 or 5 seconds (can't remember exactly how long) if they haven't shutdown. There is no way around it. It is a limitation of Apache.
The eviction timeout thus only applies as the docs say to when a 'daemon process' is sent a graceful restart signal directly. That is, restarting Apache as a whole gracefully doesn't do anything, but send the graceful restart signal to the pid of the daemon processes themselves will.
So the only solution if this behaviour is important is to ensure you use display-name option to WSGIDaemonProcess directive so daemon processes named uniquely compared to Apache processes, and then send signals to them direct only.
Usually this only becomes an issue because some Linux systems completely ignore the fact that Apache has a perfectly good log file rotation system and instead do external log file rotation by renaming log files once a day and then attempting a graceful restart. People will see issues with interrupted requests they don't expect. In this case you should use Apache's own log file rotation mechanism if it is important and not rely on external log file rotation systems.

How to use rotate_logs on a log file that is 80+gb's for RabbitMQ on windows server

I need to run rabbitmqctl rotate_logs on a rabbitmq log file that is over 80gb's in size. When I tried to run this the first time it froze rabbit and no messages could be received. The freeze lasted 20 mins before I had to kill the command and restart the rabbit server.
This is a production server and completing this in a small amount of time without losing messages or killing the broker would be optimal.
Would it be possible to shut down the service and move the current log file to another location and restart the service and then run the rotate_logs command?
I'm fairly new to rabbitmq and I am not sure what the best way to handle this would be.
This is installed on a windows 2008 server as a service for a heavy traffic production site (However the message queue has a small load and only affects the administrative side of things).
Any help or insight would be appreciated.
I ran into a similar situation, but with only about 4GB of log file instead of 80.
the workaround I used was pretty much what you suggested... stop the service, move the log file and restart the service as quickly as possible.
for me, specifically, instead of moving the file while the service was stopped i just renamed it. i also wrote a commandline script to do the work for me.
this allowed me to stop the service, rename the file and restart the service in a matter of seconds.
once the service was back up and running, i was free to move / rename / whatever the large log file as needed.

How to tell if a Windows server is in the act of shutting down

I'm working on Windows Server 2008. Is there a way to tell (via the command line) if a server is in the act of shutting down? I searched for this but have been unable to find a way.
Your best bet would be to query the system event log. If someone runs the shutdown command, an event gets logged. I am sure if you have something else initiating the shutdown that there will be event logs indicating that it is going down.
Get-EventLog system -Newest 20 -Source User32
Running this on my machine lists several shutdown events from different processes.
The process C:\WINDOWS\system32\winlogon.exe (computername) has initiated the
restart of computer COMPUTERNAME on behalf of user KEVMAR for the following
reason: No title for this reason could be found
This would be a good starting point.
A trick I use is to just run Shutdown /a and it will tell me if it stopped a shutdown.

asadmin start-domain fails when remote JMS queue is unreachable

I have 2 servers A and B running a glassfish 3.1.2.2 application server on them. Both use a JMS queue for communication, which works fine so far. If the network connection breaks for any reason, I can see in the logs of server B (the one configured to connect to the remote queue of A) that it tries to reconnect and is actually always successful in doing so as soon as A is up again.
But the problem is, that if I try to restart the glassfish instance on B while server A is unreachable, the startup process will fail after some retries and remains stuck in a kind of undefined/unusable state, i.e. the java process is started, some ports are open but the applications are not started - not even the administration console.
IMHO glassfish startup process should not wait for the queues to connect, this should be done in some kind of background process.
Has anyone of you experienced something similar? Is there anything I can configure/tune to fix this behaviour?
Never mind, it seems to have fixed itself :(
After restarting the computer,removing the deployed ear and deploying it again it just worked. I haven't experienced this behaviour since then.