How to properly manage rabbitmq with supervisord - rabbitmq
The current section in my supervisord.conf looks like:
[program:rabbitmq]
command=/usr/sbin/rabbitmq-server
When I try to stop the rabbitmq with supervisord ( supervisorctl stop rabbitmq), the rabbitmq processes simply do not shut down. The rabbitmq documentation also mentions to never use kill but rather use rabbitmqctl stop . I'm guessing supervisord simply kills the processes - hence the poor results with rabbitmq. I couldn't find any options in supervisord to specify a custom stop command.
Do you have any recommendations?
My solution is to write a wrapper script named rabbitmq.sh as follows:
# call "rabbitmqctl stop" when exiting
trap "{ echo Stopping rabbitmq; rabbitmqctl stop; exit 0; }" EXIT
echo Starting rabbitmq
rabbitmq-server
After that, modify supervisord.conf:
[program:rabbitmq]
command=path/to/rabbitmq.sh
You have answered your own question. In normal operation, never use kill on any process unless that is the documented normal way of managing it. In the case of RabbitMQ, the documented process is to use rabbitmqctl stop or to use rabbitmqserver stop.
There is no good reason to manage RabbitMQ with anything more sophisticated than a shell script that makes one attempt to restart via rabbitmqserver start. If that doesn't work right away, then RabbitMQ is down hard due to something like, lack of RAM, ran out of disk space, or a rogue system management tool deleted some of the rabbitmq binary components.
In normal operation RabbitMQ has an internal supervisor that will attempt to shutdown and restart RabbitMQ, so if you delete binaries, it will fail to restart. When using tools like chef, puppet, cfengine, don't repeatedly push out binary package files. Just check that everything is there as it should be.
This script starts RabbitMQ as a background process (using '&') which causes a pid file to be updated/created (see 'wait' under http://www.rabbitmq.com/man/rabbitmqctl.1.man.html).
After rabbit has been started, a loop is used to verify that the pid is still running. If rabbit crashes or is manually shut down (outside of supervisord) then the script will exit with 1 and supervisord takes over.
The echo >> ./rmq.txt file is there for debugging purposes and can be commented out in production (I used this to monitor the startup/shutdown/died status).
supervisord is happy because it can see a running process and an EXIT will trigger the stop_rmq function which calls 'rabbitmqctl stop' for a clean shutdown.
#!/bin/bash
# Script to manage RMQ with supervisord
# Shut down rmq
function stop_rmq {
echo "Stopping RabbitMQ..."
echo "Stopping RabbitMQ..." >> ./rmq.txt
rabbitmqctl stop
echo "RabbitMQ stopped"
echo "RabbitMQ stopped" >> ./rmq.txt
#exit 0
}
# Set up the trap
#trap stop_rabbit TERM KILL HUP INT SIGTERM SIGKILL SIGHUP SIGINT
trap stop_rmq exit
# Start rmq
echo "Starting RabbitMQ..."
echo "Starting RabbitMQ..." >> ./rmq.txt
# Start Rabbitmq in the background (causes the pid file to be updated)
# Note that the pid file location can be overridden with the rmq 'RABBITMQ_PID_FILE' variable
/usr/sbin/rabbitmq-server &
rabbitmqctl wait /var/lib/rabbitmq/mnesia/rabbit#$HOSTNAME.pid
echo "RabbitMQ Started"
echo "RabbitMQ Started" >> ./rmq.txt
while true; do
#ps $(cat /var/lib/rabbitmq/mnesia/rabbit#$HOSTNAME.pid)
ps -o pid,cmd,etime $(cat /var/lib/rabbitmq/mnesia/rabbit#$HOSTNAME.pid)
if (($? > 0)); then
echo "RabbitMQ Died"
echo "RabbitMQ Died" >> ./rmq.txt
exit 1
fi
#echo "Sleeping..."
sleep 10
done
Here's the output generated by the script to supervisord:
foo#bar:/# supervisorctl tail rmq
Starting RabbitMQ...
Waiting for rabbit#a2d2c8f9cad2 ...
pid is 45220 ...
RabbitMQ 3.3.5. Copyright (C) 2007-2014 GoPivotal, Inc.
## ## Licensed under the MPL. See http://www.rabbitmq.com/
## ##
########## Logs: /var/log/rabbitmq/rabbit#a2d2c8f9cad2.log
###### ## /var/log/rabbitmq/rabbit#a2d2c8f9cad2-sasl.log
##########
Starting broker... completed with 0 plugins.
...done.
RabbitMQ Started
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 00:05
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 00:15
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 00:25
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 00:35
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 00:45
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 00:55
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 01:05
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 01:15
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 01:25
I would advise you to use Monit (http://mmonit.com/), it is better suited for daemons such as RabbitMQ and it is also feature rich.
First of all, you must install the Monit package. If you are under Ubuntu/Debian:
sudo apt-get update
sudo apt-get install monit
Afterwards, you must create a configuration script.
Here is a sample script to get you running (place it on /etc/monit/conf.d/):
set daemon 1800
set logfile /var/log/monit.log
check process rabbit with pidfile /var/run/rabbitmq/pid
start program = "/etc/init.d/rabbitmq-server start"
stop program = "/etc/init.d/rabbitmq-server stop"
noalert foo#bar
Then, just restarts monit and you are finished:
sudo /etc/init.d/monit restart
Related
monitor bash script execution using monit
We have just started using monit for process monitor and pretty much new in monit. I have a bash script at /home/ubuntu/launch_example.sh. This is continuously running. is it possible to monitor this using monit? monit should start the script if it bash scripts terminates. What should be syntax.I tried below syntax but all the commands are not being executed as ubuntu user, like shell script calls some python scripts. check process launch_example matching "launch_example" start program = "/bin/bash -c '/home/ubuntu/launch_example.sh'" as uid ubuntu and gid ubuntu stop program = "/bin/bash -c '/home/ubuntu/launch_example.sh'" as uid ubuntu and gid ubuntu
The simple answer is "no". Monit is just for monitoring and is not some kind of supervisor/process manager. So if you want to monitor your long running executable, you have to wrap it. check process launch_example with pidfile /run/launch.pid start program = "/bin/bash -c 'nohup /home/ubuntu/launch_example.sh &'" as uid ubuntu and gid ubuntu stop program = "/bin/bash -c 'kill $(cat /run/launch.pid)'" as uid ubuntu and gid ubuntu This quick'n'dirty way also needs an additional line for your launch_example.sh to write the pidfile (pidfile matching should always be preferred over string matching) - it could be just the first line after she shebang. It simply writes the current process ID to the pidfile. Nothing fancy here ;) echo $$ > /run/launch.pid In fact, it's not even hard to convert your script into a systemd unit. Here is an example on how to. User binding, restarts, pidfile, and "start-on-boot" can then be managed through systemd (eg. start program = "/usr/bin/systemctl start my_unit").
Is it possible to run local servers on AWS-CodeBuild?
Good Morning, I'm using CodeBuild to test my application, I was wondering if its possible to run a local Server inside a build. I create a NPM script to start a local server, but every time I ran de tests, the CodeBuild pass through the command without waiting. I searched on AWS Documentation and they say to use "nohup" command, but It doesn't work for me. Just to be clear, my expectations is that CodeBuild ran the command, wait to be finished and proceed to another command without closing the open server. Any of you guys have an idea? Command: - nohup yarn start-server
Start a background process and wait for it to complete later: nohup sleep 30 & echo $! > pidfile … wait $(cat pidfile) Start a background process and do not wait for it to ever complete: nohup sleep 30 & disown $! Start a background process and kill it later: nohup sleep 30 & echo $! > pidfile … kill $(cat pidfile) https://docs.aws.amazon.com/codebuild/latest/userguide/build-env-ref-background-tasks.html
monit insists on timing out a program that's running fine
I'm having a problem monitoring a program using monit. I'm running this on a raspberry pi, having built monit 5.11 from source; I tried using the version from the repositories, but it was 5.4 and didn't support some of syntax below that I want. I'm trying to follow the "Q: I have a program that does not create its own pid file. Since monit requires all programs to have a pid file, what do I do?" entry in the FAQ. Here's my start_sensors.sh script (which just runs my python program, instead of the java program in the wiki example): #!/bin/bash case $1 in start) echo $$ > /var/run/start_sensors.pid; exec 2>&1 /usr/bin/python /home/pi/temperature/post_temps.py 1>/tmp/post_temps.out ;; stop) kill `cat /var/run/start_sensors.pid` ;; *) echo "usage: start_sensors {start|stop}" ;; esac exit 0 Here's my /etc/monit/monitrc entry: # Run temperature sensor monitor check process start_sensors.sh with pidfile /var/run/start_sensors.pid start = "/home/pi/temperature/start_sensors.sh start" stop = "/home/pi/temperature/start_sensors.sh stop" The output in the monit log looks like: [EST Jan 24 14:21:16] info : 'raspberrypi' Monit reloaded [EST Jan 24 14:21:16] error : 'start_sensors.sh' process is not running [EST Jan 24 14:21:16] info : 'start_sensors.sh' trying to restart [EST Jan 24 14:21:16] info : 'start_sensors.sh' start: /home/pi/temperature/start_sensors. sh [EST Jan 24 14:21:46] error : 'start_sensors.sh' failed to start (exit status -1) -- Program /home/pi/temperature/start_sensors.sh timed out So as you can see, monit starts up the program, it runs fine, and then monit kills it thirty seconds later due to the "timeout". My program is running fine, and producing the proper output that I'm sending to the /tmp/post_temps.out file. I don't understand why monit is timing the program out... it's supposed to be a long-running process! I've tried changing the start_sensors.sh script so that it puts the program in the background (and has it write its own /var/run/start_sensors.pid file), but then monit starts a new instance up every thirty seconds or so, not stopping the old ones, and writing over the pid file. It's like it's not even looking at the pid file. THANKS!
The following works: #!/bin/bash case $1 in start) /usr/bin/python /home/pi/temperature/post_temps.py 1>/tmp/post_temps.out & echo $! > /var/run/start_sensors.pid ; ;; stop) kill `cat /var/run/start_sensors.pid` ;; *) echo "usage: start_sensors {start|stop}" ;; esac exit 0
Redis Daemon not creating a PID file
The Redis startup script is supposed to create a pid file at startup, but I've confirmed all the settings I can find, and no pid file is ever created. I installed redis by: $ yum install redis $ chkconfig redis on $ service redis start In my config file (/etc/redis.conf) I checked to make sure these were enabled: daemonize yes pidfile /var/run/redis/redis.pid And in the startup script (/etc/init.d/redis) there is: exec="/usr/sbin/$name" pidfile="/var/run/redis/redis.pid" REDIS_CONFIG="/etc/redis.conf" [ -e /etc/sysconfig/redis ] && . /etc/sysconfig/redis lockfile=/var/lock/subsys/redis start() { [ -f $REDIS_CONFIG ] || exit 6 [ -x $exec ] || exit 5 echo -n $"Starting $name: " daemon --user ${REDIS_USER-redis} "$exec $REDIS_CONFIG" retval=$? echo [ $retval -eq 0 ] && touch $lockfile return $retval } stop() { echo -n $"Stopping $name: " killproc -p $pidfile $name retval=$? echo [ $retval -eq 0 ] && rm -f $lockfile return $retval } These are the settings that came by default with the install. Any idea why no pid file is created? I need to use it for Monit. (The system is RHEL 6.4 btw)
For those experiencing on Debian buster: Editing nano /etc/systemd/system/redis.service and adding this line below redis [Service] ExecStartPost=/bin/sh -c "echo $MAINPID > /var/run/redis/redis.pid" It suppose to look like this: [Service] Type=forking ExecStart=/usr/bin/redis-server /etc/redis/redis.conf ExecStop=/bin/kill -s TERM $MAINPID ExecStartPost=/bin/sh -c "echo $MAINPID > /var/run/redis/redis.pid" PIDFile=/run/redis/redis-server.pid then: sudo systemctl daemon-reload sudo systemctl restart redis.service Check redis.service status: sudo systemctl status redis.service The pid file now should appear.
On my Ubuntu 18.04, I was getting the same error. Error reported by redis (on /var/log/redis/redis-server.log): # Creating Server TCP listening socket ::1:6379: bind: Cannot assign requested address This is because I've disabled IPv6 on this host and redis-server package (version 5:4.0.9-1) for Ubuntu comes with: bind 127.0.0.1 ::1 Editing /etc/redis/redis.conf and removing the ::1 address solves the problem. Example: bind 127.0.0.1 Edit: As pointed out in the comments (thanks to #nicholas-vasilaki and #tommyalvarez), by default redis only allows connections from localhost. Commenting all the line, using: # bind 127.0.0.1 ::1 works, but makes redis listen from the network (not only from localhost). More details can be found in redis configuration file.
Problem was that the user redis did not have permission to create the pid file (or directory it was in). Fix: sudo mkdir /var/run/redis sudo chown redis /var/run/redis Then I killed and restarted redis and sure enough, there was redis.pid
In CentOs 7 i need to add to the file: $ vi /usr/lib/systemd/system/redis.service The next line: ExecStartPost=/bin/sh -c "echo $MAINPID > /var/run/redis/redis.pid" And then restart the service: $ sudo systemctl daemon-reload $ sudo systemctl restart redis.service Reference: CentOs 7: Systemd & PID File
i had a similar problem on Debian Buster, systemd complains about the missing PID file, even though the file exists and redis is running. on my system the solution using "echo $MAINPID > /run/redis/redis.pid" works by accident, although/because the real PID file is set to /run/redis/redis-server.pid (spot the different filenames!) and on my system the content of /run/redis/redis.pid (the one of the echo) was empty. in a discussion on systemd-devel#lists.freedesktop.org someone writes: ... systemd will add the MAINPID environment variable any time it knows what the main PID is. It learns this by reading the PID file ... So by the time ExecStartPost runs, the main PID may or may not be known. having an empty MAINPID environment variable can be even harmful: if you notice the different PID filenames in the suggested solution, and correct it, you may end up in a situation where the PID file written by redis gets overwritten by an empty file. this happened to me, the result was that systemctl start redis.service never finished. i also noticed that another server with 100% same OS and configuration, but different hardware did not have this problem. my conclusion is that it just hits some sort of race condition, systemd seems to look for a PID file just a little too early. on my system, whatever command i used as ExecStartPost, it will add enough delay to make the error disappear. therefore a solution is to use "sleep 1" (sleep 0.1 works too, but 1 second may be on the safe side): ExecStartPost=/bin/sleep 1 /etc/systemd/system/redis.service now looks like: [Service] Type=forking ExecStart=/usr/bin/redis-server /etc/redis/redis.conf ExecStartPost=/bin/sleep 1 ExecStop=/bin/kill -s TERM $MAINPID PIDFile=/run/redis/redis-server.pid ... an alternative solution is to use "supervised systemd": /etc/redis/redis.conf: # If you run Redis from upstart or systemd, Redis can interact with your # supervision tree. Options: # supervised no - no supervision interaction # supervised upstart - signal upstart by putting Redis into SIGSTOP mode # supervised systemd - signal systemd by writing READY=1 to $NOTIFY_SOCKET # supervised auto - detect upstart or systemd method based on # UPSTART_JOB or NOTIFY_SOCKET environment variables # Note: these supervision methods only signal "process is ready." # They do not enable continuous liveness pings back to your supervisor. supervised systemd override the redis-server.service file using: systemctl edit redis-server.service and enter the following: [Service] Type=notify reload the service and the error should be gone: sudo systemctl restart redis.service sudo systemctl status redis.service
Here from 2018 Before start, I am on Ubuntu 18.04.I wrote this if anyone comes here by searching same error. In my case error is the same but problem is so different. No solutions that proposed here worked. So I checked logs if they are exist and looked for is there anything useful. Found them on; cat /var/log/redis/redis-server.log Searched logs and found that problem is that another service is listening same port. 2963:C 21 Sep 11:07:33.007 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 2963:C 21 Sep 11:07:33.008 # Redis version=4.0.9, bits=64, commit=00000000, modified=0, pid=2963, just started 2963:C 21 Sep 11:07:33.008 # Configuration loaded 2974:M 21 Sep 11:07:33.009 # Creating Server TCP listening socket 127.0.0.1:6379: bind: Address already in use I checked who is listening. netstat anp | grep 6379 Found it. tcp6 0 0 :::6379 :::* LISTEN 3036/docker-proxy It was docker image of redis that installed by another tool root#yavuz:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a6a94d401700 redis:3.2 "docker-entrypoint.s…" 20 hours ago Up 3 hours 0.0.0.0:6379->6379/tcp incubatorsuperset_redis_1 So I stopped docker image root#yavuz:~# docker stop incubatorsuperset_redis_1 And redis-server started without problem. root#yavuz:~# systemctl start redis-server root#yavuz:~# systemctl status redis-server ● redis-server.service - Advanced key-value store Active: active (running) since Fri 2018-09-21 11:10:34 +03; 1min 49s ago Process: 3671 ExecStart=/usr/bin/redis-server /etc/redis/redis.conf (code=exited, status=0/SUCCESS)
For CentOS: In my case name of Redis server is redis.service, start it edit systemctl edit redis.service Add this: [Service] ExecStartPost=/bin/sh -c "echo $MAINPID > /var/run/redis/redis.pid" PIDFile=/var/run/redis/redis.pid Im my case it create file: /etc/systemd/system/redis.service.d/override.conf After restart service: systemctl daemon-reload systemctl restart redis And the pid file is: cat /var/run/redis/redis.pid => 19755
sudo nano /etc/redis/redis.conf Inside the file, find the supervised directive. This directive allows you to declare an init system to manage Redis as a service, providing you with more control over its operation. The supervised directive is set to no by default. Since you are running Ubuntu, which uses the systemd init system, change this to systemd.
My default, Redis does not run as a daemon, and that is why it does not create a pid file. If you look at /etc/redis/redis.conf, it says so explicitly under General. #By default Redis does not run as a daemon. Use 'yes' if you need it... daemonize no So all you need to do is to change it to daemonize yes
For people struggling with getting it to work on Ubuntu 18.04 you need to edit /etc/redis/redis.conf and update the pidfile declaration to following: pidfile "/var/run/redis/redis-server.pid"
Ubuntu 18. /var/run/redis had the wrong permissions: drwxr-sr-x 2 redis redis 60 Apr 27 12:22 redis Changed to 755 (drwxrwxr-x) and the pid file now appears.
Unable to start RabbitMQ
I have Googled so much, and not got any proper answer.So , I am posting this question for better result. I have already killed the RabbitMQ server process . Now when I am trying to start it again, it shows Command rabbitmqctl start_app Error {error_logger,{{2013,11,4},{11,26,8}},"Cookie file /ngs/app/ttet/.erlang.cookie must be accessible by owner only",[]} {error_logger,{{2013,11,4},{11,26,8}},crash_report,[[{initial_call,{auth,init,['Argument__1']}}, {pid,<0.18.0>},{registered_name,[]},{error_info,{exit,{"Cookie file /ngs/app/curot/.erlang.cookie must be accessible by owner only",[{auth,init_cookie,0,[{file,"auth.erl"}, {line,285}]},{auth,init,1,[{file,"auth.erl"},{line,139}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,297}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"}, {line,227}]}]},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,321}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}},{ancestors,[net_sup,kernel_sup,<0.9.0>]},{messages,[]},{links,[<0.16.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,610},{stack_size,24},{reductions,401}],[]]} {error_logger,{{2013,11,4},{11,26,8}},supervisor_report,[{supervisor,{local,net_sup}}, {errorContext,start_error},{reason,{"Cookie file /ngs/app/ttet/.erlang.cookie must be accessible by owner only",[{auth,init_cookie,0,[{file,"auth.erl"},{line,285}]},{auth,init,1,[{file,"auth.erl"},{line,139}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,297}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}},{offender,[{pid,undefined},{name,auth},{mfargs,{auth,start_link,[]}},{restart_type,permanent},{shutdown,2000},{child_type,worker}]}]} {error_logger,{{2013,11,4},{11,26,8}},supervisor_report,[{supervisor,{local,kernel_sup}},{errorContext,start_error},{reason,shutdown},{offender,[{pid,undefined},{name,net_sup},{mfargs,{erl_distribution,start_link,[]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]}]} {error_logger,{{2013,11,4},{11,26,8}},std_info,[{application,kernel},{exited,{shutdown,{kernel,start,[normal,[]]}}},{type,permanent}]} {"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}}"} Crash dump was written to: erl_crash.dump Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}}) Erlang is running . Is it require to kill Erlang process. [ttet#addr:17.566.98.656 Erlang]$:/ngs/app/ttet> ps -ef | grep erlang ttet 13813 10547 0 11:57 pts/0 00:00:00 grep erlang ttet 32155 1 0 Oct08 ? 00:00:14 /ngs/app/ttet/softwares/Erlang/lib/erlang/erts-5.9/bin/epmd -daemon
This helped me: chmod 600 ~/.erlang.cookie rabbitmqctl start_app
You can use rabbitmqctl start_app only after you call rabbitmqctl stop_app. These commands starts/stops RabbitMQ application, not Erlang node. If you really killed RabbitMQ node you need to call rabbitmq-server to start RabbitMQ. Check is there RabbitMQ node running you can calling ps -ef | grep rabbit. Also from your logs I figured out that the reason of errors is not appropriate .erlnag.cookie file access mode - {error_info,{exit,{"Cookie file /ngs/app/curot/.erlang.cookie must be accessible by owner only".... Try to change it chmod 600 /ngs/app/curot/.erlang.cookie and start RabbitMQ server again. It is not require to kill Erlang epmd as it is a daemon that acts as a name server on all hosts involved in distributed Erlang computations and does not interfere on you RabbitMQ instance.
I have solved this. First step is I have changed the permission to /ngs/app/curot/.erlang.cookie. And 2nd step I used rabbitmq-server -detached command for start the rabbitmq. Now its working for me.