monitor bash script execution using monit - monit

We have just started using monit for process monitor and pretty much new in monit. I have a bash script at /home/ubuntu/launch_example.sh. This is continuously running. is it possible to monitor this using monit? monit should start the script if it bash scripts terminates. What should be syntax.I tried below syntax but all the commands are not being executed as ubuntu user, like shell script calls some python scripts.
check process launch_example
matching "launch_example"
start program = "/bin/bash -c '/home/ubuntu/launch_example.sh'"
as uid ubuntu and gid ubuntu
stop program = "/bin/bash -c '/home/ubuntu/launch_example.sh'"
as uid ubuntu and gid ubuntu

The simple answer is "no". Monit is just for monitoring and is not some kind of supervisor/process manager. So if you want to monitor your long running executable, you have to wrap it.
check process launch_example with pidfile /run/launch.pid
start program = "/bin/bash -c 'nohup /home/ubuntu/launch_example.sh &'"
as uid ubuntu and gid ubuntu
stop program = "/bin/bash -c 'kill $(cat /run/launch.pid)'"
as uid ubuntu and gid ubuntu
This quick'n'dirty way also needs an additional line for your launch_example.sh to write the pidfile (pidfile matching should always be preferred over string matching) - it could be just the first line after she shebang. It simply writes the current process ID to the pidfile. Nothing fancy here ;)
echo $$ > /run/launch.pid
In fact, it's not even hard to convert your script into a systemd unit. Here is an example on how to. User binding, restarts, pidfile, and "start-on-boot" can then be managed through systemd (eg. start program = "/usr/bin/systemctl start my_unit").

Related

WSL2 Clock is out of sync with Windows

WSL2 clock goes out of sync after resuming from sleep/hibernate.
A workaround was shared on GitHub sudo hwclock -s to resync clock in WSL, but you have to do this every time you resume from sleep/hibernate.
UPDATE: THIS BUG IS FIXED, just check for updates! See the Clock Sync fix
In case anyone finds this via search and doesn't notice that there is actually a solution listed in the question, you can fix WSL clock drift via.
sudo hwclock -s
If you just need to do it occasionally, this is a fine solution. If you need to do it more frequently, consider #piouson's solution
Update
The fix is now in WSL2 Linux kernel 5.10.16.3 and newer! Note you may need to install WSL2 from the Windows Store to get the latest kernel version per this thread with Craig from Microsoft.
Older Answer
sudo hwclock -s gets you kind of there, but for some reason doesn't get the exact time - I often find it's a minute or so in the future!
sudo ntpdate pool.ntp.org should get you the correct time.
But this is all because of a bug in the Linux kernel which should be included in a Windows update at some point...
There are a number of hacks referenced in the the GitHub issue which can work around this, mostly, but not always, in my experience...
just restart wsl, it works fine for me
wsl --shutdown
then
wsl
in PowerShell
UPDATE: as mentioned by drkvogel, the Clock Sync fix was released in WSL2 kernel version 5.10.16.3
OBSOLETE
At time of writing, this GitHub Issue was open for the bug.
The workaround I chose for my situation (single distro in WSL2) is to use Windows Task Scheduler to run hwclock in WSL whenever Windows resyncs hardware clock.
Windows: Open PowerShell as Administrator
schtasks /create /tn WSLClockSync /tr "wsl.exe sudo hwclock -s" /sc onevent /ec system /mo "*[System[Provider[#Name='Microsoft-Windows-Kernel-General'] and (EventID=1)]]"
Set-ScheduledTask WSLClockSync -Settings (New-ScheduledTaskSettingsSet -AllowStartIfOnBatteries)
WSL2: Run sudo visudo and add hwclock to sudoers to skip password prompt
# bottom of my file looks like this
...
...
#includedir /etc/sudoers.d
<username> ALL=(ALL) NOPASSWD:/usr/sbin/hwclock, /usr/bin/apt update, /usr/bin/apt upgrade
Results
See image for how to get Event XPath from Windows Event filtering. Use as provided to let task scheduler auto-display scheduled triggers.
Use cron to schedule sudo hwclock -s
As others said before sudo hwclock -s syncs the clock,
but you will need to do this after every sleep/hibernate.
Solution is to add an hourly cron task to sync the clock.
Open crontab with sudo (must open with sudo since the command uses sudo):
sudo crontab -e
and add this code with a new line after the task (it's a cron requirement):
PATH=/sbin:/bin:/usr/bin
#hourly hwclock -s
You must either set PATH since root-cron do not has it or use absolute paths e.g. /usr/sbin/hwclock.
cron troubleshooting:
To verify cron is working you may add a dummy task (don't forget to add a new line):
* * * * * date > /tmp/log.txt
If no file is created, verify cron is working: pgrep cron.
If no PID shows, start cron with: sudo service cron start.
To learn cron timing method: cron timing generator
Necro'ing this: As of May 2022, this issue persists to a degree.
There are two components.
First, Windows time sync needs to be decent to start with. It's not, out of the box, on machines that aren't domain-joined.
Change w32time to start automatically. In Administrator cmd, but not PowerShell, sc triggerinfo w32time start/networkon stop/networkoff. Verify with sc qtriggerinfo w32time. To get into cmd that way, you can start Admin PowerShell and then just type cmd.
Make a few changes in regedit.
In Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\w32time\Config, set MaxPollInterval to hex c, decimal 12.
Check Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\w32time\Parameters\NtpServer. If it ends in 0x9 you are done. If it ends in 0x1 you need to adjust SpecialPollInterval in Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\w32time\TimeProviders\NtpClient to read 3600
Reboot, then from Powershell run w32tm /query /status /verbose to verify that w32time service did start. If it didn't, check triggers again. If all else fails, set it to Automatic Delayed startup
Second, WSL2 needs to actually stay in sync. MS will likely release another kernel fix. In the meantime a scheduled task can bring it back into sync periodically:
schtasks /Create /TN WSL2TimeSync /TR "wsl -u root hwclock -s" /SC ONEVENT /EC System /MO "*[System[Provider[#Name='Microsoft-Windows-Kernel-Power'] and (EventID=107 or EventID=507) or Provider[#Name='Microsoft-Windows-Kernel-General'] and (EventID=1)]]" /F
This GitHub Issue was closed
You can also run the below command in Powershell Terminal so sync it.
wsl.exe sudo hwclock -s
You can manually update the WSL2 kernel to 5.10.16 by following the method in this comment: #5650 (comment). I have fixed the issue by this method.
I've added this to Windows Task Scheduler, set to run every 12 hours:
wsl.exe -d ubuntu -u root -- ntpdate time.windows.com
To install ntpdate:
sudo apt install ntpdate
For me this issue seems to be happening when the system goes to sleep. So I have registered a bash command to call whenever, it goes out of sync. I did it by adding a function.sh file and sourced it in ~/.bashrc.
function.sh:
YELLOW='\033[0;33m'
NC='\033[0m'
TIME_SERVER=ntp.ubuntu.com
# Sync wsl time
sync_date () {
echo -e "${RED}sudo ntpdate $TIME_SERVER ${NC}"
echo
sudo ntpdate $TIME_SERVER
}
~/.bashrc:
source ~/Linux/funtions.sh
Note that I have added a bit of color and some customizations (TIME_SERVER: [windows time server is other option]).
You can sync the time using sync_date command in cli.

Cant Terminate process which is launched at bootup with at daemon

I have fooinit.rt process launched at boot (/etc/init.d/boot.local)
Here is boot.local file
...
/bin/fooinit.rt &
...
I create an order list at job in order to kill fooinit.rt. that is Triggered in C code
and I wrote a stop script (in)which kill -9 pidof fooinit.rt is written
Here is stop script
#!/bin/sh
proc_file="/tmp/gdg_list$$"
ps -ef | grep $USER > $proc_file
echo "Stop script is invoked!!"
suff=".rt"
pid=`fgrep "$suff" $proc_file | awk '{print $2}'`
echo "pid is '$pid'"
rm $proc_file
When at job timer expires 'kill -9 pid'( of fooinit.rt) command can not terminate fooinit.rt process!!
I checked pid number printed and the sentence "Stop script is invoked!!" is Ok !
Here is "at" job command in C code (I verified that the stop scriptis is called after 1 min later)
...
case 708: /* There is a trigger signal here*/
{
result = APP_RES_PRG_OK;
system("echo '/sbin/stop' | at now + 1 min");
}
...
On the other hand, It works properly in case launching fooinit.rt manually from shell as a ordinary command. (not from /etc/init.d/boot.local). So kill -9 work and terminates fooinit.rt process
Do you have any idea why kill -9 can not terminate foo.rt process if it is launched from /etc/init.d/boot.local
Your solution is built around a race condition. There is no guarantee it will kill the right process (an unknowable amount of time can pass between the ps call and the attempt to make use of the pid), plus it's also vulnerable to a tmp exploit: someone could create a few thousand symlinks under /tmp called "gdg_list[1-32767]" that point to /etc/shadow and your script would overwrite /etc/shadow if it runs as root.
Another potential problem is the setting of $USER -- have you made sure it's correct? Your at job will be called as the user your C program runs as, which may not be the same user your fooinit.rt runs as.
Also, your script doesn't include a kill command at all.
A much cleaner way of doing this would be to run your fooinit.rt under some process supervisor like runit and use runit to shut it down when it's no longer needed. That avoids the pid bingo as well as the /tmp attack vector.
But even using pkill -u username -f fooinit.rt would be less racy than the script you provided.

running same script over many machines

I have setup a few EC2 instances, which all have a script in the home directory. I would like to run the script simultaneously across each EC2 instance, i.e. without going through a loop.
I have seen csshX for OSX for terminal interactive useage...but was wondering what the commandline code is to execute commands like
ssh user#ip.address . test.sh
to run the test.sh script across all instances since...
csshX user#ip.address.1 user#ip.address.2 user#ip.address.3 . test.sh
does not work...
I would like to do this over the commandline as I would like to automate this process by adding it into a shell script.
and for bonus points...if there is a way to send a message back to the machine sending the command that it has completed running the script that would be fantastic.
will it be good enough to have a master shell script that runs all these things in the background? e.g.,
#!/bin/sh
pidlist="ignorethis"
for ip in ip1 ip2
do
ssh user#$ip . test.sh &
pidlist="$pidlist $!" # get the process number of the last forked process
done
# Now all processes are running on the remote machines, and we want to know
# when they are done.
# (EDIT) It's probably better to use the 'wait' shell built-in; that's
# precisely what it seems to be for.
while true
do
sleep 1
alldead=true
for pid in $pidlist
do
if kill -0 $pid > /dev/null 2>&1
then
alldead=false
echo some processes alive
break
fi
done
if $alldead
then
break
fi
done
echo all done.
it will not be exactly simultaneous, but it should kick off the remote scripts in parallel.

Run a php script in background on debian (Apache)

I'm trying to make a push notification work on my debian vps (apace2, mysql).
I use a php script from this tutorial (http://www.raywenderlich.com/3525/apple-push-notification-services-tutorial-part-2).
Basically, the script is put in an infintive loop, that check a mysql table for new records every couple of seconds. The tutorial says it should be run as a background process.
// This script should be run as a background process on the server. It checks
// every few seconds for new messages in the database table push_queue and
// sends them to the Apple Push Notification Service.
//
// Usage: php push.php development &
So I have four questions.
How do I start the script from the terminal? What should I type? The script location on the server is:
/var/www/development_folder/scripts/push2/push.php
How can I kill it if I need to (without having to restart apace)?
Since the push notification is essential, I need a way to check if the script is running.
The code (from the tutorial) calls a function is something goes wrong:
function fatalError($message)
{
writeToLog('Exiting with fatal error: ' . $message);
exit;
}
Maybe I can put something in there to restart the script? But It would also be nice to have a cron job or something that check every 5 minute or so if the script is running, and start it if it doens't.
4 - Can I make the script automatically start after a apace or mysql restart? If the server crash or something else happens that need a apace restart?
Thanks a lot in advance
You could run the script with the following command:
nohup php /var/www/development_folder/scripts/push2/push.php > /dev/null &
The nohup means that that the command should not quit (it ignores hangup signal) when you e.g. close your terminal window. If you don't care about this you could just start the process with "php /var/www/development_folder/scripts/push2/push.php &" instead. PS! nohup logs the script output to a file called nohup.out as default, if you do not want this, just add > /dev/null as I've done here. The & at the end means that the proccess will run in the background.
I would only recommend starting the push script like this while you test your code. The script should be run as a daemon at system-startup instead (see 4.) if it's important that it runs all the time.
Just type
ps ax | grep push.php
and you will get the processid (pid). It will look something like this:
4530 pts/3 S 0:00 php /var/www/development_folder/scripts/push2/push.php
The pid is the first number you'll see. You can then run the following command to kill the script:
kill -9 4530
If you run ps ax | grep push.php again the process should now be gone.
I would recommend that you make a cronjob that checks if the php-script is running, and if not, starts it. You could do this with ps ax and grep checks inside your shell script. Something like this should do it:
if ! ps ax | grep -v grep | grep 'push.php' > /dev/null
then
nohup php /var/www/development_folder/scripts/push2/push.php > /dev/null &
else
echo "push-script is already running"
fi
If you want the script to start up after booting up the system you could make a file in /etc/init.d (e.g. /etc.init.d/mypushscript with something like this inside:
php /var/www/development_folder/scripts/push2/push.php
(You should probably have alot more in this file)
You would also need to run the following commands:
chmod +x /etc/init.d/mypushscript
update-rc.d mypushscript defaults
to make the script start at boot-time. I have not tested this so please do more research before making your own init script!

How to shorten an inittab process entry, a.k.a., where to put environment variables that will be seen by init?

I am setting up a Debian Etch server to host ruby and php applications with nginx. I have successfully configured inittab to start the php-cgi process on boot with the respawn action. After serving 1000 requests, the php-cgi worker processes die and are respawned by init. The inittab record looks like this:
50:23:respawn:/usr/local/bin/spawn-fcgi -n -a 127.0.0.1 -p 8000 -C 3 -u someuser -- /usr/bin/php-cgi
I initially wrote the process entry (everything after the 3rd colon) in a separate script (simply because it was long) and put that script name in the inittab record, but because the script would run its single line and die, the syslog was filled with errors like this:
May 7 20:20:50 sb init: Id "50" respawning too fast: disabled for 5 minutes
Thus, I got rid of the script file and just put the whole line in the inittab. Henceforth, no errors show up in the syslog.
Now I'm attempting the same with thin to serve a rails application. I can successfully start the thin server by running this command:
sudo thin -a 127.0.0.1 -e production -l /var/log/thin/thin.log -P /var/run/thin/thin.pid -c /path/to/rails/app -p 8010 -u someuser -g somegroup -s 2 -d start
It works apparently exactly the same whether I use the -d (daemonize) flag or not. Command line control comes immediately back (the processes have been daemonized) either way. If I put that whole command (minus the sudo and with absolute paths) into inittab, init complains (in syslog) that the process entry is too long, so I put the options into an exported environment variable in /etc/profile. Now I can successfully start the server with:
sudo thin $THIN_OPTIONS start
But when I put this in an inittab record with the respawn action
51:23:respawn:/usr/local/bin/thin $THIN_OPTIONS start
the logs clearly indicate that the environment variable is not visible to init; it's as though the command were simply "thin start."
How can I shorten the inittab process entry? Is there another file than /etc/profile where I could set the THIN_OPTIONS environment variable? My earlier experience with php-cgi tells me I can't just put the whole command in a separate script.
And why don't you call a wrapper who start thin whith your options?
start_thin.sh:
#!/bin/bash
/usr/local/bin/thin -a 127.0.0.1 -e production -l /var/log/thin/thin.log -P /var/run/thin/thin.pid -c /path/to/rails/app -p 8010 -u someuser -g somegroup -s 2 -d start
and then:
51:23:respawn:/usr/local/bin/start_thin
init.d script
Use a script in
/etc/rc.d/init.d
and set the runlevel
Here are some examples with thin, ruby, apache
http://articles.slicehost.com/2009/4/17/centos-apache-rails-and-thin
http://blog.fiveruns.com/2008/9/24/rails-automation-at-slicehost
http://elwoodicious.com/2008/07/15/nginx-haproxy-thin-fastcgi-php5-load-balanced-rails-with-php-support/
Which provide example initscripts to use.
edit:
Asker pointed out this will not allow respawning. I suggested forking in the init script and disowning the process so init doesn't hang (it might fork() the script itself, will check). And then creating an infinite loop that waits on the server process to die and restarts it.
edit2:
It seems init will fork the script. Just a loop should do it.