Nagios nrpe command fails but local command works - nrpe

I'm using a custom script to check physical memory.
https://exchange.nagios.org/components/com_mtree/attachment.php?link_id=3329&cf_id=24
(i added the performance data)
Locally run with this:
/usr/lib64/nagios/plugins/check_custom_memory.sh
output:
OK - 30405 MB (96%) Free Memory | total=31513MB used=1108MB
When I run it from the nagios server with this command (hid actual IP for security reasons):
/usr/lib64/nagios/plugins/check_nrpe -t 60 -H xxx.xxx.xxx.xxx -c check_custom_memory.sh -a 10 5
output:
CRITICAL - 30405 MB (%) Free Memory | total=31513MB used=1108MB
It seems that the check_nrpe is excluding the % value. This happens only on this server and not my other servers. All other checks run fine. Any other nrpe check to the remote server works fine too. It seems to be just this one check. It makes me think it's the script but it works for other servers and locally, so i'm at a loss.
the /tmp/memcalc file has 666 permissions and owned by nrpe on the remote server, and I can see it's being written like it should when run locally. When running with check_nrpe, the file is not being accessed or written.
Any ideas why?

I believe I found the issue. Seems like it has something to do with selinux running. Normally we don't use it but this server has it running. It seems to stop access to writing to the file that's created in the /tmp directory to calculate the percentage of free memory.
As a result. I just rewrote the script to not use a temp file and to calculate the percentage using simple math and not being accurate (which is fine).

Related

Cannot ssh into Google-Engine, connecting in a loop

I am unable to connect through SSH to my GCE instance. I was connecting without any problem, the only think I changes was my user name through top right corner of the browser then selected Change Linux Username.
When I try to ssh into my google engine via browser, I keep having following message in a endless loop:
When I try to ssh via cloud shell I also get following error message, (serial console output):
Permission denied (publickey).
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
[Q] Is there any way to fix this problem? Since I have no access to the engine now, I don't know what to do.
However you could always get back access through serial console then from there you could internally y troubleshoot user/ssh issue.
1) $ gcloud compute instances add-metadata [INSTANCE_NAME] --metadata=serial-port-enable=1
You can then connect to the instance through the serial port
NOTE:The root password have must been already set in order to use the serial port
2)
$ gcloud compute connect-to-serial-port [INSTANCE_NAME]
If you never set the root password you could set it by adding a startup-script to your instance that will set a password as root by running the below command :
NOTE: the instance must be rebooted in order to run the startup script.
3) $ gcloud compute instances add-metadata [instance name] --metadata startup-script='echo "root:YourPasswdHere" | chpasswd'
Reboot the instance run the command on the step "2)" authenticate your self as root with the password that you set on the startup script in the step "3)" .
I had the same problem, It took me several days to figure out what was happening in my case.
To find out, I created a new instance from scratch and started making all modifications I've done to those that eventually couldn't connect to, one by one, exiting the ssh connection and re entering so as to test it.
I've tried it a couple of times, in both cases, the connection was impossible after uninstalling python (I only needed 3.7 version so I was uninstalling all others and installing that one I needed).
My command for uninstalling it was
sudo apt purge python2.7-minimal
and
sudo apt purge python3.5-minimal
I don't know if it was specifically because of deleting python, or because of using purge (in which case this problem might reproduce if using purge with another program).
I don't even know why would this affect ssh connection.
Could it be that google cloud is somehow using destination python for the ssh web?
In any case, if you are experiencing this problem try to avoid uninstalling anything from the base VM.

Desired port for google cloudSQL connection is not able to be used

I am following the steps here, to setup a CloudSQL DB in Google Cloud Platform. I'm stuck at the step with:
./cloud_sql_proxy -instances="[YOUR_INSTANCE_CONNECTION_NAME]"=tcp:3306
I get the message below:
2018/02/07 19:44:10 listen tcp 127.0.0.1:3306: bind: address already in use
I've tried: lsof -i tcp:3306 but nothing shows up. Alternatively, I am able to start a connection to tcp:3307, but that's not what's required in the tutorial, and may prevent the rest of the tutorial from working. When I do lsof -i tcp:3307 however, I am able to see the PID, and kill the SQL connection.
How is the port address 3306 already in use?? Even rebooted my computer.
My Steps I took to fix
I stop Mysql on my local machine
brew services stop mysql
But I had a problem of giving a directory for
Directory to use for placing Unix sockets representing database instances as seen by the console error
Then I did
sudo mkdir /cloudsql; sudo chmod 777 /cloudsql
My Final Script
./cloud_sql_proxy -instances=MyInstanceConnName=tcp:3306 -projects=myproject -dir=/cloudsql/
UPDATE: After trying to dig through many methods to kill the sql process, find out whats actually running on it, joining the gcloud slack group to ask around, etc etc, I ended up uninstalling mysql, and reinstalling it. Fixed it. :shrug:

Rundeck - reboot server job

I have a rundeck job that reboots a server, it sends the command "sudo reboot". This works and the server is rebooting.
The problem is that rundeck doesn't get a signal back so the job fails.
Is there a way to make this work and get a complete signal back in rundeck?
Perhaps wrap your command in a script, background the reboot operation, and return 0? I'm doing something similar with a set of development VMs, but I'm using virsh. I don't see why this couldn't be done with a physical server:
#!/bin/bash
ssh rundeck#yourserver sudo reboot &
exit 0
You may need to experiment a bit with the ssh options (perhaps '-f' and/or '-n') to get this to work properly.
Well playing around now I just used as Local Command step:
ssh ${node.username}#${node.hostname} "reboot & exit"
The return code is ZERO and everybody is happy.

sftp fails with 'message too long' error

My java program uses ssh/sftp for transferring files into linux machines (obviously...), and my library for doing so is JSch (though it's not to blame).
Now, some of these linux machines, have shell login startup scripts, which tragically causes the ssh/sftp connection to fail, with the following message:
Received message too long 1349281116
After briefly reading about it, it's clearly a known ssh design issue (not a bug - see here). And all suggested solutions are on ssh-server side (i.e. disable scripts which output messages during shell login).
My question - is there on option to avoid this issue on client side?
Check your .bashrc and .bash_profile on the server, remove anything that can echo. For now, comment the lines out.
Try again. You should not be seeing this message again.
put following into the top of file ~/.bashrc on username of id on remote machine
# If not running interactively, don't do anything just return early from .bashrc
[[ $- == *i* ]] || return
this will just exit early from the .bashrc instead of sourcing entire file which you do not want when performing a scp or sftp onto that target remote machine ... depending on shell of that remote username, make this edit on ~/.bashrc
or ~/.bash_profile or ~/.profile or similar
I got that error too, during and sftp get call in a bash script.
And according to the TO's error message, which was similar to mine, it looks like the -B option of the sftp command was set. Although a buffer size of 1349281116 bytes is "a bit" too high.
In my case I also did set the buffer size explicitly (with "good intentions"), which cause the same error message, followed by my set value.
Removing the forced value and letting sftp run with the default of 32K solved the problem to me.
-B buffer_size
Specify the size of the buffer that sftp uses when transferring
files. Larger buffers require fewer round trips at the cost of
higher memory consumption. The default is 32768 bytes.
In case it confirms to be the same issue, that whould suite as client side solution.
NOTE: I had to fix .bashrc output on the remote hosts, not the host that's issuing the scp or sftp command.
Here's a quick-n'-dirty solution, but it seems to work - also on binary files. All credits goes to uvgroovy.
Given file 'some-file.txt', just do:
cat some-file.txt | ssh root:1.1.1.1 /bin/bash -c "cat > /root/some-new-file.txt"
Still, if anyone know a sftp/scp built-in way to do so on client side, it'll be great.

Access Denied when executing through cygwin openssh

When I execute the command "iisreset" through an ssh terminal on a remote windows machine, I get the following error:
Attempting stop...
Restart attempt failed.
Access denied, you must be an administrator of the remote computer to use this
command. Either have your account added to the administrator local group of
the remote computer or to the domain administrator global group.
When I type whoami, it shows that I am the administrator. My cygwin ssh session is running as the "cyg_server" user who has admin privileges.
My ssh server is configured with privilege separation and allows me to login as administrator.
When I run the command locally, it works fine. The problem is execution through ssh.
I've also used process monitor to see what's going on, but it does not indicate the problem.
That is pretty strange because I am able to do admin-only operations in remote ssh such as:
echo "hi">/cygdrive/c/x.txt
rm /cygdrive/c/x.txt
Turning off UAC did not make a difference.
Any ideas?
I had a similar problem: unable to start/stop services using net start/net stop from a remote password-less (public/private key) SSH user. Attempting to start/stop the service was resulting in a "System Error 5 has occurred. Access is denied." error).
I had to install Cygwin's LSA authentication package (see http://cygwin.com/cygwin-ug-net/ntsec.html#ntsec-setuid-overview) in order for (I presume) setuid to work properly for password-less logins.
The problem should go away once LSA is installed on the Cygwin/SSH host and the machine has been rebooted.
I got scared of the LSA package mentioned in #user3609241's answer because of this sentence in the LSA docs:
as soon as the LSA encounters serious problems (for instance, one of
the protected LSA processes died), it triggers a system reboot.
But, those same docs point to a very easy way to "runas" SYSTEM - just use the at command:
$ date
Mon, Jan 12, 2015 8:17:35 PM
$ at 20:18 iisreset
Added a new job with job ID = 1
$ at
Status ID Day Time Command Line
-------------------------------------------------------------------------------
1 Today 8:18 PM iisreset
It works, at the cost of having to wait up to 59 seconds.
(wrapping the above sequence of commands in a simple-to-call script is left as an exercise to the reader; our management util is written in Perl so it was pretty straightforward).
Run the Cygwin terminal as administrator