Nagios host notifications not sending via email or logging - notifications

I am re-doing our nagios infrastructure with puppet but I am currently stopped at a seemingly simple problem (most likely a config issue).
Using puppet, I spit out some basic nagios config files on disk. Nagios reloads fine and everything looks okay in the UI but, when I mark a host down, it does not send a notification.
nagios.log shows:
[1470699491] EXTERNAL COMMAND:
PROCESS_HOST_CHECK_RESULT;divcont01;1;test notification
[1470699491] PASSIVE HOST CHECK: divcont01;1;test notification
[1470699491] HOST ALERT: divcont01;DOWN;HARD;1;test notification
In production (where I have changed nothing), I see in nagios.log (after marking a host down in ui):
[1470678186] EXTERNAL COMMAND:
PROCESS_HOST_CHECK_RESULT;PALTL12;1;test ey
[1470678187] PASSIVE HOST CHECK: PALTL12;1;test ey
[1470678187] HOST ALERT: PALTL12;DOWN;HARD;1;test ey
[1470678187] HOST NOTIFICATION:
pal_infra;PALTL12;DOWN;host-notify-by-pom;test ey
[1470678187] HOST NOTIFICATION:
pal_infra;PALTL12;DOWN;host-notify-by-email;test ey
[1470678192] HOST ALERT: PALTL12;UP;HARD;1;PING OK - Packet loss = 0%,
RTA = 0.81 ms
[1470678192] HOST NOTIFICATION:
pal_infra;PALTL12;UP;host-notify-by-pom;PING OK - Packet loss = 0%,
RTA = 0.81 ms
[1470678192] HOST NOTIFICATION:
pal_infra;PALTL12;UP;host-notify-by-email;PING OK - Packet loss = 0%,
RTA = 0.81 ms
As seen in the logs, there is a HOST NOTIFICATION logged and sent directly after the HOST ALERT in prod. I have been exhaustively comparing config files today and I cannot find a reason why the new config stops short of the notification.
I have verified that notifications are enabled at the top level. I have verified that email can be sent from this box (though, I am using the logs to verify functionality, not email). I have also tried multiple other google suggestions (and will continue my search too).
Relevant config details below. Please pardon the verbosity of my configuration and lackluster stack-overflow formatting. Thank you in advance.
hosts/divcont01.cfg:
define host {
address snip
host_name divcont01
use generic-host-puppetized
}
host-templates/generic-host-puppetized.cfg:
define host {
check_command check-host-alive
check_interval 1
contact_groups generic-contactgroup
checks_enabled 1
event_handler_enabled 0
flap_detection_enabled 0
name generic-host-puppetized
hostgroups +generic-host-puppetized
max_check_attempts 4
notification_interval 4
notification_options d,u,r
notification_period 24x7
notifications_enabled 1
process_perf_data 0
register 0
retain_nonstatus_information 1
retain_status_information 1
}
hostgroups/generic-host-puppetized.cfg:
define hostgroup {
hostgroup_name generic-host-puppetized
}
contactgroups/generic-contactgroup.cfg
define contactgroup {
contactgroup_name generic-contactgroup
members generic-puppetized-contact
}
contacts/generic-puppetized-contact.cfg
define contact {
use generic-contact
contact_name generic-puppetized-contact
email <my email>
}
objects/templates.cfg (generic-contact config only)
define contact{
use my email
name generic-contact ; The name of this contact template
service_notification_period 24x7 ; service notifications can be sent anytime
host_notification_period 24x7 ; host notifications can be sent anytime
host_notification_commands generic-puppetized-contact-host-notify-by-email-low
service_notification_commands notify-by-email,service-notify-by-pom
service_notification_options u,c,r,f ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_options d,r,f ; send notifications for all host states, flapping events, and scheduled downtime events
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
commands/generic-puppetized-contact-host-notify-by-email-low.cfg:
define command {
command_line /etc/nagios/global/scripts/nagios-mailx.sh -t my email -s "** notification Host Alert: hostname is hoststate **" -m "***** Nagios ***** Notification Type: notification type Host: host State: hoststate Address: address Info: output Date/Time: date"
command_name generic-puppetized-contact-host-notify-by-email-low
}

Figured it out...I was building my system within another pre-existing system (dangerous, I know) and my contacts were actually pointing to a generic-contact that had its notifications disabled.
Whoops :)

Related

Syslog-ng logs not processing certain logs possibly due to journal cursor issue

I'm using syslog-ng 3.37.1 on a VMware Photon 3.0 virtual appliance (preconfigured VM). The appliance is configured to write logs into certain files under /var/log folder as well as to remote syslog servers (optional).
Logs from facility 'auth' and 'authpriv' are configured to write to /var/log/auth.log, as well as send it over to remote syslog server when enabled.
In addition, there are other messages as well from kernel, systemd services as well as other processes, configured to be processed via syslog-ng.
Issue is that, logs from a few facilities (such as auth, authpriv, cron etc) are not processed (received?) by syslog-ng initially. So, any SSH events, TTY login events are not logged into the file and remote. However, many other events from kernel, systemd and other processes are logged fine.
Below is the configuration for auth.log, that does not log in the first boot.
filter f_auth { facility(auth) or facility(authpriv)); };
destination authlog { file("/var/log/auth.log" perm(0600)); };
log { source(s_local); filter(f_auth); destination(authlog); };
I updated the filter as below without any success
filter f_auth {
facility(auth) or facility(authpriv) or
match('sshd' value('PROGRAM')) or match('systemd-logind' value('PROGRAM'));
};
In journal logs I can observe the relevant logs, for example, below command to view SSH logs.
journalctl -f -u sshd
Additional syslog-ng service restart or config reload during appliance startup do not fix this.
The log file /var/log/auth.log (and also cron log etc) show zero size during this time. Syslog-ng log looks fine too.
However, if I generate some auth facility event (say, SSH/TTY login) and manually restart syslog-ng, all the log entries (including old events) are immediately written into filesystem log (/var/log/auth.log) and also sent to remote syslog server.
In the syslog-ng.log I find below entry when it starts working that way.
syslog-ng[481]: [date] Failed to seek journal to the saved cursor position; cursor='', error='Invalid argument (22)'
It makes me wonder if it is due to some bad cursor position. However, I can still see other systemd and kernel logs being logged fine. So, not sure.
What could be causing such behaviour? How can I ensure that syslog-ng is able to receive and process these logs without manual intervention?
Below is more detailed configuration for reference:
#version: 3.37
#include "scl.conf"
source s_local {
system();
internal();
udp();
};
destination d_local {
file("/var/log/messages");
file("/var/log/messages-kv.log" template("$ISODATE $HOST $(format-welf --scope all-nv-pairs)\n") frac-digits(3));
};
log {
source(s_local);
# uncomment this line to open port 514 to receive messages
#source(s_network);
destination(d_local);
};
filter f_auth {
facility(auth) or facility(authpriv)); # Also tried facility (auth, authpriv)
};
destination authlog { file("/var/log/auth.log" perm(0600)); };
log { source(s_local); filter(f_auth); destination(authlog); };
destination d_kern { file("/dev/console" perm(0600)); };
filter f_kern { facility(kern); };
log { source(s_local); filter(f_kern); destination(d_kern); };
destination d_cron { file("/var/log/cron" perm(0600)); };
filter f_cron { facility(cron); };
log { source(s_local); filter(f_cron); destination(d_cron); };
destination d_syslogng { file("/var/log/syslog-ng.log" perm(0600)); };
filter f_syslogng { program(syslog-ng); };
log { source(s_local); filter(f_syslogng); destination(d_syslogng); };
# A few more of above kind of configuration follows here.
# Add configuration files that have remote destination, filter and log configuration for remote servers
#include "remote/*.conf"
As can be seen, /var/log/auth.log should hold logs from auth facility, but the log remains blank until subsequent restart of syslog-ng after a syslog config change (via API) or manual login into the system. However, triggering automated restart of syslog-ng using cron (without additional syslog config change) does not help.
Any thoughts, suggestions?
This is probably caused by your real time clock going backwards. The notification mechanism in libsystemd does not work in this case.
There's a proof-of-concept patch in this syslog-ng issue:
https://github.com/syslog-ng/syslog-ng/issues/2836
But I've increased the priority to tackle that problem and fix this, as it is causing issue more often than I anticipated.
As a workaround you should synchronize the time for your VM, preferably so that during boot it waits until a sync and then keep the time synchronized by ntp.

Weblogic 12.1.2. "https + t3" combination on a single managed server. Is it possible?

WLS 12.1.2 is running under JDK 1.7_60 on Windows 7
To meet the requirement "Switch to HTTPS, but leave t3" the following steps are performed in admin console for managed server (where the apps reside)
Disable default listen port 7280 (http and t3)
Enable default SSL listen port 7282 (https and t3s)
In order to enable t3, create a custom Channel
Protocol: t3
Port: 7280
“HTTP Enabled for This Protocol“ flag is set to false
After that, we have https and t3s on port 7282 and t3 only on port 7280.
In this case, we have issues with deployment of applications.
The deployer fails to start/stop the apps.
The reason is the deployer still tries to send messages to managed server via http.
I turned on the deployment debugging and see the following messages in admin server log.
…<DeploymentServiceTransportHttp> …<HTTPMessageSender: IOException: java.io.EOFException: Response had end of stream after 0 bytes when making a DeploymentServiceMsg request to URL: http://localhost:7280/bea_wls_deployment_internal/DeploymentService>
… <DeploymentServiceTransportHttp> …<sending message for id '-1' to 'my_srv' using URL 'http://localhost:7280' via http>
If I disable the custom t3 Channel, everything is ok. The deployer sends messages to https://localhost:7282, as expected. But in this case, we have no t3 available.
Any help is much appreciated.
Thanks

tftp retry timeout exceeded

My issue is retry count exceeds when I download kernel image to Econa processor board (Econa is ARM based processor) via TFTP as shown below
CNS3000 # tftp 0x4000000 bootpImage.cns3420.uclibc
MAC PORT 0 : Initialize bcm53115M
MAC PORT 2 : Initialize RTL8211
TFTP from server 192.168.0.219; our IP address is 192.168.0.112
Filename 'bootpImage.cns3420.uclibc'.
Load address: 0x4000000
Loading: T T T T T T T T T T
Retry count exceeded; starting again
Following are the points which may help you in finding the cause of this error.
Ping response is OK
CNS3000 # ping 192.168.0.219
MAC PORT 0 : Initialize bcm53115M
MAC PORT 2 : Initialize RTL8211
host 192.168.0.219 is alive
When I tried to verify TFTP is running, I tried as shown below. It seems TFTP server is working. I placed a small file in /tftpboot:
# echo "Hello, embedded world" > /tftpboot/hello.txt"
Then I did localhost
# tftp localhost
tftp> get hello.txt
Received 23 bytes in 0.1 seconds
tftp> quit
Please note that there is no firewall or SELinux on my machine.
Please verify location of these files are OK. I have placed kernel image file bootpImage.cns3420.uclibc in /tftpbootTFTP service file is located in /etc/xinetd.d/tftp.
My TFTP service file is:
service tftp
{
socket_type =dgram
protocol=udp
wait=yes
user=root
server=/usr/sbin/in.tftpd
server_args=-s /tftpboot -b 512
disable=no
per_source=11
cps=100 2
flags=ipv4
}
printenv response in U-boot is:
CNS3000 # printenv
bootargs=root=/dev/mtdblock0 mem=256M console=ttyS0
baudrate=38400
ethaddr=00:53:43:4F:54:54
netmask=255.255.0.0
tftp_bsize=512
udp_frag_size=512
mmc_init=mmcinit
loading=fatload mmc 0 0x4000000 bootpimage-82511
running=go 0x4000000
bootcmd=run mmc_init;run loading;run running
serverip=192.168.0.219
ipaddr=192.168.0.112
bootdelay=5
port=1
bootfile=/tftpboot/bootpImage.cns3420.uclibcl
stdin=serial
stdout=serial
stderr=serial
verify=n
Environment size: 437/4092 bytes
Regards
Waqas
Loading: T T T T T T T T T T
Means there is no transfer at all; this can be caused by wrong interface setting i.e.
u-boot is configured for 100Mbit full duplex, and you try to connect via half duplex or 10Mbit (or some mix of it). Another point is the MTU size, should be 1500 (u-boot cannot handle packet fragmentation)
Hint for windows/vmware users:
tftp timeouts from u-boot are caused by windows ip-forwarding.
1) If you have a home network : switch it of.
2) You are running Routing and Remote Access service : shut down service
3) check registry for ip forwarding:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\IPEnableRouter
set value to 0 (and maybe reboot)

couldn't setup local SOCKS5 proxy on port 7777: Address already in use: JVM_Bind

while sending meessage from agent present in spark to client present in client application
im getting following error
couldn't setup local SOCKS5 proxy on port 7777: Address already in use: JVM_Bind
the code i wrote for sending message to client is .. bellow..
i wrote the following method in the class, implemented org.jivesoftware.smackx.workgroup.agent.OfferListener
Message message1 = new Message();
message1.setBody(message);
try {
for (MultiUserChat muc : GlobalUtils.getMultiuserchat()) {
if (muc.getRoom().equals(conf)) {
muc.sendMessage(message1);
System.out.println("message sent ############# agent to client..");
}
}
} catch (Exception ex) {
System.out.println("exception while sending message in sendMessage() ");
ex.printStackTrace();
}
help me
thanks
rajesh.v
it was because you was running your server with your client on the same machine.
You know... I assume you use openfire for the server..
Openfire use port 7777 by default for file transfer proxy service and it was enabled by default.
and your client do the same by using the port 7777 for the default file transfer.
look at openfire setting at the Server Settings > File Transfer Setting.
You can disable it.
or just run your client and your server on different machine.
I think you are in development state so your server and your client on the same machine
What is the payload of your message - are there any & in it - not sure why, but this seems to trip up smack

Authentication on a very low level TCP Server written for Node.JS?

How do I implement something similar to the HTTP Basic authentication, in a TCP server written for Node.JS ? The code for a basic TCP server is the following:
// Load the net module to create a tcp server.
var net = require('net');
// Setup a tcp server
var server = net.createServer(function (socket) {
// Every time someone connects, tell them hello and then close the connection.
socket.addListener("connect", function () {
console.log("Connection from " + socket.remoteAddress);
socket.end("Hello World\n");
});
});
// Fire up the server bound to port 7000 on localhost
server.listen(7000, "localhost");
// Put a friendly message on the terminal
console.log("TCP server listening on port 7000 at localhost.");
While there are several ways to provide authentication over a TCP connection, all require some form of "protocol" being an agreed-upon communications grammar/syntax.
For example, in the Simple Mail Transport Protocol, the following conversation occurs (where S: and C: designate lines provided by the SMTP server and email client, respectively):
S: 220 server.example.com
C: HELO client.example.com
S: 250 server.example.com
C: MAIL FROM:<sender#example.com>
S: 250 2.1.0 sender#example.com... Sender ok
C: RCPT TO:<recipient#example.com>
S: 250 recipient <recipient#example.com> OK
C: DATA
S: 354 enter mail, end with line containing only "."
C: full email message appears here, where any line
C: containing a single period is sent as two periods
C: to differentiate it from the "end of message" marker
C: .
S: 250 message sent
C: QUIT
S: 221 goodbye
In replies from the server, the initial numeric value indicates the success or failure of the requested operation, or that the reply contains an informational message. Using a three digit numeric value allows for efficient parsing as all replies beginning with 2xx indicate success, 3xx are informational, 4xx indicate protocol errors, and 5xx are reserved for server errors. See IETF RFC 5321 - https://www.rfc-editor.org/rfc/rfc5321 for the full protocol.
So in your specific case, you might consider something as simple as:
[connect to TCP server]
S: ? # indicates the server is ready for authorization
C: username password # send authentication credentials
The server would then reply with:
S: ! # indicates successful authentication and
# that server is ready for more commands
Or
S: ? # indicates authentication failure
If too many failed attempts to authenticate are seen, the server might sever the connection to reduce the potential for abuse, such as DDOS attacks.
Once authenticated, the client could send:
C: > # begin streaming
Or any other command you which to support.