Apache Ignite 3 Beta 1 examples fail in ClientMessageDecoder.readMagic() checking for the MAGIC_BYTES - ignite

I'm running a new instance of the Apache Ignite 3 beta on Windows and am hoping someone might recognize the error reading the MAGIC_BYTES I'm seeing trying to run the examples.
The cluster starts successfully and I can connect via the CLI; e.g.: 'node status' shows [name: defaultNode, state: started]
However, when I attempt to run any of the examples, such as SqlJdbcExample, it fails in ClientMessageDecoder.readMagic(). In there it is attempting to read the 4 MAGIC_BYTES (representing the 4 ASCII characters 'INGI').
What I see instead are the bytes 0x01, 0x00, 0x03, 0x00
This ultimately results in an IgniteException:
IGN-CMN-65535 TraceId:xxxx Invalid magic header in thin client connection. Expected 'IGNI', but was '▯▯'.
(Note: in debug, if I read more bytes out of the buffer I can see those 4 bytes are followed by an ASCII newline character (0x12) then the ASCII 'defaultNode'.)
In the SqlJdbcExample when it is initializing the JDBC connection, I can see that is has called socket.send() with those MAGIC_BYTES IN TcpClientChannel.handshakeReq().
I am running the examples with no change to any configuration files and have set up the environment as per the documentation.
Set up Apache Ignite 3 beta and ran samples. They failed with
IgniteException: IGN-CMN-65535 TraceId:xxxx Invalid magic header in thin client connection. Expected 'IGNI', but was '▯▯'.
Verified as best I can that everything is configured correctly, but can't determine why I'm not picking up these bytes.

Ignite examples use port 10800 by default. Looks like something else is using that port on your machine, so Ignite server is on a different port.
Look into the log file in ignite3-db-3.0.0-beta1 directory (ignite3db-0.log file name), and search for Thin client protocol started successfully[port=10800] line. The port is likely different. Copy the port number and update the connection string in the example accordingly

Related

MQTT Error BR_ERR_BAD_VERSION on Shelly 1PM with Tasmota

I try to connect a Shelly 1 PM smart power relay to a managed MQTT broker.
The firmware on the device is a custom-built Tasmota 8.3.1 from the dev branch with USE_MQTT_TLS enabled. The port is set correctly to 8883 for TLS and the broker service is running at mqtt.bosch-iot-hub.com
When the device boots up, I can see the log messages on the serial port as follows:
23:03:03 MQT: Connect failed to mqtt.bosch-iot-hub.com:8883, rc 4. Retry in 10 sec
23:03:14 MQT: Attempting connection...
23:03:14 MQT: TLS connection error: 0
Return Code 4 is, according to the Tasmota documentation (https://tasmota.github.io/docs/TLS/), the code for BR_ERR_BAD_VERSION
And this error constant seems to be from BearSSL and means "Incoming record version does not match the expected version." (according to http://sources.freebsd.org/HEAD/src/contrib/bearssl/tools/errors.c)
Using an online TLS testing tool and checking mqtt.bosch-iot-hub, it supports only TLS 1.2 (1.3, 1.1 and 1.0 being disabled as well as SSLv2 and SSLv3). BearSSL website states that it supports TLS 1.2
I tried setting the log level of Tasmota in my_user_config.h , but it does not log any more verbose or detailed information.
#define SERIAL_LOG_LEVEL LOG_LEVEL_DEBUG_MORE // [SerialLog] (LOG_LEVEL_NONE, LOG_LEVEL_ERROR, LOG_LEVEL_INFO, LOG_LEVEL_DEBUG, LOG_LEVEL_DEBUG_MORE)
What is the error message supposed to mean? Is it a TLS incompatibility of the BearSSL stack or on the service side?
How can I enable verbose logging on Tasmota to see detailed TLS handshake information?
Anything else I am missing?
I appreciate after 6 months the question may have been a little expired, however the error code is not the TLS one as you describe, but rather the return code for the MQTT connection, as described in
https://tasmota.github.io/docs/MQTT/#return-codes-rc
which means your error code corresponds to
4 MQTT_CONNECT_BAD_CREDENTIALS the username/password were rejected

How to fix etcd cluster "error "tls: first record does not look like a TLS handshake""

I created a three node etcd cluester, config and start is already OK, but when I check the /var/log/messages, it shows
etcd: rejected connection from "172.17.0.3:43192" (error "tls: first
record does not look like a TLS handshake", ServerName "")
How can I fix it ?
I have checked the health of etcd :
member 48b0dff99d5c867e is healthy: got healthy result from https://172.17.0.9:2379
member 646dab89331aabab is healthy: got healthy result from https://172.17.0.8:2379
member b45603216bfac234 is healthy: got healthy result from https://172.17.0.10:2379
That shows Ok, but when I cat the /var/log/messages, it always shows this error :
Jan 12 20:08:57 master etcd: rejected connection from
"172.17.0.3:43160" (error "tls: first record does not look like a TLS
handshake", ServerName "")
Jan 12 20:08:57 master etcd: rejected
connection from "172.17.0.3:43162" (error "tls: oversized record
received with length 21536", ServerName "")
I got this message for the etcd peer communication when switching from http to https for peer communication. Apparently etcd has persistent peer information that overrides the command line options so it continued to use http for peer communication in spite of the command line options.
In the end, since this was a test cluster, I nuked /var/lib/etcd and the new cli configuration took hold
There is no solution from my side to fully help you with an issue but I've found couple of links that might help you in further investigations. Read them carefully, try solutions and I hope you will resolve the problem.
Github question #9917: check ETCDCTL_API variable, especially make sure --endpoints is configured with https.
Runtime reconfiguration: try to reconfigure you etcd by updating/removing/adding etcs members.
nginx ingress: check your nginx ingress annotations in case you are using nginx
google groups TLS handshake topic: Check this topic, especially comments related to VAULT_ADDR variable. I will copy paste last comment from thread here:
We were able to get everything to work, after understanding the
permission issues.
You asked: "Please confirm if you are seeing server error messages
before initializing Vault" Upon further examination, I did determine
that the errors were not happening before initializing the Vault.
The problem ended up not being related to VAULT_ADDR, and we used the
value: "http://127.0.0.1:8200"
I have the setup operation scripted, and it appears that not
everything was being run at the proper permissions. At first I was
running the scripts using the "sudo" command, which resulted in the
failures. I discovered that the permissions for the certificate key
were restricted and the file could not be accessed by my user. There
may have been other permission issues as well. But once I switched
user to root, and ran the script, everything behaved correctly.
Thanks

Timeout during allocate while making RFC call

I am trying to create a SAP RFC connection to a new system.
AFAIK the firewall (in this case to port 3321) is open.
I get this message at the client:
RFC_COMMUNICATION_FAILURE (rc=1): key=RFC_COMMUNICATION_FAILURE, message=
LOCATION SAP-Gateway on host ax-swb-q06.prod.lokal / sapgw21
ERROR timeout during allocate
TIME Thu Jul 26 16:45:48 2018
RELEASE 753
COMPONENT SAP-Gateway
VERSION 2
RC 242
MODULE /bas/753_REL/src/krn/si/gw/gwr3cpic.c
LINE 2210
DETAIL no connect of TP sapdp21 from host 10.190.10.32 after 20 sec
COUNTER 3
[MSG: class=, type=, number=, v1-4:=;;;]
And this message on the SAP server
Any clue what needs to be done, to get RFC working?
With this little info no one can know what the issue is here.
But it is something related to your network and SAP system configuration.
I guess your firewall does some network address translation (NAT) and the new IP behind the firewall does not match anymore with the known one. SAP is doing some own IP / host name security checks.
If not already done, check with opening the ports 3221, 3321 and 4821 in the firewall. Also check the SAP gateway configuration which IP addresses and host names are configured to be valid ones for it (look at what is traced in the beginning of the gateway trace file dev_rd at ABAP side).
Also consider if maybe the usage of a SAProuter would be the better option for your needs.
it works in my case if ashost is the host name, and not an IP address!
Do not ask me why, but this fails:
Connection(user='x', passwd='...', ashost='10.190.10.32', sysnr='21', client='494')
But this works:
Connection(user='x', passwd='...', ashost='ax-swb-q06.prod.lokal', sysnr='21', client='494')
This is strange, since DNS resolution happens before TCP communication.
It seems that the ashost value gets used inside the connection. Strange. For most normal protocols (http, ftp, pop3, ...) this does not matter. Or you get at least a better error message.

How to solve: UDP send of xxx bytes failed with error 11 in Ubuntu?

UDP send of XXXX bytes failed with error 11
I am running a WebRTC streaming app on Ubuntu 16.04.
It streams video and audio from Logitec HD Webcam c930e within an Electronjs Desktop App.
It all works fine and smooth running on my other machine Macbook Pro. But on my Ubuntu machine I receive errors after 10-20 seconds when the peer connection is established:
[2743:0513/193817.691636:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1019 bytes failed with error 11
[2743:0513/193817.691775:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1020 bytes failed with error 11
[2743:0513/193817.696615:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1020 bytes failed with error 11
[2743:0513/193817.696777:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1020 bytes failed with error 11
[2743:0513/193817.712369:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1029 bytes failed with error 11
[2743:0513/193817.712952:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1030 bytes failed with error 11
[2743:0513/193817.713086:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1030 bytes failed with error 11
[2743:0513/193817.717713:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1030 bytes failed with error 11
==> Btw, if I do NOT stream audio, but video only. I got the same error but only with the "video" between the Log lines...
somewhere in between the lines I also got one line that says:
[3441:0513/195919.377887:ERROR:stunport.cc(506)] sendto: [0x0000000b] Resource temporarily unavailable
I also looked into sysctl.conf and increased the values there. My currenct sysctl.conf looks like this:
fs.file-max=1048576
fs.inotify.max_user_instances=1048576
fs.inotify.max_user_watches=1048576
fs.nr_open=1048576
net.core.netdev_max_backlog=1048576
net.core.rmem_max=16777216
net.core.somaxconn=65535
net.core.wmem_max=16777216
net.ipv4.tcp_congestion_control=htcp
net.ipv4.ip_local_port_range=1024 65535
net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_orphans=1048576
net.ipv4.tcp_max_syn_backlog=20480
net.ipv4.tcp_max_tw_buckets=400000
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_syn_retries=2
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_wmem=4096 65535 16777216
vm.max_map_count=1048576
vm.min_free_kbytes=65535
vm.overcommit_memory=1
vm.swappiness=0
vm.vfs_cache_pressure=50
Like suggested here: https://gist.github.com/cdgraff/7920db287988463aafd7ea09eef6f9f0
It does not seem to help. I am still getting these errors and I experience lagging on the other side.
Additional info: on Ubuntu the Electronjs App connects to Heroku Server (Nodejs) and the other side of the peer connection (Chrome Browser) also connects to it. Heroku Server acts as Handshaking Server to establish WebRTC connection. Both have as configuration:
{'urls': 'stun:stun1.l.google.com:19302'},
{'urls': 'stun:stun2.l.google.com:19302'},
and also an additional Turn Server from numb.viagenie.ca
Connection is established and within the first 10 seconds the quality is very high and there is no lagging at all. But then after 10-20 seconds there is lagging and on the Ubuntu console I am getting these UDP errors.
The PC that Ubuntu is running on:
PROCESSOR / CHIPSET:
CPU Intel Core i3 (2nd Gen) 2310M / 2.1 GHz
Number of Cores: Dual-Core
Cache: 3 MB
64-bit Computing: Yes
Chipset Type: Mobile Intel HM65 Express
RAM:
Memory Speed: 1333 MHz
Memory Specification Compliance: PC3-10600
Technology: DDR3 SDRAM
Installed Size: 4 GB
Rated Memory Speed: 1333 MHz
Graphics
Graphics Processor Intel HD Graphics 3000
Could please anyone give me some hints or anything that could solve this problem?
Thank you
==============EDIT=============
I found in my very large strace log somewhere these two lines:
7671 sendmsg(17, {msg_name(0)=NULL, msg_iov(1)=[{"CHILD_PING\0", 11}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 11
7661 <... recvmsg resumed> {msg_name(0)=NULL, msg_iov(1)=[{"CHILD_PING\0", 12}], msg_controllen=32, [{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, {pid=7671, uid=0, gid=0}}], msg_flags=0}, 0) = 11
On top of that, somewhere near when the error happens (at the end of the log file, just before I quit the application) I see in the log file the following:
https://gist.github.com/Mcdane/2342d26923e554483237faf02cc7cfad
First, to get an impression of what is happening in the first place, I'd look with strace. Start your application with
strace -e network -o log.strace -f YOUR_APPLICATION
If your application looks for another running process to turn the work too, start it with parameters so it doesn't do that. For instance, for Chrome, pass in a --user-data-dir value that is different from your default.
Look for = 11 in the output file log.strace afterwards, and look what happened before and after. This will give you a rough picture of what is happening, and you can exclude silly mistakes like sendtos to 0.0.0.0 or so (For this reason, this is also very important information to include in a stackoverflow question, for instance by uploading the output to gist).
It may also be helpful to use Wireshark or another packet capture program to get a rough overview of what is being sent.
Assuming you can confirm with strace that a valid send call is taken place, you can then further analyze the error conditions.
Error 11 is EAGAIN. The documentation of send says when this error is supposed to happen:
EAGAIN (...) The socket is marked nonblocking and the requested operation would block. (...)
EAGAIN (Internet domain datagram sockets) The socket referred to by
sockfd had not previously been bound to an address and, upon
attempting to bind it to an ephemeral port, it was determined that all
port numbers in the ephemeral port range are currently in use. See
the discussion of /proc/sys/net/ipv4/ip_local_port_range in
ip(7).
Both conditions could apply.
The first will be obvious by the strace log if you trace the creation of the socket involved.
To exclude the second, you can run netstat -una (or, if you want to know the programs involved, sudo netstat -unap) to see which ports are open (if you want Stack Overflow users to look into it, post the output on gist or similar and link to it here). Your port range net.ipv4.ip_local_port_range=1024 65535 is not the standard 32768 60999; this looks like you attempted to do something about lacking port numbers already. It would help to trace back to the reason of why you changed that parameter, and the conditions that convinced you to do so.

How do I use Nagios to monitor a log file

We are using Nagios to monitor our network with great success. However, we have a syslog for critical application errors and while I set up check_log, it doesn't seem to work as well as monitering a device.
The issues are:
It only shows the last entry
There doesn't seem to be a way to acknowledge the critical error and
return the monitor to a good state
Is nagios the wrong tool, or are we just not setting up the service monitering right?
Here are my entries
# log file
define command{
command_name check_log
command_line $USER1$/check_log -F /var/log/applications/appcrit.log -O /tmp/appcrit.log -q ?
}
# Define the log monitering service
define service{
name logfile-check ;
use generic-service ;
check_period 24x7 ;
max_check_attempts 1 ;
normal_check_interval 5 ;
retry_check_interval 1 ;
contact_groups admins ;
notification_options w,u,c,r ;
notification_period 24x7 ;
register 0 ;
}
define service{
use logfile-check
host_name localhost
service_description CritLogFile
check_command check_log
}
For monitoring logs with Nagios, typically the log checker will return a warning only for newly discovered error messages each time it is invoked (so it must retain some state in order to know to ignore them on subsequent runs). Therefore I usually set:
max_check_attempts 1
is_volatile 1
This causes Nagios to send out the alert immeidately, but only once, and then go back to normal.
My favorite log checker is logwarn, but I'm biased because I wrote it myself after not finding any existing ones that I liked. The logwarn package includes a Nagios plugin.
Nothing in your config jumps out at me as being misconfigured.
By design, check_log will only show either an OK message, or the last log entry that triggered an alert. If you need to see multiple entries, you'll need to modify the plugin.
However, I find the fact that you're not getting recoveries somewhat odd. The way check_log works (by comparing the current log to the previous version), you should get a recovery on the very next service check. Except of course, when there have been additional matching entries added to the log since the last check.
Does forcing another service check (or several) cause it to recover?
Also, I don't intend this in a mean way, but make sure it's really malfunctioning.
Is your log getting additional matching entries in between checks, causing it not to recover? Your check is matching "?" which will match anything new in the log. Is something else (a non-error) being added to the log and inadvertently causing a match?
If none of the above are the issue, I would suggest narrowing it down by taking Nagios out of the equation. Try running check_log manually (from the command line, but as the same user as nagios), and with a different oldlog. It should go something like this -
run check with a new "oldlog" - get initialization message
run check - check OK
make change to log
run check - check fails
run check - check OK
If this doesn't work, then you know to focus on the log, the oldlog, and how the check_log is doing the check.
If it works, then it points more towards a problem with your nagios configuration.
There is a Nagios plugin that you can use to check the log files: it's called check_logfiles and it's used to scan the lines of a file for regular expressions.
The following link shows how to install and configure check_logfiles for Nagios and Opsview:
https://www.opsview.com/resources/nagios-alternative/blog/syslog-monitoring-nagios-opsview
As there are many ways to achieve a goal, there is also a nice plugin from Consol available:
https://labs.consol.de/lang/en/nagios/check_logfiles/
supports regex
supports log rotation
To use it, you need a cfg file, this is an example for oracle databases
#searches = ({
tag => 'oraalerts',
options => 'sticky=28800',
logfile => '/u01/app/oracle/diag/rdbms/davmdkp/DAVMDKP1/trace/alert_DAVMDKP1.log',
criticalpatterns => [
'ORA\-0*204[^\d]', # error in reading control file
'ORA\-0*206[^\d]', # error in writing control file
'ORA\-0*210[^\d]', # cannot open control file
'ORA\-0*257[^\d]', # archiver is stuck
'ORA\-0*333[^\d]', # redo log read error
'ORA\-0*345[^\d]', # redo log write error
'ORA\-0*4[4-7][0-9][^\d]',# ORA-0440 - ORA-0485 background process failure
'ORA\-0*48[0-5][^\d]',
'ORA\-0*6[0-3][0-9][^\d]',# ORA-6000 - ORA-0639 internal errors
'ORA\-0*1114[^\d]', # datafile I/O write error
'ORA\-0*1115[^\d]', # datafile I/O read error
'ORA\-0*1116[^\d]', # cannot open datafile
'ORA\-0*1118[^\d]', # cannot add a data file
'ORA\-0*1122[^\d]', # database file 16 failed verification check
'ORA\-0*1171[^\d]', # datafile 16 going offline due to error advancing checkpoint
'ORA\-0*1201[^\d]', # file 16 header failed to write correctly
'ORA\-0*1208[^\d]', # data file is an old version - not accessing current version
'ORA\-0*1578[^\d]', # data block corruption
'ORA\-0*1135[^\d]', # file accessed for query is offline
'ORA\-0*1547[^\d]', # tablespace is full
'ORA\-0*1555[^\d]', # snapshot too old
'ORA\-0*1562[^\d]', # failed to extend rollback segment
'ORA\-0*162[89][^\d]', # ORA-1628 - ORA-1632 maximum extents exceeded
'ORA\-0*163[0-2][^\d]',
'ORA\-0*165[0-6][^\d]', # ORA-1650 - ORA-1656 tablespace is full
'ORA\-16014[^\d]', # log cannot be archived, no available destinations
'ORA\-16038[^\d]', # log cannot be archived
'ORA\-19502[^\d]', # write error on datafile
'ORA\-27063[^\d]', # number of bytes read/written is incorrect
'ORA\-0*4031[^\d]', # out of shared memory.
'No space left on device',
'Archival Error',
],
warningpatterns => [
'ORA\-0*3113[^\d]', # end of file on communication channel
'ORA\-0*6501[^\d]', # PL/SQL internal error
'ORA\-0*1140[^\d]', # follows WARNING: datafile #20 was not in online backup mode
'Archival stopped, error occurred. Will continue retrying',
]
});
I believe there's now a real Nagios plugin that monitors logs effectively.
http://support.nagios.com/forum/viewtopic.php?f=6&t=8851&p=42088&hilit=unixautomation#p42088
The home page of the Nagios plugin on that page is Nagios Log Monitor
Your [ commands.cfg file ] will contain:
define command {
command_name NagiosLogMonitor
command_line $USER1$/NagiosLogMonitor $HOSTNAME$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ '$ARG5$' '$ARG6$' $ARG7$ $ARG8$ $ARG9$ $ARG10$
}
OR
define command {
command_name NagiosLogMonitor
command_line $USER1$/NagiosLogMonitor $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ '$ARG5$' '$ARG6$' $ARG7$ $ARG8$ $ARG9$ $ARG10$
}
Your [ services.cfg file ] will look similar to:
define service {
check_command NagiosLogMonitor!logrobot!autofig!/var/log/proteus.log!15!500.html!500 Internal Server Error!1!2!-foundn
max_check_attempts 1
service_description 500_ERRORS_LOGCHECK
host_name sky.blat-01.net,sky.blat-02.net,sky.blat-03.net
use fifteen-minute-interval
}
Nagios now has a solution that integrates tightly with Nagios Core, XI, etc.
Nagios Log Server which can alert on any query on any log file on any system in your infrastructure.