How to solve: UDP send of xxx bytes failed with error 11 in Ubuntu? - udp

UDP send of XXXX bytes failed with error 11
I am running a WebRTC streaming app on Ubuntu 16.04.
It streams video and audio from Logitec HD Webcam c930e within an Electronjs Desktop App.
It all works fine and smooth running on my other machine Macbook Pro. But on my Ubuntu machine I receive errors after 10-20 seconds when the peer connection is established:
[2743:0513/193817.691636:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1019 bytes failed with error 11
[2743:0513/193817.691775:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1020 bytes failed with error 11
[2743:0513/193817.696615:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1020 bytes failed with error 11
[2743:0513/193817.696777:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1020 bytes failed with error 11
[2743:0513/193817.712369:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1029 bytes failed with error 11
[2743:0513/193817.712952:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1030 bytes failed with error 11
[2743:0513/193817.713086:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1030 bytes failed with error 11
[2743:0513/193817.717713:ERROR:stunport.cc(282)] Jingle:Port[0xa5faa3df800:audio:1:0:local:Net[wlx0013ef503b67:192.168.0.x/24:Wifi]]: UDP send of 1030 bytes failed with error 11
==> Btw, if I do NOT stream audio, but video only. I got the same error but only with the "video" between the Log lines...
somewhere in between the lines I also got one line that says:
[3441:0513/195919.377887:ERROR:stunport.cc(506)] sendto: [0x0000000b] Resource temporarily unavailable
I also looked into sysctl.conf and increased the values there. My currenct sysctl.conf looks like this:
fs.file-max=1048576
fs.inotify.max_user_instances=1048576
fs.inotify.max_user_watches=1048576
fs.nr_open=1048576
net.core.netdev_max_backlog=1048576
net.core.rmem_max=16777216
net.core.somaxconn=65535
net.core.wmem_max=16777216
net.ipv4.tcp_congestion_control=htcp
net.ipv4.ip_local_port_range=1024 65535
net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_orphans=1048576
net.ipv4.tcp_max_syn_backlog=20480
net.ipv4.tcp_max_tw_buckets=400000
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_syn_retries=2
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_wmem=4096 65535 16777216
vm.max_map_count=1048576
vm.min_free_kbytes=65535
vm.overcommit_memory=1
vm.swappiness=0
vm.vfs_cache_pressure=50
Like suggested here: https://gist.github.com/cdgraff/7920db287988463aafd7ea09eef6f9f0
It does not seem to help. I am still getting these errors and I experience lagging on the other side.
Additional info: on Ubuntu the Electronjs App connects to Heroku Server (Nodejs) and the other side of the peer connection (Chrome Browser) also connects to it. Heroku Server acts as Handshaking Server to establish WebRTC connection. Both have as configuration:
{'urls': 'stun:stun1.l.google.com:19302'},
{'urls': 'stun:stun2.l.google.com:19302'},
and also an additional Turn Server from numb.viagenie.ca
Connection is established and within the first 10 seconds the quality is very high and there is no lagging at all. But then after 10-20 seconds there is lagging and on the Ubuntu console I am getting these UDP errors.
The PC that Ubuntu is running on:
PROCESSOR / CHIPSET:
CPU Intel Core i3 (2nd Gen) 2310M / 2.1 GHz
Number of Cores: Dual-Core
Cache: 3 MB
64-bit Computing: Yes
Chipset Type: Mobile Intel HM65 Express
RAM:
Memory Speed: 1333 MHz
Memory Specification Compliance: PC3-10600
Technology: DDR3 SDRAM
Installed Size: 4 GB
Rated Memory Speed: 1333 MHz
Graphics
Graphics Processor Intel HD Graphics 3000
Could please anyone give me some hints or anything that could solve this problem?
Thank you
==============EDIT=============
I found in my very large strace log somewhere these two lines:
7671 sendmsg(17, {msg_name(0)=NULL, msg_iov(1)=[{"CHILD_PING\0", 11}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 11
7661 <... recvmsg resumed> {msg_name(0)=NULL, msg_iov(1)=[{"CHILD_PING\0", 12}], msg_controllen=32, [{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, {pid=7671, uid=0, gid=0}}], msg_flags=0}, 0) = 11
On top of that, somewhere near when the error happens (at the end of the log file, just before I quit the application) I see in the log file the following:
https://gist.github.com/Mcdane/2342d26923e554483237faf02cc7cfad

First, to get an impression of what is happening in the first place, I'd look with strace. Start your application with
strace -e network -o log.strace -f YOUR_APPLICATION
If your application looks for another running process to turn the work too, start it with parameters so it doesn't do that. For instance, for Chrome, pass in a --user-data-dir value that is different from your default.
Look for = 11 in the output file log.strace afterwards, and look what happened before and after. This will give you a rough picture of what is happening, and you can exclude silly mistakes like sendtos to 0.0.0.0 or so (For this reason, this is also very important information to include in a stackoverflow question, for instance by uploading the output to gist).
It may also be helpful to use Wireshark or another packet capture program to get a rough overview of what is being sent.
Assuming you can confirm with strace that a valid send call is taken place, you can then further analyze the error conditions.
Error 11 is EAGAIN. The documentation of send says when this error is supposed to happen:
EAGAIN (...) The socket is marked nonblocking and the requested operation would block. (...)
EAGAIN (Internet domain datagram sockets) The socket referred to by
sockfd had not previously been bound to an address and, upon
attempting to bind it to an ephemeral port, it was determined that all
port numbers in the ephemeral port range are currently in use. See
the discussion of /proc/sys/net/ipv4/ip_local_port_range in
ip(7).
Both conditions could apply.
The first will be obvious by the strace log if you trace the creation of the socket involved.
To exclude the second, you can run netstat -una (or, if you want to know the programs involved, sudo netstat -unap) to see which ports are open (if you want Stack Overflow users to look into it, post the output on gist or similar and link to it here). Your port range net.ipv4.ip_local_port_range=1024 65535 is not the standard 32768 60999; this looks like you attempted to do something about lacking port numbers already. It would help to trace back to the reason of why you changed that parameter, and the conditions that convinced you to do so.

Related

Apache Ignite 3 Beta 1 examples fail in ClientMessageDecoder.readMagic() checking for the MAGIC_BYTES

I'm running a new instance of the Apache Ignite 3 beta on Windows and am hoping someone might recognize the error reading the MAGIC_BYTES I'm seeing trying to run the examples.
The cluster starts successfully and I can connect via the CLI; e.g.: 'node status' shows [name: defaultNode, state: started]
However, when I attempt to run any of the examples, such as SqlJdbcExample, it fails in ClientMessageDecoder.readMagic(). In there it is attempting to read the 4 MAGIC_BYTES (representing the 4 ASCII characters 'INGI').
What I see instead are the bytes 0x01, 0x00, 0x03, 0x00
This ultimately results in an IgniteException:
IGN-CMN-65535 TraceId:xxxx Invalid magic header in thin client connection. Expected 'IGNI', but was '▯▯'.
(Note: in debug, if I read more bytes out of the buffer I can see those 4 bytes are followed by an ASCII newline character (0x12) then the ASCII 'defaultNode'.)
In the SqlJdbcExample when it is initializing the JDBC connection, I can see that is has called socket.send() with those MAGIC_BYTES IN TcpClientChannel.handshakeReq().
I am running the examples with no change to any configuration files and have set up the environment as per the documentation.
Set up Apache Ignite 3 beta and ran samples. They failed with
IgniteException: IGN-CMN-65535 TraceId:xxxx Invalid magic header in thin client connection. Expected 'IGNI', but was '▯▯'.
Verified as best I can that everything is configured correctly, but can't determine why I'm not picking up these bytes.
Ignite examples use port 10800 by default. Looks like something else is using that port on your machine, so Ignite server is on a different port.
Look into the log file in ignite3-db-3.0.0-beta1 directory (ignite3db-0.log file name), and search for Thin client protocol started successfully[port=10800] line. The port is likely different. Copy the port number and update the connection string in the example accordingly

MQTT Error BR_ERR_BAD_VERSION on Shelly 1PM with Tasmota

I try to connect a Shelly 1 PM smart power relay to a managed MQTT broker.
The firmware on the device is a custom-built Tasmota 8.3.1 from the dev branch with USE_MQTT_TLS enabled. The port is set correctly to 8883 for TLS and the broker service is running at mqtt.bosch-iot-hub.com
When the device boots up, I can see the log messages on the serial port as follows:
23:03:03 MQT: Connect failed to mqtt.bosch-iot-hub.com:8883, rc 4. Retry in 10 sec
23:03:14 MQT: Attempting connection...
23:03:14 MQT: TLS connection error: 0
Return Code 4 is, according to the Tasmota documentation (https://tasmota.github.io/docs/TLS/), the code for BR_ERR_BAD_VERSION
And this error constant seems to be from BearSSL and means "Incoming record version does not match the expected version." (according to http://sources.freebsd.org/HEAD/src/contrib/bearssl/tools/errors.c)
Using an online TLS testing tool and checking mqtt.bosch-iot-hub, it supports only TLS 1.2 (1.3, 1.1 and 1.0 being disabled as well as SSLv2 and SSLv3). BearSSL website states that it supports TLS 1.2
I tried setting the log level of Tasmota in my_user_config.h , but it does not log any more verbose or detailed information.
#define SERIAL_LOG_LEVEL LOG_LEVEL_DEBUG_MORE // [SerialLog] (LOG_LEVEL_NONE, LOG_LEVEL_ERROR, LOG_LEVEL_INFO, LOG_LEVEL_DEBUG, LOG_LEVEL_DEBUG_MORE)
What is the error message supposed to mean? Is it a TLS incompatibility of the BearSSL stack or on the service side?
How can I enable verbose logging on Tasmota to see detailed TLS handshake information?
Anything else I am missing?
I appreciate after 6 months the question may have been a little expired, however the error code is not the TLS one as you describe, but rather the return code for the MQTT connection, as described in
https://tasmota.github.io/docs/MQTT/#return-codes-rc
which means your error code corresponds to
4 MQTT_CONNECT_BAD_CREDENTIALS the username/password were rejected

Timeout during allocate while making RFC call

I am trying to create a SAP RFC connection to a new system.
AFAIK the firewall (in this case to port 3321) is open.
I get this message at the client:
RFC_COMMUNICATION_FAILURE (rc=1): key=RFC_COMMUNICATION_FAILURE, message=
LOCATION SAP-Gateway on host ax-swb-q06.prod.lokal / sapgw21
ERROR timeout during allocate
TIME Thu Jul 26 16:45:48 2018
RELEASE 753
COMPONENT SAP-Gateway
VERSION 2
RC 242
MODULE /bas/753_REL/src/krn/si/gw/gwr3cpic.c
LINE 2210
DETAIL no connect of TP sapdp21 from host 10.190.10.32 after 20 sec
COUNTER 3
[MSG: class=, type=, number=, v1-4:=;;;]
And this message on the SAP server
Any clue what needs to be done, to get RFC working?
With this little info no one can know what the issue is here.
But it is something related to your network and SAP system configuration.
I guess your firewall does some network address translation (NAT) and the new IP behind the firewall does not match anymore with the known one. SAP is doing some own IP / host name security checks.
If not already done, check with opening the ports 3221, 3321 and 4821 in the firewall. Also check the SAP gateway configuration which IP addresses and host names are configured to be valid ones for it (look at what is traced in the beginning of the gateway trace file dev_rd at ABAP side).
Also consider if maybe the usage of a SAProuter would be the better option for your needs.
it works in my case if ashost is the host name, and not an IP address!
Do not ask me why, but this fails:
Connection(user='x', passwd='...', ashost='10.190.10.32', sysnr='21', client='494')
But this works:
Connection(user='x', passwd='...', ashost='ax-swb-q06.prod.lokal', sysnr='21', client='494')
This is strange, since DNS resolution happens before TCP communication.
It seems that the ashost value gets used inside the connection. Strange. For most normal protocols (http, ftp, pop3, ...) this does not matter. Or you get at least a better error message.

apache2 processes stuck in sending reply - W

I am hosting multiple sites on a server with 7.5gb RAM. Using apache2 mpm_prefork.
Following command gives me a value of 200-300 in production
ps aux|grep -c 'apache2'
Using top i see only some hundred megabytes of RAM is free. Error log show nothing unusual. Is this much apache2 process normal?
MaxRequestWorkers is set to 512
Update:
Now i am using mod-status to check apache activity.
I have a row like this
Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request
0-0 29342 2/2/70 W 0.07 5702 0 3.0 0.00 1.67 XXX XXX /someurl
If i check again after sometime PID not changes and i get SS with greater value that previous time. M of this request is in 'W` sending reply state. So that means apache2 process locked in for that request?
On my VPS and root servers, the situation is partially similar. AFAIK the os tries to distribute most of the processing power/RAM to running processes and frees the resources for other processes as the need arises.

tftp retry timeout exceeded

My issue is retry count exceeds when I download kernel image to Econa processor board (Econa is ARM based processor) via TFTP as shown below
CNS3000 # tftp 0x4000000 bootpImage.cns3420.uclibc
MAC PORT 0 : Initialize bcm53115M
MAC PORT 2 : Initialize RTL8211
TFTP from server 192.168.0.219; our IP address is 192.168.0.112
Filename 'bootpImage.cns3420.uclibc'.
Load address: 0x4000000
Loading: T T T T T T T T T T
Retry count exceeded; starting again
Following are the points which may help you in finding the cause of this error.
Ping response is OK
CNS3000 # ping 192.168.0.219
MAC PORT 0 : Initialize bcm53115M
MAC PORT 2 : Initialize RTL8211
host 192.168.0.219 is alive
When I tried to verify TFTP is running, I tried as shown below. It seems TFTP server is working. I placed a small file in /tftpboot:
# echo "Hello, embedded world" > /tftpboot/hello.txt"
Then I did localhost
# tftp localhost
tftp> get hello.txt
Received 23 bytes in 0.1 seconds
tftp> quit
Please note that there is no firewall or SELinux on my machine.
Please verify location of these files are OK. I have placed kernel image file bootpImage.cns3420.uclibc in /tftpbootTFTP service file is located in /etc/xinetd.d/tftp.
My TFTP service file is:
service tftp
{
socket_type =dgram
protocol=udp
wait=yes
user=root
server=/usr/sbin/in.tftpd
server_args=-s /tftpboot -b 512
disable=no
per_source=11
cps=100 2
flags=ipv4
}
printenv response in U-boot is:
CNS3000 # printenv
bootargs=root=/dev/mtdblock0 mem=256M console=ttyS0
baudrate=38400
ethaddr=00:53:43:4F:54:54
netmask=255.255.0.0
tftp_bsize=512
udp_frag_size=512
mmc_init=mmcinit
loading=fatload mmc 0 0x4000000 bootpimage-82511
running=go 0x4000000
bootcmd=run mmc_init;run loading;run running
serverip=192.168.0.219
ipaddr=192.168.0.112
bootdelay=5
port=1
bootfile=/tftpboot/bootpImage.cns3420.uclibcl
stdin=serial
stdout=serial
stderr=serial
verify=n
Environment size: 437/4092 bytes
Regards
Waqas
Loading: T T T T T T T T T T
Means there is no transfer at all; this can be caused by wrong interface setting i.e.
u-boot is configured for 100Mbit full duplex, and you try to connect via half duplex or 10Mbit (or some mix of it). Another point is the MTU size, should be 1500 (u-boot cannot handle packet fragmentation)
Hint for windows/vmware users:
tftp timeouts from u-boot are caused by windows ip-forwarding.
1) If you have a home network : switch it of.
2) You are running Routing and Remote Access service : shut down service
3) check registry for ip forwarding:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\IPEnableRouter
set value to 0 (and maybe reboot)