IPVS (keepalived) doesn't balance UDP connections - udp

I have two load balancer with Debian 8 and three Graylog server with Debian 9.
Every server in my network sends logs via rsyslog to a virtual server configured on the LB. The connection is UDP.
The problem is that the packets are not balanced. (all connections goes on the first real server on the list)
In case of failover the packets are correctly sent to the others real servers.
The only way I found to re-balance the connection is to remove all real server from the LB and the restart keepalived service.
I already tied to set:
ipvsadm --set 0 0 1
Timeout (tcp tcpfin udp): 900 120 1
I already set these two variables:
echo 1 > /proc/sys/net/ipv4/vs/expire_nodest_conn
echo 1 > /proc/sys/net/ipv4/vs/expire_quiescent_template
IPVS is configure as follow:
vrrp_instance logserver {
state MASTER
interface eth0
virtual_router_id 195
priority 200
advert_int 1
authentication {
auth_type keepalived
auth_pass xxxxxx
}
virtual_ipaddress {
10.20.20.195/22
}
}
virtual_server 10.20.20.195 0 {
delay_loop 60
protocol UDP
lb_algo wrr
lb_kind DR
persistence_timeout 30
real_server 10.20.20.196 0 {
weight 100
MISC_CHECK {
connect_timeout 3
misc_path "/etc/keepalived/checkgraylog 10.20.20.196"
}
}
real_server 10.20.20.197 0 {
weight 100
MISC_CHECK {
connect_timeout 3
misc_path "/etc/keepalived/checkgraylog 10.20.20.197"
}
}
real_server 10.20.20.198 0 {
weight 100
MISC_CHECK {
connect_timeout 3
misc_path "/etc/keepalived/checkgraylog 10.20.20.198"
} } }
Is there a way to effective balance UDP connection with Direct Routing?
Thank you

virtual_server 10.20.20.195 12333 {
delay_loop 60
protocol UDP
lb_algo wrr
lb_kind DR
ops # <<< - Try this. Works for me (Ubuntu 18.04, Keepalived v1.3.9, ipvsadm v1.28)
real_server 10.20.20.196 12333 {
Option ops for me works only if either:
Virtual server port is explicitly defined.
fwmark is used together with in virtual_server definition.
Does not work for virtual_server_IP 0 form - in that case ipvsadm -Ln shows that persistent option is used as well.

Related

ASP.NET Core SignalR websocket connection limit

I produce load testing of SignalR (ASP.NET Core) application hosted at Windows Server 2016 standard using Microsoft.AspNetCore.SignalR.Client.
Dotnet core hosting 2.1.1 installed
And i can not create more than 3000 (2950-3050) connections.
Already tried recomendations as described here:
How to configure concurrency in .NET Core Web API?
Limiting performance factors of WebSocket in ASP.NET 4.5?
Set limit concurrent connections for websocket on iis 8
Added limits to UseKestrel (this seems to work if i set values to 100 or 1000):
var host = new WebHostBuilder()
.UseKestrel(options =>
{
options.Limits.MaxConcurrentConnections = 50000;
options.Limits.MaxConcurrentUpgradedConnections = 50000;
})
Changed all aspnet.config files by adding this:
<system.web>
<applicationPool maxConcurrentRequestsPerCPU="50000" />
</system.web>
Executed this command:
cd %windir%\System32\inetsrv\ appcmd.exe set config /section:system.webserver/serverRuntime /appConcurrentRequestLimit:50000
Added performance counter for Web Service\Current Connections - Maximum Connections. And Maximum Connections increases to 3300 and stops.
There are no exceptions in server logs. But I feel that there are some restrictions in system.
Server IIS logs contains only this:
GET /messageshub
id=A_3x1sH9kHM1Rc3oPSgP6w
80 - 172.20.192.11 - - 404 0 0 3
Client exceptions is basically the following:
System.Net.Http.HttpRequestException: Error while copying content to a
stream. ---> System.IO.IOException: Unable to read data from the
transport connection: An existing connection was forcibly closed by
the remote host.
On Windows you may have dynamic port assignment issue .
Windows by default has 5000 port numbers ready to be assigned to TCP connections and 1024 of them are reserved for the OS itself which you will end up with 3977 ports free to be assigned .
In your case the number is 3300 as you mentioned but it's possible that 3300 of the connections are established and 677 of them are Time_Waited.
In any case i recommend to use
netstat -an | find 'Established" -c
netstat -an | find 'TIME" -c
netstat -an | find 'CLOSED" -c
In order to figure out the number of established & time_wait & close_wait connections at the time you received the IO exception and if the number is close to 5000 just add this to your registry and reboot and test again
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \Services \Tcpip \Parameters]
MaxUserPort = 5000 (Default = 5000, Max = 65534)

HyperV Gen2 VM not booting over PXE

I have two VMs in HyperV, both on the same virtual switch (internal), on the same subnet. I am trying to set up one as a DHCP and TFTP server for PXE boot. With Gen1 machine, it's working fine with pxelinux. Gen2 with UEFI does not unfortunately work.
DHCP & TFTP Server
IP 192.168.1.2
VLAN identification is disabled
DHCP - ISC DHCP Server running in a docker container with "host" network type with the following configuration:
set vendorclass = option vendor-class-identifier;
option pxe-system-type code 93 = unsigned integer 16;
set pxetype = option pxe-system-type;
authoritative;
default-lease-time 7200;
max-lease-time 7200;
option tftp-server-name "192.168.1.2";
option bootfile-name "efi/core.efi";
subnet 192.168.1.0 netmask 255.255.255.0 {
interface "eth0:0";
option routers 192.168.1.1;
option subnet-mask 255.255.255.0;
range 192.168.1.100 192.168.1.150;
option broadcast-address 192.168.1.255;
option domain-name-servers 8.8.8.8, 8.8.4.4;
option domain-name "ad.lholota.net";
option domain-search "ad.lholota.net";
if substring(vendorclass, 0, 9)="PXEClient" {
if pxetype=00:06 or pxetype=00:07 {
filename "efi/core.efi";
} else {
filename "pxelinux/pxelinux.0";
}
}
next-server 192.168.1.2;
}
TFTP - tftp-hpa running in a docker container on a "host" type network. I can download the efi files manually through a standard tftp client.
Booting machine
HyperV Gen2
No virtual HDD or DVD
Firmware tab has only one item in the boot sequence - network
Secure boot is disabled
VLAN identification is disabled
Network adapter pointing into the same internal switch as the first VM
Enable virtual machine queue - checked
Enable IPsec task offloading - checked, maximum number: 512
MAC Address dynamic
Enable DHCP guard - NOT checked
Enable router advertisement guard - NOT checked
Procted network - NOT checked
Mirroring mode - None
Enable device naming - NOT checked
The trouble is that the machine doesn't even get to the TFTP server because it doesn't finish the DHCP Discover-Offer-Request-Ack flow. It gets stuck on offer as shown in the dhcpdump below. The booting machine never sends the request message. Funny enough, BIOS based Gen1 HyperV machine boots without any issue so the DHCP flow works there.
Can you please give me a hint of what might be wrong?
TIME: 2018-07-11 19:49:37.641
IP: 0.0.0.0 (0:15:5d:0:50:d0) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
OP: 1 (BOOTPREQUEST)
HTYPE: 1 (Ethernet)
HLEN: 6
HOPS: 0
XID: 8bf1c250
SECS: 0
FLAGS: 7f80
CIADDR: 0.0.0.0
YIADDR: 0.0.0.0
SIADDR: 0.0.0.0
GIADDR: 0.0.0.0
CHADDR: 00:15:5d:00:50:d0:00:00:00:00:00:00:00:00:00:00
SNAME: .
FNAME: .
OPTION: 53 ( 1) DHCP message type 1 (DHCPDISCOVER)
OPTION: 57 ( 2) Maximum DHCP message size 1472
OPTION: 55 ( 35) Parameter Request List 1 (Subnet mask)
2 (Time offset)
3 (Routers)
4 (Time server)
5 (Name server)
6 (DNS server)
12 (Host name)
13 (Boot file size)
15 (Domainname)
17 (Root path)
18 (Extensions path)
22 (Maximum datagram reassembly size)
23 (Default IP TTL)
28 (Broadcast address)
40 (NIS domain)
41 (NIS servers)
42 (NTP servers)
43 (Vendor specific info)
50 (Request IP address)
51 (IP address leasetime)
54 (Server identifier)
58 (T1)
59 (T2)
60 (Vendor class identifier)
66 (TFTP server name)
67 (Bootfile name)
97 (UUID/GUID)
128 (???)
129 (???)
130 (???)
131 (???)
132 (???)
133 (???)
134 (???)
135 (???)
OPTION: 97 ( 17) UUID/GUID 008c0c7ab81331a0 ...z..1.
4297445b2e41610e B.D[.Aa.
a8 .
OPTION: 94 ( 3) Client NDI 010300 ...
OPTION: 93 ( 2) Client System 0007 ..
OPTION: 60 ( 32) Vendor class identifier PXEClient:Arch:00007:UNDI:003000
---------------------------------------------------------------------------
TIME: 2018-07-11 19:49:37.641
IP: 0.0.0.0 (0:15:5d:0:50:12) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
OP: 2 (BOOTPREPLY)
HTYPE: 1 (Ethernet)
HLEN: 6
HOPS: 0
XID: 8bf1c250
SECS: 0
FLAGS: 7f80
CIADDR: 0.0.0.0
YIADDR: 192.168.1.105
SIADDR: 192.168.1.2
GIADDR: 0.0.0.0
CHADDR: 00:15:5d:00:50:d0:00:00:00:00:00:00:00:00:00:00
SNAME: .
FNAME: efi/core.efi.
OPTION: 53 ( 1) DHCP message type 2 (DHCPOFFER)
OPTION: 51 ( 4) IP address leasetime 7200 (2h)
OPTION: 1 ( 4) Subnet mask 255.255.255.0
OPTION: 3 ( 4) Routers 192.168.1.1
OPTION: 6 ( 8) DNS server 8.8.8.8,8.8.4.4
OPTION: 15 ( 14) Domainname ad.lholota.net
OPTION: 28 ( 4) Broadcast address 192.168.1.255
I have had what i believe is the same issue when booting HyperV virtual machines on win10 2004(19041.685): gen 1 works, gen 2 times out without ever asking for the boot file.
I strongly suspect this is an issue with the GEN2 UEFI PXE implementation. Because as soon as I have at least two entries to choose from in the pxe boot menu it requests files and downloads as expected.
I run dnsmasq for tftp and DHCP and my config file below works if and only if at least one of the last two rows are uncommented. (pxe-service=x86-64_EFI and pxe-service=7 are equal)
config context: https://linuxconfig.org/how-to-configure-a-raspberry-pi-as-a-pxe-boot-server
# /etc/dnsmasq.d/03-tftpboot.conf
enable-tftp
tftp-lowercase
tftp-root=/mnt/data/netboot
pxe-prompt="Choose:"
pxe-service=x86PC,"PXELINUX (BIOS)",bios/pxelinux.0
pxe-service=x86PC,"WinPE (BIOS)",boot/pxeboot.n12
pxe-service=x86-64_EFI,"PXELINUX (EFI)",efi64/syslinux.efi
pxe-service=x86-64_EFI,"winpe (EFI)",boot/wdsmgfw.efi
#pxe-service=7,"PXELINUX (EFI-7)",efi64/syslinux.efi
I think I am experiencing the same problem when using digital rebar provisioner. Works great on Gen 1 but not on Gen 2. Have followed the same configuration as well.
Looking at the digital rebar code it seems like it should work but does not: https://github.com/digitalrebar/provision/blob/8269e1c7ff12a82854c19eccd114d064e2278211/midlayer/pxe.go#L252
I think this could be related:
https://wiki.fogproject.org/wiki/index.php/BIOS_and_UEFI_Co-Existence
https://serverfault.com/questions/739138/hyper-v-2016-gen2-vm-pxe-dhcp-timeout-wireshark-dhcp-discover-offer

Broken pipe error on query from aerospike

i have namespace "test" and set "demo"
when i run "select * from test.demo" in aql terminal, i got this error. What exactly causes broken pipe?
and i got a warn message in server log below.
and my aerospike.conf is:
service {
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
proto-fd-max 15000
}
logging {
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode multicast
multicast-group 239.1.99.222
port 9918
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
replication-factor 2
memory-size 4G
default-ttl 30d # 30 days, use 0 to never expire/evict.
storage-engine memory
}
namespace bar {
replication-factor 2
memory-size 4G
default-ttl 30d # 30 days, use 0 to never expire/evict.
storage-engine memory
# To use file storage backing, comment out the line above and use the
# following lines instead.
# storage-engine device {
# file /opt/aerospike/data/bar.dat
# filesize 16G
# data-in-memory true # Store data in memory in addition to file.
# }
}
somebody can figure out the reason?
I think you are getting a socket error when trying to send the scan result to a socket that has already timedout on the client side.
Error: (-10) Socket read error: 11, [::1]:3000, 36006
By default the aql timeout is set to 1000ms
It could be bumped up to 100000ms using the -T command line option. (or using set timeout within the aql interactive mode)
aql -T 100000
-T, --timeout <ms> Set the timeout (ms) for commands. Default: 1000
This option is equivalent to setting TotalTimeout on other clients.
Setting the timeout higher should help, but doesn't answer why a basic scan would take so long.
Here is an example with setting different client timeouts, this shows the clients timing out prior to the scan result being received. In the logs you would see the TCP send error for scan.
WARNING (proto): (proto.c:693) send error - fd 32 Broken pipe
Details from aql console:
aql> set timeout 10
TIMEOUT = 10
aql> select * from test.demo
Error: (-10) Socket read error: 11, 127.0.0.1:3000, 58496
aql> select * from test.demo
Error: (-10) Socket read error: 115, 127.0.0.1:3000, 58498
aql> set timeout 100
TIMEOUT = 100
aql> select * from test.demo
Error: (-10) Socket read error: 115, 127.0.0.1:3000, 58492
aql> set timeout 1000
TIMEOUT = 1000
aql> select * from test.demo
+-----+-------+
| foo | bar |
+-----+-------+
| 123 | "abc" |
+-----+-------+
1 row in set (0.341 secs)
Its still a mystery why your aql client would timeout for returning 1 record, if default timeout was kept at 1000ms. Did you by any chance modify the timeout. Or have a huge number of records in the test namespace with null sets.

Aerospike heartbeat configuration for single server, error "Unable to find any suitable network device for node ID"

I want to run Aerospike server in single-server mode.
Now I have this configuration:
service {
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000
}
logging {
console {
context any info
}
}
network {
service {
address 127.0.0.1
port 3000
}
heartbeat {
mode multicast
multicast-group 239.1.99.222
port 9918
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
replication-factor 1
memory-size 20M
default-ttl 1d # 30 days, use 0 to never expire/evict.
storage-engine memory
}
And when I try to start server I got error in the log:
"Unable to find any suitable network device for node ID"
I don't want server to be available to internet.
How to achieve this and fix the issue?
The Node ID is generated using the MAC id of the interface on the host.
https://github.com/aerospike/aerospike-server/blob/master/cf/src/socket.c#L2470
If you dont have any of the default interface names that aerospike is aware of, then you might get this error.
To fix this problem, you can specify your interface name.
http://www.aerospike.com/docs/operations/troubleshoot/startup#problem-with-network-interface
To avoid exposing your aerospike node on internet, you can bind it only to localhost or to a private interface only or use other network tools/devices to avoid exposing the server port such as firewall or ACL. Best way to avoid exposing aerospike on internet is to ensure that the server hosting aerospike is not exposed to internet. If that is not doable then restrict your aerospike port access to your aerospike clients IP only using firewall. Also, you can use database credentials available in enterprise edition.
http://www.aerospike.com/docs/guide/security.html

Aerospike - “n_bytes_memory went negative” with in memory-only namespace with ttl

We have a namespace configured to store data in memory only with couple of minutes default ttl. After starting putting some data into it, when expiration kicks in, we're getting these messages in the log (a lot, for ~30% of expired records):
WARNING (namespace): (namespace.c::762) set_id 1 - n_bytes_memory went negative!
I have simple client app with server config that can reproduce this: https://github.com/akkomar/aerospike-test (it's based on docker and is very easy to start)
Any advice what might be the reason?
Edit:
I checked this on versions 3.6.4, 3.7.0.1 and 3.7.4
Configuration file used for testing (from https://github.com/akkomar/aerospike-test/blob/master/etc/aerospike.conf):
service {
user root
group root
paxos-single-replica-limit 1
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 1024
}
logging {
file /var/log/aerospike/aerospike.log {
context any info
}
console {
context any info
context namespace detail
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002
mesh-port 3002
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test_ns {
replication-factor 2
memory-size 1G
default-ttl 10S
storage-engine memory
}
Edit2:
It seems that it's happening only if I update records via UDF. The simplest one that reproduces this:
local VAL_KEY = "v"
function add_data(rec, val_to_add, ttl_to_set)
if aerospike:exists(rec) then
rec[VAL_KEY] = val_to_add
aerospike:update(rec)
else
rec[VAL_KEY] = val_to_add
aerospike:create(rec)
end
end
When I execute the same operation via Java API - everything seems to work fine (example github repo mentioned earlier is updated with Java API example)
The meaning of the error message is that the space we have accounted for the set in memory went to a negative number which should not be possible.
This has been logged in our internal bug tracking system for resolution in future releases
It turned out it was a bug in Aerospike.
It's fixed in version 3.7.4.1 (detailed explanation in https://discuss.aerospike.com/t/problem-with-expiring-records-in-memory-only-namespace-n-bytes-memory-went-negative/2560/6)