Tinc/SHH/IPSec: tuning for high throughput

Tinc/SHH/IPSec: tuning for high throughput - ssh

I have a dedicated 128GB ram server running memcached. 4 web servers connect to that one. They send a total of around
20k packets/sec.
Recently I decided to change connection from webservers to the memcached server from persistent SSH tunnels to using Tinc (for simplicity of setup and flexibility whenever I needed them to communicate on a new port).
This change has caused the overhead on the network roundtrip to increase significantly (see graphs). I noticed however, that the network overhead of using Tinc in favor of SSH-tunnels is a lot smaller (even faster than the previous SSH-tunnels!), when I use it for communicating between servers (e.g. my Postgresql database server), where the throughput is a lot lower < 10k packet per sec. I tried to distribute the memcached load between more servers, and suddenly the overhead from tinc/network dropped significantly.
Now, I do not understand WHY the tinc network overhead increases so dramatically, as the throughput goes up? It's like I hit some kind of bottle neck (and it defiantly is not CPU, since Newrelic report < 0.5% usage for the tinc process). Is there something I could tune in the Tinc setup, or is Tinc just a bad choice for high throughput? Should I use IPsec instead?

Related

RabbitMQ poor performance

We are facing bad performance in our RabbitMQ clusters. Even when idle.
Once installed the rabbitmq-top plugin, we see many processes with very high reductions/sec. 100k and more!
Questions:
What does it mean?
How to control it?
What might be causing such slowness without any errors?
Notes:
Our clusters are running on Kubernetes 1.15.11
We allocated 3 nodes, each with 8 CPU and 8 GB limits. Set vm_watermark to 7G. Actual usage is ~1.5 CPU and 1 GB RAM
RabbitMQ 3.8.2. Erlang 22.1
We don't have many consumers and producers. The slowness is also on a fairly idle environment
The rabbitmqctl status is very slow to return details (sometimes 2 minutes) but does not show any errors

After some more investigation, we found the actual reason was made up of two issues.
RabbitMQ (Erlang) run time configuration by default (using the bitnami helm chart) assigns only a single scheduler. This is good for some simple app with a few concurrent connections. Production grade with 1000s of connections have to use many more schedulers. Bumping up from 1 to 8 schedulers improved throughput dramatically.
Our monitoring that was hammering RabbitMQ with a lot of requests per seconds (about 100/sec). The monitoring hits the aliveness-test, which creates a connection, declares a queue (not mirrored), publishes a message and then consumes that message. Disabling the monitoring reduced load dramatically. 80%-90% drop in CPU usage and the reductions/sec also dropped by about 90%.
References
Performance:
https://www.rabbitmq.com/runtime.html#scheduling
https://www.rabbitmq.com/blog/2020/06/04/how-to-run-benchmarks/
https://www.rabbitmq.com/blog/2020/08/10/deploying-rabbitmq-to-kubernetes-whats-involved/
https://www.rabbitmq.com/runtime.html#cpu-reduce-idle-usage
Monitoring:
http://rabbitmq.1065348.n5.nabble.com/RabbitMQ-API-aliveness-test-td32723.html
https://groups.google.com/forum/#!topic/rabbitmq-users/9pOeHlhQoHA
https://www.rabbitmq.com/monitoring.html

Optimise play framework instance for 64mb server

I trying to have the best optimisation in my play
framework server.
I try to optimize with :
%prod.jvm.memory=-server -Xms64m -Xmx128m -Xoptimize
# Jobs executor
# ~~~~~~
# Size of the Jobs pool
play.jobs.pool=2
# Execution pool
# ~~~~~
# Default to 1 thread in DEV mode or (nb processors + 1) threads in
PROD mode.
# Try to keep a low as possible. 1 thread will serialize all requests
(very useful for debugging purpose)
play.pool=5
However I did not success to have good perf on 256 mb server. it seems
that http://www.playframework.org/ run on 64mb server and it work
fine. How it is possible ? Have I missed something in optimization?

What do you mean by 256 mb server ? If 256 mb is all the ram of your server, it is not enough.
When you do -Xmx64M you set a maximum limit for your heap size but java also needs memory for native, classloading, threads.
You also need memory for your os.
From my experience, 256 Mb is the lower limit for one java process.

There could be many reasons external to Play that impact performance:
Server too busy (too many processes competing for CPU)
Not enough RAM and server doing Swapping (performance killer)
Slow connection that adds extra delay
You may also have issues in your application:
- Your application is getting too many requests and it requires more RAM to manage the clients
- You are creating too many objects in memory while processing requests, taking most of the RAM (and triggering many GC)
- Connection to database is slow and delays responses
To be honest, there are many reasons why your app may eb slow, many related to your implementation or the server. You'll need to monitor and see what's the issue by yourself (or give us much more data on server performance, ram, swap, i/o, your code, etc)

Apache KeepAlive on API Server

I have a LAMP server (Quad Core Debian with 4GB RAM, Apache 2.2 and PHP 5.3) with Rackspace which is used as an API Server. I would like to know what is the best KeepAlive option for Apache given our setup.
The API server hosts a single PHP file which responds with plain JSON. This is a fairly hefty file which performs some MySql reads/writes and quite a few Memcache lookups.
We have about 90 clients that are logged into the system at any one time.
Roughly 1/3rd of clients would be idle.
Of the active clients (roughly 60) they send a request to the API every 3 seconds.
Clients switch from active to idle and vice versa every 15 or 20 minutes or so.
With KeepAlive On, the server goes nuts and memory peaks at close to 4GB (swap is engaged etc).
With KeepAlive Off, the memory sits at 3GB however I notice that Apache is constantly killing and creating new processes to handle each connection.
So, my three options are:
KeepAlive On and KeepAliveTimeout Default - In this case I guess I will just need to get more RAM.
KeepAlive On and KeepAliveTimeout Low (perhaps 10 seconds?) If KeepAliveTimeout is set at 10 seconds, will a client maintain a constant connection to that one process by accessing the resource at regular 3 second intervals? When that client becomes idle for longer than 10 seconds will the process then be killed? If so I guess option 2 looks like the best one to go for?
KeepAlive Off This is clearly best for RAM, but will it have an impact on the response times due to the work involved in setting up a new process for each request?
Which option is best?

It looks like your php script is leaking memory. Before making them long running processes you should get to grips with that.
If you have not a good idea of the memory usage per request and from request to request adding memory is not a real solution. It might help for now and break again next week.
I would keep running separate processes till memory management is under control. If you have response problems currently your best bet is add another server to spread load.

The very first thing you should be checking is whether the clients are actually using the keepalive functioality at all. I'm not sure what you mean by an 'API server' but if its some sort of webservice then (IME) its rather difficult to implement well behaved clients using keepalives.(See %k directive for mod_log_config).
ALso, we really need to know what your objectives and constraints are? Performance / capacity / low cost?
Is this running over HTTP or HTTPS - there's a big difference in latency.
I'd have said that a keeplive time of 10 seconds is ridiculously high - not low at all.
Even if you've got 90 clients holding connections open, 4Gb seems a rather large amount of memory for them to be using - I'e run systems with 150-200 concurrent connections to complex PHP scripts using approx 0.5Gb over resting usage. Your figures of 250 + 90 x 20M only gives you a footprint of about 2Gb (I know is not that simple - but its not much more complicated).
For the figures you've given I wouldn't expect any benefit - but a significantly bigger memory footprint - using anything over 5 seconds for the keepalive. You could probably use a keepalive time of 2 seconds without any significant loss of throughput, But there's no substitute for measuring the effectiveness of various configs - and analysing the data to find the optimal config.
Certainly if you find that your clients are able to take advantage of keepalives and get a measurable benefit from doing so then you need to find the best way of accomodating that. Using a threaded server might help a little with memory usage, but you'll probably find a lot more benefit in running a reverse proxy in front of the webserver - particularly which SSL.
Besides that you may get significant benefits through normal tuning - code profiling, output compression etc.

Instead of managing the KeepAlive settings, which clearly have no real advantage in your particular situation between the 3 options, you should consider switching the Apache to an event or a thread based MPM where you could easily use KeepAlive On and set the Timeout value high.
I would go as far as also considering the switch to Apache on Windows. The benefit here is that it's MPM is completely thread based and takes advantage of Windows preference for threads over processes. You can easily do 512 threads with KeepAlive On and Timeout of 3-10 seconds on 1-2GB of RAM.
WampDeveloper Pro -
Xampp -
WampServer
Otherwise, your only other options are to switch MPM from Prefork to Worker...
http://httpd.apache.org/docs/2.2/mod/worker.html
Or to Event (which also got better with Apache 2.4)...
http://httpd.apache.org/docs/2.2/mod/event.html

What is the point of WCF MaxReceivedMessageSize

The only time we notice this value appears to be when the service crashes because the value is too low. The quick way to fix this is to set it to some very large number. Then no problem.
What I was wondering about is are there any negative consiquences to setting this value high?
I can see that it can potentially give some protection from a denial of service attack, but does it have any other function?

It helps limit the strain on your WCF server. If you allow 1'000 connections, and each connection is allowed to send you 1 MB of data - you potentially need 1 GB of RAM in your server - or a lot of swapping / trashing might occur.
The limit on the message size (and the limit on the concurrent connections / calls) helps keep that RAM usage (and also CPU usage) to a manageable level.
It also allows you to scale, depending on your server. If you have a one-core CPU and 4 GB or RAM, you probably won't be able to handle quite as much traffic as if you have a 16-way CPU and 32 GB of RAM or more. With the various settings, including the MaxReceivedMessageSize, you can tweak your WCF environment to the capabilities of your underlying hardware.
And of course, as you already mention: many settings in WCF are kept OFF or set to a low value specifically to thwart malicious users from flooding your server with DoS attacks and shutting it down.

Do ping requests put a load on a server?

I have a lot of clients (around 4000).
Each client pings my server every 2 seconds.
Can these ping requests put a load on the server and slow it down?
How can I monitor this load?
Now the server response slowly but the processor is almost idle and the free memory is ok.
I'm running Apache on Ubuntu.

Assuming you mean a UDP/ICMP ping just to see if the host is alive, 4000 hosts probably isn't much load and is fairly easy to calculate. CPU and memory wise, ping is handled by you're kernel, and should be optimized to not take much resources. So, you need to look at network resources. The most critical point will be if you have a half-duplex link, because all of you're hosts are chatty, you'll cause alot of collisions and retransmissions (and dropped pings). If the links are all full duplex, let's calculate the actual amount of bandwidth required at the server.
4000 client #2 seconds
Each ping is 72 bytes on the wire (32 bytes data + 8 bytes ICMP header + 20 bytes IP header + 14 bytes Ethernet). * You might have some additional overhead if you use vlan tagging, or UDP based pings
If we can assume the pings are randomly distributed, we would have 2000 pings per second # 72 bytes = 144000 bytes
Multiple by 8 to get Bps = 1,152,000 bps or about 1.1Mbps.
On a 100Mbps Lan, this would be about 1.1% utilization just for the pings.
If this is a lan environment, I'd say this is basically no load at all, if it's going across a T1 then it's an immense amount of load. So you should basically run the same calculation on which network links may also be a bottle neck.
Lastly, if you're not using ICMP pings to check the host, but have an application level ping, you will have all the overhead of what protocol you are using, and the ping will need to go all the way up the protocol stack, and you're application needs to respond. Again, this could be a very minimal load, or it could be immense, depending on the implementation details and the network speed. If the host is idle, I doubt this is a problem for you.

Yes, they can. A ping request does not put much CPU load on, but it certainly takes up bandwidth and a nominal amount of CPU.
If you want to monitor this, you might use either tcpdump or wireshark, or perhaps set up a firewall rule and monitor the number of packets it matches.

The other problem apart from bandwidth is the CPU. If a ping is directed up to the CPU for processing, thousands of these can cause a load on any CPU. It's worth monitoring - but as you said yours is almost idle so it's probably going to be able to cope. Worth keeping in mind though.
Depending on the clients, ping packets can be different sizes - their payload could be just "aaaaaaaaa" but some may be "thequickbrownfoxjumpedoverthelazydog" - which is obviously further bandwidth requirements again.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas