I have a lot of clients (around 4000).
Each client pings my server every 2 seconds.
Can these ping requests put a load on the server and slow it down?
How can I monitor this load?
Now the server response slowly but the processor is almost idle and the free memory is ok.
I'm running Apache on Ubuntu.
Assuming you mean a UDP/ICMP ping just to see if the host is alive, 4000 hosts probably isn't much load and is fairly easy to calculate. CPU and memory wise, ping is handled by you're kernel, and should be optimized to not take much resources. So, you need to look at network resources. The most critical point will be if you have a half-duplex link, because all of you're hosts are chatty, you'll cause alot of collisions and retransmissions (and dropped pings). If the links are all full duplex, let's calculate the actual amount of bandwidth required at the server.
4000 client #2 seconds
Each ping is 72 bytes on the wire (32 bytes data + 8 bytes ICMP header + 20 bytes IP header + 14 bytes Ethernet). * You might have some additional overhead if you use vlan tagging, or UDP based pings
If we can assume the pings are randomly distributed, we would have 2000 pings per second # 72 bytes = 144000 bytes
Multiple by 8 to get Bps = 1,152,000 bps or about 1.1Mbps.
On a 100Mbps Lan, this would be about 1.1% utilization just for the pings.
If this is a lan environment, I'd say this is basically no load at all, if it's going across a T1 then it's an immense amount of load. So you should basically run the same calculation on which network links may also be a bottle neck.
Lastly, if you're not using ICMP pings to check the host, but have an application level ping, you will have all the overhead of what protocol you are using, and the ping will need to go all the way up the protocol stack, and you're application needs to respond. Again, this could be a very minimal load, or it could be immense, depending on the implementation details and the network speed. If the host is idle, I doubt this is a problem for you.
Yes, they can. A ping request does not put much CPU load on, but it certainly takes up bandwidth and a nominal amount of CPU.
If you want to monitor this, you might use either tcpdump or wireshark, or perhaps set up a firewall rule and monitor the number of packets it matches.
The other problem apart from bandwidth is the CPU. If a ping is directed up to the CPU for processing, thousands of these can cause a load on any CPU. It's worth monitoring - but as you said yours is almost idle so it's probably going to be able to cope. Worth keeping in mind though.
Depending on the clients, ping packets can be different sizes - their payload could be just "aaaaaaaaa" but some may be "thequickbrownfoxjumpedoverthelazydog" - which is obviously further bandwidth requirements again.
Related
I have a dedicated 128GB ram server running memcached. 4 web servers connect to that one. They send a total of around
20k packets/sec.
Recently I decided to change connection from webservers to the memcached server from persistent SSH tunnels to using Tinc (for simplicity of setup and flexibility whenever I needed them to communicate on a new port).
This change has caused the overhead on the network roundtrip to increase significantly (see graphs). I noticed however, that the network overhead of using Tinc in favor of SSH-tunnels is a lot smaller (even faster than the previous SSH-tunnels!), when I use it for communicating between servers (e.g. my Postgresql database server), where the throughput is a lot lower < 10k packet per sec. I tried to distribute the memcached load between more servers, and suddenly the overhead from tinc/network dropped significantly.
Now, I do not understand WHY the tinc network overhead increases so dramatically, as the throughput goes up? It's like I hit some kind of bottle neck (and it defiantly is not CPU, since Newrelic report < 0.5% usage for the tinc process). Is there something I could tune in the Tinc setup, or is Tinc just a bad choice for high throughput? Should I use IPsec instead?
I was testing enqueue and dequeue rate of redis over the network which has 1Gbps LAN speed, and both the machines has 1Gbps ethernet card.
Redis version:3.2.11
lpush 1L items having 1 byte per item using python client.
Dequeuing items using rpop took around 55 secs over the network which is just 1800 dequeues sec. Whereas the same operation completes within 5 secs which I dequeue from local which is around 20,000 dequeues sec.
Enqueue rates are almost close to dequeue rate.
This is done using office network when no much usage are there. The same is observed on production environments too!
A drop of less than 3x over the network is accepted. Around 10x looks like I am doing something wrong.
Please suggest if I need to make any configuration changes on server or client side.
Thanks in Advance.
Retroactively replying in case anyone else discovers this question.
Round-trip latency and concurrency are likely your bottlenecks here. If all of the dequeue calls are in serial, then you are stacking that network latency. With 1 million calls at 2ms latency, you'd have at least 2 million ms of latency overhead, or 33 mins). This is to say that your application is waiting for the server to receive the payload, do something, and reply to acknowledge the operation was successful. Some redis clients also perform multiple calls to enqueue / dequeue a single job (pop & ack/del), potentially doubling that number.
The following link illustrates different approaches for using redis keys by different libraries (ruby's resque vs. clojure's carmine, pay note to the use of multiple redis commands that are executed on the redis server for a single message). This is likely the cause of the 10x vs. 3x performance you were expecting.
https://kirshatrov.com/2018/07/20/redis-job-queue/
An oversimplified example of two calls per msg dequeue (latency of 1ms and redis server operations take 1 ms):
|client | server
~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1ms | pop msg >--(1ms)--> receive pop request
2ms | [process request (1ms)]
3ms | receive msg <--(1ms)--< send msg to client
4ms | send del >--(1ms)--> receive del
5ms | [delete msg from queue (1ms)]
6ms | receive ack <--(1ms)--< reply with delete ack
Improving dequeue times often involves using a client that supports multi-threaded or multi-process concurrency (i.e. 10 concurrent workers would significantly reduce the overall time to completion). This ensures your network is better utilized by sending a stream of dequeue requests, instead of waiting for one request to complete before grabbing the next one.
As for 1 byte vs 500 bytes, the default TCP MTU is 1500 bytes. Subtracting TCP headers, the payload is ~ 1460 bytes (less if tunneling with GRE/IPsec, more if using jumbo frames). Since both payload sizes would fit in a single TCP packet, they will have similar performance characteristics.
A 1gbps ethernet interface can deliver anywhere between 81,274 and 1,488,096 packets per second (depending on payload size).
So really, it's a question of how many processes & threads you can run concurrently on the client to keep the network & redis server busy.
Redis is generally I/O bound, not CPU bound. It may be hitting network bandwidth limits. Given the small size of your messages most of the bandwidth may be eaten by TCP overhead.
On a local machine you are bound by memory bandwidth, which is much faster than your 1Gbps network bandwidth. You can likely increase network throughput by increasing the amount of data you grab at a time.
I have a LAMP server (Quad Core Debian with 4GB RAM, Apache 2.2 and PHP 5.3) with Rackspace which is used as an API Server. I would like to know what is the best KeepAlive option for Apache given our setup.
The API server hosts a single PHP file which responds with plain JSON. This is a fairly hefty file which performs some MySql reads/writes and quite a few Memcache lookups.
We have about 90 clients that are logged into the system at any one time.
Roughly 1/3rd of clients would be idle.
Of the active clients (roughly 60) they send a request to the API every 3 seconds.
Clients switch from active to idle and vice versa every 15 or 20 minutes or so.
With KeepAlive On, the server goes nuts and memory peaks at close to 4GB (swap is engaged etc).
With KeepAlive Off, the memory sits at 3GB however I notice that Apache is constantly killing and creating new processes to handle each connection.
So, my three options are:
KeepAlive On and KeepAliveTimeout Default - In this case I guess I will just need to get more RAM.
KeepAlive On and KeepAliveTimeout Low (perhaps 10 seconds?) If KeepAliveTimeout is set at 10 seconds, will a client maintain a constant connection to that one process by accessing the resource at regular 3 second intervals? When that client becomes idle for longer than 10 seconds will the process then be killed? If so I guess option 2 looks like the best one to go for?
KeepAlive Off This is clearly best for RAM, but will it have an impact on the response times due to the work involved in setting up a new process for each request?
Which option is best?
It looks like your php script is leaking memory. Before making them long running processes you should get to grips with that.
If you have not a good idea of the memory usage per request and from request to request adding memory is not a real solution. It might help for now and break again next week.
I would keep running separate processes till memory management is under control. If you have response problems currently your best bet is add another server to spread load.
The very first thing you should be checking is whether the clients are actually using the keepalive functioality at all. I'm not sure what you mean by an 'API server' but if its some sort of webservice then (IME) its rather difficult to implement well behaved clients using keepalives.(See %k directive for mod_log_config).
ALso, we really need to know what your objectives and constraints are? Performance / capacity / low cost?
Is this running over HTTP or HTTPS - there's a big difference in latency.
I'd have said that a keeplive time of 10 seconds is ridiculously high - not low at all.
Even if you've got 90 clients holding connections open, 4Gb seems a rather large amount of memory for them to be using - I'e run systems with 150-200 concurrent connections to complex PHP scripts using approx 0.5Gb over resting usage. Your figures of 250 + 90 x 20M only gives you a footprint of about 2Gb (I know is not that simple - but its not much more complicated).
For the figures you've given I wouldn't expect any benefit - but a significantly bigger memory footprint - using anything over 5 seconds for the keepalive. You could probably use a keepalive time of 2 seconds without any significant loss of throughput, But there's no substitute for measuring the effectiveness of various configs - and analysing the data to find the optimal config.
Certainly if you find that your clients are able to take advantage of keepalives and get a measurable benefit from doing so then you need to find the best way of accomodating that. Using a threaded server might help a little with memory usage, but you'll probably find a lot more benefit in running a reverse proxy in front of the webserver - particularly which SSL.
Besides that you may get significant benefits through normal tuning - code profiling, output compression etc.
Instead of managing the KeepAlive settings, which clearly have no real advantage in your particular situation between the 3 options, you should consider switching the Apache to an event or a thread based MPM where you could easily use KeepAlive On and set the Timeout value high.
I would go as far as also considering the switch to Apache on Windows. The benefit here is that it's MPM is completely thread based and takes advantage of Windows preference for threads over processes. You can easily do 512 threads with KeepAlive On and Timeout of 3-10 seconds on 1-2GB of RAM.
WampDeveloper Pro -
Xampp -
WampServer
Otherwise, your only other options are to switch MPM from Prefork to Worker...
http://httpd.apache.org/docs/2.2/mod/worker.html
Or to Event (which also got better with Apache 2.4)...
http://httpd.apache.org/docs/2.2/mod/event.html
I'm writing both client and server code using WCF, where I need to know the "perceived" bandwidth of traffic between the client and server. I could use ping statistics to gather this information separately, but I wonder if there is a way to configure the channel stack in WCF so that the same statistics can be gathered simultaneously while performing my web service invocations. This would be particularly useful in cases where ICMP is disabled (e.g. ping won't work).
In short, while making my regular business-related web service calls (REST calls to be precise), is there a way to collect connection speed data implicitly?
Certainly I could time the web service round trip, compared to the size of data used in the round-trip, to give me an idea of throughput - but I won't know how much of that perceived bandwidth was network related, or simply due to server-processing latency. I could perhaps solve that by having the server send back a time delta, representing server latency, so that the client can compute the actual network traffic time. If a more sophisticated approach is not available, that might be my answer...
The ICMP was not created with the intention of trying those connection speed statistics, but rather if a valid connection was made between two hosts.
My best guess is that the amount of data sent in those REST calls or ICMP traffic is not enough to calculate a perceived connection speed / bandwidth.
If you calculate by these metrics, you will get very big bandwidth statistics or very low, use as an example the copy box in windows XP. You need a constant and substantial amount of data to be sent in order to calculate valid throughput statistics.
The only time we notice this value appears to be when the service crashes because the value is too low. The quick way to fix this is to set it to some very large number. Then no problem.
What I was wondering about is are there any negative consiquences to setting this value high?
I can see that it can potentially give some protection from a denial of service attack, but does it have any other function?
It helps limit the strain on your WCF server. If you allow 1'000 connections, and each connection is allowed to send you 1 MB of data - you potentially need 1 GB of RAM in your server - or a lot of swapping / trashing might occur.
The limit on the message size (and the limit on the concurrent connections / calls) helps keep that RAM usage (and also CPU usage) to a manageable level.
It also allows you to scale, depending on your server. If you have a one-core CPU and 4 GB or RAM, you probably won't be able to handle quite as much traffic as if you have a 16-way CPU and 32 GB of RAM or more. With the various settings, including the MaxReceivedMessageSize, you can tweak your WCF environment to the capabilities of your underlying hardware.
And of course, as you already mention: many settings in WCF are kept OFF or set to a low value specifically to thwart malicious users from flooding your server with DoS attacks and shutting it down.