Size of a collectd packet - collectd

Can anyone help me find the size of collectd packets?
For example, I have the following plugins activated:
df
load
network
ping
swap
memory
I have an interval of ten seconds. I would like to find out the size of the collectd packets sent to the host system after every ten seconds.

Related

Huge packet loss while performing l2 forwarding using packet gen

Am facing severe packet loss while performing Packet forwarding using Packet gen DPDK. Any packets equal to 500 and above that results in huge packet loss. I have learnt about DPDK architecutre from its main site. I have changed MBUF values and also cahce size, but still no luck. Any help would be much apprecitated. I am uploading every single image to get an idea on what I am doing.
Packet Gen DPDK VM Settings
L2FWD VM

Redis dequeue rate 10x slower over the network

I was testing enqueue and dequeue rate of redis over the network which has 1Gbps LAN speed, and both the machines has 1Gbps ethernet card.
Redis version:3.2.11
lpush 1L items having 1 byte per item using python client.
Dequeuing items using rpop took around 55 secs over the network which is just 1800 dequeues sec. Whereas the same operation completes within 5 secs which I dequeue from local which is around 20,000 dequeues sec.
Enqueue rates are almost close to dequeue rate.
This is done using office network when no much usage are there. The same is observed on production environments too!
A drop of less than 3x over the network is accepted. Around 10x looks like I am doing something wrong.
Please suggest if I need to make any configuration changes on server or client side.
Thanks in Advance.
Retroactively replying in case anyone else discovers this question.
Round-trip latency and concurrency are likely your bottlenecks here. If all of the dequeue calls are in serial, then you are stacking that network latency. With 1 million calls at 2ms latency, you'd have at least 2 million ms of latency overhead, or 33 mins). This is to say that your application is waiting for the server to receive the payload, do something, and reply to acknowledge the operation was successful. Some redis clients also perform multiple calls to enqueue / dequeue a single job (pop & ack/del), potentially doubling that number.
The following link illustrates different approaches for using redis keys by different libraries (ruby's resque vs. clojure's carmine, pay note to the use of multiple redis commands that are executed on the redis server for a single message). This is likely the cause of the 10x vs. 3x performance you were expecting.
https://kirshatrov.com/2018/07/20/redis-job-queue/
An oversimplified example of two calls per msg dequeue (latency of 1ms and redis server operations take 1 ms):
|client | server
~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1ms | pop msg >--(1ms)--> receive pop request
2ms | [process request (1ms)]
3ms | receive msg <--(1ms)--< send msg to client
4ms | send del >--(1ms)--> receive del
5ms | [delete msg from queue (1ms)]
6ms | receive ack <--(1ms)--< reply with delete ack
Improving dequeue times often involves using a client that supports multi-threaded or multi-process concurrency (i.e. 10 concurrent workers would significantly reduce the overall time to completion). This ensures your network is better utilized by sending a stream of dequeue requests, instead of waiting for one request to complete before grabbing the next one.
As for 1 byte vs 500 bytes, the default TCP MTU is 1500 bytes. Subtracting TCP headers, the payload is ~ 1460 bytes (less if tunneling with GRE/IPsec, more if using jumbo frames). Since both payload sizes would fit in a single TCP packet, they will have similar performance characteristics.
A 1gbps ethernet interface can deliver anywhere between 81,274 and 1,488,096 packets per second (depending on payload size).
So really, it's a question of how many processes & threads you can run concurrently on the client to keep the network & redis server busy.
Redis is generally I/O bound, not CPU bound. It may be hitting network bandwidth limits. Given the small size of your messages most of the bandwidth may be eaten by TCP overhead.
On a local machine you are bound by memory bandwidth, which is much faster than your 1Gbps network bandwidth. You can likely increase network throughput by increasing the amount of data you grab at a time.

Redis hotspot concerning client requests

I want to avoid hot spot only in case of client requests. What criteria should I take into account?
Some papers define this threshold in 500 QPS (read) but i want something that based on some metrics in a real scenario. In my case when client request reach a threshold on a master node, i migrate the keys to other master (that do not exceed this threshold) and redirect the client there and the number of requests.
Can i define in Redis a threshold based on number of requests in every instance?
After several experiments, i find a solution. The threshold selected on the basis of the response time. As presented in below Figure, the response time significantly increased in case of Request Rate > 20000.
My machine has the following configuration:
Ubuntu 14.04 LTS 64-bit
Intel® CoreTM i5-4570 CPU # 3.20GHz × 4
7,7 GiB RAM

Better way to scale out logstash and balance loading?

The question originated from: https://groups.google.com/forum/#!topic/logstash-users/cYv8ULhHeE0
By comparing below logstash scale out strategies, tcp load balancer has best performance if traffics/cpu load are balanced.
However, it seems hard to balance traffic all the time due to nature of logstash-forwarder <-> logstash tcp connections.
Anyone got better idea to make traffic/cpu load more balanced across logstash nodes?
Thanks for advise :)
< My scenario >
10+ service node equipped with logstash-forwarder to forward logs to central logstash node(cluster)
each service node's log average throughput, throughput daily distribution, log type's filter complexity varies a lot
log average throughput: e.g. service_1: 0.5k event/s; service_2: 5k event/s
throughput daily distribution: e.g. service_1 peak at morning, service_2's peak at night
log type's filter complexity: by consuming 100% single logstash node's CPU, service_1's log type can be processed at 300 event/s, while service_2's log type is 1500 event/s
< TCP load balancer >
Since tcp connection are persistence between logstash-forwarder and logstash, which means, whether eventually the tcp connection amount are balanced or distributed by least connection, least load, across all logstash nodes. It doesn't guarantee traffics/cpu load are balanced across all logstash nodes. According to my scenario, each tcp connection's traffic varies on daily average, over time, and it's event complexity.
So in worse case, let's say, logstash_1 and logstash_2 both has 10 tcp connection, but logstash_1's cpu load might 3x more than logstash_2 since logstash_1's connection contains higher traffic, complexer event.
< Manual assign logstash-forwarders to logstash >
Might face the same situation as of TCP load balancer, since we can plan to distributing load based on historical daily average traffic, but it changed over time ,and no HA of course.
< message queue >
architecture as: service node with logstash-forwarder -> queuer: logstash to rabbitmq -> indexer: logstash from rabbitmq and to ElasticSearch
around 30% of CPU overhead on sending message to or receiving message from queue broker, for all nodes.
I’ll focus on one aspect of your question; which is load-balancing a RabbitMQ cluster. RabbitMQ clusters always consist of a single Master node, and 0…n Slave nodes. It is therefore favourable to force connections to the Master node, rather than implement round-robin, leastconn, etc.
RabbitMQ will automatically route traffic directly to the Master node, even if your Load Balancer routes to a different node. This posts explains the conceptin greater detail.

Do ping requests put a load on a server?

I have a lot of clients (around 4000).
Each client pings my server every 2 seconds.
Can these ping requests put a load on the server and slow it down?
How can I monitor this load?
Now the server response slowly but the processor is almost idle and the free memory is ok.
I'm running Apache on Ubuntu.
Assuming you mean a UDP/ICMP ping just to see if the host is alive, 4000 hosts probably isn't much load and is fairly easy to calculate. CPU and memory wise, ping is handled by you're kernel, and should be optimized to not take much resources. So, you need to look at network resources. The most critical point will be if you have a half-duplex link, because all of you're hosts are chatty, you'll cause alot of collisions and retransmissions (and dropped pings). If the links are all full duplex, let's calculate the actual amount of bandwidth required at the server.
4000 client #2 seconds
Each ping is 72 bytes on the wire (32 bytes data + 8 bytes ICMP header + 20 bytes IP header + 14 bytes Ethernet). * You might have some additional overhead if you use vlan tagging, or UDP based pings
If we can assume the pings are randomly distributed, we would have 2000 pings per second # 72 bytes = 144000 bytes
Multiple by 8 to get Bps = 1,152,000 bps or about 1.1Mbps.
On a 100Mbps Lan, this would be about 1.1% utilization just for the pings.
If this is a lan environment, I'd say this is basically no load at all, if it's going across a T1 then it's an immense amount of load. So you should basically run the same calculation on which network links may also be a bottle neck.
Lastly, if you're not using ICMP pings to check the host, but have an application level ping, you will have all the overhead of what protocol you are using, and the ping will need to go all the way up the protocol stack, and you're application needs to respond. Again, this could be a very minimal load, or it could be immense, depending on the implementation details and the network speed. If the host is idle, I doubt this is a problem for you.
Yes, they can. A ping request does not put much CPU load on, but it certainly takes up bandwidth and a nominal amount of CPU.
If you want to monitor this, you might use either tcpdump or wireshark, or perhaps set up a firewall rule and monitor the number of packets it matches.
The other problem apart from bandwidth is the CPU. If a ping is directed up to the CPU for processing, thousands of these can cause a load on any CPU. It's worth monitoring - but as you said yours is almost idle so it's probably going to be able to cope. Worth keeping in mind though.
Depending on the clients, ping packets can be different sizes - their payload could be just "aaaaaaaaa" but some may be "thequickbrownfoxjumpedoverthelazydog" - which is obviously further bandwidth requirements again.