Better way to scale out logstash and balance loading? - load-balancing

The question originated from: https://groups.google.com/forum/#!topic/logstash-users/cYv8ULhHeE0
By comparing below logstash scale out strategies, tcp load balancer has best performance if traffics/cpu load are balanced.
However, it seems hard to balance traffic all the time due to nature of logstash-forwarder <-> logstash tcp connections.
Anyone got better idea to make traffic/cpu load more balanced across logstash nodes?
Thanks for advise :)
< My scenario >
10+ service node equipped with logstash-forwarder to forward logs to central logstash node(cluster)
each service node's log average throughput, throughput daily distribution, log type's filter complexity varies a lot
log average throughput: e.g. service_1: 0.5k event/s; service_2: 5k event/s
throughput daily distribution: e.g. service_1 peak at morning, service_2's peak at night
log type's filter complexity: by consuming 100% single logstash node's CPU, service_1's log type can be processed at 300 event/s, while service_2's log type is 1500 event/s
< TCP load balancer >
Since tcp connection are persistence between logstash-forwarder and logstash, which means, whether eventually the tcp connection amount are balanced or distributed by least connection, least load, across all logstash nodes. It doesn't guarantee traffics/cpu load are balanced across all logstash nodes. According to my scenario, each tcp connection's traffic varies on daily average, over time, and it's event complexity.
So in worse case, let's say, logstash_1 and logstash_2 both has 10 tcp connection, but logstash_1's cpu load might 3x more than logstash_2 since logstash_1's connection contains higher traffic, complexer event.
< Manual assign logstash-forwarders to logstash >
Might face the same situation as of TCP load balancer, since we can plan to distributing load based on historical daily average traffic, but it changed over time ,and no HA of course.
< message queue >
architecture as: service node with logstash-forwarder -> queuer: logstash to rabbitmq -> indexer: logstash from rabbitmq and to ElasticSearch
around 30% of CPU overhead on sending message to or receiving message from queue broker, for all nodes.

I’ll focus on one aspect of your question; which is load-balancing a RabbitMQ cluster. RabbitMQ clusters always consist of a single Master node, and 0…n Slave nodes. It is therefore favourable to force connections to the Master node, rather than implement round-robin, leastconn, etc.
RabbitMQ will automatically route traffic directly to the Master node, even if your Load Balancer routes to a different node. This posts explains the conceptin greater detail.

Related

RabbitMQ clustering

I have created RabbitMQ cluster on single windows machine with HA policy to all and created two DISC and two RAM node and 1 STAT node. I then ran the PerfTest (rabbitmq client test utility), the result were disappointing, it was around 5000m/sec. But when I ran the same test with single RabbitMQ node it gave me good result i.e. 25000m/sec. I am unable to get what wrong is happening, its result should be impressive if run within cluster, but it is opposite. Anyone have encounter the same or if know the reason behind it.
Thanks
A RabbitMQ Cluster with Mirrored Queues won't go faster than a single node. Why? Clustering is there to improve reliability and fault tolerance, not to improve throughput.
What's the reason for this? When you enable mirrored queues, RabbitMQ needs to coordinate state between nodes, that is, it needs to coordinate publishes, consumers and acks, to not deliver the same message more than once, or to more than one consumer. All this coordination affects performance, but that's the tradeoff with this kind of replication.
If you need decentralised replication, then you could use the Federation Plugin
The throughput rate would depend on couple of factors. In our perf tests for RabbitMQ in a cluster we observed that the rate varied depending on RabbitMQ nodes were DISC or RAM, but a big chunk of the performance variation was observed when running RabbitMQ Cluster with Mirrored Queues vs without. With Mirroring enabled we were seeing a rate of 3500 m/sec, while without it was 5000 m/sec. Also what is your message size when you run your perftest.
As is typical with RabbitMQ, it really depends. Here are a few ways that I have found to improve performance with RabbitMQ clustering:
Push the messages to a set of appropriately sized memory nodes only using a load balancer
Keep the message size very small
Do not use amqp transactions or Publisher Confirms
Only use HA Mirrored queues for a small set of queues that you absolutely have to have the data saved
Set a TTL on all messages or queues using a policy
Just to addon to above comments.. Putting it as FYI
http://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/
http://www.rabbitmq.com/blog/2012/04/25/rabbitmq-performance-measurements-part-2/
The problems is that you are running a cluster on the same machine with the same resources.
The purpose of a rabbit cluster is to scale out and not scale in.
In other words, to have more network connections available, more disk power of course more CPU power to handle more messages.
When adding nodes on a single machine you don't scale your resources plus you are adding overheads of using a cluster. (As stated above)

When will LogStash exceed the queue capacity and drop messages?

I am using LogStash to collect the logs from my service. The volume of the data is so large (20GB/day) that I am afraid that some of the data will be dropped at peak time.
So I asked question here and decided to add a Redis as a buffer between ELB and LogStash to prevent data loss.
However, I am curious about when will LogStash exceed the queue capacity and drop messages?
Because I've done some experiments and the result shows that LogStash can completely process all the data without any loss, e.g., local file --> LogStash --> local file, netcat --> LogStash --> local file.
Can someone give me a solid example when LogStash eventually drop messages? So I can have a better understanding about why we need a buffer in front of it.
As far as I know, Logstash queue is very small. Please refer to here.
Logstash sets each queue size to 20. This means only 20 events can be pending into the next phase.
This helps reduce any data loss and in general avoids logstash trying to act as a data storage
system. These internal queues are not for storing messages long-term.
As you say, your daily logs size are 20GB. It's quite large amount. So, it is recommended that install a redis before logstash. The other advantage for installing a redis is when your logstash process have error and shutdown, redis can buffer the logs for you, otherwise all your logs will be drop.
The maximum queue size is configurable and the queue can be stored on-disk or in-memory. (Strongly advise in-memory due to high volume).
When the queue is full, logstash will stop reading log messages and drop incoming logs.
For log files, logstash will stop reading further when tit can't keep up, it can resume reading later. It's keeping track of active log files and last read position. The files are basically acting like an enormous buffer, it's really unlikely to lose data (unless files are deleted).
For TCP/UDP input, messages can be lost if the queue is full.
For other inputs/outputs, you have to check the doc, whether it can support back pressure, whether it can replay missed messages if a network connection was lost.
Generally speaking, 20 GB a day is pretty low (even in 2014 when it was originally posted), we're talking about 1000 messages a second. logstash really doesn't need a redis in front.
For very large deployments (multiple TB per day), it's common to encounter kafka somewhere in the chain to buffer messages. At this stage there are typically many clients with different types of messages, flowing over a variety of protocols.

distributed cluster questions about performance

I'm using 6 servers to make a cluster and they are all disk nodes. I use rabbitmq for collecting log file for our website. Now at the peak hour, the publish rate is about 30k message per second. There are 2 main consumers(hdfs and elasticsearch) and each one need to handle all message, so the delivery rate hit about 60k per second.
In my scenario, a single server can hold 10k delivery rate and I use 6 node to load balance the pressure. My solution is that I created 2 queues on each node. Each message is with a random routing-key(something like message.0, message.1, etc) to distribute the pressure to every node.
What confused me is:
All message send to one node. Should I use a HA Proxy to load balance this publish pressure?
Is there any performance difference between Durable Queues and Transient Queues?
Is there any performance difference between Memory Node and Disk Node? What I know is the difference between memory node and disk node is only about the meta data such as queue configuration.
How can I imrove the performance in publish and delivery codes? I've researched and I know several methods:
disable the confirm mechanism(in publish codes?)
enable HiPE(I've done that and it helped a lot)
For example, input is 1w mps(message per second), there are two consumers to consume all message. Then the output is 2w mps. If my server can handle 1w mps, I need two server to handle the 2w-mps-pressure. Now a new consumer need to consume all message, too. As a result, output hits 3w mps, so I need another one more server. For a conclusion, one more consumer for all message, one more server?
"All message send to one node. Should I use a HA Proxy to load balance this publish pressure?"
This article outlines a number of designs aimed at distributing load in RabbitMQ.
"Is there any performance difference between Durable Queues and Transient Queues?"
Yes, Durable Queues are backed up to disk so that they can be reinstated on server-restart, for example. This adds a nominal overhead, though the actual process occurs asynchronously.
"Is there any performance difference between Memory Node and Disk Node?"
Not that I'm aware of, but that would depend on the machine itself.
"How can I imrove the performance in publish and delivery codes?"
Try this out.

how to recover from message store exhaustion?

when a activemq broker gets flooded with messages or the consumer fails it will stop accepting messages once certain (configurable) limits are reached. In Broker Networks this effect can take down the whole cluster.
I'm currently using the default configuration for memory limits and experience the following behavior:
consumer fails or becomes very slow (known problem)
broker A (the one the consumer connects to) gets filled and stops accepting messages
all other brokers get filled up and stop to accept messages
the cluster is basicly down
if the consumer comes back online now it will try to reconnect to one of the cluster nodes but the nodes will not accept the connection becaus this would create advisory messages that can't be handled because the broker is already full.
How do i have to configure the memory limits so that my productive destinations are limited and blocked but the broker will still be able to accept advisories so my consumer can revover?
You should be able to use producerFlowControl to slow producers to not overwhelm your broker. That being said, this is enabled by default, so you are likely using it already...
I would try something like this (assuming an 8GB box or so)...
use the failover transport everywhere (broker/client connections)
increase JVM heap to 4 GB
increase systemUsage limits substantially (memoryUsage 3gb, storeUsage/tempUsage = 10 gb)
enable producer flow control on both topics and queues
set the memory limit to 2GB divided by the total # of topics+queues
in other words, this should in total be substantially less the the memoryUsage limit
exclude the Advisory topics from the producer flow control (they might be already)
This should limit the producers and leave resources for your system to function/recover/accept consumer connections...

How distributed should queues be in a RabbitMQ cluster?

Assume you have a small rabbitmq system of 3 nodes that is supposed to handle 100+ decently high volume queues in the same exchange. Given that queues only exist on the node they are created on (we're not using replicated, High Availability queues), what's the best way to create the queues? Is there any benefit to having the queues distributed among the cluster nodes, or is it better to keep them all on one node and have rmq do the routing?
It depends on your application, really.
RabbitMQ is smart about sending messages, so it'll only send a message to a node in the cluster if
a queue that holds that message resides on that node or
if a consumer has connected to that node and has requested the message.
In general, you should aim to declare queues on the nodes on which both the publishers and the consumers for that queue will connect to. In other words, you should aim to connect publishers and consumers to the node that holds the queues they use. This assumes you're trying to conserve bandwidth used overall.
If you're using clustering to improve throughput (and you probably are), and you don't care about internal bandwidth used, you should aim to connect your publishers/consumers to the nodes in a balanced way and not worry about the internal routing mechanisms.
One last thing to think about is memory and disk-space. Queues store messages in main memory, and fallback to disk if that's insufficient. So, if you declare all your queues in one place, that'll result in one node that's "over-worked" and two nodes with memory to spare.
As part of a move towards redundancy and failover in an application I'm working on, I've just finished setting up a RabbitMQ cluster behind a proxy, and have all of my publishers and consumers connect via the proxy, which round robins connections to the individual nodes as they come in from the clients. Prior to upgrading RabbitMQ to 2.7.1, this seemed to pretty evenly distribute queues to the separate nodes, though this would of course depend pretty heavily on how your proxy balances the requests and when your clients try to connect (and declare a queue)...
Having said all that, I just upgraded to RabbitMQ 2.7.1, which was pretty painless, and gave us HA queues, which is a pretty big win for our apps. At any rate, if you're interested in the set up, and think it would be of benefit to your queue problem, I'd be happy to share the setup.