Definition of yarn queue capacity - hadoop-yarn

If I search for a generic definition of "capacity", Oxford languages says, "the maximum amount that something can contain". If I ask yarn for the status of the default queue, I get the following (less relevant information omitted):
> yarn queue -status default
...
Queue Information :
Queue Name : default
State : RUNNING
Capacity : 100.0%
Current Capacity : .0%
Maximum Capacity : 100.0%
I hate to sound pedantic, but if capacity is the "maximum amount", what is "Maximum Capacity" and how does that compare to "Capacity"? Does "Current Capacity" of zero indicate that there is no room left in the queue? I think so, but one of my coworkers thinks it means the opposite, that all of the room is still available. And really shouldn't capacity be some kind of measurable unit, not a percentage?
I haven't been able to find yarn's definition of these terms. I'm hoping somebody here can explain.

This is the best information I have found so far, from a Cloudera blog on YARN Capacity and Hierarchical Design.
Queues are laid out in a hierarchical design with the topmost parent
being the ‘root’ of the cluster queues, from here leaf (child) queues
can be assigned from the root, or branches which can have leafs on
themselves. Capacity is assigned to these queues as min and max
percentages of the parent in the hierarchy. The minimum capacity is
the amount of resources the queue should expect to have available to
it if everything is running maxed out on the cluster. The maximum
capacity is an elastic like capacity that allows queues to make use of
resources which are not being used to fill minimum capacity demand in
other queues.

Related

Upper limit on number of redis streams consumer groups?

We are looking at using redis streams as a cluster wide messaging bus, where each node in the cluster has a unique id. The idea is that each node, when spawned, creates a consumer group with that unique id to a central redis stream to guarantee each node in the cluster gets a copy of every message. In an orchestrated environment, cluster nodes will be spawned and removed on the fly, each having a unique id. Over time I can see this resulting in there being 100's or even 1000's of old/unused consumer groups all subscribed to the same redis stream.
My question is this - is there an upper limit to the number of consumer groups that redis can handle and does a large number of (unused) consumer groups have any real processing cost? It seems that a consumer group is just a pointer stored in redis that points to the last read entry in the stream, and is only accessed when a consumer of the group does a ranged XREADGROUP. That would lead me to assume (without diving into Redis code) that the number of consumer groups really does not matter, save for the small amount of RAM that the consumer groups pointers would eat up.
Now, I understand we should be smarter and a node should delete its own consumer groups when it is being killed or we should be cleaning this up on a scheduled basis, but if a consumer group is just a record in redis, I am not sure it is worth the effort - at least at the MVP stage of development.
TL;DR;
Is my understanding correct, that there is no practical limit on the number of consumer groups for a given stream and that they have no processing cost unless used?
Your understanding is correct, there's no practical limit to the number of CGs and these do not impact the operational performance.
That said, other than the wasted RAM (which could become significant, depending on the number of consumers in the group and PEL entries), this will add time complexity to invocations of XINFO STREAM ... FULL and XINFO GROUPS as these list the CGs. Once you have a non-trivial number of CGs, every call to these would become slow (and block the server while it is executing).
Therefore, I'd still recommend implementing some type of "garbage collection" for the "stale" CGs, perhaps as soon as the MVP is done. Like any computing resource (e.g. disk space, network, mutexes...) and given there are no free lunches, CGs need to be managed as well.
P.S. IIUC, you're planning to use a single consumer in each group, and have each CG/consumer correspond to a node in your app's cluster. If that is the case, I'm not sure that you need CGs and you can use the simpler XREAD (instead of XREADGROUP) while keeping the last ID locally in the node.
OTOH, assuming I'm missing something and that there's a real need for this use pattern, I'd imagine Redis being able to support it better by offering some form of expiry for idle groups.

What are the recommended settings for fast queue binding in a RabbitMQ cluster

When binding a queue in a RabbitMQ cluster with com.rabbit.mq:amqp-client:5.4.3, this takes a considerable amount of time when many queues are bound to an exchange within a short duration (averaging to ca. 1 second, with max times reaching 10 seconds). The RabbitMQ cluster is running 3 nodes in version 3.7.8 (Erlang 20.3.4) inside Rancher/Kubernetes.
When reducing the number of nodes down to a single node, maximum times stay well below 1 second (<700 ms. Still somewhat long, if you ask me, but acceptable). With a single node, average times range between 10 and 100 milliseconds.
I understand that replicating this information between the cluster nodes can take some time, but 1 to 2 orders of magnitude worse performance with just 3 nodes? (the same happens with 2 nodes as well, but 3 nodes is the minimum for a meaningful cluster setup)
Are there some knobs to turn to bring the time to bind a queue to an acceptable level for a cluster? Is there a configuration I'm missing? Having only a single node is a non-option with HA in mind. I couldn't find anything helpful in https://www.rabbitmq.com/clustering.html until now, maybe I missed it?
TLDR:
Are these timings expected for a RabbitMQ cluster?
What is the performance one can expect from a simple RabbitMQ (3 nodes) when binding queues to exchanges?
How many "bind" operations can be performed in e.g. 1 second?
Which factors affect this? Number of exchanges, number of queues, number of existing bindings, frequency of the operation?
What are the options to reduce the time required to create bindings?

Idle Queue utilization in Capacity Scheduler - EMR

I configured capacity scheduler and schedule jobs in specific Queues. However, I see there are times when jobs in some Queues complete faster while other Queues have jobs waiting on the previous ones to commplete. This creates a scenario where half of my capacity is idle and other half is busy with jobs waiting to get resources.
Is there any config that I can tweak to maximize my utilization. I want to route waiting jobs to other queues where resources are available. Attached is a screenshot -
Seems like an issue with Capacity-Scheduler here, I switched to Fair-scheduler and definitely see huge improvements in cluster utilization, ~75% and way better than 40s with caoacity-scheduler
So the reason behind is when multiple users submits jobs to a same queue it can consume max resources, but a single user can't consume more than the capacity even though max capacity is greater than that.
So if you specify yarn.scheduler.capacity.root.QUEUE-1.capacity: 20 this to capacity-scheduler.xml one user can't take more than 20% resources for QUEUE-1 queue even though your cluster have free resources.
By default this user-limit-factor is set to 1. So if you set it to 2 your job can use 40% of resources if maximum allocated resources is greater than or equals to 40.
yarn.scheduler.capacity.root.QUEUE-1.user-limit-factor: 2
Please follow this blog

How can I measure the frequency which is good enough take out the data from RabbitMQ?

I have RabbitMQ running on a server and there's some script which inserts data into it. I know the approximate frequency in which the data is inserted, but it's not only approximate, it can also vary quite a lot.
How can I know how often does another script have to take the data out of RabbitMQ?
What will happen if the 2nd script take the data out of RabbitMQ slower than needed?
How can I measure whether or not the frequency is good enough?
How can I know how often does another script have to take the data out of RabbitMQ?
You should consume messages from the queue at a rate greater than or equal to the rate they are published. RabbitMQ reports publish rates; however, you will want to get a reasonable estimate from load testing your application.
What will happen if the 2nd script take the data out of RabbitMQ slower than needed?
In the short term, the number of messages in the queue will increase, as will processing time (think about what happens when more people get in line for Space Mountain at Disney). In the long term, the system will be unstable because the queue will increase without bound, eventually resulting in a failure of the queue, as well as other practical consequences (think of this as the case where Space Mountain is broken down, but people are still allowed to enter the queue line).
How can I measure whether or not the frequency is good enough?
From an information only perspective, you can monitor the queue yourself using the RabbitMQ management plugin. If you need automated processes to spawn up additional workers, you'll have to integrate those processes into the RabbitMQ management API. How to do this is the subject of a number of how-to articles.

distributed cluster questions about performance

I'm using 6 servers to make a cluster and they are all disk nodes. I use rabbitmq for collecting log file for our website. Now at the peak hour, the publish rate is about 30k message per second. There are 2 main consumers(hdfs and elasticsearch) and each one need to handle all message, so the delivery rate hit about 60k per second.
In my scenario, a single server can hold 10k delivery rate and I use 6 node to load balance the pressure. My solution is that I created 2 queues on each node. Each message is with a random routing-key(something like message.0, message.1, etc) to distribute the pressure to every node.
What confused me is:
All message send to one node. Should I use a HA Proxy to load balance this publish pressure?
Is there any performance difference between Durable Queues and Transient Queues?
Is there any performance difference between Memory Node and Disk Node? What I know is the difference between memory node and disk node is only about the meta data such as queue configuration.
How can I imrove the performance in publish and delivery codes? I've researched and I know several methods:
disable the confirm mechanism(in publish codes?)
enable HiPE(I've done that and it helped a lot)
For example, input is 1w mps(message per second), there are two consumers to consume all message. Then the output is 2w mps. If my server can handle 1w mps, I need two server to handle the 2w-mps-pressure. Now a new consumer need to consume all message, too. As a result, output hits 3w mps, so I need another one more server. For a conclusion, one more consumer for all message, one more server?
"All message send to one node. Should I use a HA Proxy to load balance this publish pressure?"
This article outlines a number of designs aimed at distributing load in RabbitMQ.
"Is there any performance difference between Durable Queues and Transient Queues?"
Yes, Durable Queues are backed up to disk so that they can be reinstated on server-restart, for example. This adds a nominal overhead, though the actual process occurs asynchronously.
"Is there any performance difference between Memory Node and Disk Node?"
Not that I'm aware of, but that would depend on the machine itself.
"How can I imrove the performance in publish and delivery codes?"
Try this out.