I am binding a queue to a topic exchange with multiple binding keys, and have the following related questions
Is there a maximum number of binding keys that can be used to bind a single queue to an exchange?
Is it considered bad practice to use a lot of binding keys for a single queue (say around 50 binding keys)?
I see from this answer here that "For topic routing, performance decreases as the number of bindings increase". I am curious to understand a bit more specifically about how drastically performance can decrease, and generally how big the number of bindings can get before it starts affecting performance.
Related
Hypothetical (but simpler) scenario:
I have many orders in my system.
I have external triggers that affect those orders (e.g. webhooks). They may occur in parallel, and are handled by different instances in my cluster.
In the scope of a single order, I would like to make sure that those events are processed in sequential order to avoid race conditions, version conflicts etc.
Events for different orders can (and should) be processed in parallel
I'm currently toying with the idea of leveraging RabbitMQ with a setup similar to this:
use a queue for each order (create on the fly)
if an event occurs, put it in that queue
Those queues would be short-lived, so I wouldn't end up with millions of them, but it should scale anyway (let's say lower one-digit thousands if the project grows substantially). Question is whether that's an absolute anti-pattern as far as RabbitMQ (or similar) systems goes, or if there's better solutions to ensure sequential execution anyway.
Thanks!
In my opinion creating ephemeral queues might not be a great idea as there will considerable overhead of creating and delete queues. The focus should be on message consumption. I can think of following solutions:
You can limit the number of queues by building publishing strategy like all orders with orderId divisible by 2 goes to queue-1, divisible by 3 goes to queue-2 and so forth. That will give you parallel throughput as well as finite number queues but there is some additional publisher logic you have to handle
The same logic can be transferred to consumer side by using single pub-sub style queue and then onus lies on consumer to filter unwanted orderIds
If you are happy to explore other technologies, you can look into Kafka as well where you can use orderId as partitionKey and use multiple partitions to gain parallel throughput.
Im using distributed pub/sub in an Akka.net cluster and I've begun seeing this error when pub/sub grows to approx. 1000 subscribers and 3000 publishers.
max allowed size 128000 bytes, actual size of encoded Akka.Cluster.Tools.PublishSubscribe.Internal.Delta was 325691 bytes
I don't know, but I'm guessing distributed pub/sub is trying to pass the pub/sub list to other actor systems on the cluster?
Anyway, I'm a little hesitant about boosting size limits because of this post. So what would be a reasonable approach to correcting this?
You may want to tackle with distributed pub/sub HOCON settings. Messages in Akka.Cluster.DistributePubSub are grouped together and send as deltas. You may be interested in two settings:
akka.cluster.pub-sub.max-delta-elements = 3000 says how many items can maximally consist on delta message. 3000 is the default value and you may want to lower it in order to reduce the size of the delta message (which seems to be an issue in your case).
akka.cluster.pub-sub.gossip-interval = 1s indirectly affects how often gossips will be sent. The more often they're send, the smaller they may be - assuming continuously highly saturated channel.
If these won't help, you may also think about reducing the size of your custom messages by introducing custom serializers with smaller payload footprint.
I am reading "paxos" on wiki, and it reads:
"Rounds fail when multiple Proposers send conflicting Prepare messages, or when the Proposer does not receive a Quorum of responses (Promise or Accepted). In these cases, another round must be started with a higher proposal number."
But I don't understand how the proposer tells the difference between its proposal not being approved and it just takes more time for the message to transmit?
One of the tricky parts to understanding Paxos is that the original paper and most others, including the wiki, do not describe a full protocol capable of real-world use. They only focus on the algorithmic necessities. For example, they say that a proposer must choose a number "n" higher than any previously used number. But they say nothing about how to actually go about doing that, the kinds of failures that can happen, or how to resolve the situation if two proposers simultaneously try to use the same proposal number (as in both choosing n=2). That actually completely breaks the protocol and would lead to incorrect results but I'm not sure I've ever seen that specifically called out. I guess it's just supposed to be "obvious".
Specifically to your question, there's no perfect way to tell the difference using the raw algorithm. Practical implementations typically go the extra mile by sending a Nack message to the Proposer rather than just silently ignoring it. There are plenty of other tricks that can be used but all of them, including the nacks, come with varying downsides. Which approach is best generally depends on both the kind of application employing Paxos and the environment it's intended to run in.
If you're interested, I put together a much longer-winded description of Paxos that includes many of issues practical implementations must address in addition to the core components. It covers this issue along with several others.
Specific to your question it isn't possible for a proposer to distinguish between lost messages, delayed messages, crashed acceptors or stalled acceptors. In each case you get no response. Typically an implementation will timeout on getting less than a quorum response and resend the proposal on the assumption messages were dropped or acceptors are rebooting.
Often implementations add "nack" messages as negative acknowledgement as an optimisation to speed up recovery. The proposer only gets "nack" responses from nodes that are reachable that have accepted a higher promise. The ”nack” can show both the highest promise and also the highest instance known to be fixed. How this helps will be outlined below.
I wrote an implementation of Paxos called TRex with some of these techniques sticking as closely as possible to the description of the algorithm in the paper Paxos Made Simple. I wrote up a description of the practical considerations of timeouts and nacks on a blog post.
One of the interesting techniques it uses is for a timed out node to make the first proposal with a very low number. This will always get "nack" messages. Why? Consider a three node cluster where one network link breaks between a stable proposer and one other node. The other node will timeout and issue a prepare. If it issues a high prepare it will get a promise from the third node. This will interrupt the stable leader. You then have symmetry where the two nodes that cannot message one another can fight with the leadership swapping with no forward progress.
To avoid this a timed out node can start with a low prepare. It can then look at the "nack" messages to learn from the third node that there is a leader who is making progress. It will see this as the highest instance known to be fixed in the nack will be greater than the local value. The timed out node can then not issue a high prepare and instead ask the third node to send it the latest fixed and accepted values. With that enhancement a timed out node can now distinguish between a stable proposer crashing or the connection failing. Such ”nack” based techniques don't affect the correctness of the implementation they are only an optimisation to ensure fast failover and forward progress.
Let's say I have 200 events which are going to be placed in multiple queues (or not) and I was thinking of binding each queue to a topic exchange with 200 unique keys. Am i going to see bottleneck in performance by adding 200 unique bindings between one queue and one exchange?
if yes, do I have an alternative?
Thanks in advance
In general, it is less likely (like snow on July 4th) that routing will be the most resources consuming part. For further reading on routing please refer to Very fast and scalable topic routing – part 1 and Very fast and scalable topic routing – part 2.
As to particular case it depends on resources available to RabbitMQ server(s), messages flow, bindings, bindings key complexity, etc. Anyway, it is always better to run some load tests first to figure out bottlenecks, but again, it is less likely that routing will be the cause of significant performance degradation.
Side note: even if the question was posted several months ago, I'm still in search of a good answer so any feedback is welcomed.
While developing WCF Web Services I have encountered the error:
The maximum array length quota (16384) has been exceeded while reading XML data.
like many others and have solved it by modifying the binding configuration.
When looking for answers on the Internet, the solution was almost always changing the binding configuration, setting the maxArrayLength to maximum, going to Streamed transfer.
In some situations, like in this question WCF sending huge data , people suggest modifying the binding configuration over transmiting data in smaller chunks.
But the maximum values and streamed transfer will always work? Even in a system where you may never know what maximum size will have the data?
How to choose between the two options?
It depends on what you transfer? Downloading media vs. returning large log information?
The answer given to me until now revolves around technical aspects of streaming, but the answer I am looking for should be more focused on guidelines in the situation exposed, about choosing between the two options.
Not all bindings support streaming. The only ones that do are basicHttpBinding, NetTcpBinding, NetNamedPipeBinding, and WebHttpBinding. You also do not get to do reliable sessions if using streaming.
So why the big deal about streaming for large messages? Well if you don't use streaming, it is going to load the entire message in the memory buffer which can kill the available resources.
For more information, see this on MSDN: MSDN Large Message Transfers