What's the best way to broadcast a message to Akka.Net ConsistentHashingPool routees? - akka.net

This...
hasPoolRouter.Tell(new Broadcast(new SendStatus()))
Does not work. The message only makes it to the Routee where the Consistent Hash Id matches which defeats the purpose of Broadcast.
the router actor is defined like this:
hasPoolRouter = ... .WithRouter(new ConsistentHashingPool(4))

I was intending to have a pool of these locally and was running in a clustered configuration and only one was created locally...so the broadcast was only going to the one local instance. I also made a mistake in the configuration and had two configs using the same actor route making things worse. Good thing that came out of it is that I better understand how this stuff works by screwing it up.
For clarification I had both a clustered one and a non-clustered one configured at the same actor path. I was sending the broadcast to the clustered IActorRef and it was delivered to only one of the multiple pooled non clustered ones. IF I had been running two nodes then two would have received it but not all and may have clued me in.

Related

How to quickly subscribed to relevant subset of large set of routing keys?

I have the feeling I am not understanding something fundamental in AMQP/RabbitMQ, since I cannot find much help on this specific detail.
Let's assume I have a system made up of several components sending each other messages via a RabbitMQ broker. The messages can have routing keys of the form XXX.YYY. Let's further assume XXX and YYY are numbers between 000 and 999. That means there are a total of 1,000,000 different possible routing keys.
Now, not every component in my system is interested in every message. Let's say there is a component that wants all the messages in which XXX is between 300 and 500 and YYY is between 600 and 900. That means the component wants to process messages referring to 200*300 = 60,000 different routing keys. Also, the component might be restarted at any point in time and needs to be able to start processing the messages quickly after restart.
Furthermore, the routing keys the component is interested in might change at runtime.
There are several ways to approach this that I can think of:
Use topic exchanges and subscribe to each routing key. If I do this using one connection and one channel, it is awfully slow. My understanding is that bindings are created sequentially for each channel and thus creating 60,000 bindings takes a while. Adding and removing bindings is trivial, though. Would it be feasible to create more channels so that bindings can be created in parallel?
Use topic exchanges and wildcards, discard messages you're not interested in in the client. We could subscribe to *.* and receives messages for all 1,000,000 routing keys => much more load in the client. Or subscribe to all 200 relevant values of XXX.* and receive messages for 200,000 routing keys. Is this a generally applied pattern?
Use headers exchanges and set x-match to any. This feels a little hacky and it seems headers exchanges are not widely used. You also have to deal with the maximum size of the header when defining a binding. Do people do this? You only need a handful of bindings though, so re-creating the bindings after a restart is very fast. Updating the set of topics we're interested in is also not a problem: Just re-create everything.
So, I guess my question is: What's the best practice to subscribe to a large amount of topics very quickly (<5s) and still be able change routing keys dynamically at run-time?
Would it be feasible to split the component which needs the messages and the subscription into two components? One component is only responsible for keeping the subscriptions up-to-date (this would exchange-to-exchange subscriptions) and the other components receives every message from the downstream exchange.

activemq multiple consumers multiple topics performance

Im relatively new to activemq and one of the first things im trying to do is publish from a server process to 5,000 topics ( one topic per stock ). The server and broker manage to keep up fine.
However on the consumer side its very odd. If I subscribe to all 5k topics with a single wildcard consumer ( "mytopic.>" ) everything keeps up fine. However if I try to subscribe with a single consumer per topic the performance just drops out and it cant keep up.
Ive tried playing with prefetch limits and optimized ack modes, nothing seems to help.
Any idea why a single wildcard would be able to perform fine where as 5k individual topics would not?
I could just as well demultiplex the msgs myself but would expect activemq to be able to do this for me as efficiently as I can.
EDIT: Some more information and updates on this:
I was testing this on ~ 6,000 topics publishing once per second
I'm using the activemq-cpp c++ library, and I was creating 1 session for all topics. It turns out the activemq implementation is horribly inefficient, it does a linear scan of all topics on every messages ( twice actually ) when delivering a message to a session.
To make matters worse, if you create a session per topic it tries to create a thread per session so that blows out pretty quick.
But wait! There's an option on the connection, setAlwaysSessionAsync, so sessions dont create their own threads, great!
D'OH! not so fast, sessions still create some some RW mutex in non-async mode, and they use some home-grown TLS data which had a hardcoded limit of ~300 instances per thread... ugh
Ok, so I had to limit number of sessions i can create to ~ 150 ( I guess other objects are using TLS data as well ) and then round-robin my topics on these...
It would be nice if I can control how many threads I can have processing the data off the wire, but alas thats not exposed either.... ugh, hardcoded in activemq-cpp code
TLDR; activemq is a streaming pile of messy poo

What to use: multiple queue names or multiple routing keys and when?

Can anyone explain in which cases I need to create multiple queues (one user -> one queue name), and when one queue name for all clients with different routing keys (one user -> one routing key) and why?
A user should not be able to read messages intended for another user.
I'm using direct exchange type.
First off I am going to assume that when you say "user" you are interchangeably referring to a consumer or producer, and they aren't the same thing so I would read up on that here in rabbitmq's simplest explanation. Walking through that tutorial will definitely help solidify your understanding of rabbit a bit more overall too, which is always good.
In any case, I would recommend doing this:
Create multiple queue's, each one linked to a single consumer. The reason for doing this instead of using a single queue with multiple is discussed here but if you don't want a bunch of programmer jargon, it pretty much says that a single queue is super slow because only one message can be consumed at a time from the queue.
Also, there is a built in "default exchange" that you can use instead of setting up another direct exchange which it sounds like you're putting effort into that you might not need to, obviously I'm not sure what you are doing but I would take that into consideration... hope this helps!

Redirect NServiceBus message based on Endpoint availability

I'm new to NServiceBus, but currently using it with SQL Server Transport to send messages between three machines: one belongs to an endpoint called Server, and two belong to an endpoint called Agent. This is working as expected, with messages sent to the Agent endpoint distributed to one of the two machines via the default round-robin.
I now want to add a new endpoint called PriorityAgent with a different queue and two additional machines. While all endpoints use the same message type, I know where each message should be handled prior to sending it, so normally I can just choose the correct destination endpoint and the message will be processed accordingly.
However, I need to build in a special case: if all machines on the PriorityAgent endpoint are currently down, messages that ordinarily should be sent there should be sent to the Agent endpoint instead, so they can be processed without delay. On the other hand, if all machines on the Agent endpoint are currently down, any Agent messages should not be sent to PriorityAgent, they can simply wait for an Agent machine to return.
I've been researching the proper way to implement this, and haven't seen many results. I imagine this isn't an unheard-of scenario, so my assumption is that I'm searching for the wrong things or thinking about this problem in the wrong way. Still, I came up with a couple potential solutions:
Separately track heartbeats of PriorityAgent machines, and add a mutator or behavior to change the destination of outgoing PriorityAgent messages to the Agent endpoint if those heartbeats stop.
Give PriorityAgent messages a short expiration, and somehow handle the expiration to redirect messages to the Agent endpoint. I'm not sure if this is actually possible.
Is one of these solutions on the right track, or am I off-base entirely?
You have not seen many do this because it's considered an antipattern. Or rather one of two antipatterns.
1) Either you are sending a command, in which case the RECEIVER of the command defines the contract. Why are you sending a command defined by PriorityAgent to Agent? There should be no coupling there. A command belongs to ONE logical endpoint/queue.
2) Or you are publishing an event defined by whoever publishes, with both PriorityAgent and Agent as subscribers. The two subscribers should be 100% autonomous and share nothing. Checking heartbeats/sharing info between these two logical separate entities is a bad thing. Why have them separately in the first place then? If they know about each other "dirty secrets," they should be the same thing.
If your primary concern is that the PriorityAgent messages will not be handled if the machines hosting it are down, and want to use the machines hosting Agent as a backup, simply deploy PriorityAgent there as well. One machine can run more than one endpoint just fine.
That way you can leverage the additional machines, but don't have to get dirty with sending the same command to a different logical endpoint or coupling two different logical endpoints together through some back channel.
I'm Dennis van der Stelt and I work for Particular Software, makers of NServiceBus.
From what I understand, both PriorityAgent and Agent are already scaled out over multiple machines? Then they both work according to competing consumers pattern. In other words, both machines try to pick up messages from the same queue, where only one will win and starts processing the message.
You're also talking about high availability. So when PriorityAgent goes down, another machine will pick it up. That's what I don't understand. Why fail over to Agent, which seems to me to be a logically different endpoint? If it is logically different, how can it handle PriorityAgent messages? If it can handle the same message, it seems logically the same endpoint. Then why make the difference between PriorityAgent and Agent?
Besides that, SQL Server has all kinds of features (like Always-On) to make sure it does not (completely) go down. Why try to solve difficult scenarios with custom build solutions, when SQL Server can already solve this for you?
Another scenario could be that PriorityAgent should handle priority cases. Something like preferred customers, or high-value customers. That is sometimes used when (for example) a lot of orders (read: messages) come in, but we want to deal with high-value customers sooner than regular customers. But due to the amount of messages coming in, high-value customers would also end up in the back of the queue, together with regular customers. A solution could be to publish these messages and have two different endpoints (with different queues) subscribed both to this message. Both receive each unique message, but check whether it's a message they should handle. The Agent will ignore high-value customers, the PriorityAgent will ignore regular customer.
These are some of the solutions available as standard messaging patterns, or infrastructural solutions to solving your issue. Again, it's not completely clear to me what it is you're looking for. If you'd like to continue the discussion; perhaps you want to email support#particular.net and we can continue the discussion there.

Can you decide which actors take which keys when using consistent hashing?

I've experimented a little with the Akka .NET consistent hashing router. It seems to me that although you can specify what key to use for the hashing, it is the router who decides how to allocate the keys across actors.
I would have liked to do something like Actor A takes messages of type A, Actor B takes messages of type B, etc. Is this at all possible with the consistent hashing router?
No, it's not possible for existing routers.
You can subscribe your actors to a particular message types using EventBus (Context.System.EventStream.Subscribe(Self, typeof<MyMessage>);) and publish them by calling system.EventStream.Publish(new MyMessage()); - this way published message will be send to all subscribers. Limitation of that approach is that it works only in the scope of a single ActorSystem.
For distributed publish/subscribe scenarios you may use Akka.Cluster.Tools plugin, which exposes such option. Remember however that in this case subscription key is string instead of message type.