I'm setting up a consumer that will listen for messages from two different sources. I want to have a different callback for messages from these two sources(other solutions are welcome though).
I'm very new to rabbitmq and pika and I haven't grasped the nitty gritty details yet. But what i want to know is:
Should i use different queues and setup two
channel.basic_consume(callback_1, ...)
channel.basic_consume(callback_2, ...)
for my callbacks or should i do some tricks with routing keys instead?
That depends on your needs a little. It's really about processing, I am most familiar with Java so I will tell you how I handle things and then you can make a decision based on that.
If I need to have different threads process different data or do different things with the data I create two different queues and each thread will consume a different queue. I use topic exchanges to make sure the queues get the correct messages. If the data is only slightly different then using the routing key I can handle the data differently with the same thread. The decision is purely based on the parallelism I require, ie how many queues I want processing the data.
Related
I have the feeling I am not understanding something fundamental in AMQP/RabbitMQ, since I cannot find much help on this specific detail.
Let's assume I have a system made up of several components sending each other messages via a RabbitMQ broker. The messages can have routing keys of the form XXX.YYY. Let's further assume XXX and YYY are numbers between 000 and 999. That means there are a total of 1,000,000 different possible routing keys.
Now, not every component in my system is interested in every message. Let's say there is a component that wants all the messages in which XXX is between 300 and 500 and YYY is between 600 and 900. That means the component wants to process messages referring to 200*300 = 60,000 different routing keys. Also, the component might be restarted at any point in time and needs to be able to start processing the messages quickly after restart.
Furthermore, the routing keys the component is interested in might change at runtime.
There are several ways to approach this that I can think of:
Use topic exchanges and subscribe to each routing key. If I do this using one connection and one channel, it is awfully slow. My understanding is that bindings are created sequentially for each channel and thus creating 60,000 bindings takes a while. Adding and removing bindings is trivial, though. Would it be feasible to create more channels so that bindings can be created in parallel?
Use topic exchanges and wildcards, discard messages you're not interested in in the client. We could subscribe to *.* and receives messages for all 1,000,000 routing keys => much more load in the client. Or subscribe to all 200 relevant values of XXX.* and receive messages for 200,000 routing keys. Is this a generally applied pattern?
Use headers exchanges and set x-match to any. This feels a little hacky and it seems headers exchanges are not widely used. You also have to deal with the maximum size of the header when defining a binding. Do people do this? You only need a handful of bindings though, so re-creating the bindings after a restart is very fast. Updating the set of topics we're interested in is also not a problem: Just re-create everything.
So, I guess my question is: What's the best practice to subscribe to a large amount of topics very quickly (<5s) and still be able change routing keys dynamically at run-time?
Would it be feasible to split the component which needs the messages and the subscription into two components? One component is only responsible for keeping the subscriptions up-to-date (this would exchange-to-exchange subscriptions) and the other components receives every message from the downstream exchange.
I'm making a service that consumes a specific queue in RabbitMQ.
In the work up to this point, I don't need to worry about the internal behavior of the queue, I just need to properly process the value the queue delivers.
But I would like to know more advanced knowledge.
If too many services send messages to the queue I'm consuming, the queue could burst.
To prevent this I would have to multiplex the queue or make it scalable, is there a way to do this?
Should I create multiple queues with the same function and implement it so that consumer services can choose which one to use?
As far as I know, Queues are single-threaded, so in order to scale things up, there are some plugins that help you do this.
We are using x-consistent-hash (Ref) but there are also other plugins like rabbitmq-sharding
Consistent-hash is a technique that lets you create multiple queues for consuming events coming from an exchange. so you can be sure that you can use all CPU cores of a server. besides, it lets you add more queues later on.
lern more about consitent-hashing here.
So we will have a topic exchange that looks something like
{class}.{genus}
So we have some consumers that bind with the topic
mammal.*
(or bird.*, etc.)
Now suppose later on we want to include species information so the topic exchange now looks like this:
{class}.{genus}.{species}
Now the old consumers are broken :(
However they could have bound as
mammal.*.#
And been able to listen to whatever future information is added. However, this is something my team came up with on our own which leads me to ask:
Is this good practice?
Are there tradeoffs to this I should be aware of?
Is there an alternate way to have a producer be able to add information without breaking existing consumers, without publishing to multiple exchanges?
Typically if you have a need maximum control on queue delivery and want to do the logic in rabbit, then you should consider header exchanges.
Usually when we code up the publish we know exactly which queue it needs to go to, so whether you want to use a routing key or a boolean to do this might not make much difference depending on your application.
This brings up another design consideration to be aware of: whether you want routing logic in rabbit. Someone people prefer to just use simple routing keys and either direct or topic exchanges, focusing on flexible consumers. Its going to be hard to guess at what is best for your application obviously.
Keep in mind that your consumers will be subscribed, often statically, to the queue(s) that the exchange delivers to. Also mammal.# is the same as mammal.*.# (see: ref)
I have found this image is very similar to my bussiness model. I need to split message to some queue.
for some heavy work. I can add more worker thread for them. But for some no much heavy work. I can
let single consumer to subscribe their message. But how to do that in rabbitMQ.
Through their document. I just found that single-queue-multi-consumer model.
You can add multiple workers to a queue
There can be multiple queues bound to an exchange.
In RabbitMQ, the producer always sends the message to an exchange. So, in your case, I hope only one exchange is enough. If you want to load balance at the consumer side, you have the above said two options.
You can also read my article:
https://techietweak.wordpress.com/2015/08/14/rabbitmq-a-cloud-based-message-oriented-middleware/
RabbitMQ has a very flexible model, which enables a wide variety of routing scenarios to take place.
I need to split message to some queue. for some heavy work. I can add more worker thread for them.
Yes, this is supported via a direct exchange. Publish a message using a routing key that is the same as the name of the queue. For convenience, let's say you use the fully-qualified object name (e.g. MyApp.Objects.DataTypeOne). All you need to do is subscribe multiple consuming processes to this queue, and RabbitMQ will load-balance using a round-robin approach.
But for some no much heavy work. I can let single consumer to subscribe their message.
Yes, you can do this also. Same process as in the paragraph above. Just don't attach multiple consuming processes.
I have found this image is very similar to my business model.
The diagram isn't very useful, because it lacks information about the type of messages being published. In that sense, it is only an interconnect diagram. The interesting lines are the ones connecting the queues to the exchange, as that is what you specify within RabbitMQ via Queue Bindings. You can also bind exchanges to one another, but that's a bit further than we probably need to go.
Everything else on the diagram is fully under your control as the user of the RabbitMQ/AMQP system. You can create an arbitrary number of publishers and have an arbitrary number of consuming processes each consuming from an arbitrary number of queues. There are no hard and fast limits, though there are some practical aspects you probably will want to think about to ensure your system is maintainable.
Pretty new to RabbitMQ and we're still in the investigation stage to see if it's a good fit for our use cases--
We've readily come to the conclusion that our desired topology would have us deploying a few topic based exchanges, and then filtering from there to specific queues. For example, let's say we have a user and an upload exchange, where the user queue might receive messages where the topic is "new-registration" or "friend-request" and the upload exchange might receive messages like "video-upload" or "picture-upload".
Creating the queues, getting them routed to the appropriate queue, and then building listeners to handle the messages for the various queues has been quite straight forward.
What's unclear to me however is if it's possible to do a fanout on a topic exchange?
I.e. I have named queues that are bound to my topic exchange, but I'd like to be able to just throw tons of instances of my listeners at those queues to prevent single points of failure. But to the best of my knowledge, RabbitMQ treats these listeners in a straight forward round robin fashion--e.g. every Nth message always go to the same Nth listener rather than dispatching messages to the first available consumer. This is generally acceptable to us but given the load we anticipate, we'd like to avoid the possibility of hot spots developing amongst our consumer farm.
So, is there some way, either in the queue or exchange configuration or in the consumer code, where we can point our listeners to a topic queue but have the listeners treated in a fanout fashion?
Yes, by having the listeners bind using different queue names, they will be treated in a fanout fashion.
Fanout is 1:N though, i.e. each task can be delivered to multiple listeners like pub-sub. Note that this isn't restricted to a fanout exchange, but also applies if you bind multiple queues to a direct or topic exchange with the same binding key. (Installing the management plugin and looking at the exchanges there may be useful to visualize the bindings in effect.)
Your current setup is a task queue. Each task/message is delivered to exactly one worker/listener. Throw more listeners at the same queue name, and they will process the tasks round-robin as you say. With "fanout" (separate queues for a topic) you will process a task multiple times.
Depending on your platform there may be existing work queue solutions that meet your requirements, such as Resque or DelayedJob for Ruby, Celery for Python or perhaps Octobot or Akka for the JVM.
I don't know for a fact, but I strongly suspect that RabbitMQ will skip consumers with unacknowledged messages, so it should never bottleneck on a single stuck consumer. The comments on their FAQ seem to suggest that RabbitMQ will make an effort to keep things chugging along even in the presence of troublesome consumers.
This is a late answer, but in case others come across this question...
It sounds like what you want is fair dispatch rather than a fan out model (which would publish a given message to every queue).
Fair dispatch will give a message to the next available worker rather than using a simple round-robin approach. This should avoid the "hotspots" you are concerned about, without delivering the same message to multiple consumers.
If this is what you are looking for, then see the "Fair Dispatch" section on this page in the Rabbit docs. A prefetch count of 1 is the key here.