My question is whether it makes sense to publish a huge amount of streaming data to the same application which is also the consumer of those messages. Does asynchronous nature of queue messaging systems like RabbitMQ helps with the performance in such conditions?
Example: if a huge amount of streaming data comes into the service, will sending them into a queue in RabbitMQ and processing them (Listening) one by one from the same application make sense?
Related
We are building spark based jobs. Processing each message delivered by the queue takes time. There is a need to be able to reprioritize one already sent to the queue.
I am aware there is priority queue implementation available, but not sure how to re-prioritize the existing message in the queue?
One bad workaround is to push that message again as higher priority, so that it handled on priority. Later drop the message with same content which had low or no priority when it's turns comes next.
Is there a natural way we can handle this situation or any other queues that supports scenario better?
Unfortunately there isn't. Queues are to be considered as lists of messages in flight. It is not possible to delete/update them.
Your approach of submitting a higher priority message is the only feasible solution.
RabbitMQ is a messaging system (such as the postal one), it is not a DataBase or a storage service. The storage in form of queues is a necessary feature as much as the postal service needs storage for postcards in transit. It is optimized for the purpose and does not allow to access the messages easily.
My team wants to move to microservices architecture. Currently we are using Redis Pub/Sub as message broker for some legacy parts of our system. My colleagues think that it is naturally to continue use redis as service bus as they don't want spend their time on studying new product. But in my opinion RabbitMQ (especially with MassTransit) is a better approach for microservices. Could you please compare Redis Pub/Sub with Rabbit MQ and give me some arguments for Rabbit?
Redis is a fast in-memory key-value store with optional persistence. The pub/sub feature of Redis is a marginal case for Redis as a product.
RabbitMQ is the message broker that does nothing else. It is optimized for reliable delivery of messages, both in command style (send to an endpoint exchange/queue) and publish-subscribe. RabbitMQ also includes the management plugin that delivers a helpful API to monitor the broker status, check the queues and so on.
Dealing with Redis pub/sub on a low level of Redis client can be a very painful experience. You could use a library like ServiceStack that has a higher level abstraction to make it more manageable.
However, MassTransit adds a lot of value compared to raw messaging over RMQ. As soon as you start doing stuff for real, no matter what transport you decide to use, you will hit typical issues that are associated with messaging like handling replies, scheduling, long-running processes, re-delivery, dead-letter queues, and poison queues. MassTransit does it all for you. Neither Redis or RMQ client would deliver any of those. If your team wants to spend time dealing with those concerns in their own code - that's more like reinventing the wheel. Using the argument of "not willing to learn a new product" in this context sounds a bit weird, since, instead of delivering value for the product, developers want to spend their time dealing with infrastructure concerns.
RabbitMQ is far more stable and robust than Redis for passing messages.
RabbitMQ is able to hold and store a message if there is no consumer for it (e.g. your listener crashed , etc).
RabbitMQ has different methods for communication: Pub/Sub , Queue. That you can use for load balancing , etc
Redis is convenient for simple cases. If you can afford losing a message and you don't need queues then I think Redis is also a good option.
If you however can not afford losing a message then Redis is not a good option.
I haven't found an existing post asking this but apologize if I missed it.
I'm trying to get my head round microservices and have come across articles where RabbitMQ is used. I'm confused why RabbitMQ is needed. Is the intention that the services will use a web api to communicate with the outside world and RabbitMQ to communicate with each other?
In Microservices architecture you have two ways to communicate between the microservices:
Synchronous - that is, each service calls directly the other microservice , which results in dependency between the services
Asynchronous - you have some central hub (or message queue) where you place all requests between the microservices and the corresponding service takes the request, process it and return the result to the caller. This is what RabbitMQ (or any other message queue - MSMQ and Apache Kafka are good alternatives) is used for. In this case all microservices know only about the existance of the hub.
microservices.io has some very nice articles about using microservices
A message queue provide an asynchronous communications protocol - You have the option to send a message from one service to another without having to know if another service is able to handle it immediately or not. Messages can wait until the responsible service is ready. A service publishing a message does not need know anything about the inner workings of the services that will process that message. This way of handling messages decouple the producer from the consumer.
A message queue will keep the processes in your application separated and independent of each other; this way of handling messages could create a system that is easy to maintain and easy to scale.
Simply put, two obvious cases can be used as examples of when message queues really shine:
For long-running processes and background jobs
As the middleman in between microservices
For long-running processes and background jobs:
When requests take a significant amount of time, it is the perfect scenario to incorporate a message queue.
Imagine a web service that handles multiple requests per second and cannot under any circumstances lose one. Plus the requests are handled through time-consuming processes, but the system cannot afford to be bogged down. Some real-life examples could include:
Images Scaling
Sending large/many emails (like newsletters)
Search engine indexing
File scanning
Video encoding
Delivering notifications
PDF processing
Calculations
The middleman in between microservices:
For communication and integration within and between applications, i.e. as the middleman between microservices, a message queue is also useful. Think of a system that needs to notify another part of the system to start to work on a task or when there are a lot of requests coming in at the same time, as in the following scenarios:
Order handling (Order placed, update order status, send an order, payment, etc.)
Food delivery service (Place an order, prepare an order, deliver food)
Any web service that needs to handle multiple requests
Here is a story explaining how Parkster (a digital parking service) are breaking down their system into multiple microservices by using RabbitMQ.
This guide follow a scenario where a web application allows users to upload information to a web site. The site will handle this information and generate a PDF and email it back to the user. Handling the information, generating the PDF and sending the email will in this example case take several seconds and that is one of the reasons of why a message queue will be used.
Here is a story about how and why CloudAMQP used message queues and RabbitMQ between microservices.
Here is a story about the usage of RabbitMQ in an event-based microservices architecture to support 100 million users a month.
And finally a link to Kontena, about why they chose RabbitMQ for their microservice architecture: "Because we needed a stable, manageable and highly-available solution for messaging.".
Please note that I work for the company behind CloudAMQP (hosting provider of RabbitMQ).
The same question can be why REST is necessary for microservices? Microservice concept is not something new under moon. A long time distribution of workflow was used for backend engineering and asynchronous request processing, Microservice is the same component in a separated jvm which matches with S(single responsibility) in SOLID. What makes it micro SERVICE - is that it is balanced. And that is the all! Particularly (!), it can be REST Service on Spring Cloud/REST base, which is registered by Eureka, has proxy gateway and load balancing over Zuul and Ribbon. But it is not the whole world of microservices!By the way, asynchronous distributed processing is one of tasks which microservices are used for. Long time ago services(components) in separated JVM was integrated over any messaging and the pattern is known as ESB. Microservices are the same subjects the pattern. Due to fashion for Spring Cloud REST seems like it is the only way of microservices. Nope! Message based asynchronous microservice architecture is supported by Vertx https://dzone.com/articles/asynchronous-microservices-with-vertx, for example. Why not to use RabbitMQ as message channel? In this case load balancing can be provided by building RabbitMQ cluster. For example:https://codeburst.io/using-rabbitmq-for-microservices-communication-on-docker-a43840401819. So, world is much wide more.
In our application the publisher creates a message and sends it to a topic.
It then needs to wait, when all of the topic's subscribers ack the message.
It does not appear, the message bus implementations can do this automatically. So we are leaning towards making each subscriber send their own new message for the client, when they are done.
Now, the client can receive all such messages and, when it got one from each destination, do whatever clean-ups it has to do. But what if the client (sender) crashes part way through the stream of acknowledgments? To handle such a misfortune, I need to (re)implement, what the buses already implement, on the client -- save the incoming acknowledgments until I get enough of them.
I don't believe, our needs are that esoteric -- how would you handle the situation, where the sender (publisher) must wait for confirmations from multiple recipients (subscribers)? Sort of like requesting (and awaiting) Return-Receipts from each subscriber to a mailing list...
We are using RabbitMQ, if it matters. Thanks!
The functionality that you are looking for sounds like a messaging solution that can perform transactions across publishers and subscribers of a message. In The Java world, JMS specifies such transactions. One example of a JMS implementation is HornetQ.
RabbitMQ does not provide such functionality and it does for good reasons. RabbitMQ is built for being extremely robust and to perform like hell at the same time. The transactional behavior that you describe is only achievable with the cost of reasonable performance loss (especially if you want to keep outstanding robustness).
With RabbitMQ, one way to assure that a message was consumed successfully, is indeed to publish an answer message on the consumer side that is then consumed by the original publisher. This can be achieved through RabbitMQ's RPC procedure calls which might help you to get a clean solution for your problem setting.
If the (original) publisher crashes before all answers could be received, you can assume that all outstanding answers are still queued on the broker. So you would have to build your publisher in a way that it is capable to resume with processing those left messages. This might turn out to be none-trivial.
Finally, I recommend the following solution: Design your producing component in a way that you can consume the answers with one or more dedicated answer consumers that are separated from the origin publisher.
Benefits of this solution are:
the origin publisher can finish its task independent of consumer success
the origin publisher is independent of consumer availability and speed
the origin publisher implementation is far less complex
in a crash scenario, the answer consumer can resume with processing answers
Now to a more general point: One of the major benefits of messaging is the decoupling of application components by the broker. In AMQP, this is achieved with exchanges and bindings that allow you to move message distribution logic from your application to a central point of configuration.
If you add RPC-style calls to your clients, then your components are most likely closely coupled again, meaning that the publishing component fails if one of the consuming components fails / is not available / too slow. This is exactly what you will want to avoid. Otherwise, why would you have split the components then?
My recommendation is that you design your application in a way that publishers can complete their tasks independent of the success of consumers wherever possible. Back-channels should be an exceptional case and be implemented in the described not-so coupled way.
In our project, we want to use the RabbitMQ in "Task Queues" pattern to pass data.
On the producer side, we build a few TCP server(in node.js) to recv
high concurrent data and send it to MQ without doing anything.
On the consumer side, we use JAVA client to get the task data from
MQ, handle it and then ack.
So the question is:
To get the maximum message passing throughput/performance( For example, 400,000 msg/second) , How many queues is best? Does that more queue means better throughput/performance? And is there anything else should I notice?
Any known best practices guide for using RabbitMQ in such scenario?
Any comments are highly appreciated!!
For best performance in RabbitMQ, follow the advice of its creators. From the RabbitMQ blog:
RabbitMQ's queues are fastest when they're empty. When a queue is
empty, and it has consumers ready to receive messages, then as soon as
a message is received by the queue, it goes straight out to the
consumer. In the case of a persistent message in a durable queue, yes,
it will also go to disk, but that's done in an asynchronous manner and
is buffered heavily. The main point is that very little book-keeping
needs to be done, very few data structures are modified, and very
little additional memory needs allocating.
If you really want to dig deep into the performance of RabbitMQ queues, this other blog entry of theirs goes into the data much further.
According to a response I once got from the rabbitmq-discuss mailing group there are other things that you can try to increase throughput and reduce latency:
Use a larger prefetch count. Small values hurt performance.
A topic exchange is slower than a direct or a fanout exchange.
Make sure queues stay short. Longer queues impose more processing
overhead.
If you care about latency and message rates then use smaller messages.
Use an efficient format (e.g. avoid XML) or compress the payload.
Experiment with HiPE, which helps performance.
Avoid transactions and persistence. Also avoid publishing in immediate
or mandatory mode. Avoid HA. Clustering can also impact performance.
You will achieve better throughput on a multi-core system if you have
multiple queues and consumers.
Use at least v2.8.1, which introduces flow control. Make sure the
memory and disk space alarms never trigger.
Virtualisation can impose a small performance penalty.
Tune your OS and network stack. Make sure you provide more than enough
RAM. Provide fast cores and RAM.
You will increase the throughput with a larger prefetch count AND at the same time ACK multiple messages (instead of sending ACK for each message) from your consumer.
But, of course, ACK with multiple flag on (http://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.ack) requires extra logic on your consumer application (http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-August/029600.html). You will have to keep a list of delivery-tags of the messages delivered from the broker, their status (whether your application has handled them or not) and ACK every N-th delivery-tag (NDTAG) when all of the messages with delivery-tag less than or equal to NDTAG have been handled.