RabbitMQ - Delayed message exchange - rabbitmq

Currently, we have 2 systems that are communicating directly.
Service A continuously (but not in periodical manner) sends messages to service B. The Message is in simple Key/Value format. Key is an integer number and Value is current local date and time.
Service B, in order to decide whether to process the request, has this logic to examine the last incoming request; If there is a time difference against the system time (for each key) and the difference is more than 10 minutes, then it starts processing the request.
Now that we are bringing RabbitMQ into our solution, we need to revise this communication model as well. I was thinking to use a delayed message exchange for the 10 minutes time window, and then rewrite and reset the time (re-schedule for another 10 minutes) for duplicate messages incoming from service A.
Could share your ideas about this proposed solution?

Well, after reading the documents I'm certain that such logic should be implemented in application layer (in my situation consumer software)

Related

RabbitMQ support for LIFO or time based priority queue

Is there any way to make a RabbitMQ queue behave as a Stack, i.e. the client gets the last message that was posted in the queue (LIFO) rather than the first one? Or maybe alternatively make it a priority queue using a timestamp which the client could set?
RabbitMQ does support priority queues but the priority it allows is just a number up to 255 (recommended to use up to 10).
What I want to achieve is that the latest messages are processed first because they contain the latest information about the source. I still want to process the old messages, but in situations when the client cannot keep up (or there was some downtime and the client is recovering) I want to process the latest state information first.
The only solution I came up with so far is to use a TTL on the messages of the main queue and have them go to a dead letter queue when they expire, which is also processed by the client. However this is not so clean, and if the source of the message takes longer than the TTL to send a new status update, the latest state will be stuck in queue behind the other older expired messages still to be processed.
If it is not possible to achieve with RabbitMQ, is there any other recommended messaging framework that supports this requirement?
Kafka Log Compaction was created for exactly the use case you describe:
Log compaction ensures that Kafka will always retain at least the last
known value for each message key within the log of data for a single
topic partition. It addresses use cases and scenarios such as
restoring state after application crashes or system failure, or
reloading caches after application restarts during operational
maintenance. Let's dive into these use cases in more detail and then
describe how compaction works.
So, RabbitMQ is a queue, not a stack. It is specifically designed NOT to do what you are asking (a queue is always a first-in, first-out data structure).
However, there are options:
Presumably some process (e.g. a web service) exists between the client and the message server. This process could save the data off to an additional storage location (e.g. memcached) for immediate access of the latest value, thus leaving the queue untouched.
You could configure a secondary queue/service combination. When messages are published, they can then be routed to both queues. The first queue is for your heavy processing, and the second queue would be a service whose only task is to update the latest value in memcached or some other fast storage/retrieval system. Thus, message lifetime in this queue would presumably be much shorter.
You could implement multiple processing steps. The first step would be to update the current state (presumably a quick operation), after which the message is then re-published to the longer processing step's queue.

Is it a good practice to create a channel for each user in redis message bus

We are using redis message bus and handling messages using a channel. But if our application is deployed in multiple instances then the request and response is passed to all the instances. To avoid this scenario which of the below approach is better?
Create a channel for each instance of the application
Create a channel for each user
Any suggestions will be highly appreciated
The limiting factor here is the number of subscribers to the same channel. Number of channels can be large as such. So you can choose the granularity accordingly. Read more here:
https://groups.google.com/forum/#!topic/redis-db/R09u__3Jzfk
All the complexity on the end is on the PUBLISH command, that performs
an amount of work that is proportional to:
a) The number of clients receiving the message.
b) The number of clients subscribed to a pattern, even if they'll not
match the message.
This means that if you have N clients subscribed to 100000 different
channels, everything will be super fast.
If you have instead 10000 clients subscribed to the same channel,
PUBLISH commands against this channel will be slow, and take maybe a
few milliseconds (not sure about the actual time taken). Since we have
to send the same message to everybody.
Similar question asked before : How does Redis PubSub subscribe mechanism works?

To be sure about concurrency, same group of works in multiple queues (FIFO)

I have a question about multi consumer concurrency.
I want to send works to rabbitmq that comes from web request to distributed queues.
I just want to be sure about order of works in multiple queues (FIFO).
Because this request comes from different users eech user requests/works must be ordered.
I have found this feature with different names on Azure ServiceBus and ActiveMQ message grouping.
Is there any way to do this in pretty RabbitMQ ?
I want to quaranty that customer's requests must be ordered each other.
Each customer may have multiple requests but those requests for that customer must be processed in order.
I desire to process quickly incoming requests with using multiple consumer on different nodes.
For example different customers 1 to 1000 send requests over 1 millions.
If I put this huge request in only one queue it takes a lot of time to consume. So I want to share this process load between n (5) node. For customer X 's requests must be in same sequence for processing
When working with event-based systems, and especially when using multiple producers and/or consumers, it is important to come to terms with the fact that there usually is no such thing as a guaranteed order of events. And to get a robust system, it is also wise to design the system so the message handlers are idempotent; they should tolerate to get the same message twice (or more).
There are way to many things that may (and actually should be allowed to) interfere with the order;
The producers may deliver the messages in a slightly different pace
One producer might miss an ack (due to a missed package) and will resend the message
One consumer may get and process a message, but the ack is lost on the way back, so the message is delivered twice (to another consumer).
Some other service that your handlers depend on might be down, so that you have to reject the message.
That being said, there is one pattern that servicebus-systems like NServicebus use to enforce the order messages are consumed. There are some requirements:
You will need a centralized storage (like a sql-server or document store) that allows for conditional updates; for instance you want to be able to store the sequence number of the last processed message (or how far you have come in the process), but only if the already stored sequence/progress is the right/expected one. Storing the user-id and the progress even for millions of customers should be a very easy operation for most databases.
You make sure the queue is configured with a dead-letter-queue/exchange for retries, and then set your original queue as a dead-letter-queue for that one again.
You set a TTL (for instance 30 seconds) on the retry/dead-letter-queue. This way the messages that appear on the dead-letter-queue will automatically be pushed back to your original queue after some timeout.
When processing your messages you check your storage/database if you are in the right state to handle the message (i.e. the needed previous steps are already done).
If you are ok to handle it you do and update the storage (conditionally!).
If not - you nack the message, so that it is thrown on the dead-letter queue. Basically you are saying "nah - I can't handle this message, there are probably some other message in the queue that should be handled first".
This way the happy-path is to process a great number of messages in the right order.
But if something happens and a you get a message out of band, you will throw it on the retry-queue (the dead-letter-queue) and Rabbit will make sure it will get back in the queue to be retried at a later stage. But only after a delay.
The beauty of this is that you are able to handle most of the situations that may interfere with processing the message (out of order messages, dependent services being down, your handler being shut down in the middle of handling the message) in exact the same way; by rejecting the message and letting your infrastructure (Rabbit) take care of it being retried after a while.
(Assuming the OP is asking about things like ActiveMQs "message grouping:)
This isn't currently built in to RabbitMQ AFAIK (it wasn't as of 2013 as per this answer) and I'm not aware of it now (though I haven't kept up lately).
However, RabbitMQ's model of exchanges and queues is very flexible - exchanges and queues can be easily created dynamically (this can be done in other messaging systems but, for example, if you read ActiveMQ documentation or Red Hat AMQ documentation you'll find all of the examples in the user guides are using pre-declared queues in configuration files loaded at system startup - except for RPC-like request/response communication).
Also it is very easy in RabbitMQ for a consumer (i.e., message consuming thread) to consume from multiple queues.
So you could build, on top of RabbitMQ, a system where you got your desired grouping semantics.
One way would be to create dynamic queues: The first time a customer order was seen or a new group of customer orders a queue would be created with a unique name for all messages for that group - that queue name would be communicated (via another queue) to a consumer who's sole purpose was to load-balance among other consumers that were responsible for handling customer order groups. I.e., the load-balancer would pull off of its queue a message saying "new group with queue name XYZ" and it would find in a pool of order group consumer a consumer which could take this load and pass it a message saying "start listening to XYZ".
Another way to do it is with pub/sub and topic routing - each customer order group would get a unique topic - and proceed as above.
RabbitMQ Consistent Hash Exchange Type
We are using RabbitMQ and we have found a plugin. It use Consistent Hashing algorithm to distribute messages in order to consistent keys.
For more information about Consistent Hashing ;
https://en.wikipedia.org/wiki/Consistent_hashing
https://www.youtube.com/watch?v=viaNG1zyx1g
You can find this plugin from rabbitmq web page
plugin : rabbitmq_consistent_hash_exchange
https://www.rabbitmq.com/plugins.html

Setting a long timeout for RabbitMQ ack message

I was wondering if this is possible. I want to pull a task from a queue and have some work that could potentially take anywhere from 3 seconds or longer (possibly) minutes before an ack is sent back to RabbitMQ notifying that the work has been completed. The work is done by a user, hence this is why the time it takes to process the job varies.
I don't want to ack the message immediately after I pop off the queue because I want the message to be requeued if no ack is received. Can anyone give me any insights into how to solve my problem?
Having a long timeout should be fine, and certainly as you say you want redelivery if something goes wrong, so you want to only ack after you finish.
The best way to achieve that, IMO, would be to have multiple consumers on the queue (i.e. multiple threads/processes consuming from the same queue). That should be fine as long as there's no particular ordering constraint on your queue contents (i.e. the way there might be if the queue were to contain contents representing Postgres data that involves FK constraints).
This tutorial on the RabbitMQ website provides more info (Python linked, but there should be similar tutorials for other languages): https://www.rabbitmq.com/tutorials/tutorial-two-python.html
Edit in response to comment from OP:
What's your heartbeat set to? If your worker doesn't acknowledge the heartbeat within the set period of time, the server will consider the connection to be dead.
Not sure which language you're using, but for Java you would use the setRequestedHeartbeat method to specify the heartbeat.
The way you implement your workers, it's vital that the heartbeat can still be sent back to the RabbitMQ server. If something blocks the client from sending the heartbeat, the server will kill the connection after the time interval expires.

REST, WCF and Queues

I created a RESTful service using WCF which calculates some value and then returns a response to the client.
I am expecting a lot of traffic so I am not sure whether I need to manually implement queues or it is not neccessary in order to process all client requests.
Actually I am receiving measurements from clients which have to be stored to the database - each client sends a measurement every 200 ms so if there are a multiple clients there could be a lot of requests.
And the other operation performed on received data. For example a client could send an instruction "give me the average of the last 200 measurements" so it could take some time to calculate this value and in the meantime the same request could come from another client.
I would be very thankful if anyone could give any advice on how to create a reliable service using WCF.
Thanks!
You could use the MsmqBinding and utilize the method implemented by eedsi9n. However, from what I'm gathering from this post is that you're looking for something along the lines of a pub/sub type of architecture.
This can be implemented with the WSDualHttpBinding which allows subscribers to subscribe to events. The publisher will then notify the user when the action is completed.
Therefore you could have Msmq running behind the scenes. The client subscribes to the certain events, then perhaps it publishes a message that needs to be processed. THe client sits there and does work (because its all async) and when the publisher is done working on th message it can publish an event (The event your client subscribed to) letting you know that its done. That way you don't have to implement a polling strategy.
There are pre-canned solutions for this as well. Such as NService Bus, Mass Transit, and Rhino Bus.
If you are using Web Service, Transmission Control Protocol (TCP/IP) will act as the queue to a certain degree.
TCP provides reliable, ordered
delivery of a stream of bytes from one
program on one computer to another
program on another computer.
This guarantees that if client sends packet A, B, then C, the server will received it in that order: A, B, then C. If you must reply back to the client in the same order as request, then you might need a queue.
By default maximum ASP.NET worker thread is set to 12 threads per CPU core. So on a dual core machine, you can run 24 connections at a time. Depending on how long the calculation takes and what you mean by "a lot of traffic" you could try different strategies.
The simplest one is to use serviceTimeouts and serviceThrottling and only handle what you can handle, and reject the ones you can't.
If that's not an option, increase hardware. That's the second option.
Finally you could make the service completely asynchronous. Implement two methods
string PostCalc(...) and double GetCalc(string id). PostCalc accepts the parameters, stuff them into a queue (or a database) and returns a GUID immediately (I like using string instead of Guid). The client can use the returned GUID as a claim ticket and call GetCalc(string id) every few seconds, if the calculation has not finished yet, you can return 404 for REST. Calculation must now be done by a separate process that monitors the queue.
The third option is the most complicated, but the outcome is similar to that of the first option of putting cap on incoming request.
It will depend on what you mean by "calculates some value" and "a lot of traffic". You could do some load testing and see how the #requests/second evolves with the traffic.
There's nothing WCF specific here if you are RESTful
the GET for an Average would give a URI where the answer would wait once the server finish calculating (if it is indeed a long operation)
Regarding getting the measurements - you didn't specify the freshness needed (i.e. when you get a request for an average - how fresh do you need the results to be) Also you did not specify the relative frequency of queries vs. new measurements
In any event you can (and IMHO should) use the queue (assuming measuring your performance proves it) behind the endpoint. If you change the WCF binding you might still be RESTful but will not benefit from the standard based approach of REST over HTTP