Redis/hazelcast - parallel consuming without locks

Redis/hazelcast - parallel consuming without locks - redis

The question is asked for Redis, but can also be asked for hazelcast.
Is there a way for multiple instances to work on a Redis data-structure without locks?
Example:
Assume you have a queue stored on Redis.
When an applications gets an API request, it takes elements from that queue, does some processing and then returns the elements as the API response.
Is there a way to it in a way that satisfies the following requirements:
Elements are not lost. Meaning that in case where elements were taken from the queue and the application restarts, before done processing them, the elements will return to the queue.
Elements are read by a single consumer. Meaning, if a message is currently processed by a consumer then no other consumer will pick it up.
Seems that working with transactions satisfies 1 but fails 2, dequeuing messages satisfies 2 but fails 1.
Is there a way to satisfy both?

For Hazelcast, try TransactionalQueue
If you do IQueue.take() or TransactionalQueue.take() then only one consumer receives the message.
For the transactional queue, the item taken is at the front of the queue if the transaction goes to rollback.
Note here transaction isolation impacts on sequentiality semantics.
A queue may contain "A", "B" and "C".
A transaction take would get a hold on "A", as you'd expect.
While the transaction is running the next take gets "B".
If the transaction aborts, the next take gets "A".

Related

ActiveMQ: How do I limit the number of messages being dispatched?

Let's say I have one ActiveMQ Broker and an undefined numbers of consumers.
Problem:
To process a message, consumers need an external service which is either "DATA1" or "DATA2" (specified in the message)
Each server, "DATA1" and "DATA2", can only handle 20 connections
So at most 20 "DATA1" and 20 "DATA2" messages must be dispatched at any time
Because of priorization, the messages must be enqueued in the same queue
Even if message A has a higher prio than message B, if A can't be processed because the external service has no free slots, message B needs to be processed instead
How can this be solved? As long as I was using message pulling (prefetch of 0), I was able to do this by using a BrokerPlugin that, on messagePull, achieved this by using semaphores and selectors. If the limits were reached, the pull returned null.
However, due to performance issues I had to set prefetch to 1 and use push instead. Therefore, my messagePull hack no longer works (it's never called).
So far I'm considering implementing a custom Cursor but I was wondering if someone knows a better solution.
Update the custom cursor worked but broke features like message removal. I tried with a custom Queue and QueueDispatchSelector (which is a pain to configure since there isn't a proper API to do so) and it mostly works but I still have synchronisation issues.
Also, a very suitable API seems to be DispatchPolicy, however, while it is referenced by Queue, it's never used.

Queues give you buffering for system processing time for free. Messages are delivered on demand. With prefetch=0 or prefetch=1, should effectively get you there. Messages will only be delivered to a consumer when the consumer is ready (ie.. during the consumer.receive() method).
consumer.receive() is a blocking call, so you should not need any custom plugin or other to delay delivery until the consumer process (and its required downstream services) are ready to handle it.
The behavior should work out-of-the-box, or there are some details to your use case that are not provided to shed more light on the scenario.

RabbitMQ support for LIFO or time based priority queue

Is there any way to make a RabbitMQ queue behave as a Stack, i.e. the client gets the last message that was posted in the queue (LIFO) rather than the first one? Or maybe alternatively make it a priority queue using a timestamp which the client could set?
RabbitMQ does support priority queues but the priority it allows is just a number up to 255 (recommended to use up to 10).
What I want to achieve is that the latest messages are processed first because they contain the latest information about the source. I still want to process the old messages, but in situations when the client cannot keep up (or there was some downtime and the client is recovering) I want to process the latest state information first.
The only solution I came up with so far is to use a TTL on the messages of the main queue and have them go to a dead letter queue when they expire, which is also processed by the client. However this is not so clean, and if the source of the message takes longer than the TTL to send a new status update, the latest state will be stuck in queue behind the other older expired messages still to be processed.
If it is not possible to achieve with RabbitMQ, is there any other recommended messaging framework that supports this requirement?

Kafka Log Compaction was created for exactly the use case you describe:
Log compaction ensures that Kafka will always retain at least the last
known value for each message key within the log of data for a single
topic partition. It addresses use cases and scenarios such as
restoring state after application crashes or system failure, or
reloading caches after application restarts during operational
maintenance. Let's dive into these use cases in more detail and then
describe how compaction works.

So, RabbitMQ is a queue, not a stack. It is specifically designed NOT to do what you are asking (a queue is always a first-in, first-out data structure).
However, there are options:
Presumably some process (e.g. a web service) exists between the client and the message server. This process could save the data off to an additional storage location (e.g. memcached) for immediate access of the latest value, thus leaving the queue untouched.
You could configure a secondary queue/service combination. When messages are published, they can then be routed to both queues. The first queue is for your heavy processing, and the second queue would be a service whose only task is to update the latest value in memcached or some other fast storage/retrieval system. Thus, message lifetime in this queue would presumably be much shorter.
You could implement multiple processing steps. The first step would be to update the current state (presumably a quick operation), after which the message is then re-published to the longer processing step's queue.

AMQP/RabbitMQ - Process messages sequentially

I have one direct exchange. There is also one queue, bound to this exchange.
I have two consumers for that queue. The consumers are manually ack'ing the messages once they've done the corresponding processing.
The messages are logically ordered/sorted, and should be processed in that order. Is it possible to enforce that all messages are received and processed sequentially accross consumer A and consumer B? In other words, prevent A and B from processing messages at the same time.
Note: the consumers are not sharing the same connection and/or channel. This means I cannot use <channel>.basicQoS(1);.
Rationale of this question: both consumers are identicall. If one goes down, the other queue starts processing messages and everything keeps working without any required intervention.

One approach to handling failover in a case where you want redundant consumers but need to process messages in a specific order is to use the exclusive consumer option when setting up the bind to the queue, and to have two consumers who keep trying to bind even when they can't get the exclusive lock.
The process is something like this:
Consumer A starts first and binds to the queue as an exclusive consumer. Consumer A begins processing messages from the queue.
Consumer B starts next and attempts to bind to the queue as an exclusive consumer, but is rejected because the queue already has an exclusive consumer.
On a recurring basis, consumer B attempts to get an exclusive bind on the queue but is rejected.
Process hosting consumer A crashes.
Consumer B attempts to bind to the queue as an exclusive consumer, and succeeds this time. Consumer B starts processing messages from the queue.
Consumer A is brought back online, and attempts an exclusive bind, but is rejected now.
Consumer B continues to process messages in FIFO order.
While this approach doesn't provide load sharing, it does provide redundancy.

Even though this is already answered. May be this can help others.
RabbitMQ has a feature known as Single Active Consumer, which matches your case.
We can have N consumers attached to a Queue but only 1 (one) of them will be actively consuming messages from the Queue. Fail-over happens only when active consumer fails.
Kindly take a look at the link https://www.rabbitmq.com/consumers.html#single-active-consumer
Thank you

Usually the point of a MQ system is to distribute workload. Of course, there are some situations where processing of message N depends on result of processing the message N-1, or even the N-1 message itself.
If A and B can't process messages at the same time, then why not just have A or just B? As I see it, you are not saving anything with having 2 consumers in a way that one can work only when the other one is not...
In your case, it would be best to have one consumer but to actually do the parallelisation (not a word really) on the processing part.
Just to add that RMQ is distributing messages evenly to all consumers (in round-robin fashion) regardless on any criteria. Of course this is when prefetch is set to 1, which by default it is. More info on that here, look for "fair dispatch".

Queue Fairness and Messaging Servers

I'm looking to solve a problem that I have with the FIFO nature of messaging severs and queues. In some cases, I'd like to distribute the messages in a queue to the pool of consumers on a criteria other than the message order it was delivered in. Ideally, this would prevent users from hogging shared resources in the system. Take this overly simplified scenario:
There is a feature within an application where a user can empty their trash can.
This event dispatches a DELETE message for each item in trash can
The consumers for this queue invoke a web service that has a rate limited API.
Given that each user can have very large volumes of messages in their trash can, what options do we have to allow concurrent processing of each trash can without regard to the enqueue time? It seems to me that there are a few obvious solutions:
Create a separate queue and pool of consumers for each user
Randomize the message delivery from a single queue to a single pool of consumers
In our case, creating a separate queue and managing the consumers for each user really isn't practical. It can be done but I think I really prefer the second option if it's reasonable. We're using RabbitMQ but not necessarily tied to it if there is a technology more suited to this task.
I'm entertaining the idea of using Rabbit's message priorities to help randomize delivery. By randomly assigning a message a priority between 1 and 10, this should help distribute the messages. The problem with this method is that the messages with the lowest priority may be stuck in the queue forever if the queue is never completely emptied. I thought I could use a TTL on the message and then re-queue the message with an escalated priority but I noticed this in the docs:
Messages which should expire will still only expire from the head of
the queue. This means that unlike with normal queues, even per-queue
TTL can lead to expired lower-priority messages getting stuck behind
non-expired higher priority ones. These messages will never be
delivered, but they will appear in queue statistics.
I fear that I may heading down the rabbit hole with this approach. I wonder how others are solving this problem. Any feedback on creative routing, messaging patterns, or any alternative solutions would be appreaciated.

So I ended up taking a page out of the network router handbook. This a problem they routers need to solve to allow fair traffic patterns. This video has a good breakdown of the problem and the solution.
The translation of the problem into my domain:
And the solution:
The load balancer is a wrapper around a channel and a known number of queues that uses a weighted algorithm to balance between messages received on each queue. We found a really interesting article/implementation that seems to be working well so far.
With this solution, I can also prioritize workspaces after messages have been published to increase their throughput. That's a really nice feature.
The biggest challenge ahead of me is management of the queues. There will be too many queues to leave bound to the exchange for an extended period of time. I'm working on some tools to manage their lifecycle.

One solution could be to interpose a Resequencer. The principle is outlined in the diag in that link. In your case, something like:
The app dispatches its DELETE messages into the delete queue as originally.
The Resequencer (a new component you write) is interposed between the original publishers and original consumers. It:
pulls messages off the DELETE queue into memory
places them into (in-memory) queues-by-user
republishes them to a new queue (eg FairPriorityDeleteQueue), round-robinning to interleave fairly any messages from different original users
limits its republish rate into FairPriorityDeleteQueue, either such that the length of FairPriorityDeleteQueue (obtainable via polling the rabbitmq management api periodically) never exceeds some integer you choose N, or limited to some rate related to the rate-limited delete API the consumers use.
doesn't ack any message it pulled off the original DELETE queue, until it's republished it to FairPriorityDeleteQueue (so you never lose a message)
The original consumers subscribe instead to FairPriorityDeleteQueue.
You set the preFetchCount on these consumers fairly low (<10), to prevent them in turn bulk-buffering the contents of FairPriorityDeleteQueue in memory.
--
Some points to watch:
Rate- or length-limiting publishing into and/or drawing messages out of FairPriorityDeleteQueue is essential. If you don't limit, Resequencer may just hand messages on as fast as it receives them, limiting the potential for resequencing.
Resequencer of course acts as a kind of in-memory buffer while resequencing. If the original publishers can publish very large numbers of messages in to the queue suddenly, you may need to memory-limit the Resequencer process so that it doesn't ingest more than it can hold.
Your particular scenario is greatly helped by the fact that you have an external factor (the final delete API) limiting throughput. Without such an extrinsic limiting factor, it is much harder to choose the optimum parameters for such a resequencer, to balance throughput-versus-resequencing in a particular environment.

I don't think a resequencer is needed in this case. Maybe it is, if you need to ensure the items are deleted in a specific order. But that only comes into play when you send multiple messages at roughly the same time and need to guarantee order on the consumer end.
You should also avoid the timeout scenario, for the reasons you've mentioned. timeout is meant to tell RabbitMQ that a message doesn't need to be processed - or that it needs to be routed to a dead letter queue so that i can be processed by some other code. while you might be able to make timeout work, i don't think it's a good choice.
Priorities may solve part of the problem, but could introduce a scenario where files never get processed. if you have a priority 1 message sitting back in the queue somewhere, and you keep putting priority 2, 3, 5, 10, etc. into the queue, the 1 might not be processed. the timeout doesn't solve this, as you've noted.
For my money, I would suggest a different approach: sending delete requests serially, for a single file.
that is, send 1 message to delete 1 file. wait for a response to say it's done. then send the next message to delete the next file.
here's why i think that will work, and how to manage it:
Long-Running Workflow, Single File Delete Requests
In this scenario, I would suggest taking a multi-step approach to the problem using the idea of a "saga" (aka a long-running workflow object).
when a user requests to delete their trashcan, you send a single message through rabbitmq to the service that can handle the delete process. that service creates an instance of the saga for that user's trashcan.
the saga gathers a list of all files in the trashcan that need to be deleted. then it starts to send the requests to delete the individual files, one at a time.
with each request to delete a single file, the saga waits for the response to say the file was deleted.
when the saga receives the message to say the previous file has been deleted, it sends out the next request to delete the next file.
once all the files are deleted, the saga updates itself and any other part of the system to say the trash can is empty.
Handling Multiple Users
When you have a single user requesting a delete, things will happen fairly quickly for them. they will get their trash emptied soon.
u1 = User 1 Trashcan Delete Request
|u1|u1|u1|u1|u1|u1|u1|u1|u1|u1done|
when you have multiple users requesting a delete, the process of sending one file delete request at a time means each user will have an equal chance of getting the next file delete.
u1 = User 1 Trashcan Delete Request
u2 = User 2 Trashcan Delete Request
|u1|u2|u1|u1|u2|u2|u1|u2|u1|u2|u2|u1|u1|u1|u2|u2|u1|u2|u1|u1done|u2|u2done|
This way, there will be shared use of the resources to delete the files. Over-all, it will take a little longer for each person's trashcan to be emptied, but they will see progress sooner and that's an important aspect of people thinking the system is fast / responsive to their request.
Optimizing Small File Set vs Large File Set
In a scenario where you have a small number of users with a small number of files, the above solution may prove to be slower than if you deleted all the files at once. after all, there will be more messages sent across rabbitmq - at least 2 for every file that needs to be deleted (one delete request, one delete confirmation response)
To optimize this further, you could do a couple of things:
have a minimum trashcan size before you split up the work like this. below that minimum, you just delete it all at once
chunk the work into groups of files, instead of one at a time. maybe 10 or 100 files would be a better group size, than 1 file at a time
Either (or both) of these solutions would help to improve the over-all performance of the process by reducing the number of messages being sent, and batching the work a bit.
You would need to do some testing in your real scenario to see which of these (or maybe both) would help and at what settings.
Many Users Problem
There's one additional problem you may face - many users. If you have 2 or 3 users requesting deletes, it won't be a big deal.
But if you have 100 or 1000 users requesting deletes, it could take a very long time for an individual to get their trashcan emptied.
You may need to have a higher level controlling process for this situation, where all requests to empty trashcans would be managed by yet another Saga. This saga would rate-limit the number of active trashcan-deletion sagas.
For example, if you have 10 active requests for deleting trashcans, the rate-limiting saga would only start 3 of them and it would wait for one to finish before starting the next one.
Again, you would need to test your actual scenario to see if this is needed and see what the limits should be, for performance reasons.
There may be additional scenarios that have to be considered in your actual scenario, but I hope this gets you down the path! :)

How to solve message disorder in RabbitMq? [duplicate]

I need to choose a new Queue broker for my new project.
This time I need a scalable queue that supports pub/sub, and keeping message ordering is a must.
I read Alexis comment: He writes:
"Indeed, we think RabbitMQ provides stronger ordering than Kafka"
I read the message ordering section in rabbitmq docs:
"Messages can be returned to the queue using AMQP methods that feature
a requeue
parameter (basic.recover, basic.reject and basic.nack), or due to a channel
closing while holding unacknowledged messages...With release 2.7.0 and later
it is still possible for individual consumers to observe messages out of
order if the queue has multiple subscribers. This is due to the actions of
other subscribers who may requeue messages. From the perspective of the queue
the messages are always held in the publication order."
If I need to handle messages by their order, I can only use rabbitMQ with an exclusive queue to each consumer?
Is RabbitMQ still considered a good solution for ordered message queuing?

Well, let's take a closer look at the scenario you are describing above. I think it's important to paste the documentation immediately prior to the snippet in your question to provide context:
Section 4.7 of the AMQP 0-9-1 core specification explains the
conditions under which ordering is guaranteed: messages published in
one channel, passing through one exchange and one queue and one
outgoing channel will be received in the same order that they were
sent. RabbitMQ offers stronger guarantees since release 2.7.0.
Messages can be returned to the queue using AMQP methods that feature
a requeue parameter (basic.recover, basic.reject and basic.nack), or
due to a channel closing while holding unacknowledged messages. Any of
these scenarios caused messages to be requeued at the back of the
queue for RabbitMQ releases earlier than 2.7.0. From RabbitMQ release
2.7.0, messages are always held in the queue in publication order, even in the presence of requeueing or channel closure. (emphasis added)
So, it is clear that RabbitMQ, from 2.7.0 onward, is making a rather drastic improvement over the original AMQP specification with regard to message ordering.
With multiple (parallel) consumers, order of processing cannot be guaranteed.
The third paragraph (pasted in the question) goes on to give a disclaimer, which I will paraphrase: "if you have multiple processors in the queue, there is no longer a guarantee that messages will be processed in order." All they are saying here is that RabbitMQ cannot defy the laws of mathematics.
Consider a line of customers at a bank. This particular bank prides itself on helping customers in the order they came into the bank. Customers line up in a queue, and are served by the next of 3 available tellers.
This morning, it so happened that all three tellers became available at the same time, and the next 3 customers approached. Suddenly, the first of the three tellers became violently ill, and could not finish serving the first customer in the line. By the time this happened, teller 2 had finished with customer 2 and teller 3 had already begun to serve customer 3.
Now, one of two things can happen. (1) The first customer in line can go back to the head of the line or (2) the first customer can pre-empt the third customer, causing that teller to stop working on the third customer and start working on the first. This type of pre-emption logic is not supported by RabbitMQ, nor any other message broker that I'm aware of. In either case, the first customer actually does not end up getting helped first - the second customer does, being lucky enough to get a good, fast teller off the bat. The only way to guarantee customers are helped in order is to have one teller helping customers one at a time, which will cause major customer service issues for the bank.
It is not possible to ensure that messages get handled in order in every possible case, given that you have multiple consumers. It doesn't matter if you have multiple queues, multiple exclusive consumers, different brokers, etc. - there is no way to guarantee a priori that messages are answered in order with multiple consumers. But RabbitMQ will make a best-effort.

Message ordering is preserved in Kafka, but only within partitions rather than globally. If your data need both global ordering and partitions, this does make things difficult. However, if you just need to make sure that all of the same events for the same user, etc... end up in the same partition so that they are properly ordered, you may do so. The producer is in charge of the partition that they write to, so if you are able to logically partition your data this may be preferable.

I think there are two things in this question which are not similar, consumption order and processing order.
Message Queues can -to a degree- give you a guarantee that messages will get consumed in order, they can't, however, give you any guarantees on the order of their processing.
The main difference here is that there are some aspects of message processing which cannot be determined at consumption time, for example:
As mentioned a consumer can fail while processing, here the message's consumption order was correct, however, the consumer failed to process it correctly, which will make it go back to the queue. At this point the consumption order is intact, but the processing order is not.
If by "processing" we mean that the message is now discarded and finished processing completely, then consider the case when your processing time is not linear, in other words processing one message takes longer than the other. For example, if message 3 takes longer to process than usual, then messages 4 and 5 might get consumed and finish processing before message 3 does.
So even if you managed to get the message back to the front of the queue (which by the way violates the consumption order) you still cannot guarantee they will also be processed in order.
If you want to process the messages in order:
Have only 1 consumer instance at all times, or a main consumer and several stand-by consumers.
Or don't use a messaging queue and do the processing in a synchronous blocking method, which might sound bad but in many cases and business requirements it is completely valid and sometimes even mission critical.

There are proper ways to guarantuee the order of messages within RabbitMQ subscriptions.
If you use multiple consumers, they will process the message using a shared ExecutorService. See also ConnectionFactory.setSharedExecutor(...). You could set a Executors.newSingleThreadExecutor().
If you use one Consumer with a single queue, you can bind this queue using multiple bindingKeys (they may have wildcards). The messages will be placed into the queue in the same order that they were received by the message broker.
For example you have a single publisher that publishes messages where the order is important:
try (Connection connection2 = factory.newConnection();
Channel channel2 = connection.createChannel()) {
// publish messages alternating to two different topics
for (int i = 0; i < messageCount; i++) {
final String routingKey = i % 2 == 0 ? routingEven : routingOdd;
channel2.basicPublish(exchange, routingKey, null, ("Hello" + i).getBytes(UTF_8));
}
}
You now might want to receive messages from both topics in a queue in the same order that they were published:
// declare a queue for the consumer
final String queueName = channel.queueDeclare().getQueue();
// we bind to queue with the two different routingKeys
final String routingEven = "even";
final String routingOdd = "odd";
channel.queueBind(queueName, exchange, routingEven);
channel.queueBind(queueName, exchange, routingOdd);
channel.basicConsume(queueName, true, new DefaultConsumer(channel) { ... });
The Consumer will now receive the messages in the order that they were published, regardless of the fact that you used different topics.
There are some good 5-Minute Tutorials in the RabbitMQ documentation that might be helpful:
https://www.rabbitmq.com/tutorials/tutorial-five-java.html

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas