Huge Number of consumers per ActiveMQ session - activemq

I am making one session per connection per thread to activeMQ cluster. But I want to consume from hundreds of destinations. I do understand that if I only have one thread ( one session ), I can't consume messages from these destinations concurrently. I don't want to do that either. But I want to have hundreds of consumers per session which will in-turn be associated to hundreds of different destinations, is this a viable approach? Please also provide the reason of viability or non-viability.
PS : I don't want to do any heavy processing on the messages, so that's why only 1 thread.

A session is not bound to a single thread - threading is a separate chapter. You can use a session in multiple threads (not recommended) and multiple sessions in a single thread. The session construct is more a thing to control transactions - i.e. commit and rollback messages in a transaction.
Anyway, you can use a single consumer to read multiple destinations. Simply put the destinations in a list like: "my.first.queue,my.other.queue,my.last.queue". You can also read a queue using wildcards - "my.>". would use all queues above.
This way, you can use a single thread and a single session to read from a large number of queues.

Related

Coordinate scheduled jobs between multiple producers

I have a distributed system of producers and consumers across several servers, with redundant nodes—both for failover and load-balancing. The nodes communicate via RabbitMQ messages.
Each producer runs its own scheduler to invoke jobs, which one of the consumers should run. This works by publishing the appropriate RabbitMQ message, that one of the consumers will process.
Now, the tricky part is, each job should be run only once. In short, my requirements are:
Only one invoke message per scheduled job should be processed (by any of the consumer instances)
If any of the procuders goes down, the job should still be invoked by the other instances
I can't figure out how to implement this without relying on anything else but RabbitMQ. I could make it work if there was such a thing as an "exclusive exchange", which only one producer can connect to at a time. I thought about making the consumers ignore any duplicate invokes for the same job, but this will not work, because due to the load-balancing, subsequent messages may be received by any of the other instances. Another idea was implementing a mechanism to declare one of the producers the "principal" node, so only this one is allowed to send invokes, but this basically presented the same problem of coordinating between instances.
Any ideas? Thanks in advance.

Multiple service instances using Hangfire (shared tasks/objects), is it possible?

I need to run multiple instances of the same service, with the same database, for redundancy reason.
I found some question about "Hangfire multiple instances" but for a differenct purpose then mine: usually about running multiple instances for different tasks on the same database, or similar to this.
I need to know if there are problems of concurrency when 2 or more instances of Hangfire use the same Database (we want to use MongoDB) and if this is the solution to make the service resilient.
The goal is to have instance that take care of all the jobs when another instance goes down.
Any suggestion wellcome for covering this scenario.
In our environment, we have a replica set used by about 10 Hangfire servers. If there are multiple Hangfire servers servicing the same queue, it means they will share the load and whichever Hangfire server checks the queue first, picks up the job and continues. If you remove all but 1 server, the jobs will continue (as long as there are enough workers otherwise they will remain queued until a worker is available).
To answer your question, yes, you can have 2 or more Hangfire servers using the same MongoDB. MongoDB provides multi-threading support so its safe to have various servers accessing the same database backend. If you have two servers, both will be active and if one instance goes off line, other instance (based on queues) will continue to process the jobs in queue.
Keep in mind, Hangfire servers processes the jobs in Specific Queues. If both servers are part of the same queue then you are load balancing the jobs among the two servers. If they are part of different queues, then you read about that scenario where each Hangfire instance processes different jobs (because they are part of different queues).
Read about configuring Job Queues here

RabbitMQ - allow only one process per user

To keep it short, here is a simplified situation:
I need to implement a queue for background processing of imported data files. I want to dedicate a number of consumers for this specific task (let's say 10) so that multiple users can be processed at in parallel. At the same time, to avoid problems with concurrent data writes, I need to make sure that no one user is processed in multiple consumers at the same time, basically all files of a single user should be processed sequentially.
Current solution (but it does not feel right):
Have 1 queue where all import tasks are published (file_queue_main)
Have 10 queues for file processing (file_processing_n)
Have 1 result queue (file_results_queue)
Have a manager process (in this case in node.js) which consumes messages from file_queue_main one by one and decides to which file_processing queue to distribute that message. Basically keeps track of in which file_processing queues the current user is being processed.
Here is a little animation of my current solution and expected behaviour:
Is RabbitMQ even the tool for the job? For some reason, it feels like some sort of an anti-pattern. Appreciate any help!
The part about this that doesn't "feel right" to me is the manager process. It has to know the current state of each consumer, and it also has to stop and wait if all processors are working on other users. Ideally, you'd prefer to keep each process ignorant of the others. You're also getting very little benefit out of your processing queues, which are only used when a processor is already working on a message from the same user.
Ultimately, the best solution here is going to depend on exactly what your expected usage is and how likely it is that the next message is from a user that is already being processed. If you're expecting most of your messages coming in at any one time to be from 10 users or fewer, what you have might be fine. If you're expecting to be processing messages from many different users with only the occasional duplicate, your processing queues are going to be empty much of the time and you've created a lot of unnecessary complexity.
Other things you could do here:
Have all consumers pull from the same queue and use some sort of distributed locking to prevent collisions. If a consumer gets a message from a user that's already being worked on, requeue it and move on.
Set up your queue routing so that messages from the same user will always go to the same consumer. The downside is that if you don't spread the traffic out evenly, you could have some consumers backed up while others sit idle.
Also, if you're getting a lot of messages in from the same user at once that must be processed sequentially, I would question if they should be separate messages at all. Why not send a single message with a list of things to be processed? Much of the benefit of event queues comes from being able to treat each event as a discrete item that can be processed individually.
If the user has a unique ID, or the file being worked on has a unique ID then hash the ID to get the processing queue to enter. That way you will always have the same user / file task queued on the same processing queue.
I am not sure how this will affect queue length for the processing queues.

Question about moving events from redis to kafka

I have a question related to a tricky situation in an event-driven system that I want to ask for advise. Here is the situation:
In our system, I use redis as a memcached database, and kafkaa as message queues. To increase the performance of redis, I use lua scripting to process data, and at the same time, push events into a blocking list of redis. Then there will be a process to pick redis events in that blocking list and move them to kafka. So in this process, there are 3 steps:
1) Read events from redis list
2) Produce in batch into kafka
3) Delete corresponding events in redis
Unfortunately, if the process dies between 2 and 3, meaning that after producing all events into kafka, it doesn't delete corresponding events in redis, then after that process is restarted, it will produce duplicated events into kafka, which is unacceptable. So does any one has any solution for this problem. Thanks in advance, I really appreciate it.
Kafka is prone to reprocess events, even if written exactly once. Reprocessing will almost certainly be caused by rebalancing clients. Rebalancing might be triggered by:
Modification of partitions on a topic.
Redeployment of servers and subsequent temporary unavailabilty of clients.
Slow message consumption and subsequent recreation of client by the broker.
In other words, if you need to be sure that messages are processed exactly once, you need to insure that at the client. You could do so, by setting a partition key that ensures related messages are consumed in a sequential fashion by the same client. This client could then maintain a databased record of what he has already processed.

How can I get SQL Service Broker to actually use all available Queue Readers?

I've built a data collection framework around service broker. There are several procs that fill the queue with various jobs. Then a listener (activated procedure) that takes the jobs, decides what needs to be done with that item, and hands it off to the correct collection proc.
The activation queue has a MAX_QUEUE_READERS of 10, but almost never reaches that limit. Instead it will take far longer to process with just 1 or 2 activated tasks as seen from dm_broker_activated_tasks.
How can I incentivize or even force the higher number of workers?
EDIT: THIS MS doc says it only checks for activation every 5 sec.
Does that mean if my tasks take less that 5 seconds I have no way to parallelize them through service broker?
Service Broker has a specific concept for parallelism, namely the conversation group. Only messages from different groups can be processed in parallel. How this manifests is that a RECEIVE will lock the conversation group for the dequeued message and no other RECEIVE can dequeue messages from the same conversation group.
So even if you do have more messages in your queue, if they belong to the same conversation group then SQL Server cannot activate more parallel readers.
Even if you don't manage conversation groups explicitly (almost nobody does), they are managed implicitly by the fact that a conversation handle is also a group. Basically, every time you issue a single BEGIN DIALOG followed by several SEND on the same handle, they will not be processable in parallel. If you issue separate BEGIN DIALOG for each SEND they are processable in parallel, but you loose the order guarantee.