I'm trying to unregister a worker node as described here, but the procedure doesn't seem to work correctly. If the distributor has any control messages related to the disconnecting worker when running the unregistration script, next time messages come in, it will consume those (effectively sending more work to the node). It's only afterwards that the distributor will reject control messages coming from the node.
Has anyone got this to work correctly = the worker should not receive any new messages RIGHT AFTER unregistration?
What you describe is the intended behaviour for the unregister operation. It was designed and implemented like that.
It does not actively remove the existing ready messages for the worker from the distributors storage queue. It only makes sure that the worker will not send any new ready messages back to, and thus request more work from, the distributor after the unregister.
Related
I have a publisher that sends messages to a consumer that moves a motor.
The motor has a work queue which I cannot access, and it works slower than the rate of the incoming messages, so I'm trying to control the traffic on the consumer.
To keep updated and relevant data coming to the motor without the queue filling up and creating a traffic jam, I set the RabbitMQ queue size limit to 5 and basicQos to 1.
The idea is that the RabbitMQ queue will drop the old messages when it is filled up, so the newest commands are at the front of the queue.
Also by setting basicQos to 1 I ensure that the consumer doesn't grab all messages from the queue and bombards the motor at once, which is exactly what i'm trying to avoid since I can't do anything once the command was sent to the motor.
This way the consumer takes messages from the queue one by one, while new messages replace the old ones on the queue.
Practically this moves the bottleneck to the RabbitMQ queue instead of the motor's queue.
I also cannot check the motor's work queue, so all traffic control must be done on the consumer.
I added messageId and tested, and found out many messages are still coming and going long after the publisher is being shut down.
I'm expecting around 5 messages after shutdown since that's the size of the queue, but i'm getting hundreds.
I also added a few seconds of sleep inside the callback to make sure this isn't the robot queue that's acting up, but i'm still getting many messages after shutdown, and I can see in the logs the callback is being called every time so it's definitely still getting messages from somewhere.
Please help.
Thanks.
Moving the acknowledgment to the end of the callback solved the problem.
I'm guessing that by setting basicQos to 1 it did execute the callback for each message one after another, but in the background it kept grabbing messages from the queue.
So even when the publisher was shutdown, the consumer still had messages that were taken from the queue in it, and those messages were the ones that I saw being executed.
I am using celery and rabbitmq , but due to pushing several task in queue my server memory utilization becomes more than 40% , so that rabbit further will not accepting any task . so i want to delete those message which are already executed , but due to durable behavior of rabbitmq those message not automatically delete, so i want to set some configuration like autoAck=True , so that if message is consumed from celery ,it will delete from rabbitmq queues and also from my server memory. please explain how can we do that .
OK, so while I don't fully understand why you have the problem you have, it is clear what is going on.
A publisher puts a message task in the queue
Your worker process pulls the message and processes it
The message is never actually removed from the queue
This behavior happens when a consumer fails to acknowledge the processing of a message. To confirm, if you look at the RabbitMQ management plug-in, you'll see a whole bunch of unacknowledged messages. These will be unavailable for consumption, but will continue to be held on the server and taking up disk space and memory.
Further, if you do a Basic.Recover, all of these messages will then get dumped back into the queue to be processed again.
This problem is due to incorrect configuration of your consumer. There are two ways to address this:
You can configure the consumer to auto-ack (i.e. acknowledge the message automatically upon receipt). This is done when you declare the consumer (using Basic.Consume). Edit: It looks like this may be the default behavior of Celery.
You can configure your worker process to submit an acknowledgement (using Basic.Ack). Edit: this is done via the acks_late property in Celery.
I'm attempting to create a high level test in my solution, and I want to 'catch' messages sent to the bus.
Here's what I do:
nUnit [SetUp] spins up the WebAPI project in IISExpress
SetUp also creates the bus
Send a HTTP request to the API
Verify whatever I want to verify
The WebAPI part of the whole test works fine. The creation of the bus and kicking it off seems great too. It even finds my fake message handler. The problem is the handler never receives the command from the queue, they just stay in the RabbitMQ queue forever.
Here's how the bus is being configured:
var bus = Configure.With()
.DefineEndpointName("Local")
.Log4Net()
.UseTransport<global::NServiceBus.RabbitMQ>()
.UseInMemoryTimeoutPersister()
.RijndaelEncryptionService()
.UnicastBus();
.CreateBus();
In the log from NServiceBus starting up, I see that my fake handler is being associated with the command:
2014-09-24 15:29:59,007 [Runner thread] DEBUG NServiceBus.Unicast.MessageHandlerRegistry
[(null)] <(null)> - Associated 'Bloo.MyCommand' message with 'Blah.FakeMyCommandHandler' handler
So seeing as the message lands in the correct RabbitMQ queue, I'm assuming everything up until the handler point is working fine.
I've tried putting waits in my [TearDown] so that the bus lives a little longer - hoping to give the handler time to receive the message. I've also tried spinning off the in-memory bus for the consumer part of the interactoin into a new thread with no luck.
Has anyone else tried this?
This is only the first step, what I would love to do is create a fake bus that records messages being sent to it. The need for RabbitMQ is just to get myself going (the bounds of my solution are WebAPI on the front and the bus at the back).
Cheers
You forgot to call .Start() on the bus, that's why it didn't listen for messages.
See here for more info: http://docs.particular.net/nservicebus/hosting-nservicebus-in-your-own-process-v4.x
Also, consider using NServiceBus.Testing for unit testing your handlers and sagas:
https://www.nuget.org/packages/NServiceBus.Testing
I'm guessing your messages are just sitting in your queue forever because your end point is listening on "Local.MachineName" queue instead of "Local"
If you set the ScaleOut to be SingleBrokerQueue this should sort the issue.
Configure.ScaleOut(s => s.UseSingleBrokerQueue());
var bus = Configure.With()
.DefineEndpointName("Local")
...
If you are attempting to do full integration tests, using actual queues, then this answer won't help you.
If you are doing more focused tests, i.e. testing individual components that rely on the bus, I would recommend that you use a mocking framework (I like Moq) and mock out IBus. You can then verify that messages you expected to be sent to the bus were indeed sent.
I am using the alerts feature of IronMQ service provided by IronIO to start workers.
I have things setup so that a message is pushed onto the push queue. The push queue sends an alert that starts a worker. The worker pulls off the message on the push queue, reserving it. Sometimes for whatever reason the job fails, the reservation for a message expires, and the message becomes available again. However, from what I can tell, no alert is sent when the reservation expires on a message. So the message sits in the queue until another message is added to the queue firing an alert and starting a worker. But the new message is not processed.
Are alerts created for messages that have a reservation expire in IronMQ? Is there any documentaion that I missed describing what can happen?
I am working on having workers pull off multiple messages but I am running into issues unrelated to iron io when processing multiple messages in the same worker.
Also is there a way to pull off the top of the queue. To avoid pulling off messages that may be causing errors? Should I just modify my workers to delete messages causing errors?
Currently there are no alerts for when a message times out and goes back on the queue, but that does seem like it would be a good idea. I assume this is a pretty inactive queue? I made a feature request for this here: https://trello.com/c/XcHi0NdN/35-fire-alert-when-a-message-times-out-goes-back-on-queue
And regarding messages that are causing issues, your best bet would be to add them to a different queue (an error queue) and delete them off the original queue. Then you can go through the error queue to figure out why certain messages are causing you problems. This is known as a "dead letter queue" btw and we have a feature request for it here, please give it a vote! https://trello.com/c/bGnJcNa9/26-dead-letter-queue
We've been using Rabbit successfully for about a year. Recently have upgraded to v2.6.1, because we want to use clusters with replicated message queues.
My testing has hit a puzzling behavior that smells like a Rabbit bug to me. The test that uncovers this is working with a two-node cluster. Both nodes are running v2.6.1. Both nodes have disk. Both nodes are running on Mac OS, though I doubt this is pertinent.
I'm also running Alice on the node that runs the test. The test uses it to programmatically do a stop_app on one of the nodes, because the test is trying to validate that if the cluster master fails, and a slave is elevated to take its place, that we don't lose messages.
So, the test has a small thread pool, which is given tasks that periodically 1) publish messages, and 2) toggle the state of the Rabbit master node (stopped if running; started if stopped). Other threads are consuming messages from queues.
I'm using publisher confirms, and I'm also acknowledging the messages in the consumers (using autoAck=false for channel.basicConsume()).
When the master node is stopped, I see both the producers and consumers catching ShutdownSignalException. They handle this by attempting to reconnect to the cluster. This works fine. When reconnected, they continue with their business.
Sometimes, what I see is that a consumer has successfully fetched a message from the broker, and is calling channel.basicAck() when it gets that ShutdownSignalException.
Later, when the consumer has reconnected, it again pulls down the same message. (The message bodies are tagged with a UUID, so I know it is the same one.) This time, when the consumer attempts to basicAck() the message, it again gets ShutdownSignalException, but this one has the following text in it: "reply-text=PRECONDITION_FAILED - unknown delivery tag 7".
In fact, that is the same delivery tag that was offered to the consumer by the broker before the master went down and the consumer reconnected.
Googling suggests that this event means that the consumer is attempting to ack the same message more than once.
But, how can this be so? If the first ack succeeded, then the message should have been removed from the broker's queues, and the consumer shouldn't see the same message again.
Yet, if the first ack did not succeed, then the consumer shouldn't be dinged for attempting to re-ack the message.
Anyone seen this before? It smells like a bug in Rabbit's replicated queues to me, but I've still new to Rabbit, and so am willing to believe there's a subtlety here in consuming from a clustered broker that I haven't yet grokked!
Thanks, --Steve
I'm not sure if my case matching yours, but I have seen similar "unknown delivery tag" on attempts to ack after reconnect and then the same message arrived again. Initially it looked like a bug to me, but in fact this is expected behavior. Consumer with QOS>1 may have in it's local buffer some messages and delivery tag will be invalid for all o them after reconnect. From another hand, attempt to ack even the current message after reconnect doesn't make any sense, because that message already nacked automatically on connection lost and this is why I got it again.