I have a batch job where the job is divided into multiple small messages and pushed to queue and multiple workers listening to queue would pick up and process these messages.
When connection reset happened as shown below the same message was picked up by another worker instance even when the 1st worker has picked it before and is still processing.
ERROR org.springframework.amqp.rabbit.connection.CachingConnectionFactory$DefaultChannelCloseLogger:Channel shutdown: connection error
com.rabbitmq.client.impl.ForgivingExceptionHandler:An unexpected connection driver error occured (Exception message: Connection reset)
This happened only with one message of a particular job and the connection reset time matches with that particular message redispatch.
I would want to know if this is a valid functionality as part of rabbitmq autoRecovery and unavoidable ?
Correct; any unacknowledged message will be requeued and redelivered after a connection reset.
Related
I am throwing an AmqpException inside of my consumer.
My expectation is that the message will return back to the queue in FIFO order and will be reprocessed sometime in the future.
It seems as if Spring AMQP does not release the message back to the queue. But instead tries to reprocess the failed messages over and over again.
This blocks the newly arrived messages from being processed. The ones that are stuck appear in the "unpacked" state forever inside of the AMQP console.
Any thoughts?
That's the way rabbitmq/Spring AMQP works; if a message is rejected (any exception is thrown) the message is requeued by default and is put back at the head of the queue so it is retried immediately.
... reprocessed sometime in the future.
You have to configure things appropriately to make that happen.
First, you have to tell the broker to NOT requeue the message. That is done by setting defaultRequeueRejected on the listener container to false (it's true by default). Or, you can throw an AmqpRejectAndDontRequeueException which instructs the container to reject (and not requeue) an individual message.
But that's not the end of it; just doing that will simply cause the rejected message to be discarded.
To avoid that, you have to set up a Dead Letter Exchange/Queue for the queue - rejected messages are then sent to the DLX/DLQ instead of being discarded. Using a policy rather than queue arguments is generally recommended.
Finally, you can set a message time to live on the the DLQ so, after that time, the message is removed from the queue. If you set up an another appropriate dead letter exchange on that queue (the DLQ), you can cause the message to be requeued back to the original queue after the time expires.
Note that this will only work for rejected deliveries from the original queue; it will not work when expiring messages in that queue.
See this answer and some of the links from its question for more details.
You can use the contents of the x-death header to decide if you should give up completely after some number of attempts (catch the exception and somehow dispose of the bad message; don't thrown an exception and the container will ack the message).
Here is a solution I used to solve this. I setup an Interceptor to retry the message x number of times while applying a backoff policy.
http://trippstech.blogspot.com/2016/03/rabbitmq-deadletter-queue-with.html
Is it possible to configure workers (in distributed scenario) to return failed (not processed messaged) to distributors source queue so that another! workers will try to process this message?
When the processing of a message fails, it is moved to the error queue - regardless of whether the endpoint is scaled out across multiple workers.
When you send the message back from the error queue to be reprocessed (whether that's done via the old ReturnToSourceQueue.exe tool or the newer ServicePulse webapp), it is sent back to the "endpoint" - which would be the distributor in the case of a scaled out endpoint, not the specific worker that failed the first time.
I am sending the SOAP messages to external service using webservice call.
Sometimes external webservice is down so I don't want to lose those failed messages.
I push those failed messages to one jms queue designated as retry queue.
Now my requirement is that I have to implement a mechanism to process failed messages from retry queue after some time(lets say half an hour) and try to deliver again to webservice. I should be using fix number of attempts at the interval of half an hour. If I don;t succeed after fixed number of attempts, I should put the message in dead letter queue.
I need help in implementing this requirement.
As the initial step in this direction, I tried use jms polling on retry queue and set the polling interval half an hour. This jms polling jobs wakes up every half an hour and process all the messages present in retry queue. The drawback with this approach is, it tries to redeliver the failed message as soon as it receives for the first time. For subsequent messages, it works fine.
Due to this when some message is failed and I put that message in retry queue, it tries to redeliver that message immediately.
Is there any way I can achieve this:
Write a message to a queue
Block the producer process until there is a consumer on the other side
If there is no consumer after 10 seconds, raise an exception
If there is a consumer, unblock the producer process
When the 10sec timeout is reached and an exception is raised on the producer side, the message should be kept in the queue, so that a consumer can consume it later
I want to be able to notify a consumer in an asynchrone way.
Until now I'm sending a message. I want to know if there is an immediate consumer, but if there is not, the message should still be on the queue. It doesn't seem to be the behavior of the "immediate" amqp thing
Interesting problem, unfortunately there isn't an elegant solution.
From the RabbitMQ documentation the "immediate" flag works like this:
This flag tells the server how to react if the message cannot be routed to a queue consumer immediately. If this flag is set, the server will return an undeliverable message with a Return method. If this flag is zero, the server will queue the message, but with no guarantee that it will ever be consumed.
You could solve your problem in part using the immediate flag, I'm thinking something like this:
When the producer is ready to queue a message it fires it off with the immediate flag set
If the message is returned then start a timer and keep retrying for 10 seconds with the immediate flag set
If after 10 seconds of trying it has still failed to be picked up, then publish it with the immediate flag set to false (so that your consumer will pick it up when the consumer comes online)
It seems the longer I keep my rabbitmq server running, the more trouble I have with unacknowledged messages. I would love to requeue them. In fact there seems to be an amqp command to do this, but it only applies to the channel that your connection is using. I built a little pika script to at least try it out, but I am either missing something or it cannot be done this way (how about with rabbitmqctl?)
import pika
credentials = pika.PlainCredentials('***', '***')
parameters = pika.ConnectionParameters(host='localhost',port=5672,\
credentials=credentials, virtual_host='***')
def handle_delivery(body):
"""Called when we receive a message from RabbitMQ"""
print body
def on_connected(connection):
"""Called when we are fully connected to RabbitMQ"""
connection.channel(on_channel_open)
def on_channel_open(new_channel):
"""Called when our channel has opened"""
global channel
channel = new_channel
channel.basic_recover(callback=handle_delivery,requeue=True)
try:
connection = pika.SelectConnection(parameters=parameters,\
on_open_callback=on_connected)
# Loop so we can communicate with RabbitMQ
connection.ioloop.start()
except KeyboardInterrupt:
# Gracefully close the connection
connection.close()
# Loop until we're fully closed, will stop on its own
connection.ioloop.start()
Unacknowledged messages are those which have been delivered across the network to a consumer but have not yet been ack'ed or rejected -- but that consumer hasn't yet closed the channel or connection over which it originally received them. Therefore the broker can't figure out if the consumer is just taking a long time to process those messages or if it has forgotten about them. So, it leaves them in an unacknowledged state until either the consumer dies or they get ack'ed or rejected.
Since those messages could still be validly processed in the future by the still-alive consumer that originally consumed them, you can't (to my knowledge) insert another consumer into the mix and try to make external decisions about them. You need to fix your consumers to make decisions about each message as they get processed rather than leaving old messages unacknowledged.
If messages are unacked there are only two ways to get them back into the queue:
basic.nack
This command will cause the message to be placed back into the queue and redelivered.
Disconnect from the broker
This action will force all unacked messages from this channel to be put back into the queue.
NOTE: basic.recover will try to republish unacked messages on the same channel (to the same consumer), which is sometimes the desired behaviour.
RabbitMQ spec for basic.recover and basic.nack
The real question is: Why are the messages unacknowledged?
Possible scenarios to cause unacked messages:
Consumer fetching too many messages, then not processing and acking them quickly enough.
Solution: Prefetch as few messages as appropriate.
Buggy client library (I have this issue currently with pika 0.9.13. If the queue has a lot of messages, a certain number of messages will get stuck unacked, even hours later.
Solution: I have to restart the consumer several times until all unacked messages are gone from the queue.
All the unacknowledged messages will go to ready state once all the workers/consumers are stopped.
Ensure all workers are stopped by confirming with a grep on ps aux output, and stopping/killing them if found.
If you are managing workers using supervisor, which shows as worker is stopped, you may want to check for zombies. Supervisor reports the worker to be stopped but still you will find zombie processes running when grepped on ps aux output. Killing the zombie processes will bring messages back to ready state.