rabbitmq messages not consumed - tcp closure - rabbitmq

UPDATE - apparently a tcp closure
I see on rabbit server:
=ERROR REPORT==== 24-Jan-2015::03:22:00 ===
closing AMQP connection <0.1070.22> (209.151.226.37:38040 -> 192.168.80.81:5672):
{inet_error,etimedout}
This conections appears alive on my app's side. How to prevent this? tcp keepalive parms look OK.
I have two apps.
One, "processor", consumes jobs from a queue and sends replies to a response queue.
The other, "responder" consumes from this response queue and talks to a database.
I had some replies which apparently made it into the response queue because upon restart of the responder they were handled and database updated appropriately. But before that restart where were they?
How can I pinpoint why they weren't PREVIOUSLY handled? That responder seems to have been running fine.
In the responder I do
res = amqp_consume_message(Cx->conn, genvelope, &tqb, 0);
I ack ( not multiple ) after replying to the database.
I have prefetch at 11.
The processor was closed and restarted a few times during this FWIW. Also the processor is the one that establishes the exchange used for the replies; the responder connects to it.
I have the management url up.
I saw no indication that the replies were available from the consume(), which makes sense since the database wasn't updated. The processor did do its processing and put a reply in the response queue according to logs.
In separate testing I saw that messages in the reply aren't destroyed by restarting the processor - reply exchange is durable.
The apps generally work.
Any debugging suggestions or conceptual info that might be relevant would be appreciated.

Related

Rabbitmq: Unacked message not going away after broker restart

We have observed the following behavior of RabbitMQ and are trying to understand if it is correct and how to resolve it.
Scenario:
A (persistent) message is delivered into a durable queue
The (single) Consumer (Spring-AMQP) takes the message and starts processing => Message goes from READY to UNACK
Now the broker is shut down => Client correctly reports "Channel shutdown"
The consumer finishes the processing, but can not acknowledge the message as the broker is still down
Broker is started again => Client reconnects
As a result, one message remains unack'ed forever (or until the client is restarted).
Side note: In the Rabbit Admin UI, I can see that two channels are existing now. The "dead" one that was created before the broker restart, containing the unacked message and a new one that is healthy.
Is this behavior expected to be like that? It seems to me "correct" in the way, that RabbitMQ can not know after the broker restart, whether the message processing was completed or not. But what solution would exist than to get that unacked message back into the queue and to heal the system without restarting the consumer process?
The RabbitMQ team monitors this mailing list and only sometimes answers questions on StackOverflow.
Is this behavior expected to be like that? It seems to me "correct" in the way, that RabbitMQ can not know after the broker restart, whether the message processing was completed or not.
Yes, you are observing expected behavior. RabbitMQ will re-enqueue the message once it determines that the consumer is really dead. Since your consumer re-connects with what must be the same consumer tag as before, it is up to that process to ack or nack the message.

RabbitMQ durable queue losing messages over STOMP

I have a webpage connecting to a rabbit mq broker using javascript/websockets that are exposed by a spring app deployed in tomcat. Messages are produced 1 per second by an external application and are rendered on the webpage. The javascript subscription is durable.
The issue I'm experiencing is that when the network connection is broken on the javascript client for a period of time (say 60 seconds), the first ~24 seconds of messages are missing. I've looked through the logs of the app deployed in tomcat and the missing messages seem to be up until the following log statement:
org.springframework.messaging.simp.stomp.StompBrokerRelayMessageHandler - DEBUG - TCP connection to broker closed in session 14
I think this is the point at which the endpoint realises the javascript client is disconnected and decides to close the connection to the broker resulting in future messages queueing up.
My question is how can I ensure that the messages between the time the network is severed and the time the endpoint realises the client is disconnected are not lost? Should the endpoint put the messages back on the queue somehow? Maybe there's a way to make it transactional?
Thanks in advance.
The RabbitMQ team monitors this mailing list and only sometimes answers questions on StackOverflow.
Your Tomcat application should not acknowledge messages from RabbitMQ until it confirms that your Javascript client has received them. This way, any messages that aren't ack-ed by the JS client won't be ack-ed by Tomcat, and RabbitMQ will re-deliver them.
I don't know how your JS app and Tomcat interact, but you may have to implement your own ack process there.

RabbitMQ dropping messages after the first one

I'm using celery 3.0.18 with RabbitMQ 3.0.2. I have a task sent to another application by using celery.send_task, and I can see the send_task call in my logs, I can see the packets leaving the worker instance, and I can see the packets reaching the RabbitMQ instance when I call tcpflow -ce -i any port 5672, however, only the first message gets to the queue. They all have the same routing key, I tried recreating the exchange and bindings, and even a new RabbitMQ instance, and nothing seems to work. This used to work fine for months, until we had to rebuild the RabbitMQ from scratch after a crash in our AWS infrastructure. Strangely, I have the exact same setup working on other application, using the same broker and the same exchange, binding and queue, and it works perfectly there. Also, it works when I send the messages to the same exchange using the same call from a management script, running from the shell on the same instance, but it doesn't work when it's sent from the celery task in the worker process.
Any ideas on what the problem might be?
Eventually, I figured what's wrong, but it's not clear if this is the expected behavior, a celery bug, or a RabbitMQ bug.
What happens is that besides our application tasks, I have a custom logging handler used to send logs to a central location using RabbitMQ, using celery.send_task. This logging handler sends messages to an exchange named application.logger, with a routing key like application.logger.info, application.logger.warning, etc, and have bindings to route some logging levels to specific queues. This exchange, bindings and queues were created directly in RabbitMQ and not defined in Celery routes.
When the worker tries to send a message to this exchange and it doesn't exist, Celery would log a 404 NOT_FOUND error. After that, tasks sent to other exchanges using the same connection weren't delivered. They were sent by the worker instance, we could see the packets arriving and the RabbitMQ management screen for that connection even shows the data arriving from the client in kb/s, but no messages were delivered.

How do you replay missed messages when using STOMP to connect to RabbitMQ?

I've got an iOS application which uses a STOMP Client to talk to RabbitMQ. The application loads a lot of state during startup, and then keeps that state in sync by receiving updates published on STOMP. Of course, if it loses its connection, it can no longer be sure it's in sync, and therefore has to re-load that large initial blob. Any kind of network interruption triggers this behavior and makes my customers sad.
There are a lot of big-picture ways to fix this (and I'm working on them) but in the meantime, I'm trying to use persistent queues to solve this problem. The idea is that the server will create a queue, bind it to the appropriate topics, and then start building the large startup bundle. When finished, it will hand everything off to the client. The client will set itself up with the startup bundle, open a subscription to the queue, and then process any updates which happened while the server was getting things ready. Similarly, if the client should become disconnected, it can simply reconnect and resume reading the messages it finds in the queue.
My problem is that while the client successfully receives messages sent after it connects, if there were any messages in the queue before it connected, they are not read. Likewise, if the client becomes disconnected, when it reconnects, it won't see any messages which arrived while it was away.
Can anyone suggest how I might get the client to be able to read those missing messages?
It turns out what was happening was that the STOMP adapter was consuming the messages but failing to deliver them. Thus, when the client reconnected, it wouldn't have any messages waiting for it.
To fix the problem, I changed the "ack" setting on the subscribe request to "client", meaning that STOMP shouldn't consider the message delivered until the client sends back an ACK frame. By changing my client appropriately, messages now get delivered even after the client has been away.

Behavior of channels in "confirm" mode with RabbitMQ

I've got some trouble understanding the confirm of RabbitMQ, I see the following explanation from RabbitMQ:
Notes
The broker loses persistent messages if it crashes before said
messages are written to disk. Under certain conditions, this causes
the broker to behave in surprising ways. For instance, consider this
scenario:
a client publishes a persistent message to a durable queue
a client consumes the message from the queue (noting that the message is persistent and the queue durable), but doesn't yet ack it,
the broker dies and is restarted, and
the client reconnects and starts consuming messages.
At this point, the client could reasonably assume that the message
will be delivered again. This is not the case: the restart has caused
the broker to lose the message. In order to guarantee persistence, a
client should use confirms. If the publisher's channel had been in
confirm mode, the publisher would not have received an ack for the
lost message (since the consumer hadn't ack'd it and it hadn't been
written to disk).
Then I am using this http://hg.rabbitmq.com/rabbitmq-java-client/file/default/test/src/com/rabbitmq/examples/ConfirmDontLoseMessages.java to do some basic test and verify the confirm, but get some weird results:
The waitForConfirmsOrDie method doesn't block the producer, which is different from my expectation, I suppose the waitForConfirmsOrDie will block the producer until all the messages have been ack'd or one of them is nack'd.
I remove the channel.confirmSelect() and channel.waitForConfirmsOrDie() from publisher, and change the consumer from auto ack to manual ack, I publish all messages to the queue and consume messages one by one, then I stop the rabbitmq server during the consuming process, what I expect now is the left messages will be lost after the rabbitmq server is restarted, because the channel is not in confirm mode, but I still see all other messages in the queue after the server restart.
Since I am new to RabbitMQ, can anyone tells me where is my problem of the confirm understanding?
My understanding is that "Channel Confirmation" is for Broker confirms it successfully got the message from producer, regardless of consumer ack this message or not. Depending on the queue type and message deliver mode, see http://www.rabbitmq.com/confirms.html for details,
the messages are confirmed when:
it decides a message will not be routed to queues
(if the mandatory flag is set then the basic.return is sent first) or
a transient message has reached all its queues (and mirrors) or
a persistent message has reached all its queues (and mirrors) and been persisted to disk (and fsynced) or
a persistent message has been consumed (and if necessary acknowledged) from all its queues
Old question but oh well..
I publish all messages to the queue and consume messages one by one, then I stop the rabbitmq server during the consuming process, what I expect now is the left messages will be lost after the rabbitmq server is restarted, because the channel is not in confirm mode, but I still see all other messages in the queue after the server restart.
This is actually how it should work, IF the persistence is enabled. If the server crashes or something else goes wrong, the messages cannot be confirmed, and thus, won't be removed from the queue.
Messages will only be removed from the queue if they are confirmed to be handled, or the broker didn't yet write it to memory or disk before the server crashed.
Confirming and acknowledging can be set off if wanted, and the producer won't be waiting for the acks. I cannot find the exact command for it right now, but it does exist.
More on the acks and confirms: https://www.rabbitmq.com/reliability.html