Spring AMQP Consumer stopped receiving messages - rabbitmq

I am using Spring AMQP.
My Consumer is stopped listening messages, however there is no exception in logs and I could not see any deadlock on thread dump.
I have two servers which are listening to 10 queues.
Everything was working fine when cache-mode of rabbit:connection-factory was 'CONNECTION'.
But when I changed it to 'CHANNEL' this issue started appearing frequently.
From the rabbitMQ UI console I can see there is one unacknowledged messages. (Acknowledge mode is 'AUTO').
It looks to me that client could not send acknowledgement back to the server.
I can see following in my thread dump
"" prio=10 tid=0x00007f5f3823c000 nid=0x5faa waiting on condition [0x00007f5f847e2000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007b83a5260> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.nextMessage(BlockingQueueConsumer.java:359)
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.doReceiveAndExecute(SimpleMessageListenerContainer.java:1000)
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.receiveAndExecute(SimpleMessageListenerContainer.java:989)
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.access$700(SimpleMessageListenerContainer.java:82)
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1103)
at java.lang.Thread.run(Thread.java:745)

Related

Spring AMQP reconnection issue with rabbitmq cluster due to queue checking retry limit

I have a rabbitmq cluster with 3 nodes. One node has a durable and non-mirrored classic queue named test-queue.
I have a spring boot app using spring-AMQP default connection factory new CachingConnectionFactory() to firstly ensure the queue exists and then subscribe its messages. Everything works fine
Then I started a rolling update to the rabbitmq cluster, where node was being restarted one by one.
I observed following during this process from the log:
Upon start I saw below output
Received shutdown signal for consumer tag=amq.ctag-pzPHM_GEd5e-J5Y_L2W7_g com.rabbitmq.client.ShutdownSignalException: connection error; protocol method: #method<connection.close>(reply-code=320, reply-text=CONNECTION_FORCED - broker forced connection closure with reason 'shutdown', class-id=0, method-id=0)
...
org.springframework.amqp.rabbit.connection.CachingConnectionFactory[m][] - Attempting to connect to: xxx:5672
...
org.springframework.amqp.rabbit.connection.CachingConnectionFactory[m][] - Created new connection: xxx#66971f6b:58/SimpleConnection#4315e774
Which shows that the app received shutdown signal and successfully reconnected. At this point, it looks like the node that has the queue was shut down, but the app was able to establish a new connection because there are other nodes
Later I saw more shut down signal which indicates the other node started to shutdown
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer[m][] - Consumer raised exception, processing can restart if the connection factory supports it com.rabbitmq.client.ShutdownSignalException: connection error; protocol method: #method<connection.close>(reply-code=320, reply-text=CONNECTION_FORCED - broker forced
At the same time I noticed below logs, which indicate that although connected, spring amqp can't find the queue. I guess it is because the node has the queue was down. Spring amqp might be checking other nodes. It thought the queue does not exist so it started to recreate the queue. Also note that there was a retry limit which is 3
org.springframework.amqp.rabbit.listener.BlockingQueueConsumer[m][] - Failed to declare queue: test-queuey
Queue declaration failed; retries left=3 org.springframework.amqp.rabbit.listener.BlockingQueueConsumer$DeclarationException: Failed to declare queue(s):[test-queue]
...
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - queue 'test-queue' in vhost '/' process is stopped by supervisor, class-id=50, method-id=10)
At the end, retry exhausted. I noticed the followings. Looks like spring amqp give up, and started to close everything. The end state was that, no consumer registered to the queue. Spring app was still running but not be able to get messages. It no longer retry like how the disconnection is handled. The resolution was to reboot the app.
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer[m][] - Cancelling Consumer#7f74d6dd: tags=[[]], channel=Cached Rabbit Channel: AMQChannel(amqp://guest#xxx:5672/,26), conn: Proxy#65ef722a Shared Rabbit Connection: SimpleConnection#4315e774 [delegate=amqp://guest#xxx:5672/, localPort= 37208], acknowledgeMode=AUTO local queue size=0
org.springframework.amqp.rabbit.listener.BlockingQueueConsumer[m][] - Closing Rabbit Channel: Cached Rabbit Channel: AMQChannel(amqp://guest#xxx:5672/,26), conn: Proxy#65ef722a Shared Rabbit Connection: SimpleConnection#4315e774 [delegate=amqp://guest#xxx:5672/, localPort= 37208]
org.springframework.amqp.rabbit.connection.CachingConnectionFactory[m][] - Closing cached Channel: AMQChannel
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer[m][] - Stopping container from aborted consumer
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer[m][] - Shutting down Rabbit listener container
I get spring amqp come with a retry on disconnection logic which keeps reconnecting indefinitely. But for such case, how can I make it so that spring wait until cluster restart is completed then start reconnecting? or is there a way to disable the retry limit on the queue checking so that it will keep checking the queue until cluster restart is completed instead of giving up early? Would changing queue to mirrored queue or Quorum queue resolve this issue?
See
https://docs.spring.io/spring-amqp/docs/current/reference/html/#declarationRetries
The number of retry attempts when passive queue declaration fails. Passive queue declaration occurs when the consumer starts or, when consuming from multiple queues, when not all queues were available during initialization. When none of the configured queues can be passively declared (for any reason) after the retries are exhausted, the container behavior is controlled by the 'missingQueuesFatal` property, described earlier.
and
https://docs.spring.io/spring-amqp/docs/current/reference/html/#failedDeclarationRetryInterval
The interval between passive queue declaration retry attempts. Passive queue declaration occurs when the consumer starts or, when consuming from multiple queues, when not all queues were available during initialization.
You can increase one or both of these from their defaults (3 and 5000 respectively).

Spring Cloud Stream : Sink loses RabbitMQ connectivity

I see that my custom Spring cloud stream sink with log sink stream app dependency loses RabbitMQ connectivity during RabbitMQ outage, tries making a connection for 5 times and then stops its consumer. I have to manually restart the app to make it successfully connect once the RabbitMQ is up. When I see the default properties of rabbitMQ binding here, it gives interval time but there is no property for infinite retry(which i assume to be default behaviour). Can someone please let me know what I might be missing here to make it try connecting infinitely ?
Error faced during outage triggering consumer retry :
2017-08-08T10:52:07.586-04:00 [APP/PROC/WEB/0] [OUT] Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - home node 'rabbit#229ec9f90e07c75d56a0aa84dc28f602' of durable queue 'datastream.dataingestor.datastream' in vhost '8880756f-8a21-4dc8-9b97-95e5a3248f58' is down or inaccessible, class-id=50, method-id=10)
It appears you have a RabbitMQ cluster and the queue in question is hosted on a down node.
If the queue was HA, you wouldn't have this problem.
The listener container does not (currently) handle that condition. It will retry for ever if it loses connection to RabbitMQ itself.
Please open a JIRA Issue and we'll take a look. The container should treat that error the same as a connection problem.

WAITING threads issue in wso2esb

Hi am working on wso2esb and using Active MQ for message queues.
After around 3 weeks of usage ESB server was hanging and with the help of JMX monitoring of ESB i found that they are a huge number of java threads are in WAITING state.
[EsbMonitoring] ***************** java Threads Attributes *********************
[EsbMonitoring] ThreadCount :8873
[EsbMonitoring] DaemonThreadCount :104
[EsbMonitoring] PeakThreadCount :8992
[EsbMonitoring] TotalStartedThreadCount :16086123
Initially when we start the ESB server they are around 560 threads
[EsbMonitoring] ***************** java Threads Attributes *********************
[EsbMonitoring] ThreadCount :592
[EsbMonitoring] DaemonThreadCount :78
[EsbMonitoring] PeakThreadCount :592
[EsbMonitoring] TotalStartedThreadCount :1510
I took thread dump with jstack of ESB server with
"ActiveMQ Connection Executor: tcp://my-desktop/127.0.1.1:61616#60595" prio=10 tid=0x00007fdb1c890800 nid=0x259e waiting on condition [0x00007fda951cd000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000070bdf6c18> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"ActiveMQ Connection Executor: tcp://my-desktop/127.0.1.1:61616#60589" prio=10 tid=0x00007fdb1c8b7800 nid=0x259a waiting on condition [0x00007fda950cc000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000070be68c48> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"ActiveMQ Connection Executor: tcp://my-desktop/127.0.1.1:61616#60590" prio=10 tid=0x00007fdb1c8ae800 nid=0x2597 waiting on condition [0x00007fda946c2000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000070be22c88> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
All this connections are closed in the active MQ side but ESB is still holding these threads and troubling ESB and making consumption high and after reaching curtain point it hangs . I need to restart the ESB server to solve this issue.
Firstly why are these connection are WAITING state forever and why ESB is not removing these kind of threads , Is there any way i can kill or destroy WAITING threads with out restarting ESB.
Any help would be appreciated.
Thank You..!
Another few questions
What version of WSO server ?
What is jvm version ?
Anyway, i assume you have pretty high load. I don't think i can reproduce it easy in short time. I had something similar with postgresql jdbc driver, connections were getting stuck after some time, and all connection in pool are got blocked. It was fixed just on jdbc driver update.
According http://mvnrepository.com/artifact/org.apache.activemq/activemq-broker version 5.8.0 quite old. It was released in 2013. There are some bug reports in activemq jira about connection leak. Few thing you can try.
Get latest version of library from jboss repository. Link you will find on same page as shown on picture underneath
Just be accurate with library dependencies. Update dependencies as well. List of jar files is underneath (It is from official documentation https://docs.wso2.com/display/ESB500/Configure+with+ActiveMQ)
activemq-broker-5.8.0.jar
activemq-client-5.8.0.jar
activemq-kahadb-store-5.8.0.jar
geronimo-jms_1.1_spec-1.1.1.jar
geronimo-j2ee-management_1.1_spec-1.0.1.jar
geronimo-jta_1.0.1B_spec-1.0.1.jar
hawtbuf-1.9.jar
slf4j-api-1.6.6.jar
activeio-core-3.1.4.jar (available in /lib/optional folder)
Another way just update to next release of activemq broker. ActiveMq mature project, so it less possible that in every new release it introduces massive amount of new features and break compatibility with previous version. If release notes doesn't state anything extra ordinary, try it, test it in test environment and make decision.

Rabbitmq consumer shutting down repeatedly

I have configured a Dead Letter exchange with exponential back off policy. After making these changes, I have started getting an exception in the rabbitmq consumer getting shutdown repeatedly throwing the following exception:
Received shutdown signal for consumer tag=amq.ctag--Qn9jFNOd3vxhaHvEw8Nrw
com.rabbitmq.client.ShutdownSignalException: connection error
at com.rabbitmq.client.impl.AMQConnection.startShutdown(AMQConnection.java:715)
at com.rabbitmq.client.impl.AMQConnection.shutdown(AMQConnection.java:705)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:563)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:290)
at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:95)
at com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:139)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:532)
Can someone please give me some pointers regarding possible causes for this exception?
Thanks,
Shuchi
If you have changed your queue declarations in spring then make sure any queues that existed in Rabbit before the change have been deleted. RabbitMQ doesn't like it when you try to generate a queue with the same name with different parameters.

ActiveMQ Durable consumer is in use for client and subscriptionName via STOMP

I have an iOS client that connects to several ActiveMQ topics and queues via STOMP protocol. When I connect to the server, I send the following message:
2012-10-30 10:19:29,757 [MQ NIO Worker 2] TRACE StompIO
CONNECT
passcode:*****
login:system
2012-10-30 10:19:29,758 [MQ NIO Worker 2] DEBUG ProtocolConverter
2012-10-30 10:19:29,775 [MQ NIO Worker 2] TRACE StompIO
CONNECTED
heart-beat:0,0
session:ID:mbp.local-0123456789
server:ActiveMQ/5.6.0
version:1.0
And then, I subscribe to several topics using the following message:
2012-10-30 10:19:31,028 [MQ NIO Worker 2] TRACE StompIO
SUBSCRIBE
activemq.subscriptionName:user#mail.com-/topic/SPOT.SPOTCODE
activemq.prefetchSize:1
activemq.dispatchAsync:true
destination:/topic/SPOT.SPOTCODE
client-id:1234
activemq.retroactive:true
I'm facing two problems with the ActiveMQ server. Each time I connect, the Number of Consumers column in the web interface gets incremented, so I have just one real consumer but the count is around 50 consumers. But the most problematic issue is that when I plug another iOS device into my laptop to test the messaging environment, i get the following error when connecting to ActiveMQ:
WARN | Async error occurred: javax.jms.JMSException: Durable consumer is in use for client: ID:mbp.local-0123456789 and subscriptionName: user#mail.com-/topic/SPOT.SPOTCODE
This seems to be that disconnecting from ActiveMQ via STOMP is not working propertly, because this logging attempt is made when the other device is not running the app. I've tried the following things in order to solve the issue:
Always logoff when attempting to subscribe to the topics.
Subscribe
I'm currently using v5.6.0 executing the server on my laptop.
IF you read the STOMP page on the ActiveMQ site you will notice that client-id and activemq-subscriptionName must match in order to use STOMP durable subscribers. These value should be different for each of you client's otherwise you will see the same errors because of the name clashes.