I recently upgraded to NMS ActiveMQ 1.5.2 and when I restart the broker, the connection and the consumers get restored, but they get restored to the "pull" mode, which means the broker will not send them messages automatically. This isn't how the previous version behaved. What I need is for it to recover back to the way it was, which was prefetch 1000.
I think I must be missing a setting for the failover URL or something like that.
Anyone here know what I can try?
My stack is:
AMQ Broker 5.4.2
Spring.NET 1.3.2
Apache.NMS 1.5.0
Apache.NMS.ActiveMQ 1.5.2
(all the latest releases)
Here is the consumer logs entries I see when I restart the broker:
restore consumer: ID:csi-dul-516m-6334-634583598187658753-1:0:-1:1 in pull mode pending recovery, overriding prefetch: 1000
restore consumer: ID:csi-dul-516m-6334-634583598187658753-1:0:-1:1
restore consumer: ID:csi-dul-516m-6334-634583598187658753-1:0:1:1 in pull mode pending
recovery, overriding prefetch: 1000
restore consumer: ID:csi-dul-516m-6334-634583598187658753-1:0:1:1
restore consumer: ID:csi-dul-516m-6334-634583598187658753-1:0:2:1 in pull mode pending
recovery, overriding prefetch: 1000
restore consumer: ID:csi-dul-516m-6334-634583598187658753-1:0:2:1
Sending queued commands...
Transport has resumed normal operation.
Connection established
Successfully reconnected to: tcp://localhost:61616/
I upgraded to Apache.NMS 1.5.3 and it corrected the behavior. So 1.5.2 carried a defect and probably should not be used.
Related
I have a rabbitmq cluster with 3 nodes. One node has a durable and non-mirrored classic queue named test-queue.
I have a spring boot app using spring-AMQP default connection factory new CachingConnectionFactory() to firstly ensure the queue exists and then subscribe its messages. Everything works fine
Then I started a rolling update to the rabbitmq cluster, where node was being restarted one by one.
I observed following during this process from the log:
Upon start I saw below output
Received shutdown signal for consumer tag=amq.ctag-pzPHM_GEd5e-J5Y_L2W7_g com.rabbitmq.client.ShutdownSignalException: connection error; protocol method: #method<connection.close>(reply-code=320, reply-text=CONNECTION_FORCED - broker forced connection closure with reason 'shutdown', class-id=0, method-id=0)
...
org.springframework.amqp.rabbit.connection.CachingConnectionFactory[m][] - Attempting to connect to: xxx:5672
...
org.springframework.amqp.rabbit.connection.CachingConnectionFactory[m][] - Created new connection: xxx#66971f6b:58/SimpleConnection#4315e774
Which shows that the app received shutdown signal and successfully reconnected. At this point, it looks like the node that has the queue was shut down, but the app was able to establish a new connection because there are other nodes
Later I saw more shut down signal which indicates the other node started to shutdown
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer[m][] - Consumer raised exception, processing can restart if the connection factory supports it com.rabbitmq.client.ShutdownSignalException: connection error; protocol method: #method<connection.close>(reply-code=320, reply-text=CONNECTION_FORCED - broker forced
At the same time I noticed below logs, which indicate that although connected, spring amqp can't find the queue. I guess it is because the node has the queue was down. Spring amqp might be checking other nodes. It thought the queue does not exist so it started to recreate the queue. Also note that there was a retry limit which is 3
org.springframework.amqp.rabbit.listener.BlockingQueueConsumer[m][] - Failed to declare queue: test-queuey
Queue declaration failed; retries left=3 org.springframework.amqp.rabbit.listener.BlockingQueueConsumer$DeclarationException: Failed to declare queue(s):[test-queue]
...
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - queue 'test-queue' in vhost '/' process is stopped by supervisor, class-id=50, method-id=10)
At the end, retry exhausted. I noticed the followings. Looks like spring amqp give up, and started to close everything. The end state was that, no consumer registered to the queue. Spring app was still running but not be able to get messages. It no longer retry like how the disconnection is handled. The resolution was to reboot the app.
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer[m][] - Cancelling Consumer#7f74d6dd: tags=[[]], channel=Cached Rabbit Channel: AMQChannel(amqp://guest#xxx:5672/,26), conn: Proxy#65ef722a Shared Rabbit Connection: SimpleConnection#4315e774 [delegate=amqp://guest#xxx:5672/, localPort= 37208], acknowledgeMode=AUTO local queue size=0
org.springframework.amqp.rabbit.listener.BlockingQueueConsumer[m][] - Closing Rabbit Channel: Cached Rabbit Channel: AMQChannel(amqp://guest#xxx:5672/,26), conn: Proxy#65ef722a Shared Rabbit Connection: SimpleConnection#4315e774 [delegate=amqp://guest#xxx:5672/, localPort= 37208]
org.springframework.amqp.rabbit.connection.CachingConnectionFactory[m][] - Closing cached Channel: AMQChannel
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer[m][] - Stopping container from aborted consumer
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer[m][] - Shutting down Rabbit listener container
I get spring amqp come with a retry on disconnection logic which keeps reconnecting indefinitely. But for such case, how can I make it so that spring wait until cluster restart is completed then start reconnecting? or is there a way to disable the retry limit on the queue checking so that it will keep checking the queue until cluster restart is completed instead of giving up early? Would changing queue to mirrored queue or Quorum queue resolve this issue?
See
https://docs.spring.io/spring-amqp/docs/current/reference/html/#declarationRetries
The number of retry attempts when passive queue declaration fails. Passive queue declaration occurs when the consumer starts or, when consuming from multiple queues, when not all queues were available during initialization. When none of the configured queues can be passively declared (for any reason) after the retries are exhausted, the container behavior is controlled by the 'missingQueuesFatal` property, described earlier.
and
https://docs.spring.io/spring-amqp/docs/current/reference/html/#failedDeclarationRetryInterval
The interval between passive queue declaration retry attempts. Passive queue declaration occurs when the consumer starts or, when consuming from multiple queues, when not all queues were available during initialization.
You can increase one or both of these from their defaults (3 and 5000 respectively).
Suppose I use ‘when-synced’ policy for both ha-promote-on-failure, ha-promote-on-shutdown on a HA cluster.
If so, ‘mirror to master promotion' will never be occurred and the master queue is blocked if there are no synchronized mirrors on controlled master shutdown.
That's what the documentation says.
https://www.rabbitmq.com/ha.html#cluster-shutdown
By default, RabbitMQ will refuse to promote an unsynchronised mirror
on controlled master shutdown (i.e. explicit stop of the RabbitMQ
service or shutdown of the OS) in order to avoid message loss; instead
the entire queue will shut down as if the unsynchronised mirrors were
not there.
If using 'when-synced' policy and if no mirrors were synchronized at the time of shutting down the master, according to the documentation, master doesn’t seem to shutdown gracefully.
For me it seems like there are only two options.
Waiting for a master to be restored (regardless of how long it takes) if I use ‘when-synced’.
Abandoning all the messages that are not yet synchronized to mirrors (exist only in the master) for availability if I use ‘always’.
Really?
There’s no option like “Blocking queues until one of the mirrors is fully synced, and then promote the synced mirror to the new master”?
I see that my custom Spring cloud stream sink with log sink stream app dependency loses RabbitMQ connectivity during RabbitMQ outage, tries making a connection for 5 times and then stops its consumer. I have to manually restart the app to make it successfully connect once the RabbitMQ is up. When I see the default properties of rabbitMQ binding here, it gives interval time but there is no property for infinite retry(which i assume to be default behaviour). Can someone please let me know what I might be missing here to make it try connecting infinitely ?
Error faced during outage triggering consumer retry :
2017-08-08T10:52:07.586-04:00 [APP/PROC/WEB/0] [OUT] Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - home node 'rabbit#229ec9f90e07c75d56a0aa84dc28f602' of durable queue 'datastream.dataingestor.datastream' in vhost '8880756f-8a21-4dc8-9b97-95e5a3248f58' is down or inaccessible, class-id=50, method-id=10)
It appears you have a RabbitMQ cluster and the queue in question is hosted on a down node.
If the queue was HA, you wouldn't have this problem.
The listener container does not (currently) handle that condition. It will retry for ever if it loses connection to RabbitMQ itself.
Please open a JIRA Issue and we'll take a look. The container should treat that error the same as a connection problem.
We have two independent ActiveMQ brokers running (AMQ 5.11 and 5.14). The 5.14 must replace the 5.11 broker.
Yet, the AMQ 5.11 has still messages in the schedulerDB. How can we migrate the scheduled messages from broker 5.11 into the scheduler of 5.14? The 5.14 already has collected scheduled messages, so we cannot simply replace the files.
Can we merge the schedulerdb?
What if you keep the old broker alive and configure a static brigde to the new broker. I.e. all messages that appears on any queue would flow over to the new instance. When all scheduled deliveries are done you should be able to close the old broker. This requires you to keep both brokers alive and disable the transport-connector of the old broker so it won't accept clients.
How to setup a Static bridge:
http://activemq.apache.org/networks-of-brokers.html
I have 3 ActiveMQ brokers in a networked Shared File System(GlusterFS)/Master Slave configuration - all in VMs.
If the master fails the client should failover to the new master.
The issue I have is that the connection to the new master takes about 50 seconds.
Is that reasonable?
How to improve it?
My client connection looks like this
failover:(tcp://a1:61616?connectionTimeout=1000,tcp://a2:61616?connectionTimeout=1000,tcp://a3:61616?connectionTimeout=1000)?randomize=false&maxReconnectDelay=10000&backup=true"
Also when disconnecting the master by disconnecting network cable it stops and throws an exception regarding the kahaDB (which is on GlusterFS) and needs to be restarted.
Is there a workaround for this behavior so the master broker auto-restarts or is able to connect automatically once the network comes back?
The failover depends on the time the underlying file system take for releasing the file lock.
In your case, the NFS cluster is waiting 50s to detect that the first node is lost and so release the lock on the kahadb file, wich can then be taken by the seconde node.
You can customize this delay with the NFSD_V4_GRACE and NFSD_V4_LEASE parameters in the NFS server configuration file (/etc/sysconfig/nfs on redhat/centos systems).
You can also customize the kahadb lockKeepAlivePeriod, see http://activemq.apache.org/pluggable-storage-lockers.html