I am using Rabbitmq 3.8.16 with Vertx 3.9.4 Rabbitmq client for my backend application and noticed that sometimes the client looses the connection and it fails to recover correctly (autorecovery is turned ON). It means that after the recovery, It stops to consume messages and the durable queues has no consumers.
This happens randomly and with different exceptions. Since I am using Vertx Rabbitmq client, I cannot create different channel per consumer/publisher, I have something like 10 publishers and 10 consumers on the same node (same connection same channel) sometimes reaching up to 400 messages per second. Can it cause this kind of issues ??
Other problem can be that I have a queues representing websocket connections to my backend, these websocket connections are short living so it quite often declare and then remove a queue representing this user session. Is that a problem for Rabbitmq ?
From the Rabbitmq log I found following exceptions which were related to the connection crash:
First one :
2021-09-08 08:14:15.321 [error] <0.24867.686> ** Generic server <0.24867.686> terminating
** Last message in was {'$gen_cast',{method,{'basic.publish',0,<<"NOTIFICATION_EXCHANGE">>,<<"BROADCAST_NOTIFICATION_ADDRESS">>,false,false},{content,60,none,<<0,0>>,rabbit_framing_amqp_0_9_1,[<<"{MY_MESSAGE}">>]},flow}}
** When Server state == {ch,{conf,running,rabbit_framing_amqp_0_9_1,1,<0.24846.686>,<0.24862.686>,<0.24846.686>,<<"172.20.24.177:60848 -> 172.20.28.73:5672">>,undefined,{user,<<"backend">>,[],[{rabbit_auth_backend_internal,none}]},<<"backend">>,<<"8a4a4f71-251a-4ab6-85f6-50d594db46b9">>,<0.24850.686>,[{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentication_failure_close">>,bool,true},{<<"basic.nack">>,bool,true},{<<"publisher_confirms">>,bool,true},{<<"consumer_cancel_notify">>,bool,true}],none,200,134217728,900000,#{},1000000000},{lstate,<0.24877.686>,false},none,4927,{2,{[{4926,<<"amq.ctag-ckMzQHTLitAIZ86qk0_R0Q">>,1631088855286,{{'backend_SERVICE_QUEUE','rabbit#rabbitmq-server-2.rabbitmq-nodes.rabbitmq'},321}}],[{4925,<<"amq.ctag-xDoqkK1VmPptIM9OG3YaPw">>,1631088855286,{{'backend_SERVICE_QUEUE','rabbit#rabbitmq-server-2.rabbitmq-nodes.rabbitmq'},321}}]}},#{'backend_9fb541c5-a9ae-4bbc-99a4-ff444316b3f1' => {resource,<<"backend">>,queue,<<"9fb541c5-a9ae-4bbc-99a4-ff444316b3f1">>},'backend_5f188762-0a66-47e3-bfa8-76aa11481cb9' => {resource,<<"backend">>,queue,<<"5f188762-0a66-47e3-bfa8-76aa11481cb9">>},'backend_e4a3cec7-3f6f-4b05-bdf2-ade8a087425d' => {resource,<<"backend">>,queue,<<"e4a3cec7-3f6f-4b05-bdf2-ade8a087425d">>},'backend_e7a9834a-7ddc-4690-a317-4f19433edafd' => {resource,<<"backend">>,queue,<<"e7a9834a-7ddc-4690-a317-4f19433edafd">>},'backend_6fdcc521-4ff1-4721-bf12-cefffe1181eb' => ...,...},...}
** Reason for termination ==
** {{error,noproc},[{rabbit_fifo_client,enqueue,3,[{file,"src/rabbit_fifo_client.erl"},{line,168}]},{rabbit_quorum_queue,deliver,3,[{file,"src/rabbit_quorum_queue.erl"},{line,754}]},{rabbit_amqqueue,'-deliver/3-fun-3-',4,[{file,"src/rabbit_amqqueue.erl"},{line,2239}]},{lists,foldl,3,[{file,"lists.erl"},{line,1267}]},{rabbit_amqqueue,deliver,3,[{file,"src/rabbit_amqqueue.erl"},{line,2236}]},{rabbit_channel,deliver_to_queues,2,[{file,"src/rabbit_channel.erl"},{line,2204}]},{rabbit_channel,handle_method,3,[{file,"src/rabbit_channel.erl"},{line,1375}]},{rabbit_channel,handle_cast,2,[{file,"src/rabbit_channel.erl"},{line,643}]}]}
2021-09-08 08:14:15.321 [error] <0.24867.686> CRASH REPORT Process <0.24867.686> with 0 neighbours exited with reason: {error,noproc} in rabbit_fifo_client:enqueue/3 line 168 in gen_server2:terminate/3 line 1183
2021-09-08 08:14:15.322 [error] <0.24859.686> Supervisor {<0.24859.686>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.24846.686>, <0.24862.686>, <0.24846.686>, <<"172.20.24.177:60848 -> 172.20.28.73:5672">>, rabbit_framing_amqp_0_9_1, {user,<<"backend">>,[],[{rabbit_auth_backend_internal,none}]}, <<"backend">>, [{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentica...">>,...},...], <0.24850.686>, <0.24877.686>) at <0.24867.686> exit with reason {error,noproc} in rabbit_fifo_client:enqueue/3 line 168 in context child_terminated
2021-09-08 08:14:15.322 [error] <0.24859.686> Supervisor {<0.24859.686>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.24846.686>, <0.24862.686>, <0.24846.686>, <<"172.20.24.177:60848 -> 172.20.28.73:5672">>, rabbit_framing_amqp_0_9_1, {user,<<"backend">>,[],[{rabbit_auth_backend_internal,none}]}, <<"backend">>, [{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentica...">>,...},...], <0.24850.686>, <0.24877.686>) at <0.24867.686> exit with reason reached_max_restart_intensity in context shutdown
2021-09-08 08:14:15.323 [error] <0.24846.686> Error on AMQP connection <0.24846.686> (172.20.24.177:60848 -> 172.20.28.73:5672, vhost: 'backend', state: running), channel 1:
{{error,noproc},
[{rabbit_fifo_client,enqueue,3,
[{file,"src/rabbit_fifo_client.erl"},{line,168}]},
{rabbit_quorum_queue,deliver,3,
[{file,"src/rabbit_quorum_queue.erl"},{line,754}]},
{rabbit_amqqueue,'-deliver/3-fun-3-',4,
[{file,"src/rabbit_amqqueue.erl"},{line,2239}]},
{lists,foldl,3,[{file,"lists.erl"},{line,1267}]},
{rabbit_amqqueue,deliver,3,[{file,"src/rabbit_amqqueue.erl"},{line,2236}]},
{rabbit_channel,deliver_to_queues,2,
[{file,"src/rabbit_channel.erl"},{line,2204}]},
{rabbit_channel,handle_method,3,
[{file,"src/rabbit_channel.erl"},{line,1375}]},
{rabbit_channel,handle_cast,2,
[{file,"src/rabbit_channel.erl"},{line,643}]}]}
2021-09-08 08:14:15.323 [warning] <0.24846.686> Non-AMQP exit reason '{{error,noproc},[{rabbit_fifo_client,enqueue,3,[{file,"src/rabbit_fifo_client.erl"},{line,168}]},{rabbit_quorum_queue,deliver,3,[{file,"src/rabbit_quorum_queue.erl"},{line,754}]},{rabbit_amqqueue,'-deliver/3-fun-3-',4,[{file,"src/rabbit_amqqueue.erl"},{line,2239}]},{lists,foldl,3,[{file,"lists.erl"},{line,1267}]},{rabbit_amqqueue,deliver,3,[{file,"src/rabbit_amqqueue.erl"},{line,2236}]},{rabbit_channel,deliver_to_queues,2,[{file,"src/rabbit_channel.erl"},{line,2204}]},{rabbit_channel,handle_method,3,[{file,"src/rabbit_channel.erl"},{line,1375}]},{rabbit_channel,handle_cast,2,[{file,"src/rabbit_channel.erl"},{line,643}]}]}'
Another one that I noticed is :
2021-09-02 14:59:43.260 [error] <0.17646.16> closing AMQP connection <0.17646.16> (172.20.23.201:35420 -> 172.20.36.197:5672):
missed heartbeats from client, timeout: 60s
2021-09-02 14:59:43.263 [info] <0.29592.28> Closing all channels from '172.20.23.201:35420 -> 172.20.36.197:5672' because it has been closed
And the last one :
2021-09-08 09:47:14.948 [warning] <0.7610.564> segment_writer: skipping segment as directory /var/lib/rabbitmq/mnesia/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/quorum/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/ORCHES3OXCNJJBB4VW does not exist
2021-09-08 09:47:14.948 [warning] <0.7611.564> segment_writer: failed to open segment file /var/lib/rabbitmq/mnesia/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/quorum/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/ORCHES0C5F3P5Y6WKO/00000001.segmenterror: enoent
2021-09-08 09:47:14.948 [warning] <0.199.564> segment_writer: failed to open segment file /var/lib/rabbitmq/mnesia/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/quorum/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/ORCHES9RZ4RFC37MDG/00000001.segmenterror: enoent
2021-09-08 09:47:14.948 [warning] <0.7611.564> segment_writer: skipping segment as directory /var/lib/rabbitmq/mnesia/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/quorum/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/ORCHES0C5F3P5Y6WKO does not exist
2021-09-08 09:47:14.948 [warning] <0.199.564> segment_writer: skipping segment as directory /var/lib/rabbitmq/mnesia/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/quorum/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/ORCHES9RZ4RFC37MDG does not exist
2021-09-08 09:47:14.950 [warning] <0.7600.564> segment_writer: failed to open segment file /var/lib/rabbitmq/mnesia/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/quorum/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/ORCHESAQL00HNNSPK4/00000001.segmenterror: enoent
2021-09-08 09:47:14.950 [warning] <0.7600.564> segment_writer: skipping segment as directory /var/lib/rabbitmq/mnesia/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/quorum/rabbit#rabbitmq-server-1.rabbitmq-nodes.rabbitmq/ORCHESAQL00HNNSPK4 does not exist
On the client side I noticed :
15:11:30.816 [vert.x-eventloop-thread-0]Fail to handle message
com.rabbitmq.client.AlreadyClosedException: connection is already closed due to connection error; cause: java.lang.IllegalStateException: Unsolicited delivery - see Channel.setDefaultConsumer to handle this case.
at com.rabbitmq.client.impl.AMQConnection.ensureIsOpen(AMQConnection.java:172)
at com.rabbitmq.client.impl.AMQConnection.createChannel(AMQConnection.java:550)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.createChannel(AutorecoveringConnection.java:165)
at io.vertx.rabbitmq.impl.RabbitMQClientImpl.forChannel(RabbitMQClientImpl.java:476)
at io.vertx.rabbitmq.impl.RabbitMQClientImpl.basicPublish(RabbitMQClientImpl.java:194)
at io.vertx.reactivex.rabbitmq.RabbitMQClient.basicPublish(RabbitMQClient.java:363)
at io.vertx.reactivex.rabbitmq.RabbitMQClient.lambda$rxBasicPublish$9(RabbitMQClient.java:376)
at io.vertx.reactivex.rabbitmq.RabbitMQClient$$Lambda$699/0x00000000102c38c0.accept(Unknown Source)
at io.vertx.reactivex.impl.AsyncResultCompletable.subscribeActual(AsyncResultCompletable.java:44)
or
08:14:15.330 [AMQP Connection 10.100.56.15:5672] WARN com.rabbitmq.client.impl.ForgivingExceptionHandler - An unexpected connection driver error occured (Exception message: Connection reset)
08:14:15.332 [AMQP Connection 10.100.56.15:5672] INFO io.vertx.rabbitmq.impl.RabbitMQClientImpl - RabbitMQ connection shutdown! The client will attempt to reconnect automatically
com.rabbitmq.client.ShutdownSignalException: connection error; protocol method: #method<connection.close>(reply-code=541, reply-text=INTERNAL_ERROR, class-id=0, method-id=0)
at com.rabbitmq.client.impl.AMQConnection.startShutdown(AMQConnection.java:916)
at com.rabbitmq.client.impl.AMQConnection.shutdown(AMQConnection.java:906)
at com.rabbitmq.client.impl.AMQConnection.handleConnectionClose(AMQConnection.java:844)
at com.rabbitmq.client.impl.AMQConnection.processControlCommand(AMQConnection.java:799)
at com.rabbitmq.client.impl.AMQConnection$1.processAsync(AMQConnection.java:242)
at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:178)
at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:111)
at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:650)
at com.rabbitmq.client.impl.AMQConnection.access$300(AMQConnection.java:48)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:597)
Since those errors are not clear to my, does anybody has an idea why is this happening ??
Thank you for your help