RabbitMQ crash rabbit_disk_monitor - rabbitmq

crash exception log as below:
2022-12-06 03:40:44 =ERROR REPORT====
** Generic server rabbit_disk_monitor terminating
** Last message in was update
** When Server state == {state,"c:/Users/appadmin/AppData/Roaming/RabbitMQ/db/rabbit#FR-PPA004-mnesia",50000000,79131406336,100,10000,#Ref<0.526994179.3769630732.93858>,false,true,10,120000}
** Reason for termination ==
** {eacces,[{erlang,open_port,[{spawn,[67,58,92,87,105,110,100,111,119,115,92,115,121,115,116,101,109,51,50,92,99,109,100,46,101,120,101,32,47,99,32,100,105,114,32,47,45,67,32,47,87,32,34,92,92,63,92,"c:","\","Users","\","appadmin","\","AppData","\","Roaming","\","RabbitMQ","\","db","\","rabbit#FR-PPA004-mnesia",34]},[binary,stderr_to_stdout,stream,in,hide]],[{file,"erlang.erl"},{line,2272}]},{os,cmd,2,[{file,"os.erl"},{line,275}]},{rabbit_disk_monitor,get_disk_free,2,[{file,"src/rabbit_disk_monitor.erl"},{line,255}]},{rabbit_disk_monitor,internal_update,1,[{file,"src/rabbit_disk_monitor.erl"},{line,209}]},{rabbit_disk_monitor,handle_info,2,[{file,"src/rabbit_disk_monitor.erl"},{line,181}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,689}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,765}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
2022-12-06 03:40:44 =CRASH REPORT====
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.436.0>
registered_name: rabbit_disk_monitor
exception error: {eacces,[{erlang,open_port,[{spawn,[67,58,92,87,105,110,100,111,119,115,92,115,121,115,116,101,109,51,50,92,99,109,100,46,101,120,101,32,47,99,32,100,105,114,32,47,45,67,32,47,87,32,34,92,92,63,92,"c:","\","Users","\","appadmin","\","AppData","\","Roaming","\","RabbitMQ","\","db","\","rabbit#FR-PPA004-mnesia",34]},[binary,stderr_to_stdout,stream,in,hide]],[{file,"erlang.erl"},{line,2272}]},{os,cmd,2,[{file,"os.erl"},{line,275}]},{rabbit_disk_monitor,get_disk_free,2,[{file,"src/rabbit_disk_monitor.erl"},{line,255}]},{rabbit_disk_monitor,internal_update,1,[{file,"src/rabbit_disk_monitor.erl"},{line,209}]},{rabbit_disk_monitor,handle_info,2,[{file,"src/rabbit_disk_monitor.erl"},{line,181}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,689}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,765}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.281.0>]
message_queue_len: 0
messages: []
links: [<0.435.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 6772
stack_size: 28
reductions: 35778
neighbours:
Anyone have idea on what is the error?
Appreciate with your help. Thanks.

You must provide the RabbitMQ and Erlang version any time you ask a question or report an issue with RabbitMQ. In fact, any time you are talking about software, providing versions is necessary!
This is the important part of the error text:
{eacces,[{erlang,open_port,[{spawn,[67,58,92,87,105,110,100,111,119,115,92,115,121,115,116,101,109,51,50,92,99,109,100,46,101,120,101,32,47,99,32,100,105,114,32,47,45,67,32,47,87,32,34,92,92,63,92,"c:","\","Users","\","appadmin","\","AppData","\","Roaming","\","RabbitMQ","\","db","\","rabbit#FR-PPA004-mnesia",34]},
eacces means that there is a permission issue. Since I don't know what version of RabbitMQ you are using, my guess is that the free disk space check errors because RabbitMQ/Erlang can't start up Powershell or cmd.exe.
I have fixed issues around free disk space monitoring as well as to make RabbitMQ more UTF-8 friendly (in case you are using a non-English) language.
My suggestion is to upgrade to the latest RabbitMQ (3.11.4) and Erlang (25.1.2). If that does not resolve your issue, you should start a new discussion here.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.

Related

Node is not able to join cluster in v3.8.24 version

We are upgrading our system from RabbitMQ version 3.6.10 & Erlang version v19.3.4 to RabbitMQ v3.8.24 and Erlang version v23.3.4.8.
We are using Rightscale to deploy our deployments. While performing resiliency testing on 3 node cluster we had deleted one node (node3) and as a result 1 new node (node4) auto churned with the same cluster Id. All the cluster join commands are well in place and are working properly for 3.6.10. But we have observed that after upgrading the newly launched node on v3.8.24 is not able to join the cluster. Rather than it is treating itself as a new single node deployment.
On the 1st and 2nd node we are getting below error in the crash.log file.
2022-02-17 09:01:32 =ERROR REPORT====
** gen_event handler lager_exchange_backend crashed.
** Was installed in lager_event
** Last event was: {log,{lager_msg,[],[{pid,<0.44.0>}],info,{["2022",45,"02",45,"17"],["07",58,"45",58,"12",46,"982"]},{1645,83912,982187},[65,112,112,108,105,99,97,116,105,111,110,32,"mnesia",32,101,120,105,116,101,100,32,119,105,116,104,32,114,101,97,115,111,110,58,32,"stopped"]}}
** When handler state == {state,{mask,127},lager_default_formatter,[date," ",time," ",color,"[",severity,"] ",{pid,[]}," ",message,"\n"],-576448326,{resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}}
** Reason == {badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,367}]},{rabbit_basic,publish,1,[{file,"src/rabbit_basic.erl"},{line,65}]},{lager_exchange_backend,handle_log_event,2,[{file,"src/lager_exchange_backend.erl"},{line,173}]},{gen_event,server_update,4,[{file,"gen_event.erl"},{line,620}]},{gen_event,server_notify,4,[{file,"gen_event.erl"},{line,602}]},{gen_event,server_notify,4,[{file,"gen_event.erl"},{line,604}]},{gen_event,handle_msg,6,[{file,"gen_event.erl"},{line,343}]}]}
2022-02-17 09:01:37 =ERROR REPORT====
** Connection attempt from node 'rabbit#node-4' rejected. Invalid challenge reply. **
2022-02-17 09:01:37 =ERROR REPORT====
** Connection attempt from node 'rabbitmqcli-481-rabbit# node -4' rejected. Invalid challenge reply. **
*node-4 is the new node which is churned automatically.
Here we have two concerns.
Why the newly churned node is not able to join the cluster.
It has been observed that post termination old node details are still present in Disc Nodes section. Is there any specific reason for retaining it or some configurational changes that need to be performed.
Regards
Kushagra
I was somehow able to resolve the issue by doing some googling. So, just thought to share my findings with you. Might be it will help someone.
Based on RabbitMQ recommendations it is always good to have RabbitMQ cluster having static nodes.
It might be possible that an unresponsive node might be able to rejoin cluster once recovered and dynamic removal of the nodes is not recommended. Please refer
https://www.rabbitmq.com/cluster-formation.html#:~:text=Nodes%20in%20clusters,understood%20and%20considered.
After having all due diligence, in case if we want to remove the unused node then we can use forget_cluster_node and pass the expired node name from any working node. It will clean all the entries.
I hope it will help you guys.
Regards
Kushagra

SO_KEEPALIVE issue in Mulesoft

we had a Mulesoft app that basically picks message from queue (ActiveMQ), then posts to target app via HTTP request to target's API.
Runtime: 4.3.0
HTTP Connector version: v1.3.2
Server: Windows, On-premise standalone
However, sometimes the message doesn't get sent successfully after picking from queue , and below message can be found in the log -
WARN 2021-07-10 01:24:46,080 [[masked-app].http.requester.requestConfig.02 SelectorRunner] [event: ] org.glassfish.grizzly.nio.transport.TCPNIOTransport: GRIZZLY0005: Can not set SO_KEEPALIVE to false
java.net.SocketException: Invalid argument: no further information
at sun.nio.ch.Net.setIntOption0(Native Method) ~[?:1.8.0_281]
The flow completed silently without any error after above message, hence no error handling happens.
I found this mentioning it is a known bug on Windows server and won’t affect the well behavior of the application, but the document is failing to set SO_KEEPALIVE to true rather than false.
Looks the message didn't get posted successfully as the target system team can't find corresponding incoming request in their log.
It is not acceptable as the message is critical and no one knows unless the target system realizes something is wrong... Not sure if the SO_KEEPALIVE is failing to be set to false is the root cause, could you please share some thoughts? Thanks a lot in advance.
The is probably unrelated to the warning you mentioned but there doesn't seem to be enough information to identify the actual root cause.
Having said that the version of the HTTP connector is old and it's missing almost 3 years of fixes. Updating the version to the last one should improve the reliability of the application.

RabbitMQ crash with bump_reduce_memory_use

I am using RabbitMQ 3.7.3 on Erlang 20.2.2 deployed on a docker (image rabbitmq:3.7-management).
Memory is setup like this : Memory high watermark set to 6000 MiB (6291456000 bytes) of 8192 MiB (8589934592 bytes) total
Here is the crash report that I am getting on automatic restart of RabbitMQ :
CRASH REPORT Process <0.818.0> with 0 neighbours exited with reason:
no function clause matching
rabbit_priority_queue:handle_info(bump_reduce_memory_use,
{state,rabbit_variable_queue,[{10,{vqstate,{0,{[],[]}},{0,{[],[]}},{delta,undefined,0,0,undefined},...}},...],...})
line 396 in gen_server2:terminate/3 line 1161
It seems to be due to messages posted to a queue setup like this filled with 500k+ messages :
Thanks for your help !
I filed this bug and opened these pull requests to fix this issue - 3.7.x PR, master PR. This fix will ship in RabbitMQ 3.7.4.
In the future, it would be preferable to discuss or report issues on the mailing list as the RabbitMQ core team monitors it daily.
Thanks for reporting this issue and for using RabbitMQ.

RabbitMQ_MQTT failing to start

I am trying to enable mqtt in rabbitmq. Plugin has been enabled successfully but when I make the changes in the config for rabbitmq_mqtt, it fails to start the service. Even after googling a lot, I am not able to see the same issue being raised.
RabbitMQ_MQTT is failing to load even when the port is available.
Starting broker...
BOOT FAILED
===========
Error description:
{could_not_start,rabbitmq_mqtt,
{{function_clause,
[{rabbit_networking,tcp_listener_addresses,
[{1993}],
[{file,"src/rabbit_networking.erl"},{line,176}]},
{rabbit_mqtt_sup,'-listener_specs/3-lc$^0/1-0-',3,
[{file,"src/rabbit_mqtt_sup.erl"},{line,55}]},
{rabbit_mqtt_sup,init,1,
[{file,"src/rabbit_mqtt_sup.erl"},{line,47}]},
{supervisor2,init,1,[{file,"src/supervisor2.erl"},{line,305}]},
{gen_server,init_it,2,[{file,"gen_server.erl"},{line,365}]},
{gen_server,init_it,6,[{file,"gen_server.erl"},{line,333}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]},
{rabbit_mqtt,start,[normal,[]]}}}
Log files (may contain more information):
/var/log/rabbitmq/rabbit.log
/var/log/rabbitmq/rabbit-sasl.log
{"init terminating in do_boot",{could_not_start,rabbitmq_mqtt,{{function_clause,[{rabbit_networking,tcp_listener_addresses,[{1993}],[{file,"src/rabbit_networking.erl"},{line,176}]},{rabbit_mqtt_sup,'-listener_specs/3-lc$^0/1-0-',3,[{file,"src/rabbit_mqtt_sup.erl"},{line,55}]},{rabbit_mqtt_sup,init,1,[{file,"src/rabbit_mqtt_sup.erl"},{line,47}]},{supervisor2,init,1,[{file,"src/supervisor2.erl"},{line,305}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,365}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,333}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]},{rabbit_mqtt,start,[normal,[]]}}}}
You need to check the log in /var/log/rabbitmq/startup_log or /var/log/rabbitmq/startup_err. It is very possible that your changes for the config file is causing the problem. Usually, it's the syntax of the config file causing the problem. If you are using the classic format, it's array like syntax, having extra comma or missing comma could also prevent you from starting the service.

This erlang code throws an exception and I don't know why

I'm using a Windows certification authority (AD CS) to issue certificates for the participants in a secure connection between a RabbitMQ Windows service and a client.
The Subject of my client certificate is my Distinguished Name (DN) in LDAP:
"CN=John Ruiz,CN=Users,DC=devexample,DC=com"
When I attempt to establish this connection, the server throws an exception and closes the connection and I see this erlang stack trace in the rabbit log:
=ERROR REPORT==== 30-Dec-2011::10:33:24 ===
exception on TCP connection <0.331.0> from 10.1.30.70:52269
{channel0_error,starting,
{error,{case_clause,[{printableString,"Users"},
{printableString,"John Ruiz"}]},
'connection.start_ok',
[{rabbit_ssl,find_by_type,2,[]},
{rabbit_auth_mechanism_ssl,init,1,[]},
{rabbit_reader,handle_method0,2,[]},
{rabbit_reader,handle_method0,3,[]},
{rabbit_reader,handle_input,3,[]},
{rabbit_reader,recvloop,2,[]},
{rabbit_reader,start_connection,7,[]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}}
Looking through the last two lines in the stack trace, I found the two files involved:
rabbit_ssl.erl
rabbit_auth_mechanism_ssl.erl
The problem is that I've neither read nor written erlang before, so I don't know why find_by_type is throwing an exception. My best guess is that since there are two CN=* elements in the list of relative DNs (RDNs), that the result of the call to lists:flatten is an array whereas the expected result is a scalar.
Can someone familiar with erlang please confirm or correct my assumption? If you see a way in which this code could be improved to handle the case I've just described (instead of throwing an exception), I would really appreciate it so that I can suggest it on the RabbitMQ mailing list.
Your guess is correct. It crashes because there are two CN=* elements. Looking at the code it seems like a lot of it depends on there only being one CN. CN itself is used as the username for the ssl session I think so having multiple makes little sense.