RabbitMQ crashing recently without reason - crash

We have a RabbitMQ node running on 4GB of RAM on a 20GB disk, shared with Redis (which does virtually nothing), usually coping with a few hundred messages a second.
However it has recently started crashing randomly, dumping no errors, not making any other warnings (e.g. Nagios), which is very odd behaviour.
Can anyone help us debug this problem?
Here's the status:
Status of node rabbit#cachingsessions ...
[{pid,13284},
{running_applications,
[{rabbitmq_management,"RabbitMQ Management Console","0.0.0"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","0.0.0"},
{amqp_client,"RabbitMQ AMQP Client","0.0.0"},
{rabbit,"RabbitMQ","2.7.1"},
{os_mon,"CPO CXC 138 46","2.2.7"},
{sasl,"SASL CXC 138 11","2.1.10"},
{rabbitmq_mochiweb,"RabbitMQ Mochiweb Embedding","0.0.0"},
{webmachine,"webmachine","1.7.0-rmq0.0.0-hg"},
{mochiweb,"MochiMedia Web Server","1.3-rmq0.0.0-git"},
{inets,"INETS CXC 138 49","5.7.1"},
{mnesia,"MNESIA CXC 138 12","4.5"},
{stdlib,"ERTS CXC 138 10","1.17.5"},
{kernel,"ERTS CXC 138 10","2.14.5"}]},
{os,{unix,linux}},
{erlang_version,
"Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] [async- threads:30] [kernel-poll:true]\n"},
{memory,
[{total,201354672},
{processes,104481264},
{processes_used,104294840},
{system,96873408},
{atom,1348809},
{atom_used,1326854},
{binary,41880856},
{code,14331631},
{ets,29589448}]},
{vm_memory_high_watermark,0.39999999961815885},
{vm_memory_limit,838044876}]
...done.
Sadly that's all the info I have to go on due to no error logs at all.
Cheers

Related

RabbitMQ Shovel over TLS errors with badmatch after renewing certificates

My RabbitMQ installation has been running for over a year using TLS connected shovels. The shovels worked with the self-signed certificates until they expired. When I recreated new certificates, the shovels still won't work even though I placed the certs, keys, and CA certs in the same locations as the previous ones.
The errors I'm getting are like these (from the rabbit#hostname-sasl.log -- long lines have been "continued" with \ ):
=SUPERVISOR REPORT==== 31-Jul-2019::15:52:59 ===
Supervisor: {<0.879.0>,rabbit_shovel_dyn_worker_sup}
Context: child_terminated
Reason: {{badmatch,{error,closed}},
[{rabbit_shovel_worker,make_conn_and_chan,1,
[{file,"src/rabbit_shovel_worker.erl"},{line,236}]},
{rabbit_shovel_worker,handle_cast,2,
[{file,"src/rabbit_shovel_worker.erl"},{line,62}]},
{gen_server2,handle_msg,2,
[{file,"src/gen_server2.erl"},{line,1049}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,240}]}]}
Offender: [{pid,<0.14768.3>},
{name,{<<"/">>,<<"Pull Light Data">>}},
{mfargs,
{rabbit_shovel_worker,start_link,
[dynamic,
{<<"/">>,<<"Pull Light Data">>},
[{<<"src-uri">>,
<<"amqps://TLS_user:MWP3wCHKMNqGbnJrwKN3#source:5673 \
?cacertfile=/etc/pki/rmqca/source_rmq_cacert.pem \
&certfile=/etc/pki/rmqclient/source_client_cert.pem \
&keyfile=/etc/pki/rmqclient/source_client_key.pem \
&verify=verify_peer&server_name_indication=source">>},
{<<"src-exchange">>,<<"Data.E.source">>},
{<<"src-exchange-key">>,<<"#">>},
{<<"dest-uri">>,
<<"amqps://TLS_user:MWP3wCHKMNqGbnJrwKN3#destination:5673 \
?cacertfile=/etc/pki/rmqca/destination_rmq_cacert.pem \
&certfile=/etc/pki/rmqclient/destination_client_cert.pem \
&keyfile=/etc/pki/rmqclient/destination_client_key.pem \
&verify=verify_peer&server_name_indication=rdestination">>},
{<<"dest-exchange">>,<<"Data.E.destination">>},
{<<"add-forward-headers">>,false},
{<<"ack-mode">>,<<"on-confirm">>},
{<<"delete-after">>,<<"never">>}]]}},
{restart_type,{transient,1}},
{shutdown,4294967295},
{child_type,worker}]
My RMQ status:
Status of node 'rabbit#destination' ...
[{pid,11710},
{running_applications,
[{rabbitmq_shovel_management,"Shovel Status","3.6.1"},
{rabbitmq_shovel,"Data Shovel for RabbitMQ","3.6.1"},
{rabbitmq_management,"RabbitMQ Management Console","3.6.1"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.1"},
{rabbit,"RabbitMQ","3.6.1"},
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.1"},
{webmachine,"webmachine","1.10.3"},
{mochiweb,"MochiMedia Web Server","2.13.0"},
{amqp_client,"RabbitMQ AMQP Client","3.6.1"},
{xmerl,"XML parser","1.3.9"},
{rabbit_common,[],"3.6.1"},
{compiler,"ERTS CXC 138 10","6.0.2"},
{ssl,"Erlang/OTP SSL application","7.2"},
{public_key,"Public key infrastructure","1.1"},
{crypto,"CRYPTO","3.6.2"},
{os_mon,"CPO CXC 138 46","2.4"},
{mnesia,"MNESIA CXC 138 12","4.13.2"},
{ranch,"Socket acceptor pool for TCP protocols.","1.2.1"},
{asn1,"The Erlang ASN1 compiler version 4.0.1","4.0.1"},
{inets,"INETS CXC 138 49","6.1"},
{syntax_tools,"Syntax tools","1.7"},
{sasl,"SASL CXC 138 11","2.6.1"},
{stdlib,"ERTS CXC 138 10","2.7"},
{kernel,"ERTS CXC 138 10","4.1.1"}]},
{os,{unix,linux}},
{erlang_version,
"Erlang/OTP 18 [erts-7.2] [source] [64-bit] [smp:4:4] [async-threads:64] [hipe] [kernel-poll:true]\n"},
{memory,
[{total,102477624},
{connection_readers,978264},
{connection_writers,214256},
{connection_channels,252872},
{connection_other,1444608},
{queue_procs,4690544},
{queue_slave_procs,0},
{plugins,805496},
{other_proc,21533200},
{mnesia,496176},
{mgmt_db,2570432},
{msg_index,979048},
{other_ets,2654936},
{binary,30328624},
{code,27425521},
{atom,992409},
{other_system,7111238}]},
{alarms,[]},
{listeners,
[{clustering,25672,"::"},
{amqp,5672,"0.0.0.0"},
{'amqp/ssl',5673,"0.0.0.0"}]},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,1661373644},
{disk_free_limit,50000000},
{disk_free,1504694272},
{file_descriptors,
[{total_limit,924},
{total_used,112},
{sockets_limit,829},
{sockets_used,37}]},
{processes,[{limit,1048576},{used,814}]},
{run_queue,0},
{uptime,3664},
{kernel,{net_ticktime,60}}]
The problem turned out to be a misconfiguration of the RabbitMQ service itself. The configuration file /etc/rabbitmq/rabbitmq.config has an SSL section:
%% Configuring SSL.
%% See http://www.rabbitmq.com/ssl.html for full documentation.
%%
{ssl, [{versions, ['tlsv1.2', 'tlsv1.1']}]},
{ssl_options, [{cacertfile, "/etc/pki/rmq_cacert.pem"},
{certfile, "/etc/pki/rmqserver/server_cert.pem"},
{keyfile, "/etc/pki/rmqserver/server_key.pem"},
{versions, ['tlsv1.2', 'tlsv1.1']},
{verify, verify_peer},
{fail_if_no_peer_cert, false}]}
Note the line for the cacertfile (/etc/pki/rmq_cacert.pem). This is the wrong location for my installation: I have a directory called rmqca for the CA certificates (following this convention, site-side my server certs go in rmqserver/, and my client certs go in rmqclient/ ). The new line is:
{ssl_options, [{cacertfile, "/etc/pki/rmqca/rmq_cacert.pem"},
and after a service restart all is well.
Thanks everyone for taking a look. I hope this answer helps someone else with this cryptic error message.

RabbitMQ delay my message randomly

here is my rabbitmqctl status:
[{pid,32074},
{running_applications,
[{rabbitmq_management,"RabbitMQ Management Console","3.2.2"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.2.2"},
{rabbit,"RabbitMQ","3.2.2"},
{os_mon,"CPO CXC 138 46","2.2.7"},
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.2.2"},
{webmachine,"webmachine","1.10.3-rmq3.2.2-gite9359c7"},
{mochiweb,"MochiMedia Web Server","2.7.0-rmq3.2.2-git680dba8"},
{xmerl,"XML parser","1.2.10"},
{inets,"INETS CXC 138 49","5.7.1"},
{mnesia,"MNESIA CXC 138 12","4.5"},
{amqp_client,"RabbitMQ AMQP Client","3.2.2"},
{sasl,"SASL CXC 138 11","2.1.10"},
{stdlib,"ERTS CXC 138 10","1.17.5"},
{kernel,"ERTS CXC 138 10","2.14.5"}]},
{os,{unix,linux}},
{erlang_version,
"Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:32:32] [rq:32] [async-threads:30] [kernel-poll:true]\n"},
{memory,
[{total,1954308048},
{connection_procs,619048024},
{queue_procs,166111144},
{plugins,4423520},
{other_proc,46207032},
{mnesia,44407568},
{mgmt_db,331614464},
{msg_index,6694584},
{other_ets,30005328},
{binary,63825504},
{code,17629100},
{atom,6531121},
{other_system,617810659}]},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,53967541043},
{disk_free_limit,50000000},
{disk_free,51883839488},
{file_descriptors,
[{total_limit,655260},
{total_used,12659},
{sockets_limit,589732},
{sockets_used,12657}]},
{processes,[{limit,1048576},{used,125740}]},
{run_queue,1},
{uptime,33320350}]
My queue was empty at most of time. But there are 84440 exchanges and 8917 queues in my RabbitMQ.
My problem is the message may delay for a very long time since I send it.
I tried to get some info from the spot.Here's my monitor chart's when the delay happen.
What we can see is that overview and my exchange is fine.But my message in queue is delay in Deliver and Acknowledge.
But from my app log.
2017-08-10 17:23:08.738 4219 INFO trove.openstack.common.rpc.amqp [-] [ProxyCallback]Received Message with Timestamp:2017-08-10T17:23:08, duration:0.737949s, unique_id:a17186068cae447bbada7a0f24ff45ef
the 17:23 message was received without delay.And then I had sent ACK back to MQ:
2017-08-10 17:23:08.739 4219 DEBUG trove.openstack.common.rpc.common [-] Consume Massage with ACK True
while my 17:43 Message was delayed by 215.895117s, Really confused me.
2017-08-10 17:43:53.895 4219 INFO trove.openstack.common.rpc.amqp [-] [ProxyCallback]Received Message with Timestamp:2017-08-10T17:40:18, duration:215.895117s, unique_id:dc04b94c8fa64978bc9d681b020f4500
Finally, I found that is my network problem.
Sometimes the message will be drop in delivery, And RabbitMQ takes a very long time timeout and retry message delivery.
As result my message delayed.
The problem stem from security group rules, a tcp connection will be disconnected by Gateway without reset (just drop packet) after 900s no packet send.
And system wide tcp keep alive(tcp_keepalive_intvl) setup to 1200s.
So I just set tcp_keepalive_intvl with sysctl to 600s, the problem solved.

Can't access RabbitMQ Management Plugin on CentOS 7

I installed RabbitMQ with the installation tutorial on RabbitMQ website.
I launched rabbitmq-server, enabled management plugin and opened port in CSF but I still can't access the Web Management UI.
When I try http://IPadress:15672, http://servername:15672 or http://domain:15672, I get Unavailable Website, failed to load resource.
I'm using a CentOS 7 server in production.
On my Windows local machine, I installed RabbitMQ and I'm able to access Web management UI.
RabbitMQ status :
[{pid,24260},
{running_applications,
[{rabbitmq_management,"RabbitMQ Management Console","3.6.6"},
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.6"},
{webmachine,"webmachine","1.10.3"},
{mochiweb,"MochiMedia Web Server","2.13.1"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.6"},
{rabbit,"RabbitMQ","3.6.6"},
{os_mon,"CPO CXC 138 46","2.2.14"},
{ssl,"Erlang/OTP SSL application","5.3.3"},
{public_key,"Public key infrastructure","0.21"},
{crypto,"CRYPTO version 2","3.2"},
{amqp_client,"RabbitMQ AMQP Client","3.6.6"},
{rabbit_common,[],"3.6.6"},
{inets,"INETS CXC 138 49","5.9.8"},
{mnesia,"MNESIA CXC 138 12","4.11"},
{compiler,"ERTS CXC 138 10","4.9.4"},
{xmerl,"XML parser","1.3.6"},
{syntax_tools,"Syntax tools","1.6.13"},
{asn1,"The Erlang ASN1 compiler version 2.0.4","2.0.4"},
{ranch,"Socket acceptor pool for TCP protocols.","1.2.1"},
{sasl,"SASL CXC 138 11","2.3.4"},
{stdlib,"ERTS CXC 138 10","1.19.4"},
{kernel,"ERTS CXC 138 10","2.16.4"}]},
{os,{unix,linux}},
{erlang_version,
"Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:64] [hipe] [kernel-poll:true]\n"},
{memory,
[{total,55228096},
{connection_readers,0},
{connection_writers,0},
{connection_channels,0},
{connection_other,2800},
{queue_procs,2800},
{queue_slave_procs,0},
{plugins,891672},
{other_proc,18595920},
{mnesia,68264},
{mgmt_db,758120},
{msg_index,51952},
{other_ets,1446776},
{binary,29688},
{code,27081119},
{atom,992409},
{other_system,5306576}]},
{alarms,[]},
{listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,4972414566},
{disk_free_limit,50000000},
{disk_free,140686581760},
{file_descriptors,
[{total_limit,924},{total_used,2},{sockets_limit,829},{sockets_used,0}]},
{processes,[{limit,1048576},{used,233}]},
{run_queue,0},
{uptime,598},
{kernel,{net_ticktime,60}}]
Opened ports :
20,21,25,26,53,80,110,143,443,465,587,993,995,1883,2077,2078,2082,2083,2086,2087,2095,2096,2908,4369,5671,5672,8883,15672,25672,30000:50000, 61613,61614

RabbitMQ 3.6.5 crashes with high memory utilization

Our cluster is comprised of 3 disc nodes in HA. All nodes are 4CPUx26Gig RAM. We use RabbitMQ 3.6.5, with Erlang 17.3.
The only plugin enabled is the management UI plugin.
The problem is that in the span of 3 hours usually, one of the servers (usually the one with most queues on) would start to gradually hog memory, until the server crashes.
This happens daily and we don't see any reason for that to happen in the logs.
Attached are the logs for the server from when it piled up with 21GB of memory, at which point, when looking at the overview pane in the management UI - it shows only 2GB is being utilized. When this happens, we usually have ~400 connections, with ~470 channels, 16 exchanges, 54 queues, and ~300 consumers.
One of the queues is TTL enabled, 4 are priority queues, and all of the queues are durable.
Upon service restart, everything goes back to normal.
Any ideas as to what's causing it / How should we approach debugging ? A checklist of known issues to rule out ?
Status of node 'rabbit#scraped-node-name' ...
[{pid,399},
{running_applications,
[{rabbitmq_management,"RabbitMQ Management Console","3.6.5"},
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.5"},
{webmachine,"webmachine","1.10.3"},
{mochiweb,"MochiMedia Web Server","2.13.1"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.5"},
{rabbit,"RabbitMQ","3.6.5"},
{os_mon,"CPO CXC 138 46","2.3"},
{amqp_client,"RabbitMQ AMQP Client","3.6.5"},
{rabbit_common,[],"3.6.5"},
{mnesia,"MNESIA CXC 138 12","4.12.3"},
{ssl,"Erlang/OTP SSL application","5.3.6"},
{public_key,"Public key infrastructure","0.22.1"},
{crypto,"CRYPTO","3.4.1"},
{inets,"INETS CXC 138 49","5.10.3"},
{compiler,"ERTS CXC 138 10","5.0.2"},
{xmerl,"XML parser","1.3.7"},
{syntax_tools,"Syntax tools","1.6.16"},
{asn1,"The Erlang ASN1 compiler version 3.0.2","3.0.2"},
{ranch,"Socket acceptor pool for TCP protocols.","1.2.1"},
{sasl,"SASL CXC 138 11","2.4.1"},
{stdlib,"ERTS CXC 138 10","2.2"},
{kernel,"ERTS CXC 138 10","3.0.3"}]},
{os,{unix,linux}},
{erlang_version,
"Erlang/OTP 17 [erts-6.2] [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true]\n"},
{memory,
[{total,2049513496},
{connection_readers,10231416},
{connection_writers,3215768},
{connection_channels,35753016},
{connection_other,14065960},
{queue_procs,430585272},
{queue_slave_procs,34912},
{plugins,525312},
{other_proc,33015816},
{mnesia,333080},
{mgmt_db,33680},
{msg_index,38121640},
{other_ets,6595504},
{binary,1432921304},
{code,27606184},
{atom,992409},
{other_system,15482223}]},
{alarms,[]},
{listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,10983032422},
{disk_free_limit,50000000},
{disk_free,148079415296},
{file_descriptors,
[{total_limit,32668},
{total_used,440},
{sockets_limit,29399},
{sockets_used,418}]},
{processes,[{limit,1048576},{used,4683}]},
{run_queue,0},
{uptime,117601},
{kernel,{net_ticktime,60}}]

Celery + RabbitMQ stuck at mingle: searching for neighbors

I encountered strange issue while trying to get my system running on new machines.
OS:
Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-39-generic x86_64)
packages:
celery==3.1.13
django-celery==3.0.23
librabbitmq==1.5.2
broker:
RabbitMQ "3.2.4"
After restart Celery process gets stuck:
[2014-11-10 18:32:55,792: INFO/MainProcess] Connected to amqp://user:**#172.16.10.6:5672/vhost
[2014-11-10 18:32:55,804: INFO/MainProcess] mingle: searching for neighbors
Tried to find the solution elsewhere. Here I found advice to increase disk space available for RabbitMQ. I checked docs and changed in rabbitmq.config file:
{disk_free_limit, {mem_relative, 1.0}}
Now RabbitMQ should have 6GB available, but it doesn't push Celery into ready state. I also tried configuring (as someone suggested) limit of open file descriptors in
/etc/default/rabbitmq-server
without any effect. Current RabbitMQ status:
[{pid,4131},
{running_applications,
[{rabbitmq_management,"RabbitMQ Management Console","3.2.4"},
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.2.4"},
{webmachine,"webmachine","1.10.3-rmq3.2.4-gite9359c7"},
{mochiweb,"MochiMedia Web Server","2.7.0-rmq3.2.4-git680dba8"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.2.4"},
{rabbit,"RabbitMQ","3.2.4"},
{os_mon,"CPO CXC 138 46","2.2.14"},
{inets,"INETS CXC 138 49","5.9.7"},
{mnesia,"MNESIA CXC 138 12","4.11"},
{amqp_client,"RabbitMQ AMQP Client","3.2.4"},
{xmerl,"XML parser","1.3.5"},
{sasl,"SASL CXC 138 11","2.3.4"},
{stdlib,"ERTS CXC 138 10","1.19.4"},
{kernel,"ERTS CXC 138 10","2.16.4"}]},
{os,{unix,linux}},
{erlang_version,
"Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:30] [kernel-poll:true]\n"},
{memory,
[{total,46342000},
{connection_procs,290304},
{queue_procs,62864},
{plugins,458136},
{other_proc,13673776},
{mnesia,76376},
{mgmt_db,127808},
{msg_index,34384},
{other_ets,1100432},
{binary,5282744},
{code,19974306},
{atom,703377},
{other_system,4557493}]},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,2503614464},
{disk_free_limit,6259036160},
{disk_free,2779140096},
{file_descriptors,
[{total_limit,924},{total_used,13},{sockets_limit,829},{sockets_used,9}]},
{processes,[{limit,1048576},{used,270}]},
{run_queue,0},
{uptime,4561}]
Another thing: when I stop Celery processes with Supervisor no warm shutdown is logged (actually - nothing is logged).
Could you help?
I had the same problem it seems. Check my own answer to my own question here:
celeryd with RabbitMQ hangs on "mingle: searching for neighbors", but plain celery works
Long story short:
sudo apt-get remove librabbitmq1
worked for me.