rabbitmq connections blocking but memory is below watermark - rabbitmq

we are use rabbitmq in our application, two hours ago, one of our app server is blocked when try to connect to rabbitmq, after check rabbitmq server , we found one node's memory is over watermark, a few minutes later, this node is down. after restart this node, the whole cluster sames work fine, but i notice there's a lot of connection in blocking and blocked state from web management,but use rabbitmqctl list_connections pid name peer_address state in all nodes shows there is no connection in blocking/blocked…so this really make me confuse:
after one node of whole cluster over watermark, but other node is
work fine, my application can't connect to rabbitmq cluster? ps:
we use spring.amqp & spring-rabbit with version 1.1.0.RELEASE
node will down for what reason when over watermark?
why after restart node, there is still blocking connection, but with rabbitmqctl they
all in running state?
here is some logs from my rabbitmq server:
=INFO REPORT==== 1-Mar-2013::19:36:21 ===
vm_memory_high_watermark clear. Memory used:1656590680 allowed:1658778419
=INFO REPORT==== 1-Mar-2013::19:36:21 ===
alarm_handler: {clear,{resource_limit,memory,rabbit#cos22}}
when i try to close blocked connection from web management, it goes error:
=INFO REPORT==== 1-Mar-2013::20:55:24 ===
Closing connection <0.17197.115> because "Closed via management plugin"
=ERROR REPORT==== 1-Mar-2013::20:55:24 ===
webmachine error: path="/api/connections/10.64.13.200%3A45891%20-%3E%2010.64.12.226%3A5672"
{throw,
{error,{not_a_connection_pid,<0.17197.115>}},
[{rabbit_networking,close_connection,2,
[{file,"src/rabbit_networking.erl"},{line,317}]},
{rabbit_mgmt_wm_connection,delete_resource,2,
[{file,"rabbitmq-management/src/rabbit_mgmt_wm_connection.erl"},
{line,52}]},
{webmachine_resource,resource_call,3,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_resource.erl"},
{line,169}]},
{webmachine_resource,do,3,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_resource.erl"},
{line,128}]},
{webmachine_decision_core,resource_call,1,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_decision_core.erl"},
{line,48}]},
{webmachine_decision_core,decision,1,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_decision_core.erl"},
{line,416}]},
{webmachine_decision_core,handle_request,2,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_decision_core.erl"},
{line,33}]},
{rabbit_webmachine,'-makeloop/1-fun-0-',3,
[{file,"rabbitmq-mochiweb/src/rabbit_webmachine.erl"},{line,75}]}]}
use rabbitmqctl shows all in running state:
rabbitmqctl list_connections pid name peer_address state
Listing connections ...
<rabbit#cos23.1.1271.51> 10.64.13.197:57321 -> 10.64.12.225:5672 10.64.13.197 running
<rabbit#cos23.1.1100.51> 10.64.13.196:57240 -> 10.64.12.225:5672 10.64.13.196 running
<rabbit#cos23.1.1056.51> 10.64.12.196:58608 -> 10.64.12.225:5672 10.64.12.196 running
<rabbit#cos23.1.1079.51> 10.64.11.235:48962 -> 10.64.12.225:5672 10.64.11.235 running
<rabbit#cos23.1.1419.51> 10.64.13.228:49857 -> 10.64.12.225:5672 10.64.13.228 running
<rabbit#cos23.1.1049.51> 10.64.11.193:36387 -> 10.64.12.225:5672 10.64.11.193 running
<rabbit#cos23.1.1159.51> 10.64.10.123:52017 -> 10.64.12.225:5672 10.64.10.123 running
<rabbit#cos23.1.26289.45> 10.64.12.247:38504 -> 10.64.12.225:5672 10.64.12.247 running
<rabbit#cos23.1.1121.51> 10.64.10.29:51483 -> 10.64.12.225:5672 10.64.10.29 running
<rabbit#cos23.1.1067.51> 10.64.11.234:50244 -> 10.64.12.225:5672 10.64.11.234 running
<rabbit#cos23.1.1149.51> 10.64.11.178:33795 -> 10.64.12.225:5672 10.64.11.178 running
<rabbit#cos23.1.1136.51> 10.64.10.28:39557 -> 10.64.12.225:5672 10.64.10.28 running
<rabbit#cos23.1.1370.51> 10.64.13.233:38766 -> 10.64.12.225:5672 10.64.13.233 running
<rabbit#cos23.1.1388.51> 10.64.13.229:50932 -> 10.64.12.225:5672 10.64.13.229 running
<rabbit#cos23.1.1254.51> 10.64.13.241:49311 -> 10.64.12.225:5672 10.64.13.241 running
<rabbit#cos23.1.1031.51> 10.64.11.195:39455 -> 10.64.12.225:5672 10.64.11.195 running
<rabbit#cos23.1.1038.51> 10.64.10.27:58938 -> 10.64.12.225:5672 10.64.10.27 running
<rabbit#cos23.1.1167.51> 10.64.13.240:37777 -> 10.64.12.225:5672 10.64.13.240 running
<rabbit#cos23.1.1442.51> 10.64.10.130:37251 -> 10.64.12.225:5672 10.64.10.130 running
<rabbit#cos22.3.2659.0> 10.64.13.200:54840 -> 10.64.12.226:5672 10.64.13.200 running
...done.
and there is a connection with a lot of channel is in blocked state, but i can't find this connection by use rabbitctl list_connections:
AMQP 0-9-1
10.64.13.200:45891 -> 10.64.12.226:5672
rabbit#cos22 0B/s
(49.2MB total)
0B/s
(2.4MB total)
0s 60920
thanks a lot for any help and suggestion.

Got a answer from the rabbitmq mailing list:
These connections / channels do not exist. You're seeing a bug in the
management plugin where it will retain information about connections
and channels that were alive on a cluster node when it crashed.
This bug was fixed in RabbitMQ 3.0.3.

Related

Unable to start second rabbitmq node on single Windows host

I am trying to run two rabbitmq nodes on a single windows host. My end goal is to run two rabbitmq services.
currently, I have the following commands for the second node in rabbitmq-env-conf.bat :
set RABBITMQ_CONFIG_FILE=%APPDATA%\RabbitMQ\rabbitmq.conf
set RABBITMQ_NODENAME=rabbit2#hostname
set RABBITMQ_DIST_PORT=5673
set RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15673}]"
Running .\rabbitmq-server.bat start produces the following error :
. . .
Starting broker...Logger - error: {removed_failing_handler,rabbit_log}
BOOT FAILED
===========
Error during startup: {error,
{rabbitmq_management,
{bad_return,
{{rabbit_mgmt_app,start,[normal,[]]},
{'EXIT',
{{could_not_start_listener,
[{cowboy_opts,[{sendfile,false}]},{port,15672}],
{shutdown,
{failed_to_start_child,ranch_acceptors_sup,
{listen_error,
{acceptor,{0,0,0,0,0,0,0,0},15672},
eaddrinuse}}}},
. . .
From the log :
Application rabbitmq_management exited with reason: {{could_not_start_listener,[{cowboy_opts,[{sendfile,false}]},{port,15672}],{shutdown,{failed_to_start_child,ranch_acceptors_sup,{listen_error,{acceptor,{0,0,0,0,0,0,0,0},15672},eaddrinuse}}}},{gen_server,call,[rabbit_web_dispatch_registry,{add,rabbitmq_management_tcp,[{cowboy_opts,[{sendfile,false}]},{port,15672}],#Fun<rabbit_web_dispatch.0.73002970>,[{'_',[],[{[],[],rabbit_mgmt_wm_static,{priv_file,rabbitmq_management,"www/index.html"}},{[<<"api">>,<<"overview">>],[],rabbit_mgmt_wm_overview,...},...]}],...},...]}}
It looks like I am unable to setup the rabbitmq management port successfully despite suppling starting args.
15672 is the first rabbitmq's management port number and I am not sure why this number is being picked up.
Some troubleshooting pointers will be appreciated.

RabbitMQ consumes memory and shuts

I just installed OpenStack Juno using devstack, and observed that RabbitMQ (package rabbitmq-server-3.1.5-10 installed by yum) is not stable, i.e. it quickly eats up the memory and shuts down; there is 2G of RAM. Below is the messages from logs and 'systemctl status' before the daemon died:
=INFO REPORT==== 18-Dec-2014::01:25:40 ===
vm_memory_high_watermark clear. Memory used:835116352 allowed:835212083
=WARNING REPORT==== 18-Dec-2014::01:25:40 ===
memory resource limit alarm cleared on node rabbit#node
=INFO REPORT==== 18-Dec-2014::01:25:40 ===
accepting AMQP connection <0.27011.5> (10.0.0.11:55198 -> 10.0.0.11:5672)
=INFO REPORT==== 18-Dec-2014::01:25:41 ===
vm_memory_high_watermark set. Memory used:850213192 allowed:835212083
=WARNING REPORT==== 18-Dec-2014::01:25:41 ===
memory resource limit alarm set on node rabbit#node.
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
rabbitmqctl[770]: ===========
rabbitmqctl[770]: nodes in question: [rabbit#node]
rabbitmqctl[770]: hosts, their running nodes and ports:
rabbitmqctl[770]: - node: [{rabbitmqctl770,40089}]
rabbitmqctl[770]: current node details:
rabbitmqctl[770]: - node name: rabbitmqctl770#node
rabbitmqctl[770]: - home dir: /var/lib/rabbitmq
rabbitmqctl[770]: - cookie hash: FftrRFUESg4RKWsyb1cPqw==
systemd[1]: rabbitmq-server.service: control process exited, code=exited status=2
systemd[1]: Unit rabbitmq-server.service entered failed state.
I know about set_vm_memory_high_watermark, but it doesn't solve the issue. I want to ensure that the daemon doesn't shut down abruptly. I wonder if someone saw this before and could advise?
Thanks.
UPDATE
Upgraded to version 3.4.2 taken directly from www.rabbitmq.com/download.html The new version doesn't consume RAM that fast and tends to work longer then previous version, but eventually still eats out all the memory and shuts.
I think the number of connections in the servers are increasing and they are being held like that without closing that's why it is consuming more memory. When the usage of RAM increases beyond the watermark rabbitmq server won't accept any network request. Either you have to close the connections which all are opened or you have to increase the RAM of the system. But increasing the RAM will only reduce the problem for some time but you'll face the problem again it is better to close the connections.
try to use CloudAMQP instead of installing locally. This will be fixed then. firstly create a rabbitMQ account here. https://customer.cloudamqp.com/signup.
then create your queue there and connect with your application.

RabbitMQ Management : webmachine error: path="/api/overview"

After I login to rabbitmq, I get the following error :
Got response code 500 with body
Internal Server Error
The server encountered an error while processing this request:
{error,{error,{badmatch,{error,nxdomain}},
[{rabbit_nodes,cluster_name_default,0},
{rabbit_nodes,cluster_name,0},
{rabbit_mgmt_wm_overview,to_json,2},
{webmachine_resource,resource_call,3},
{webmachine_resource,do,3},
{webmachine_decision_core,resource_call,1},
{webmachine_decision_core,decision,1},
{webmachine_decision_core,handle_request,2}]}}
I see the following error in the log file in /var/log/rabbitmq :
=ERROR REPORT==== 31-Oct-2014::06:20:40 ===
webmachine error: path="/api/overview"
{error,{error,{badmatch,{error,nxdomain}},
[{rabbit_nodes,cluster_name_default,0},
{rabbit_nodes,cluster_name,0},
{rabbit_mgmt_wm_overview,to_json,2},
{webmachine_resource,resource_call,3},
{webmachine_resource,do,3},
{webmachine_decision_core,resource_call,1},
{webmachine_decision_core,decision,1},
{webmachine_decision_core,handle_request,2}]}}
The workers are able to connect to the broker and are receiving the messages, also the new relic plugin for rabbitmq seems to be working fine. However I am unable to login thru the management plugin. Any pointers in this regard will be helpful.
I had updated the hostname of the system and that was causing the issue. See the link below
https://groups.google.com/forum/#!msg/rabbitmq-users/9P-BAwGVHJU/fwOpZPJywwYJ
I added 127.0.0.1 'hostname' in /etc/hosts. That solved the management plugin problem. However rabbitmqctl still showed the following error. Restarted rabbitmq and it solved the rabbitmqctl problem as well
Listing queues ...
Error: unable to connect to node 'rabbit#<hostname>': nodedown
DIAGNOSTICS
===========
attempted to contact: ['rabbit#<hostname>']
rabbit#<hostname>:
* connected to epmd (port 4369) on <hostname>
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
current node details:
- node name: <nodename>
- home dir: <homedir>
- cookie hash: <cookiehash>

bad_header for AMQP connection while connecting sensu-client to server

I have installed sensu with chef community cookbook. However, sensu client fails to connect to server. Results in rabbitmq connection error with message timed out while attempting to connect
Here are detailed client logs
logs from sensu-client.log
"timestamp":"2014-07-08T12:39:33.982647+0000","level":"warn","message":"config file applied changes","config_file":"/etc/sensu/conf.d/config.json","changes":{"rabbitmq":{"heartbeat":[null,20]},"client":[null,{"name":"girija-sensu-client","address":"test sensu client","subscriptions":["test-node"]}],"version":[null,"0.12.6-4"]}}
{"timestamp":"2014-07-08T12:39:33.996680+0000","level":"info","message":"loaded extension","type":"mutator","name":"only_check_output","description":"returns check output"}
{"timestamp":"2014-07-08T12:39:34.000721+0000","level":"info","message":"loaded extension","type":"handler","name":"debug","description":"outputs json event data"}
{"timestamp":"2014-07-08T12:39:34.104300+0000","level":"warn","message":"reconnecting to rabbitmq"}
{"timestamp":"2014-07-08T12:39:39.108623+0000","level":"warn","message":"reconnecting to rabbitmq"}
{"timestamp":"2014-07-08T12:39:44.111818+0000","level":"warn","message":"reconnecting to rabbitmq"}
{"timestamp":"2014-07-08T12:39:49.115250+0000","level":"warn","message":"reconnecting to rabbitmq"}
{"timestamp":"2014-07-08T12:39:54.045648+0000","level":"fatal","message":"rabbitmq connection error","error":"timed out while attempting to connect"}
Rabbitmq logs from server show following error
=INFO REPORT==== 8-Jul-2014::12:39:54 ===
accepting AMQP connection <0.395.0> (10.254.153.131:42813 -> 10.254.130.25:5672)
=ERROR REPORT==== 8-Jul-2014::12:39:54 ===
closing AMQP connection <0.395.0> (10.254.153.131:42813 -> 10.254.130.25:5672):
{bad_header,<<129,15,1,3,3,0,246,0>>}
I am running this on CentOS 6.4 on AWS
Rabbitmq version 3.0.4
Erlang_version,
"Erlang R14B04 (erts-5.8.5) [source] [64-bit] [rq:1] [async-threads:30] [kernel-poll:true]\n"},
bad_header suggests mismatch for client and broker AMQP version. Any help for finding out AMQP version and fixing this problem
This issue was caused, in my case, when my client was configured to use ssl authentication, but the rabbitmq server was not properly configured to use ssl and instead was expecting "plain" user/pass login with no ssl.

access_refused on rabbitmq server with spring-boot-1.1.3, fine with 1.0.1

I have a working client application under spring-boot 1.0.1, but when I update the spring-boot version to 1.1.3.RELEASE, I get a periodic Connection Reset stack trace on the client, and I can see the following log on the server:
=INFO REPORT==== 3-Jul-2014::10:57:55 ===
accepting AMQP connection <0.3945.0> (192.168.100.14:64049 -> 192.168.100.116:5672)
=ERROR REPORT==== 3-Jul-2014::10:57:58 ===
closing AMQP connection <0.3945.0> (192.168.100.14:64049 -> 192.168.100.116:5672):
{handshake_error,opening,0,
{amqp_error,access_refused,
"access to vhost 'dev-lmu' refused for user 'hermes'",
'connection.open'}}
I think it's fair to set the premise that permission issues are out of the question, because the app works under boot 1.0.1
I use RabbitMQ 3.3.4
Has anyone else run into this issue?
Looks like this was bug in boot but it has since been fixed (upgrade to 1.1.4)
https://github.com/spring-projects/spring-boot/commit/ad1636fd349b2e6636837d98af1ba1d07500ec9f#diff-19dc1e9553b1605c75168e38dcbc9477
Removed the leading '/' from the virtual host.
The relevant boot issue is: https://github.com/spring-projects/spring-boot/issues/1206