Trying to sort out why my local RabbitMQ is not starting.
I had an issue with a previous version of RabbitMQ on the system not starting, so I decided to uninstall it and reinstall using chocolaty. The service wasn't starting after having quite a few messages in the queue, system going to sleep and restarting multiple times... The uninstall did remove all the files from the AppData\Roaming\RabbitMQ directory, the service wasn't running, and the system was rebooted.
Currently have RabbitMQ 3.8.2, which installed with Erlang20.0
Here's the snipped from the rabbit log file:
=INFO REPORT==== 22-Jan-2020::19:39:24 ===
Starting RabbitMQ 3.6.11 on Erlang 20.0
Copyright (C) 2007-2017 Pivotal Software, Inc.
Licensed under the MPL. See http://www.rabbitmq.com/
=INFO REPORT==== 22-Jan-2020::19:39:24 ===
node : rabbit#myhostname
home dir : C:\WINDOWS
config file(s) : c:/Users/username/AppData/Roaming/RabbitMQ/rabbitmq.config
cookie hash : a hash goes here
log : C:/Users/username/AppData/Roaming/RabbitMQ/log/RABBIT~1.LOG
sasl log : C:/Users/username/AppData/Roaming/RabbitMQ/log/RABBIT~2.LOG
database dir : c:/Users/username/AppData/Roaming/RabbitMQ/db/RABBIT~1
=INFO REPORT==== 22-Jan-2020::19:39:25 ===
RabbitMQ hasn't finished starting yet. Waiting for startup to finish before stopping...
=INFO REPORT==== 22-Jan-2020::19:39:31 ===
Memory high watermark set to 6505 MiB (6821275238 bytes) of 16263 MiB (17053188096 bytes) total
=INFO REPORT==== 22-Jan-2020::19:39:31 ===
Enabling free disk space monitoring
=INFO REPORT==== 22-Jan-2020::19:39:31 ===
Disk free limit set to 50MB
=INFO REPORT==== 22-Jan-2020::19:39:31 ===
Limiting to approx 8092 file handles (7280 sockets)
=INFO REPORT==== 22-Jan-2020::19:39:31 ===
FHC read buffering: OFF
FHC write buffering: ON
=INFO REPORT==== 22-Jan-2020::19:39:31 ===
Waiting for Mnesia tables for 30000 ms, 9 retries left
=INFO REPORT==== 22-Jan-2020::19:39:31 ===
Waiting for Mnesia tables for 30000 ms, 9 retries left
=INFO REPORT==== 22-Jan-2020::19:39:31 ===
Priority queues enabled, real BQ is rabbit_variable_queue
=INFO REPORT==== 22-Jan-2020::19:39:52 ===
Error description:
{could_not_start,rabbit,
{error,
{{shutdown,
{failed_to_start_child,rabbit_epmd_monitor,
{{badmatch,noport},
[{rabbit_epmd_monitor,init,1,
[{file,"src/rabbit_epmd_monitor.erl"},{line,56}]},
{gen_server,init_it,2,
[{file,"gen_server.erl"},{line,365}]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},{line,333}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,247}]}]}}},
{child,undefined,rabbit_epmd_monitor_sup,
{rabbit_restartable_sup,start_link,
[rabbit_epmd_monitor_sup,
{rabbit_epmd_monitor,start_link,[]},
false]},
transient,infinity,supervisor,
[rabbit_restartable_sup]}}}}
Log files (may contain more information):
C:/Users/username/AppData/Roaming/RabbitMQ/log/RABBIT~1.LOG
C:/Users/username/AppData/Roaming/RabbitMQ/log/RABBIT~2.LOG
=ERROR REPORT==== 22-Jan-2020::19:39:53 ===
Error trying to stop RabbitMQ: error:{badmatch,false}
=INFO REPORT==== 22-Jan-2020::19:39:53 ===
Halting Erlang VM with the following applications:
sasl
stdlib
kernel
Not a lot of help to a new RabbitMQ user trying to get an install working.
This is the first few lines from the erl_crash.dump file in the same dir as the logs:
=erl_crash_dump:0.3
Wed Jan 22 20:38:13 2020
Slogan: init terminating in do_boot ({undef,[{rabbit_nodes_common,make,rabbit#myhostname,[]},{rabbit_prelaunch,start,0,[{_},{_}]},{init,start_em,1,[{_},{_}]},{init,do_boot,3,[{_},{_}]}]})
System version: Erlang/OTP 20 [erts-9.0] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10]
Compiled: Tue Jun 20 19:49:32 2017
I've been going through the the docs here, but haven't found much of a solution to this.
Related
I want to test with gitlab-ci.yml a rpc nameko server.
I can't succeed to make work the Rabitt inside .gitlab-ci.yml::
image: python:latest
before_script:
- apt-get update -yq
- apt-get install -y python-dev python-pip tree
- curl -I http://guest:guest#rabbitmq:8080/api/overview
mytest:
artifacts:
paths:
- dist
script:
- pip install -r requirements.txt
- pip install .
- pytest --amqp-uri=amqp://guest:guest#rabbitmq:5672 --rabbit-ctl-uri=http://guest:guest#rabbitmq:15672 tests
# - python setup.py test
- python setup.py bdist_wheel
look:
stage: deploy
script:
- ls -lah dist
services:
- rabbitmq:3-management
The Rabbit start correctly::
2017-04-13T18:19:23.436309219Z
2017-04-13T18:19:23.436409026Z RabbitMQ 3.6.9. Copyright (C) 2007-2016 Pivotal Software, Inc.
2017-04-13T18:19:23.436432568Z ## ## Licensed under the MPL. See http://www.rabbitmq.com/
2017-04-13T18:19:23.436451431Z ## ##
2017-04-13T18:19:23.436468542Z ########## Logs: tty
2017-04-13T18:19:23.436485607Z ###### ## tty
2017-04-13T18:19:23.436501886Z ##########
2017-04-13T18:19:23.436519036Z Starting broker...
2017-04-13T18:19:23.440790736Z
2017-04-13T18:19:23.440809836Z =INFO REPORT==== 13-Apr-2017::18:19:23 ===
2017-04-13T18:19:23.440819014Z Starting RabbitMQ 3.6.9 on Erlang 19.3
2017-04-13T18:19:23.440827601Z Copyright (C) 2007-2016 Pivotal Software, Inc.
2017-04-13T18:19:23.440835737Z Licensed under the MPL. See http://www.rabbitmq.com/
2017-04-13T18:19:23.443408721Z
2017-04-13T18:19:23.443429311Z =INFO REPORT==== 13-Apr-2017::18:19:23 ===
2017-04-13T18:19:23.443439837Z node : rabbit#ea1a207b738e
2017-04-13T18:19:23.443449307Z home dir : /var/lib/rabbitmq
2017-04-13T18:19:23.443460663Z config file(s) : /etc/rabbitmq/rabbitmq.config
2017-04-13T18:19:23.443470393Z cookie hash : h6vFB5LezZ4GR1nGuQOVSg==
2017-04-13T18:19:23.443480053Z log : tty
2017-04-13T18:19:23.443489256Z sasl log : tty
2017-04-13T18:19:23.443498676Z database dir : /var/lib/rabbitmq/mnesia/rabbit#ea1a207b738e
2017-04-13T18:19:27.717290199Z
2017-04-13T18:19:27.717345348Z =INFO REPORT==== 13-Apr-2017::18:19:27 ===
2017-04-13T18:19:27.717355143Z Memory limit set to 3202MB of 8005MB total.
2017-04-13T18:19:27.726821043Z
2017-04-13T18:19:27.726841925Z =INFO REPORT==== 13-Apr-2017::18:19:27 ===
2017-04-13T18:19:27.726850927Z Disk free limit set to 50MB
2017-04-13T18:19:27.732864417Z
2017-04-13T18:19:27.732882507Z =INFO REPORT==== 13-Apr-2017::18:19:27 ===
2017-04-13T18:19:27.732891347Z Limiting to approx 1048476 file handles (943626 sockets)
2017-04-13T18:19:27.733030868Z
2017-04-13T18:19:27.733041770Z =INFO REPORT==== 13-Apr-2017::18:19:27 ===
2017-04-13T18:19:27.733049763Z FHC read buffering: OFF
2017-04-13T18:19:27.733126168Z FHC write buffering: ON
2017-04-13T18:19:27.793026622Z
2017-04-13T18:19:27.793043832Z =INFO REPORT==== 13-Apr-2017::18:19:27 ===
2017-04-13T18:19:27.793052900Z Database directory at /var/lib/rabbitmq/mnesia/rabbit#ea1a207b738e is empty. Initialising from scratch...
2017-04-13T18:19:27.800414211Z
2017-04-13T18:19:27.800429311Z =INFO REPORT==== 13-Apr-2017::18:19:27 ===
2017-04-13T18:19:27.800438013Z application: mnesia
2017-04-13T18:19:27.800464988Z exited: stopped
2017-04-13T18:19:27.800473228Z type: temporary
2017-04-13T18:19:28.129404329Z
2017-04-13T18:19:28.129482072Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.129491680Z Waiting for Mnesia tables for 30000 ms, 9 retries left
2017-04-13T18:19:28.153509130Z
2017-04-13T18:19:28.153526528Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.153535638Z Waiting for Mnesia tables for 30000 ms, 9 retries left
2017-04-13T18:19:28.193558406Z
2017-04-13T18:19:28.193600316Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.193611144Z Waiting for Mnesia tables for 30000 ms, 9 retries left
2017-04-13T18:19:28.194448672Z
2017-04-13T18:19:28.194464866Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.194475629Z Priority queues enabled, real BQ is rabbit_variable_queue
2017-04-13T18:19:28.208882072Z
2017-04-13T18:19:28.208912016Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.208921824Z Starting rabbit_node_monitor
2017-04-13T18:19:28.211145158Z
2017-04-13T18:19:28.211169236Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.211182089Z Management plugin: using rates mode 'basic'
2017-04-13T18:19:28.224499311Z
2017-04-13T18:19:28.224527962Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.224538810Z msg_store_transient: using rabbit_msg_store_ets_index to provide index
2017-04-13T18:19:28.226355958Z
2017-04-13T18:19:28.226376272Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.226385706Z msg_store_persistent: using rabbit_msg_store_ets_index to provide index
2017-04-13T18:19:28.227832476Z
2017-04-13T18:19:28.227870221Z =WARNING REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.227891823Z msg_store_persistent: rebuilding indices from scratch
2017-04-13T18:19:28.230832501Z
2017-04-13T18:19:28.230872729Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.230893941Z Adding vhost '/'
2017-04-13T18:19:28.385440862Z
2017-04-13T18:19:28.385520360Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.385540022Z Creating user 'guest'
2017-04-13T18:19:28.398092244Z
2017-04-13T18:19:28.398184254Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.398206496Z Setting user tags for user 'guest' to [administrator]
2017-04-13T18:19:28.413704571Z
2017-04-13T18:19:28.413789806Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.413810378Z Setting permissions for 'guest' in '/' to '.*', '.*', '.*'
2017-04-13T18:19:28.451109821Z
2017-04-13T18:19:28.451162892Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.451172185Z started TCP Listener on [::]:5672
2017-04-13T18:19:28.475429729Z
2017-04-13T18:19:28.475491074Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.475501172Z Management plugin started. Port: 15672
2017-04-13T18:19:28.475821397Z
2017-04-13T18:19:28.475835599Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.475844143Z Statistics database started.
2017-04-13T18:19:28.487572236Z completed with 6 plugins.
2017-04-13T18:19:28.487797794Z
2017-04-13T18:19:28.487809763Z =INFO REPORT==== 13-Apr-2017::18:19:28 ===
2017-04-13T18:19:28.487818426Z Server startup complete; 6 plugins started.
2017-04-13T18:19:28.487826288Z * rabbitmq_management
2017-04-13T18:19:28.487833914Z * rabbitmq_web_dispatch
2017-04-13T18:19:28.487841610Z * rabbitmq_management_agent
2017-04-13T18:19:28.487861057Z * amqp_client
2017-04-13T18:19:28.487875546Z * cowboy
2017-04-13T18:19:28.487883514Z * cowlib
*********
But I get this error
$ pytest --amqp-uri=amqp://guest:guest#rabbitmq:5672 --rabbit-ctl-uri=http://guest:guest#rabbitmq:15672 tests
============================= test session starts ==============================
platform linux -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
...
E Exception: Connection error for the RabbitMQ management HTTP API at http://guest:guest#rabbitmq:15672/api/overview, is it enabled?
...
source:565: DeprecationWarning: invalid escape sequence \*
ERROR: Job failed: exit code 1
I used it the following way and it worked for me
image: "ruby:2.3.3" //not required by rabbitmq
services:
- rabbitmq:latest
variables:
RABBITMQ_DEFAULT_USER: guest
RABBITMQ_DEFAULT_PASS: guest
AMQP_URL: 'amqp://guest:guest#rabbitmq:5672'
Now you can use the AMQP_URL env variable to connect to the rabbimq server. The general rule of thumb is any services declared will have the name (e.g. rabbitmq from rabbitmq:latest) as host or url or server. However in case you are running it in your own server or kubernetes cluster it will be localhost or 127.0.0.1. In my humble opinion that might be issue in your code. Hope it helps. :)
I just installed OpenStack Juno using devstack, and observed that RabbitMQ (package rabbitmq-server-3.1.5-10 installed by yum) is not stable, i.e. it quickly eats up the memory and shuts down; there is 2G of RAM. Below is the messages from logs and 'systemctl status' before the daemon died:
=INFO REPORT==== 18-Dec-2014::01:25:40 ===
vm_memory_high_watermark clear. Memory used:835116352 allowed:835212083
=WARNING REPORT==== 18-Dec-2014::01:25:40 ===
memory resource limit alarm cleared on node rabbit#node
=INFO REPORT==== 18-Dec-2014::01:25:40 ===
accepting AMQP connection <0.27011.5> (10.0.0.11:55198 -> 10.0.0.11:5672)
=INFO REPORT==== 18-Dec-2014::01:25:41 ===
vm_memory_high_watermark set. Memory used:850213192 allowed:835212083
=WARNING REPORT==== 18-Dec-2014::01:25:41 ===
memory resource limit alarm set on node rabbit#node.
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
rabbitmqctl[770]: ===========
rabbitmqctl[770]: nodes in question: [rabbit#node]
rabbitmqctl[770]: hosts, their running nodes and ports:
rabbitmqctl[770]: - node: [{rabbitmqctl770,40089}]
rabbitmqctl[770]: current node details:
rabbitmqctl[770]: - node name: rabbitmqctl770#node
rabbitmqctl[770]: - home dir: /var/lib/rabbitmq
rabbitmqctl[770]: - cookie hash: FftrRFUESg4RKWsyb1cPqw==
systemd[1]: rabbitmq-server.service: control process exited, code=exited status=2
systemd[1]: Unit rabbitmq-server.service entered failed state.
I know about set_vm_memory_high_watermark, but it doesn't solve the issue. I want to ensure that the daemon doesn't shut down abruptly. I wonder if someone saw this before and could advise?
Thanks.
UPDATE
Upgraded to version 3.4.2 taken directly from www.rabbitmq.com/download.html The new version doesn't consume RAM that fast and tends to work longer then previous version, but eventually still eats out all the memory and shuts.
I think the number of connections in the servers are increasing and they are being held like that without closing that's why it is consuming more memory. When the usage of RAM increases beyond the watermark rabbitmq server won't accept any network request. Either you have to close the connections which all are opened or you have to increase the RAM of the system. But increasing the RAM will only reduce the problem for some time but you'll face the problem again it is better to close the connections.
try to use CloudAMQP instead of installing locally. This will be fixed then. firstly create a rabbitMQ account here. https://customer.cloudamqp.com/signup.
then create your queue there and connect with your application.
I'm having some trouble with keeping RabbitMQ up.
I start it via the provided /etc/init.d/rabbitmq-server start, and it starts up fine. status shows that it's fine.
But after a while, the server dies. status prints
Error: unable to connect to node 'rabbit#myserver': nodedown
Checking the log file, it seems I've reached the memory threshold. Here are the logs:
# start
=INFO REPORT==== 26-Mar-2014::03:24:13 ===
Limiting to approx 924 file handles (829 sockets)
=INFO REPORT==== 26-Mar-2014::03:24:13 ===
Memory limit set to 723MB of 1807MB total.
=INFO REPORT==== 26-Mar-2014::03:24:13 ===
Disk free limit set to 953MB
=INFO REPORT==== 26-Mar-2014::03:24:13 ===
Management plugin upgraded statistics to fine.
=INFO REPORT==== 26-Mar-2014::03:24:13 ===
msg_store_transient: using rabbit_msg_store_ets_index to provide index
=INFO REPORT==== 26-Mar-2014::03:24:13 ===
msg_store_persistent: using rabbit_msg_store_ets_index to provide index
=WARNING REPORT==== 26-Mar-2014::03:24:13 ===
msg_store_persistent: rebuilding indices from scratch
=INFO REPORT==== 26-Mar-2014::03:24:27 ===
started TCP Listener on [::]:5672
=INFO REPORT==== 26-Mar-2014::03:24:27 ===
Management agent started.
=INFO REPORT==== 26-Mar-2014::03:24:27 ===
Management plugin started. Port: 55672, path: /
=INFO REPORT==== 26-Mar-2014::03:24:39 ===
accepting AMQP connection <0.1999.0> (127.0.0.1:34788 -> 127.0.0.1:5672)
=WARNING REPORT==== 26-Mar-2014::03:24:40 ===
closing AMQP connection <0.1999.0> (127.0.0.1:34788 -> 127.0.0.1:5672):
connection_closed_abruptly
=INFO REPORT==== 26-Mar-2014::03:24:42 ===
accepting AMQP connection <0.2035.0> (127.0.0.1:34791 -> 127.0.0.1:5672)
=INFO REPORT==== 26-Mar-2014::03:24:46 ===
accepting AMQP connection <0.2072.0> (127.0.0.1:34792 -> 127.0.0.1:5672)
=INFO REPORT==== 26-Mar-2014::03:25:19 ===
vm_memory_high_watermark set. Memory used:768651448 allowed:758279372
=INFO REPORT==== 26-Mar-2014::03:25:19 ===
alarm_handler: {set,{{resource_limit,memory,'rabbit#myserver'},
[]}}
=INFO REPORT==== 26-Mar-2014::03:25:48 ===
Statistics database started.
# server dies here
I seem to have been reaching the memory threshold, but reading the docs, it shouldn't shutdown the server? Just prevent publishing until some memory is freed up?
And yes, I am aware that my celery workers are the cause of the memory usage, I'd just thought that RabbitMQ would handle it correctly, which the docs seem to imply. So I'm doing something wrong?
EDIT: Refactored my task so it's message is just a single string (max 15 chars). Doesn't seem to be making any difference.
I tried starting RabbitMQ and celery worker --purge, with no events coming in to trigger the tasks, but it seems RabbitMQ's memory usage still steadily climbs to 40%. It then crashes shortly afterwards. It crashes, with none of my tasks having the chance to run.
Updating RabbitMQ to official stable version fixed the issue. The RabbitMQ package in Ubuntu 12.04's repository was really old.
we are use rabbitmq in our application, two hours ago, one of our app server is blocked when try to connect to rabbitmq, after check rabbitmq server , we found one node's memory is over watermark, a few minutes later, this node is down. after restart this node, the whole cluster sames work fine, but i notice there's a lot of connection in blocking and blocked state from web management,but use rabbitmqctl list_connections pid name peer_address state in all nodes shows there is no connection in blocking/blocked…so this really make me confuse:
after one node of whole cluster over watermark, but other node is
work fine, my application can't connect to rabbitmq cluster? ps:
we use spring.amqp & spring-rabbit with version 1.1.0.RELEASE
node will down for what reason when over watermark?
why after restart node, there is still blocking connection, but with rabbitmqctl they
all in running state?
here is some logs from my rabbitmq server:
=INFO REPORT==== 1-Mar-2013::19:36:21 ===
vm_memory_high_watermark clear. Memory used:1656590680 allowed:1658778419
=INFO REPORT==== 1-Mar-2013::19:36:21 ===
alarm_handler: {clear,{resource_limit,memory,rabbit#cos22}}
when i try to close blocked connection from web management, it goes error:
=INFO REPORT==== 1-Mar-2013::20:55:24 ===
Closing connection <0.17197.115> because "Closed via management plugin"
=ERROR REPORT==== 1-Mar-2013::20:55:24 ===
webmachine error: path="/api/connections/10.64.13.200%3A45891%20-%3E%2010.64.12.226%3A5672"
{throw,
{error,{not_a_connection_pid,<0.17197.115>}},
[{rabbit_networking,close_connection,2,
[{file,"src/rabbit_networking.erl"},{line,317}]},
{rabbit_mgmt_wm_connection,delete_resource,2,
[{file,"rabbitmq-management/src/rabbit_mgmt_wm_connection.erl"},
{line,52}]},
{webmachine_resource,resource_call,3,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_resource.erl"},
{line,169}]},
{webmachine_resource,do,3,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_resource.erl"},
{line,128}]},
{webmachine_decision_core,resource_call,1,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_decision_core.erl"},
{line,48}]},
{webmachine_decision_core,decision,1,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_decision_core.erl"},
{line,416}]},
{webmachine_decision_core,handle_request,2,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_decision_core.erl"},
{line,33}]},
{rabbit_webmachine,'-makeloop/1-fun-0-',3,
[{file,"rabbitmq-mochiweb/src/rabbit_webmachine.erl"},{line,75}]}]}
use rabbitmqctl shows all in running state:
rabbitmqctl list_connections pid name peer_address state
Listing connections ...
<rabbit#cos23.1.1271.51> 10.64.13.197:57321 -> 10.64.12.225:5672 10.64.13.197 running
<rabbit#cos23.1.1100.51> 10.64.13.196:57240 -> 10.64.12.225:5672 10.64.13.196 running
<rabbit#cos23.1.1056.51> 10.64.12.196:58608 -> 10.64.12.225:5672 10.64.12.196 running
<rabbit#cos23.1.1079.51> 10.64.11.235:48962 -> 10.64.12.225:5672 10.64.11.235 running
<rabbit#cos23.1.1419.51> 10.64.13.228:49857 -> 10.64.12.225:5672 10.64.13.228 running
<rabbit#cos23.1.1049.51> 10.64.11.193:36387 -> 10.64.12.225:5672 10.64.11.193 running
<rabbit#cos23.1.1159.51> 10.64.10.123:52017 -> 10.64.12.225:5672 10.64.10.123 running
<rabbit#cos23.1.26289.45> 10.64.12.247:38504 -> 10.64.12.225:5672 10.64.12.247 running
<rabbit#cos23.1.1121.51> 10.64.10.29:51483 -> 10.64.12.225:5672 10.64.10.29 running
<rabbit#cos23.1.1067.51> 10.64.11.234:50244 -> 10.64.12.225:5672 10.64.11.234 running
<rabbit#cos23.1.1149.51> 10.64.11.178:33795 -> 10.64.12.225:5672 10.64.11.178 running
<rabbit#cos23.1.1136.51> 10.64.10.28:39557 -> 10.64.12.225:5672 10.64.10.28 running
<rabbit#cos23.1.1370.51> 10.64.13.233:38766 -> 10.64.12.225:5672 10.64.13.233 running
<rabbit#cos23.1.1388.51> 10.64.13.229:50932 -> 10.64.12.225:5672 10.64.13.229 running
<rabbit#cos23.1.1254.51> 10.64.13.241:49311 -> 10.64.12.225:5672 10.64.13.241 running
<rabbit#cos23.1.1031.51> 10.64.11.195:39455 -> 10.64.12.225:5672 10.64.11.195 running
<rabbit#cos23.1.1038.51> 10.64.10.27:58938 -> 10.64.12.225:5672 10.64.10.27 running
<rabbit#cos23.1.1167.51> 10.64.13.240:37777 -> 10.64.12.225:5672 10.64.13.240 running
<rabbit#cos23.1.1442.51> 10.64.10.130:37251 -> 10.64.12.225:5672 10.64.10.130 running
<rabbit#cos22.3.2659.0> 10.64.13.200:54840 -> 10.64.12.226:5672 10.64.13.200 running
...done.
and there is a connection with a lot of channel is in blocked state, but i can't find this connection by use rabbitctl list_connections:
AMQP 0-9-1
10.64.13.200:45891 -> 10.64.12.226:5672
rabbit#cos22 0B/s
(49.2MB total)
0B/s
(2.4MB total)
0s 60920
thanks a lot for any help and suggestion.
Got a answer from the rabbitmq mailing list:
These connections / channels do not exist. You're seeing a bug in the
management plugin where it will retain information about connections
and channels that were alive on a cluster node when it crashed.
This bug was fixed in RabbitMQ 3.0.3.
I'm intermittently (about 20% of the time) getting an IOError exception from Celery when I attempt to retry a failed task.
Here is my task:
#task
def update_data(pk_id):
try:
pk = PK.objects.get(pk=pk_id)
results = pk.get_update()
return results
except urllib2.HTTPError, exc:
print "Let's retry in a few minutes."
update_data.retry(exc=exc, countdown=600)
The exception:
[2011-10-07 11:35:53,594: ERROR/MainProcess] Task report.tasks.update_data[1babd4e3-45eb-4fa3-a497-68b67bb4a6df] raised exception: IOError()
Traceback (most recent call last):
File "/home/prj/prj_env/lib/python2.6/site-packages/celery/execute/trace.py", line 36, in trace
return cls(states.SUCCESS, retval=fun(*args, **kwargs))
File "/home/prj/prj_env/lib/python2.6/site-packages/celery/app/task/__init__.py", line 232, in __call__
return self.run(*args, **kwargs)
File "/home/prj/prj_env/lib/python2.6/site-packages/celery/app/__init__.py", line 172, in run
return fun(*args, **kwargs)
File "/home/prj/prj/report/tasks.py", line 109, in update_data
update_data.retry(exc=exc, countdown=600)
File "/home/prj/prj_env/lib/python2.6/site-packages/celery/app/task/__init__.py", line 520, in retry
self.name, options["task_id"], args, kwargs))
HTTPError
RabbitMQ Logs
=INFO REPORT==== 7-Oct-2011::15:35:43 ===
closing TCP connection <0.4294.17> from 10.254.122.225:59704
=WARNING REPORT==== 7-Oct-2011::15:35:43 ===
exception on TCP connection <0.4330.17> from 10.254.122.225:59715
connection_closed_abruptly
=INFO REPORT==== 7-Oct-2011::15:35:43 ===
closing TCP connection <0.4330.17> from 10.254.122.225:59715
=WARNING REPORT==== 7-Oct-2011::15:35:49 ===
exception on TCP connection <0.4313.17> from 10.254.122.225:59709
connection_closed_abruptly
=INFO REPORT==== 7-Oct-2011::15:35:49 ===
closing TCP connection <0.4313.17> from 10.254.122.225:59709
=WARNING REPORT==== 7-Oct-2011::15:35:49 ===
exception on TCP connection <0.4350.17> from 10.254.122.225:59720
connection_closed_abruptly
=INFO REPORT==== 7-Oct-2011::15:35:49 ===
closing TCP connection <0.4350.17> from 10.254.122.225:59720
=INFO REPORT==== 7-Oct-2011::15:36:22 ===
accepted TCP connection on [::]:5672 from 10.255.199.63:50526
=INFO REPORT==== 7-Oct-2011::15:36:22 ===
starting TCP connection <0.4501.17> from 10.255.199.63:50526
Any ideas why this might be happening?
Thanks!
May be save each task in database and retry them if no result is
received for some time? Or may be dispatcher have it's own persistent
storage? What about then if worker thread crash receiving the task or
while executing it?
Retry Lost or Failed Tasks (Celery, Django and RabbitMQ)
max_retries in celery is per default 3, so if the same task fails 3 times in a row (i.e. 20% of the time), retry will rethrow the exception.