celery+rabbitmq empty queue - rabbitmq

I am using celery+rabbitmq. I can't find convenient way to clear queue in celery+rabbitmq. I do it with remove and create vhost.
rabbitmqctl delete_vhost <vhostpath>
rabbitmqctl add_vhost <vhostpath>
Is it prefer way to clear some celery queue ?

I'm not quite sure how celery works, but I suspect you want to purge a RabbitMQ queue (you're currently simulating this by deleting the queues and having celery re-create them).
You could install RabbitMQ's Management Plugin. Its WebUI will allow you to purge the required queue. This should also tell you which queue you're aiming for, so you wouldn't need to delete everything.
Once you know which queue it is, you could purge it programatically. For instance, using py-amqplib, you would do something like:
from amqplib import client_0_8 as amqp
conn = amqp.Connection(host="localhost:5672", userid="guest", password="guest", virtual_host="/", insist=False)
conn = conn.channel()
conn.queue_purge("the-target-queue")
There's probably a better way to do it, though.

If you are facing this problem because you used rabbitmq for the result backend and as a result you got too many queues, then i would suggest using a different result backend (redis or mongodb)
This is one well known flaw with the celery. It will create a separate queue for each result if you amqp for result backend.
If you still want to stick to amqp as result backend. It will clear itself in 24 hours. You can however set it to a smaller value using CELERY_AMQP_TASK_RESULT_EXPIRES setting.

If you need to delete ALL items in queue (especially when the list is long)
1) Saves all items into the file
sudo rabbitmqctl list_queues -p /yourvhost name > queues.txt
don't forget to remove first and last lines from 'queues.txt'
2) Use mentioned python code to do the job
from amqplib import client_0_8 as amqp
conn = amqp.Connection(host="127.0.0.1:5672", userid="guest", password="guest", virtual_host="/yourvhost", insist=False)
conn = conn.channel()
queues = None
with open('queues.txt', 'r') as f:
queues = f.readlines()
for q in queues:
if q:
#print 'deleting %s' % q
conn.queue_purge(q.strip())
print 'purged %d items' % len(queues)

Related

RabbitMQ delete a corrupted queue after node crash

RabbitMQ Version 3.7.21
Erlang Version Erlang 21.3.8.10
My team had 2 nodes hit the memory watermark last night and so I rebuilt the bad nodes but it left some queues in a bad state. I want to clear them out so that we can recreate them.
The stats show NaN for Ready, Unacked, and Total and the stats in queue look like:
It looks like the queue's node is one that no longer exists so unfortunately I can't access it. It's completely gone.
I have tried the following commands:
rabbitmqctl eval 'Q = rabbit_misc:r(<<"/">>, queue, <<"QUEUE">>), rabbit_amqqueue:internal_delete(Q).'
rabbitmqctl eval 'Q = {resource, <<"/">>, queue, <<"QUEUE">>}, rabbit_amqqueue:internal_delete(Q).'
but get this error:
{:undef, [{:rabbit_amqqueue, :internal_delete, [{:resource, "/", :queue, "QUEUE"}], []}, {:erl_eval, :do_apply, 6, [file: 'erl_eval.erl', line: 680]}, {:rpc, :"-handle_call_call/6-fun-0-", 5, [file: 'rpc.erl', line: 197]}]}
Which I assume means it's trying to make an RPC call to a node that no longer exists and it fails. This seems crazy to me because not just is the node gone but it has been forgotten from the cluster but still a couple queues remain.
Looks like there are 3 options:
Comb through the Mnesia tables and delete the corrupted ones
Fully rebuild the cluster and migrate to a new cluster
Rename your queues and ignore corrupted ones
We're going to go with Option 3 for now but I'm sure eventually there will be a breaking change in RabbitMQ that will make Option 2 more appealing but for now the quick fix is best for me.
According to https://groups.google.com/g/rabbitmq-users/c/VSjzvOUfS3s/m/q8OmFTqACAAJ, the internal_delete function in 3.7.x takes two arguments:
In 3.7.x rabbit_amqqueue:internal_delete takes two arguments (acting user name is the second one).
Therefore, the next time you need to delete a queue in a bad state, try
rabbitmqctl eval 'Q = {resource, <<"/">>, queue, <<"QUEUE">>}, rabbit_amqqueue:internal_delete(Q, <<"CLI">>).'

kombu not reconnecting to RabbitMQ

I have two servers, call them A and B. B runs RabbitMQ, while A connects to RabbitMQ via Kombu. If I restart RabbitMQ on B, the kombu connection breaks, and the messages are no longer delivered. I then have to reset the process on A to re-establish the connection. Is there a better approach, i.e. is there a way for Kombu to re-connect automatically, even if the RabbitMQ process is restarted?
My basic code implementation is below, thanks in advance! :)
def start_consumer(routing_key, incoming_exchange_name, outgoing_exchange_name):
global rabbitmq_producer
incoming_exchange = kombu.Exchange(name=incoming_exchange_name, type='direct')
incoming_queue = kombu.Queue(name=routing_key+'_'+incoming_exchange_name, exchange=incoming_exchange, routing_key=routing_key)#, auto_delete=True)
outgoing_exchange = kombu.Exchange(name=outgoing_exchange_name, type='direct')
rabbitmq_producer = kombu.Producer(settings.rabbitmq_connection0, exchange=outgoing_exchange, serializer='json', compression=None, auto_declare=True)
settings.rabbitmq_connection0.connect()
if settings.rabbitmq_connection0.connected:
callbacks=[]
queues=[]
callbacks.append(callback)
# if push_queue:
# callbacks.append(push_message_callback)
queues.append(incoming_queue)
print 'opening a new *incoming* rabbitmq connection to the %s exchange for the %s queue' % (incoming_exchange.name, incoming_queue.name)
incoming_exchange(settings.rabbitmq_connection0).declare()
incoming_queue(settings.rabbitmq_connection0).declare()
print 'opening a new *outgoing* rabbitmq connection to the %s exchange' % outgoing_exchange.name
outgoing_exchange(settings.rabbitmq_connection0).declare()
with settings.rabbitmq_connection0.Consumer(queues=queues, callbacks=callbacks) as consumer:
while True:
settings.rabbitmq_connection0.drain_events()
On the consumer side, kombu.mixins.ConsumerMixin handles reconnecting when the connection goes away (and also does heartbeats, etc., and lets you write less code). There doesn't seem to be a ProducerMixin, unfortunately but you could potentially dig into the code and adapt it...?

celery - Programmatically list queues

How can I programmatically, using Python code, list current queues created on a RabbitMQ broker and the number of workers connected to them? It would be the equivalent to:
rabbitmqctl list_queues name consumers
I do it this way and display all the queues and their details (messages ready, unacknowledged etc.) on a web page -
import kombu
conn = kombu.Connection(broker_url)# example 'amqp://guest:guest#localhost:5672/'
conn.connect()
client = conn.get_manager()
queues = client.get_queues('/')#assuming vhost as '/'
You will need kombu to be installed and queues will be a dictionary with keys representing the queue names.
I think I got this when digging through the code of celery flower (The tool used for monitoring celery).
Update: As pointed out by #zaq178miami, you will also need the management plugin that has the http API. I had forgotten that I had enabled than in rabbitmq.
This way did it for me:
def get_queue_info(queue_name):
with celery.broker_connection() as conn:
with conn.channel() as channel:
return channel.queue_declare(queue_name, passive=True)
This will return a namedtuple with the name, number of messages waiting and consumers of that queue.
ksrini answer is correct too and can be used when you require more information about a queue.
Thanks to Ask Solem who gave me the hint.
As a rabbitmq client you can use pika. However it doesn't have option for list_queues. The easiest solution would be calling rabbitmqctl command from python using subprocess:
import subprocess
command = "/usr/local/sbin/rabbitmqctl list_queues name consumers"
process = subprocess.Popen(command.split(), stdout=subprocess.PIPE)
print process.communicate()
I would use simply this:
Just replace the user(default= guest), passwd(default= guest) and port with your values.
import requests
import json
def call_rabbitmq_api(host, port, user, passwd):
url = 'https://%s:%s/api/queues' % (host, port)
r = requests.get(url, auth=(user,passwd),verify=False)
return r
def get_queue_name(json_list):
res = []
for json in json_list:
res.append(json["name"])
return res
if __name__ == '__main__':
host = 'rabbitmq_host'
port = 55672
user = 'guest'
passwd = 'guest'
res = call_rabbitmq_api(host, port, user, passwd)
print ("--- dump json ---")
print (json.dumps(res.json(), indent=4))
print ("--- get queue name ---")
q_name = get_queue_name(res.json())
print (q_name)
Referred from here: https://gist.github.com/hiroakis/5088513#file-example_rabbitmq_api-py-L2

Celery with rabbitmq creates results multiple queues

I have installed Celery with RabbitMQ.
Problem is that for every result that is returned, Celery will create in the Rabbit, queue with the task's ID in the exchange celeryresults.
I still want to have results, but on ONE queue.
my celeryconfig:
from datetime import timedelta
OKER_URL = 'amqp://'
CELERY_RESULT_BACKEND = 'amqp'
#CELERY_IGNORE_RESULT = True
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT=['json', 'application/json']
CELERY_TIMEZONE = 'Europe/Oslo'
CELERY_ENABLE_UTC = True
from celery.schedules import crontab
CELERYBEAT_SCHEDULE = {
'every-minute': {
'task': 'tasks.remote',
'schedule': timedelta(seconds=30),
'args': (),
},
}
Is that possible? How?
Thanks!
amqp backend creates a new queue for each task. Alternatively, there is a new rpc backend which keeps results in a single queue.
http://docs.celeryproject.org/en/master/whatsnew-3.1.html#new-rpc-result-backend
Nothing unusual.
That is how celery works when we use amqp as result backend. It will create a new temporary queue for every result corresponding to each tasks that worker consumes.
If you are not interested in the result, you can try CELERY_IGNORE_RESULT = True setting
If you do want to store the result, then i would recommend using a different result backend like Redis.
You say you want Celery to keep the result on one queue. Now, to answer your question, let me ask you one:
How do you expect each producer to check for it's relevant result without reading every single message off the queue to find the one it needs/wants?
In essence, what you want is a database of key-value pairs so that the lookup is O(1). The only way to do that with a queue broker is to create one queue for each "pair".
I understand that having many GUID queues is not neat or pretty, but it's conceptually the only way to do it on a messaging broker.
This solution won't keep all the results to ONE queue, but it will at least clean up the extra queues right when you're done with them.
If you use Redis as your backend, when you're done with a result that has created an errant queue, run result.forget(). This will cause both the result and the queue for the result to disappear. This can help you manage the number of queues you have, and prevent OOM issues.

Why does celery add thousands of queues to rabbitmq that seem to persist long after the tasks completel?

I am using celery with a rabbitmq backend. It is producing thousands of queues with 0 or 1 items in them in rabbitmq like this:
$ sudo rabbitmqctl list_queues
Listing queues ...
c2e9b4beefc7468ea7c9005009a57e1d 1
1162a89dd72840b19fbe9151c63a4eaa 0
07638a97896744a190f8131c3ba063de 0
b34f8d6d7402408c92c77ff93cdd7cf8 1
f388839917ff4afa9338ef81c28aad75 0
8b898d0c7c7e4be4aa8007b38ccc00ea 1
3fb4be51aaaa4ac097af535301084b01 1
This seems to be inefficient, but further I have observed that these queues persist long after processing is finished.
I have found the task that appears to be doing this:
#celery.task(ignore_result=True)
def write_pages(page_generator):
g = group(render_page.s(page) for page in page_generator)
res = g.apply_async()
for rendered_page in res:
print rendered_page # TODO: print to file
It seems that because these tasks are being called in a group, they are being thrown into the queue but never being released. However, I am clearly consuming the results (as I can view them being printed when I iterate through res. So, I do not understand why those tasks are persisting in the queue.
Additionally, I am wondering if the large number queues that are being created is some indication that I am doing something wrong.
Thanks for any help with this!
Celery with the AMQP backend will store task tombstones (results) in an AMQP queue named with the task ID that produced the result. These queues will persist even after the results are drained.
A couple recommendations:
Apply ignore_result=True to every task you can. Don't depend on results from other tasks.
Switch to a different backend (perhaps Redis -- it's more efficient anyway): http://docs.celeryproject.org/en/latest/userguide/tasks.html
Use CELERY_TASK_RESULT_EXPIRES (or on 4.1 CELERY_RESULT_EXPIRES) to have a periodic cleanup task remove old data from rabbitmq.
http://docs.celeryproject.org/en/master/userguide/configuration.html#std:setting-result_expires