Task queues and result queues with Celery and Rabbitmq - rabbitmq

I have implemented Celery with RabbitMQ as Broker. I rely on Celery v4.4.7 since I have read that v5.0+ doesn't support RabbitMQ anymore. RabbitMQ is a MUST in my case.
Everything has been containerized then deployed as pods within Kubernetes 1.19. I am able to execute long running tasks and everything apparently looks fine at first glance. However, I have few concerns which require your expertise.
I have declared inbound and outbound queues but Celery created his owns and I do not see any message within those queues (inbound or outbound) :
inbound_queue = "_IN"
outbound_queue = "_OUT"
app = Celery()
app.conf.update(
broker_url = 'pyamqp://%s//' % path,
broker_heartbeat = None,
broker_connection_timeout = int(timeout)
result_backend = 'rpc://',
result_persistent = True,
task_queues = (
Queue(algorithm_queue, Exchange(inbound_queue), routing_key='default', auto_delete=False),
Queue(result_queue, Exchange(outbound_queue), routing_key='default', auto_delete=False),
),
task_default_queue = inbound_queue,
task_default_exchange = inbound_exchange,
task_default_exchange_type = 'direct',
task_default_routing_key = 'default',
)
#app.task(bind=True,
name='osmq.tasks.add',
queue=inbound_queue,
reply_to = outbound_queue,
autoretry_for=(Exception,),
retry_kwargs={'max_retries': 5, 'countdown': 2})
def execute(self, data):
<method_implementation>
I have implemented callbacks to get results back via REST APIs. However, randomly, it can return or not some results when the status is successfull. This is probably related to message persistency. In details, when I implement flower API to get info, status is successfull and the result is partially displayed (shortened json messages) - when I call AsyncResult, for the same status, result is either None or the right one. I do not understand the mechanism between rabbitmq queues and kombu which seems to cache the resulting message. I must guarantee to retrieve results everytime the task has been successfully executed.
def callback(uuid):
task = app.AsyncResult(uuid)

Specifically, it was that Celery 5.0+ did not support amqp:// as a result back end anymore. However, as your example, rpc:// is supported.
The relevant snippet is here: https://docs.celeryproject.org/en/stable/getting-started/backends-and-brokers/index.html#rabbitmq
We tend to always ignore_results=True in our implementation, so I can't give any practical tips of how to use rpc://, other than to infer that any response is put on an application-specific queue, instead of being able to put on a specified queue (or even different broker / rabbitmq instance) via amqp://.

Related

RabbitMQ routing key does not route

i'm trying to do a simple message queue with RabitMQ i push a message with create_message
and then trying to get the message by the routing key.
it works great when the routing key is the same. the problem is when the routing key is different i keep on getting the message with the wrong routing key:
for example
def callback(ch, method, properties, body):
print("%r:%r" % (method.routing_key, body))
def create_message(self):
connection = pika.BlockingConnection(pika.ConnectionParameters(
'localhost'))
channel = connection.channel()
channel.exchange_declare(exchange='www')
channel.queue_declare(queue='hello')
channel.basic_publish(exchange='www',
routing_key="11",
body='Hello World1111!')
connection.close()
self.get_analysis_task_celery()
def get_message(self):
connection = pika.BlockingConnection(pika.ConnectionParameters(
'localhost'))
channel = connection.channel()
channel.exchange_declare(exchange='www')
timeout = 1
connection.add_timeout(timeout, on_timeout)
channel.queue_bind(exchange="www", queue="hello", routing_key="10")
channel.basic_consume(callback,
queue='hello',
no_ack=True,
consumer_tag= "11")
channel.start_consuming()
example for my output: '11':'Hello World1111!'
what am i doing wrong?
tnx for the help
this is a total guess, since i can't see your rabbitmq server..
if you open the RabbitMQ management website and look at your exchange, you will probably see that the exchange is bound to the queue for routing key 10 and 11, both of which are bound to the same queue.
since both go to the same queue, your message will always be delivered to that queue, the consumer will always pick up the message
again, i'm guessing since i can't see your server. but check the server to make sure you don't have leftover / extra bindings

Consuming from rabbitmq using Celery

I am using Celery with Rabbitmq broker on Server A. Some tasks require interaction with another server say, Server B and I am using Rabbitmq queues for this interaction.
Queue 1 - Server A (Producer), Server B (Consumer)
Queue 2 - Server B (Producer), Server A (Consumer)
My celery is unexpectedly hanging and I have found the reason to be incorrect implementation of Server A consumer code.
channel.start_consuming() keeps polling Rabbitmq as expected however putting this in a celery task creates multiple pollers which don't expire. I can add expiry but the time completion for the data being sent to Server B cannot be guaranteed. The code pasted below is one method I used to tackle the issue but I am not convinced this is best solution.
I wish to know what I am doing wrong and what is the right way to implement this because I have failed searching for articles on the web. Any tips, insights and even links to articles would be extremely helpful.
Finally, my code -
#celery.task
def task_a(data):
do_some_processing
# Create only 1 Rabbitmq consumer instance to avoid celery hangups
task_d.delay()
#celery.task
def task_b(data):
do_some_processing
if data is not None:
task_c.delay()
#celery.task
def task_c():
data = some_data
data = json.dumps(data)
conn_params = pika.ConnectionParameters(host=RABBITMQ_HOST)
connection = pika.BlockingConnection(conn_params)
channel = connection.channel()
channel.queue_declare(queue=QUEUE_1)
channel.basic_publish(exchange='',
routing_key=QUEUE_1,
body=data)
channel.close()
#celery.task
def task_d():
def queue_helper(ch, method, properties, body):
'''
Callback from queue.
'''
data = json.loads(body)
task_b.delay(data)
conn_params = pika.ConnectionParameters(host=RABBITMQ_HOST)
connection = pika.BlockingConnection(conn_params)
channel = connection.channel()
channel.queue_declare(queue=QUEUE_2)
channel.basic_consume(queue_helper,
queue=QUEUE_2,
no_ack=True)
channel.start_consuming()
channel.close()

Celery with rabbitmq creates results multiple queues

I have installed Celery with RabbitMQ.
Problem is that for every result that is returned, Celery will create in the Rabbit, queue with the task's ID in the exchange celeryresults.
I still want to have results, but on ONE queue.
my celeryconfig:
from datetime import timedelta
OKER_URL = 'amqp://'
CELERY_RESULT_BACKEND = 'amqp'
#CELERY_IGNORE_RESULT = True
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT=['json', 'application/json']
CELERY_TIMEZONE = 'Europe/Oslo'
CELERY_ENABLE_UTC = True
from celery.schedules import crontab
CELERYBEAT_SCHEDULE = {
'every-minute': {
'task': 'tasks.remote',
'schedule': timedelta(seconds=30),
'args': (),
},
}
Is that possible? How?
Thanks!
amqp backend creates a new queue for each task. Alternatively, there is a new rpc backend which keeps results in a single queue.
http://docs.celeryproject.org/en/master/whatsnew-3.1.html#new-rpc-result-backend
Nothing unusual.
That is how celery works when we use amqp as result backend. It will create a new temporary queue for every result corresponding to each tasks that worker consumes.
If you are not interested in the result, you can try CELERY_IGNORE_RESULT = True setting
If you do want to store the result, then i would recommend using a different result backend like Redis.
You say you want Celery to keep the result on one queue. Now, to answer your question, let me ask you one:
How do you expect each producer to check for it's relevant result without reading every single message off the queue to find the one it needs/wants?
In essence, what you want is a database of key-value pairs so that the lookup is O(1). The only way to do that with a queue broker is to create one queue for each "pair".
I understand that having many GUID queues is not neat or pretty, but it's conceptually the only way to do it on a messaging broker.
This solution won't keep all the results to ONE queue, but it will at least clean up the extra queues right when you're done with them.
If you use Redis as your backend, when you're done with a result that has created an errant queue, run result.forget(). This will cause both the result and the queue for the result to disappear. This can help you manage the number of queues you have, and prevent OOM issues.

Why does celery add thousands of queues to rabbitmq that seem to persist long after the tasks completel?

I am using celery with a rabbitmq backend. It is producing thousands of queues with 0 or 1 items in them in rabbitmq like this:
$ sudo rabbitmqctl list_queues
Listing queues ...
c2e9b4beefc7468ea7c9005009a57e1d 1
1162a89dd72840b19fbe9151c63a4eaa 0
07638a97896744a190f8131c3ba063de 0
b34f8d6d7402408c92c77ff93cdd7cf8 1
f388839917ff4afa9338ef81c28aad75 0
8b898d0c7c7e4be4aa8007b38ccc00ea 1
3fb4be51aaaa4ac097af535301084b01 1
This seems to be inefficient, but further I have observed that these queues persist long after processing is finished.
I have found the task that appears to be doing this:
#celery.task(ignore_result=True)
def write_pages(page_generator):
g = group(render_page.s(page) for page in page_generator)
res = g.apply_async()
for rendered_page in res:
print rendered_page # TODO: print to file
It seems that because these tasks are being called in a group, they are being thrown into the queue but never being released. However, I am clearly consuming the results (as I can view them being printed when I iterate through res. So, I do not understand why those tasks are persisting in the queue.
Additionally, I am wondering if the large number queues that are being created is some indication that I am doing something wrong.
Thanks for any help with this!
Celery with the AMQP backend will store task tombstones (results) in an AMQP queue named with the task ID that produced the result. These queues will persist even after the results are drained.
A couple recommendations:
Apply ignore_result=True to every task you can. Don't depend on results from other tasks.
Switch to a different backend (perhaps Redis -- it's more efficient anyway): http://docs.celeryproject.org/en/latest/userguide/tasks.html
Use CELERY_TASK_RESULT_EXPIRES (or on 4.1 CELERY_RESULT_EXPIRES) to have a periodic cleanup task remove old data from rabbitmq.
http://docs.celeryproject.org/en/master/userguide/configuration.html#std:setting-result_expires

Pika + RabbitMQ: setting basic_qos to prefetch=1 still appears to consume all messages in the queue

I've got a python worker client that spins up a 10 workers which each hook onto a RabbitMQ queue. A bit like this:
#!/usr/bin/python
worker_count=10
def mqworker(queue, configurer):
connection = pika.BlockingConnection(pika.ConnectionParameters(host='mqhost'))
channel = connection.channel()
channel.queue_declare(queue=qname, durable=True)
channel.basic_consume(callback,queue=qname,no_ack=False)
channel.basic_qos(prefetch_count=1)
channel.start_consuming()
def callback(ch, method, properties, body):
doSomeWork();
ch.basic_ack(delivery_tag = method.delivery_tag)
if __name__ == '__main__':
for i in range(worker_count):
worker = multiprocessing.Process(target=mqworker)
worker.start()
The issue I have is that despite setting basic_qos on the channel, the first worker to start accepts all the messages off the queue, whilst the others sit there idle. I can see this in the rabbitmq interface, that even when I set worker_count to be 1 and dump 50 messages on the queue, all 50 go into the 'unacknowledged' bucket, whereas I'd expect 1 to become unacknowledged and the other 49 to be ready.
Why isn't this working?
I appear to have solved this by moving where basic_qos is called.
Placing it just after channel = connection.channel() appears to alter the behaviour to what I'd expect.