Queues with random GUID being generated in RabbitMQ server - rabbitmq

Queues with a random GUID are being generated comming from exchange 'celeryresults'.
This happened when I fired a task from the shell, using delay method, but I forgot to enter parameters of my original function in the arguments list of delay.
Error displayed in terminal where I run the celery worker:
[2015-02-20 18:42:48,547: ERROR/MainProcess] Task customers.tasks.sendmail_task[1a4daf49-81bf-4122-8dea-2ee76c2a2ff8] raised unexpected: TypeError('sendmail_task() takes exactly 4 arguments (0 given)',)
Traceback (most recent call last):
File "/home/cod/workspace/envs/cod/lib/python2.6/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/cod/workspace/envs/cod/lib/python2.6/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
TypeError: sendmail_task() takes exactly 4 arguments (0 given)
How do I stop random queues from being generated? Why won't these messages use the default queue?

There is a difference between the broker (sends/receives messages) and backend (stores/fetches task results) in celery. It sounds like you are using RabbitMQ both as the message broker and the result backend.
When RabbitMQ is used as a result backend, celery creates one queue per task to temporarily keep track of the result. This is described in the RabbitMQ Result Backend section of the docs.
If you don't want this behavior then you should either turn it off using CELERY_IGNORE_RESULT or switch to one of the other backend implementations listed in the Result Backend Settings.

Related

Task queues and result queues with Celery and Rabbitmq

I have implemented Celery with RabbitMQ as Broker. I rely on Celery v4.4.7 since I have read that v5.0+ doesn't support RabbitMQ anymore. RabbitMQ is a MUST in my case.
Everything has been containerized then deployed as pods within Kubernetes 1.19. I am able to execute long running tasks and everything apparently looks fine at first glance. However, I have few concerns which require your expertise.
I have declared inbound and outbound queues but Celery created his owns and I do not see any message within those queues (inbound or outbound) :
inbound_queue = "_IN"
outbound_queue = "_OUT"
app = Celery()
app.conf.update(
broker_url = 'pyamqp://%s//' % path,
broker_heartbeat = None,
broker_connection_timeout = int(timeout)
result_backend = 'rpc://',
result_persistent = True,
task_queues = (
Queue(algorithm_queue, Exchange(inbound_queue), routing_key='default', auto_delete=False),
Queue(result_queue, Exchange(outbound_queue), routing_key='default', auto_delete=False),
),
task_default_queue = inbound_queue,
task_default_exchange = inbound_exchange,
task_default_exchange_type = 'direct',
task_default_routing_key = 'default',
)
#app.task(bind=True,
name='osmq.tasks.add',
queue=inbound_queue,
reply_to = outbound_queue,
autoretry_for=(Exception,),
retry_kwargs={'max_retries': 5, 'countdown': 2})
def execute(self, data):
<method_implementation>
I have implemented callbacks to get results back via REST APIs. However, randomly, it can return or not some results when the status is successfull. This is probably related to message persistency. In details, when I implement flower API to get info, status is successfull and the result is partially displayed (shortened json messages) - when I call AsyncResult, for the same status, result is either None or the right one. I do not understand the mechanism between rabbitmq queues and kombu which seems to cache the resulting message. I must guarantee to retrieve results everytime the task has been successfully executed.
def callback(uuid):
task = app.AsyncResult(uuid)
Specifically, it was that Celery 5.0+ did not support amqp:// as a result back end anymore. However, as your example, rpc:// is supported.
The relevant snippet is here: https://docs.celeryproject.org/en/stable/getting-started/backends-and-brokers/index.html#rabbitmq
We tend to always ignore_results=True in our implementation, so I can't give any practical tips of how to use rpc://, other than to infer that any response is put on an application-specific queue, instead of being able to put on a specified queue (or even different broker / rabbitmq instance) via amqp://.

Python rq-scheduler: enqueue a failed job after some interval

I am using python RQ to execute a job in the background. The job calls a third-party rest API and stores the response in the database. (Refer the code below)
#classmethod
def fetch_resource(cls, resource_id):
import requests
clsmgr = cls(resource_id)
clsmgr.__sign_headers()
res = requests.get(url=f'http://api.demo-resource.com/{resource_id}', headers=clsmgr._headers)
if not res.ok:
raise MyThirdPartyAPIException(res)
....
The third-party API is having some rate limit like 7 requests/minute. I have created a retry handler to gracefully handle the 429 too many requests HTTP Status Code and re-queue the job after the a minute (the time unit changes based on rate limit). To re-queue the job after some interval I am using the rq-scheduler.
Please find the handler code attached below,
def retry_failed_job(job, exc_type, exc_value, traceback):
if isinstance(exc_value, MyThirdPartyAPIException) and exc_value.status_code == 429:
import datetime as dt
sch = Scheduler(connection=Redis())
# sch.enqueue_in(dt.timedelta(seconds=60), job.func_name, *job.args, **job.kwargs)
I am facing issues in re-queueing the failed job back into the task queue. As I can not directly call the sch.enqueue_in(dt.timedelta(seconds=60), job) in the handler code (As per the doc, job to represent the delayed function call). How can I re-queue the job function with all the args and kwargs?
Ahh, The following statement does the work,
sch.enqueue_in(dt.timedelta(seconds=60), job.func, *job.args, **job.kwargs)
The question is still open let me know if any one has better approach on this.

Inserting Celery tasks directly into Redis

I have an Erlang system. I want this system to be able to trigger Celery tasks on another, Python-based system. They share the same host, and Celery is using Redis as its broker.
Is it possible to insert tasks for Celery directly into Redis (in my case, from Erlang), instead of using a Celery API?
Yes, you can insert tasks directly into redis or whatever backend you are using with celery.
You'll have to match the celery serialization format (which is in JSON by default) and figure out which keys it is inserting to. The key structure used isn't clearly documented, but this part of the source code is a good place to start.
You can also use the redis monitor command to watch which keys celery uses in real time.
According to the task message definition from the Celery docs, the body of the message has the following format (for version 5.2):
body = (
object[] args,
Mapping kwargs,
Mapping embed {
'callbacks': Signature[] callbacks,
'errbacks': Signature[] errbacks,
'chain': Signature[] chain,
'chord': Signature chord_callback,
}
)
Therefore, to trigger a task you should put a message with a body like this to your Celery backend's queue (represented as a Python data structure):
[
['arg1', 'arg2'], # positional arguments for the task
{'kwarg1': 'val1', 'kwarg2': 'val2'}, # keyword arguments for the task
{'callbacks': None, 'errbacks': None, 'chain': None, 'chord': None}
]

How to set result queue in celery result backend RPC?

Documentation says, that "RPC-style result backend, using reply-to and one queue per client."
So, how to set result queue in rpc result backend?
I need it for that cases:
I'm doing result=send_task('name',args) in one script (and saving result.id as send_task_id) and trying to get result in another script with asyncresult = AsyncResult(id=send_task_id). I can't get this result because each script has own connection to broker and rpc declare own result queue for each client.
In second case I try send_task and AsyncResult (with retry when result.state == PENDING) in one script. When I run it as worker with concurrency = 1 it is OK. When concurrency >1 result may be never returned. Each worker fork get own connection to broker and own result queue. It will be OK when same worker fork doing send_task and proceed retry.
I'm using celery 4.0.2 and 4.1.0.

Celery: Task Singleton?

I have a task that I need to run asynchronously from the web page that triggered it. This task runs rather long, and as the web page could be getting a lot of these requests, I'd like celery to only run one instance of this task at a given time.
Is there any way I can do this in Celery natively? I'm tempted to create a database table that holds this state for all the tasks to communicate with, but it feels hacky.
You probably can create a dedicated worker for that task configured with CELERYD_CONCURRENCY=1 then all tasks on that worker will run synchronously
You can use memcache/redis for that.
There is an example on the celery official site - http://docs.celeryproject.org/en/latest/tutorials/task-cookbook.html
And if you prefer redis (This is a Django realization, but you can also easily modify it for your needs):
from django.core.cache import cache
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
class SingletonTask(Task):
def __call__(self, *args, **kwargs):
lock = cache.lock(self.name)
if not lock.acquire(blocking=False):
logger.info("{} failed to lock".format(self.name))
return
try:
super(SingletonTask, self).__call__(*args, **kwargs)
except Exception as e:
lock.release()
raise e
lock.release()
And then use it as a base task:
#shared_task(base=SingletonTask)
def test_task():
from time import sleep
sleep(10)
This realization is nonblocking. If you want next task to wait for the previous task change blocking=False to blocking=True and add timeout