Retry Lost or Failed Tasks (Celery, Django and RabbitMQ) - rabbitmq

Is there a way to determine if any task is lost and retry it?
I think that the reason for lost can be dispatcher bug or worker thread crash.
I was planning to retry them but I'm not sure how to determine which tasks need to be retired?
And how to make this process automatically? Can I use my own custom scheduler which will create new tasks?
Edit: I found from the documentation that RabbitMQ never loose tasks, but what happens when worker thread crash in the middle of task execution?

What you need is to set
CELERY_ACKS_LATE = True
Late ack means that the task messages will be acknowledged after the task has been executed,
not just before, which is the default behavior.
In this way if the worker crashes rabbit MQ will still have the message.
Obviously of a total crash (Rabbit + workers) at the same time there is no way of recovering the task, except if you implement a logging on task start and task end.
Personally I write in a mongodb a line every time a task start and another one when the task finish (independently form the result), in this way I can know which task was interrupted by analyzing the mongo logs.
You can do it easily by overriding the methods __call__ and after_return of the celery base task class.
Following you see a piece of my code that uses a taskLogger class as context manager (with entry and exit point).
The taskLogger class simply writes a line containing the task info in a mongodb instance.
def __call__(self, *args, **kwargs):
"""In celery task this function call the run method, here you can
set some environment variable before the run of the task"""
#Inizialize context managers
self.taskLogger = TaskLogger(args, kwargs)
self.taskLogger.__enter__()
return self.run(*args, **kwargs)
def after_return(self, status, retval, task_id, args, kwargs, einfo):
#exit point for context managers
self.taskLogger.__exit__(status, retval, task_id, args, kwargs, einfo)
I hope this could help

Related

Celery task for ML prediction hangs in execution

I'm trying to create a web application that receives an input from POST request and provides some ML predictions based on that input.
Since prediction model is quite heavy, I don't want that user waits for calculation to complete. Instead, I delegated heavy computation to Celery task and user can later inspect the result.
I'm using simple Flask application with Celery, Redis and Flower.
My view.py:
#ns.route('predict/')
class Predict(Resource):
...
def post(self):
...
do_categorize(data)
return jsonify(success=True)
My tasks.py file looks something like this:
from ai.categorizer import Categorizer
categorizer = Categorizer(
model_path='category_model.h5',
tokenizer_path='tokenize.joblib',
labels_path='labels.joblib'
)
#task()
def do_categorize(data):
result = categorizer.predict(data)
print(result)
# Write result to the DB
...
My predict() method inside Categorizer class:
def predict(self, value):
K.set_session(self.sess)
with self.sess.as_default():
with self.graph.as_default():
prediction = self.model.predict(np.asarray([value], dtype='int64'))
return prediction
I'm running Celery like this:
celery worker -A app.celery --loglevel=DEBUG
The problem that I'm having for the last couple of days is that categorizer.predict(data) call hangs in the middle of the execution.
I tried to run categorizer.predict(data) inside of post method and it works. But if I place it inside Celery tasks it stop working. There is no console log, if I try to debug it it just freezes on .predict().
My questions:
How can I solve this issue?
Is there any memory, CPU limit for the worker?
Are Celery tasks the "right" way to do such heavy computations?
How can I debug this problem? What am I doing wrong?
Is it correct to initialize models at the top of the file?
Thanks to this SO question I found the answer for my problem:
It turns out that is better for Keras to work with Threads pooling instead of default Process.
Lucky for me, with Celery 4.4 Threads pooling was reintroduced not a long time ago.
You can read more at Celery 4.4 Changelogs:
Threaded Tasks Pool
We reintroduced a threaded task pool using
concurrent.futures.ThreadPoolExecutor.
The previous threaded task pool was experimental. In addition it was based on the threadpool package which is obsolete.
You can use the new threaded task pool by setting worker_pool to ‘threads` or by passing –pool threads to the celery worker command.
Now you can use threads instead of processes for pooling.
celery worker -A your_application --pool threads --loginfo=INFO
If you can not use latest Celery version, you have possibility to use gevent package:
pip install gevent
celery worker -A your_application --pool gevent --loginfo=INFO

Celery + HaProxy + RabbitMQ lose task messages

Probably not the best place to ask (maybe Server Fault), but I'd try here:
I have django celery sending tasks via HaProxy to a RabbitMQ cluster, and we are losing messages (tasks not being executed) every now and then.
Observed
We turned of the workers and monitored the queue size, we noticed that we started 100 jobs but only 99 showed up in the queue.
It seems to happen when other processes are using RabbitMQ for other jobs
Tried
I tried flooding RabbitMQ with dummy messages with many connections and tried to put some proper tasks in to the queue, I couldn't replicate the issue constantly.
Just wondering if anyone had experienced this before?
UPDATE:
So I dove into the code and eventually stumble on to celery/app/amqp.py, I was debugging by adding extra publish method to an non-existing exchange, see below:
log.warning(111111111)
self.publish(
body,
exchange=exchange, routing_key=routing_key,
serializer=serializer or self.serializer,
compression=compression or self.compression,
headers=headers,
retry=retry, retry_policy=_rp,
reply_to=reply_to,
correlation_id=task_id,
delivery_mode=delivery_mode, declare=declare,
**kwargs
)
log.warning(222222222)
self.publish(
body,
exchange='celery2', routing_key='celery1',
serializer=serializer or self.serializer,
compression=compression or self.compression,
headers=headers,
retry=retry, retry_policy=_rp,
reply_to=reply_to,
correlation_id=task_id,
delivery_mode=delivery_mode, declare=declare,
**kwargs
)
log.warning(333333333)
Then tried to trigger 100 tasks from project code, and then result was only 1 message got put into the celery queue, I think it's caused by ProducerPool or ConnectionPool

How to figure out if mule flow message processing is in progress

I have a requirement where I need to make sure only one message is being processed at a time by a mule flow.Flow is triggered by a quartz scheduler which reads one file from FTP server every time
My proposed solution is to keep a global variable "FLOW_STATUS" which will be set to "RUNNING" when a message is received and would be reset to "STOPPED" once the processing of message is done.
Any messages fed to the flow will check for this variable and abort if "FLOW_STATUS" is "RUNNING".
This setup seems to be working , but I was wondering if there is a better way to do it.
Is there any best practices around this or any inbuilt mule helper functions to achieve the same instead of relying on global variables
It seems like a more simple solution would be to set the maxActiveThreads for the flow to 1. In Mule, each message processed gets it's own thread. So setting the maxActiveThreads to 1 would effectively make your flow singled threaded. Other pending requests will wait in the receiver threads. You will need to make sure your receiver thread pool is large enough to accommodate all of the potential waiting threads. That may mean throttling back your quartz scheduler to allow time process the files so the receiver thread pool doesn't fill up. For more information on the thread pools and how to tune performance, here is a good link: http://www.mulesoft.org/documentation/display/current/Tuning+Performance

How to ACK celery tasks with parallel code in reactor?

I have a celery task that, when called, simply ignites the execution of some parallel code inside a twisted reactor. Here's some sample (not runnable) code to illustrate:
def run_task_in_reactor():
# this takes a while to run
do_something()
do_something_more()
#celery.task
def run_task():
print "Started reactor"
reactor.callFromThread(run_task_in_reactor)
(For the sake of simplicity, please assume that the reactor is already running when the task is received by the worker; I used the signal #worker_process_init.connect to start my reactor in another thread as soon as the worker comes up)
When I call run_task.delay(), the task finishes pretty quickly (since it does not wait for run_task_in_reactor() to finish, only schedules its execution in the reactor). And, when run_task_in_reactor() finally runs, do_something() or do_something_more() can throw an exception, which will go unoticed.
Using pika to consume from my queue, I can use an ACK inside do_something_more() to make the worker notify the correct completion of the task, for instance. However, inside Celery, this does not seems to be possible (or, at least, I do't know how to accomplish the same effect)
Also, I cannot remove the reactor, since it is a requirement of some third-party code I'm using. Other ways to achieve the same result are appreciated as well.
Use reactor.blockingCallFromThread instead.

Using django-celery chord, celery.chord_unlock keeps executing forever not calling the provided callback

I'm using Django Celery with Redis to run a few tasks like this:
header = [
tasks.invalidate_user.subtask(args = (user)),
tasks.invalidate_details.subtask(args = (user))
]
callback = tasks.rebuild.subtask()
chord(header)(callback)
So basically the same as stated in documentation.
My problem is, that when this task chord is called, celery.chord_unlock task keeps retrying forever. Tasks in header finish successfully, but because of chord_unlock never being done, callback is never called.
Guessing that my problem is with not being able to detect that the tasks from header are finished, I turned to documentation to look how can this be customized. I've found a section, describing how the synchronization is implemented, there is an example provided, what I'm missing is how do I get that example function to be called (i.e. is there a signal for this?).
Further there's a note that this method is not used with Redis backend:
This is used by all result backends except Redis and Memcached, which increment a counter after each task in the header, then applying the callback when the counter exceeds the number of tasks in the set.
But also says, that Redis approach is better:
The Redis and Memcached approach is a much better solution
What approach is that? How is it implemented?
So, why is chord_unlock never done and how can I make it detect finished header tasks?
I'm using: Django 1.4, celery 2.5.3, django-celery 2.5.5, redis 2.4.12
You don't have an example of your tasks, but I had the same problem and my solution might apply.
I had ignore_result=True on the tasks that I was adding to a chord, defined like so:
#task(ignore_result=True)
Apparently ignoring the result makes it so that the chord_unlock task doesn't know they're complete. After I removed ignore_result (even if the task only returns true) the chord called the callback properly.
I had the same error, I changed the broker to rabbitmq and chord_unlock is working until my task finishes (2-3 minutes tasks)
when using redis the task finishes and chord_unlock only retried like 8-10 times every 1s, so callback was not executing correctly.
[2012-08-24 16:31:05,804: INFO/MainProcess] Task celery.chord_unlock[5a46e8ac-de40-484f-8dc1-7cf01693df7a] retry: Retry in 1s
[2012-08-24 16:31:06,817: INFO/MainProcess] Got task from broker: celery.chord_unlock[5a46e8ac-de40-484f-8dc1-7cf01693df7a] eta:[2012-08-24 16:31:07.815719-05:00]
... just like 8-10 times....
changing broker worked for me, now I am testing #Chris solution and my callback function never receives the results from the header subtasks :S, so, it does not works for me.
celery==3.0.6
django==1.4
django-celery==3.0.6
redis==2.6
broker: redis-2.4.16 on Mac OS X
This could cause a problem such that; From the documentation;
Note:
If you are using chords with the Redis result backend and also overriding the Task.after_return() method, you need to make sure to call the super method or else the chord callback will not be applied.
def after_return(self, *args, **kwargs):
do_something()
super(MyTask, self).after_return(*args, **kwargs)
As my understanding, If you have overwritten after_return function in your task, it must be removed or at least calling super one.
Bottom of the topic:http://celery.readthedocs.org/en/latest/userguide/canvas.html#important-notes