Can Celery email me if the task time limit is exceeded? - error-handling

So I have a celery setup using RabbitMQ as the broker and amqp as the results backend.
Sometimes, I will have tasks that go long because I misunderestimated the needed timeout, and as intended, Celery will kill the worker running the task.
The problem is that because this is a celery problem and not a task problem, my error handling that's supposed to email me from the task will not run, and I will receive no message about the failure.
Is there a way to have Celery do some error notification on it's own when it kills a task due to Celery-related errors? Like an on_timeout() function that I can create in the task? I really don't want to have the calling process do the error handling, because the timeout is already a couple hours and the process runs for about 30 seconds.

Looks like this question is from a while ago and you've probably resolved the issue, but in case not, have you checked out the CELERY_SEND_TASK_ERROR_EMAILS config setting?

Related

Celery task with a long ETA and RabbitMQ

RabbitMQ may enforce ack timeouts for consumers: https://www.rabbitmq.com/consumers.html#acknowledgement-modes
By default if a task has not been acked within 15 min entire node will go down with a PreconditionFailed error.
I need to schedule a celery task (using RabbitMQ as a broker) with an ETA quite far in the future (1-3 h) and as of now (with celery 4 and rabbitmq 3.8) when I try that... I get PreconditionFailed after the consumer ack timeout configured for my RMQ.
I expected that the task would be acknolwedged before its ETA ...
Is there a way to configure an ETA celery task to be acknowledged within the consumer ack timeout?
right now I am increasing the consumer_timeout to above my ETA time delta, but there must be a better solution ...
I think adjusting the consumer_timeout is your only option in Celery 5. Note that this is only applicable for RabbitMQ 3.8.15 and newer.
Another possible solution is to have the workers ack the message immediately upon receipt. Do this only if you don't need to guarantee task completion. For example, if the worker crashes before doing the task, Celery will not know that it wasn't completed.
In RabbitMQ, the best options for delayed tasks are the delayed-message-exchange or dead lettering. Celery cannot use either option. In Celery, messages are published to the message broker where they are sent to consumers as soon as possible. The delay is enforced in the worker, not at the broker.
There's a way to change this consumer_timeout for a running instance by running the following command on the RabbitMQ server:
rabbitmqctl eval 'application:set_env(rabbit, consumer_timeout, 36000000).'
This will set the new timeout to 10 hrs (36000000ms). For this to take effect, you need to restart your workers though. Existing worker connections will continue to use the old timeout.
You can check the current configured timeout value as well:
rabbitmqctl eval 'application:get_env(rabbit, consumer_timeout).'
If you are running RabbitMQ via Docker image, here's how to set the value: Simply add -e RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="-rabbit consumer_timeout 36000000" to your docker run OR set the environment RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS to "-rabbit consumer_timeout 36000000".
Hope this helps!
I faced this problem, actually i think you would better to use PeriodicTask, if you would like only do it once set the one_off=True.
https://docs.celeryq.dev/en/stable/userguide/periodic-tasks.html?highlight=periodic
I encountered the same problem and I resolved it.
With RabbitMQ version 3.8.14 (3.8.14-management), I am able to send long ETA tasks.
I personaly use Celery to send tasks with a long ETA.
In my case, I setup up celery to add a timeout (~consumer_timeout), I can configure it with time_limit or soft_time_limit
I also wanted to do something similar and have tried something with the "rabbitmq-delayed-exchange-plugin" and "dead-letter-queue". I wrote an article about both and mentioned the links below. I hope it will be helpful to someone. In a nutshell, we can use both approaches for scheduling celery tasks( handling long ETA).
using dlx:
Dead Letter Exchanges (DLX):
using RabbitMQ Delayed Message Plugin:
RabbitMQ Delayed Message Plugin:
p.s: I know StackOverflow answers should be self-explanatory, but posting the links as answers is long.Sorry

Celery 4.3.0 - Send Signal To a Task Without Termination

On a celery service on CENTOS which runs a single task at a time, the termination of a task is simple:
revoke(id, terminate=True, signal='SIGINT')
However while the interrupt signal is being processed, the running task gets revoked. Then a new task - from the queue - starts on the node. This is troublesome. Two task are running at the same time on the node. The signal handling could take up to a minute.
The question is how a signal could be sent to a running task, without actually terminating the task in celery?
Or let's say is there any way to send a signal to a running task?
The assumption is user should be able to send a signal from a remote node. In other words user does not have access to list the running processes of the node.
Any other solution is welcome.
I don't understand your goal.
Are you trying to kill the worker? if so, I guess you are talking about t "Warm shutdown", so you can send the SIGTEERM to the worker's process. The running task will get a chance to finish but no new task will be added.
If you're just interested in revoking a specific task and keep using the same worker, can you share your celery configuration and the worker command? are you sure you're running with concurrency 1 ?

If celery worker dies hard, does job get retried?

Is there a way for a celery job to be retried if the server where the worker is running dies? I don't just mean the sub-process that execute the job, but the entire server becomes unavailable.
I tried with RabbitMQ and Redis as brokers. In both cases, if a job is currently being processed, it is entirely forgotten. When a worker restarts, it doesn't even try to reprocess the job, and looking at Rabbit or Redis, their queues are empty. The result backend is also empty.
It looks like the worker grabs the message and assume it will put it back if the subprocess fails, but if the worker dies also, it can't put it back.
(yes, I work in an environment where this happens more than once a year, and I don't want to lose tasks)
In theory, set task_acks_late=True should do the trick. (doc)
With a Redis broker, the task will be redelivered after visibility_timeout, which defaults to one hour. (doc)
With RabbitMQ, the task is redelivered as soon as Rabbit noticed that the worker died.

Accessing AMQP connection from Resque worker when using Thin as Web Server

trying to work past an issue when using a resque job to process inbound AMQP messages.
am using an initializer to set up the message consumer at application startup and then feed the received messages to resque job for processing. that is working quite well.
however, i also want to process a response message out of the worker, i.e. publish it back out to a queue, and am running into the issue of the forking process making the app-wide AMQP connection unaddressable from inside the resque worker. would be very interested to see how other folks have tackled this as i can't believe this pattern is unusual.
due to message volumes, taking the approach of firing up a new thread and amqp connection for every response is not a workable solution.
ideas?
my bust on this, had my eye off the ball and forgot about resque forking when it kicks off a worker. going to go the route suggested by others of daemonizing the process instead....

Celery tasks retry (Celery, Django and RabbitMQ)

Can you tell me what is happening when in celery you tell the task to retry? Will it retry in the same worker thread or it will be returned to broker which may send it elsewhere?
What will happen with tasks for retry if worker or dispatcher suddenly stop? If tasks can be lost is there some approach to avoid this? May be save each task in database and retry them if no result is received for some time?
Or may be dispatcher have it's own persistent storage? What about then if worker thread crash receiving the task or while executing it?
Can you tell me what is happening when
in celery you tell the task to retry?
Will it retry in the same worker
thread or it will be returned to
broker which may send it elsewhere?
Yes the task return to the broker (ex. Rabbit MQ) with a different estimated execution time
What will happen with tasks for retry
if worker or dispatcher suddenly stop?
If tasks can be lost is there some
approach to avoid this? May be save
each task in database and retry them
if no result is received for some
time?
Or may be dispatcher have it's own
persistent storage? What about then if
worker thread crash receiving the task
or while executing it?
Here a complete answer Retry Lost or Failed Tasks (Celery, Django and RabbitMQ)