Failure to execute tasks by Celery after a month - redis

My program runs specific tasks daily, these tasks are set by django-celery-beat.
Recently, I noticed that the tasks are not performed and all changes are made by resetting the celery service configured by supervisorctl.
command=/opt/taskjo/taskjo-venv/bin/celery -A taskjo worker --pool=gevent --autoscale 4,2 -B -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler --loglevel=INFO
I added these items
--pool=gevent --autoscale 4,2
New error in log
Traceback (most recent call last):
File "/home/ubuntu/venv/lib/python3.5/site-packages/billiard/pool.py", line 1224, in mark_as_worker_lost
human_status(exitcode)),
billiard.exceptions.WorkerLostError: Worker exited prematurely: exitcode 0.
Recently, I run 4 odoo services on a server with 6 RAM and a large amount of RAM is occupied.
what happened:
Celery sends several tasks, but it can't do it, and when it is reset, all the tasks are done.
Do you have a solution?
WorkerLostError: Worker exited prematurely: exitcode 0 when shutting down worker #273
WorkerLostError: Worker exited prematurely: signal 15 (SIGTERM). #6291

The problem was solved
Steps to do the work:
Checking the server's RAM (it was almost 98% full)
Celery configuration (about one gigabyte of RAM was occupied due to the number of workers)
Closing a number of services and changing the configuration of Celery
As a result, it has been working without problems for several days.
The following config was replaced:
[program:celery]
directory=/opt/taskjo/taskjo
command=/opt/taskjo/taskjo-venv/bin/celery -A taskjo worker -B -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler --loglevel=INFO
;user=taskjo
;numprocs=1
stdout_logfile=/opt/taskjo/logs/celery/worker-access.log
stderr_logfile=/opt/taskjo/logs/celery/worker-error.log
stdout_logfile_maxbytes=50
stderr_logfile_maxbytes=50
stdout_logfile_backups=10
stderr_logfile_backups=10
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; Causes supervisor to send the termination signal (SIGTERM) to the whole process group.
stopasgroup=true
; Set Celery priority higher than default (999)
; so, if rabbitmq is supervised, it will start first.
priority=1000

Related

monit - Can I restart a process if system memory is too much?

I have this rule for system:
check system $HOST
if memory usage > 90% for 3 cycles then alert
and this rule for a process:
check process my_process matching "..."
restart program = "..."
I would like that if system memory is more than 90% to restart the process my_process.
Is this possible with monit?
I tried variants of if memory usage > 90% for 3 cycles then restart my_process but always the syntax is not recognized on monit reload.
Back to your sample, you can use something like this.
check system $HOST
if memory usage > 90% for 3 cycles then
exec "/bin/bash -c '/usr/local/bin/monit restart my_process'"
To restart a service named "my_process".
With regards, Lutz
The command (start, stop, restart, ...) execute the proper command for the service itself.
With regards,
Lutz

Celery worker receives unregeistered task from celerybeat run by systemd

On my staging server I had my celery worker(4.3.0) up and running with celery beat as daemons via systemd with RabbitMQ as broker. Everything was alright for few weeks just to the one moment 4 days ago when there was some sort of connection error between celery and amqp through kombu. [Errno 104] Connection reset by peer after started
I wasn't paying much of an attention to the server logs, since the project is in WiP stage, however when I tried to deploy newest version of the code, I realized that something is wrong with the worker.
I googled for the issue and that's what popped out:
https://github.com/celery/celery/issues/4867
The easy solution was to downgrade celery to 4.1.1 and wait till fix in future stable releases.
I removed celery, amqp, billiard and kombu from my venv, installed celery.4.1.1, which installed above packages in appropriate versions.
Atm services of celery and celerybeat are active, celerybeat sends the tasks to the celery worker, however celery logs shows me error message (please see error code of celery after downgrade). It is weird, because I haven't changed anything in task declarations or my settings ( which may be the issue here).
The weirdest thing is that if I shut down systemd services and run them with the commend:
celery -A celery_cfg:app worker -B --loglevel=DEBUG
All current tasks are being proceed as the past ones. So the celery and celerybeat configs as they are seems to be working.
Few pointed approaches I tried:
1) Made sure to import all modules without relatives imports.
2) In the past encountered issue with missing packages in venv --> they are up to date
3) Rebooted celery/celerybeat/gunicorn/systemd/rabbitmq and server itself
4) Double checked the paths in systemd services (however maybe I am debugging this to long and I just cant see the typo or something)
5) Tried with developing version 4.4.0rc2, (celery worker won't stand up)
6) Installed apps contains all required apps
Error message after downgrade of celery version
`2019-06-16 19:35:00,092: ERROR/MainProcess] Received unregistered task of type 'apps.mailing.tasks.execute_sending_system_mail'.
The message has been ignored and discarded.
Did you remember to import the module containing this task?
Or maybe you're using relative imports?
Please see
http://docs.celeryq.org/en/latest/internals/protocol.html
for more information.
The full contents of the message body was:
'[[], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]' (77b)
Traceback (most recent call last):
File "/home/user/apps/venv/loans/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", line 557, in on_task_received
strategy = strategies[type_]
KeyError: 'apps.mailing.tasks.execute_sending_system_mail'
Celery Service Systemd Code
Description=Celery Service
After=network.target
[Service]
Type=forking
User=<user>
Group=<user>
EnvironmentFile=/etc/default/celery
WorkingDirectory=/home/<user>/apps/loans
ExecStart=/bin/sh -c '${CELERY_BIN} multi start ${CELERYD_NODES} \
-A ${CELERY_APP} --pidfile=${CELERYD_PID_FILE} \
--logfile=${CELERYD_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL} ${CELERYD_OPTS}'
ExecStop=/bin/sh -c '${CELERY_BIN} multi stopwait ${CELERYD_NODES} \
--pidfile=${CELERYD_PID_FILE}'
ExecReload=/bin/sh -c '${CELERY_BIN} multi restart ${CELERYD_NODES} \
-A ${CELERY_APP} --pidfile=${CELERYD_PID_FILE} \
Celery Beat Service Systemd Code
Description=Celery Beat Service
After=network.target
[Service]
Type=simple
User=user
Group=user
EnvironmentFile=/etc/default/celery
WorkingDirectory=/home/user/apps/loans
ExecStart=/bin/sh -c '${CELERY_BIN} beat \
-A ${CELERY_APP} --pidfile=${CELERYBEAT_PID_FILE} \
--logfile=${CELERYBEAT_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL}'
[Install]
WantedBy=multi-user.target
Conf file for variables
CELERYD_NODES="w1"
CELERY_BIN="/home/user/apps/venv/loans/bin/celery"
CELERY_APP="celery_cfg:app"
CELERYD_MULTI="multi"
CELERYD_OPTS=""
CELERYD_PID_FILE="/home/user/apps/pids/celery/%n.pid"
CELERYD_LOG_FILE="/home/user/apps/logs/celery/%n%I.log"
CELERYD_LOG_LEVEL="INFO"
CELERYBEAT_PID_FILE="/home/user/apps/pids/celery/beat.pid"
CELERYBEAT_LOG_FILE="/home/user/apps/logs/celery/beat.log"
celery_cfg file
app = Celery('loans_apps')
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
app.set_default()
# <====CELERY BEAT PERIODIC TASKS ====>
app.conf.beat_schedule = {
'execute_sending_system_mail': {
'task': 'apps.mailing.tasks.execute_sending_system_mail',
'schedule': crontab(minute='*/5'),
'args': (),
},
}
#app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
minor cut of settings containing celery cfg variables
BROKER_URL = 'amqp://localhost//',
CELERY_ENABLE_UTC = True
I know I can try setting celery and celerybeat without systemd, however I treat this as the last resort solution. I'd like to keep the conf as it was, even though I've no clue what's is wrong up there.
EDIT
By the mistake and guided by my friend I just found out, that both celery and celerybeat services seems to be working fine on user root, which is obviously not the solution but narrows down the number of possible flaws
It would be rude to leave the question unanswered, even though the answer comes from me, here it is:
If someone will ever encounter such issue, after following step pointed by me above, try to check for the permissions of directories which celery and celerybeat uses - You might have created them with root permissions, which may ends up with mentioned issue. Good luck to everyone in the future !

Why is my supervised django celeryd process not accepting tasks?

We've had a django-celery process with 5 worker processes running in production for ages now. It properly receives and runs tasks. These processes run tasks which are inserted into two queues: live and celery.
The command used to run the celery process is roughly:
manage.py celeryd -E --loglevel=WARNING --concurrency=5 \
--settings=django_settings.production_celery -Q live,celery
I've now just built a new system which is supposed to process different tasks on a different queue called foobar. These celery processes are run with a command roughly like:
manage.py celeryd -E --loglevel=WARNING --concurrency=5 \
--settings=django_settings.production_foobar -Q foobar
However when I attempt to run tasks in the new queue using my_task.apply_async(queue='foobar'), the result object remains in a PENDING state indefinitely.
Through logging I have determined that the foobar workers never receive the task. So now I'm trying to debug at what point the task message is being lost.
(We use RabbitMQ as our AMQP message broker.)
How can I determine the current contents of a celery queue? Can I directly inspect the contents of the RabbitMQ queue?

Celery node fail, on pidbox already using on restart

I have Celery running with RabbitMQ broker.
Today, I have a failure of a Celery node, it doesn't execute tasks and doesn't respond on service celeryd stop command. After few repeats, the node stopped, but on start I get this message:
[WARNING/MainProcess] celery#nodename ready.
[WARNING/MainProcess] /home/ubuntu/virtualenv/project_1/local/lib/python2.7/site-packages/kombu/pidbox.py:73: UserWarning: A node named u'nodename' is already using this process mailbox!
Maybe you forgot to shutdown the other node or did not do so properly?
Or if you meant to start multiple nodes on the same host please make sure
you give each node a unique node name!
warnings.warn(W_PIDBOX_IN_USE % {'hostname': self.hostname})
Can anyone suggest how to unlock process mailbox?
From here http://celery.readthedocs.org/en/latest/userguide/workers.html#starting-the-worker you might need to name each node uniquely. Example:
$ celery -A proj worker --loglevel=INFO --concurrency=10 -n worker1.%h
In supervisor escape by using %%h.
Large log file or not enough free space was a reason, i think.
After deletion all is ok

Fork shell script (not &)

I'm accessing a webserver via PHP. I want to update some info in the Apache configs, so I start a shell script that makes the changes. Then I want to stop and restart Apache.
Problem: as soon as I stop Apache, my process stops and my shell script, being a child process, is killed. Apache never restarts. This also happens with Apache restart.
Is there a way to fork an independent, non-child process for the shell script, so I can restart Apache?
Thx,
Mr B
You can use disown:
disown [-ar] [-h] [jobspec ...]
Without options, each jobspec is removed from the table of active jobs. If the `-h' option is given, the job is not removed from the table, but is marked so that SIGHUP is not sent to the job if the shell receives a SIGHUP. If jobspec is not present, and neither the `-a' nor `-r' option is supplied, the current job is used. If no jobspec is supplied, the `-a' option means to remove or mark all jobs; the `-r' option without a jobspec argument restricts operation to running jobs.
./myscript.sh &
disown
./myscript.sh will continue running even if the script that started it dies.
Take a look at nohup, may fit you needs.
let's say you have a script called test.sh
for i in $(seq 100); do
echo $i >> test.temp
sleep 1;
done
if you run nohup ./test.sh & you can kill the shell and the process stay alive.