how to use celery beat with node-celery - rabbitmq

I am using node-celery github link to implement celery celery first step along with rabbitmq.
As in celery we define task and then we push them.
My task which I have defined inside tasks.py is as below:
import os
import logging
from celery import Celery
import requests
backend = os.getenv('CELERY_BACKEND_URL', 'amqp')
celery = Celery('tasks', backend=backend)
celery.conf.update(
CELERY_RESULT_SERIALIZER='json',
CELERY_ENABLE_UTC=True
)
#celery.task
def getOrders():
requests.get('locahost:4000/getOrders')
My file which I run to trigger tasks:
eta.js:
var celery = require('../celery'),
client = celery.createClient({
CELERY_BROKER_URL: 'amqp://guest:guest#localhost:5672//'
});
client.on('error', function(err) {
console.log(err);
});
client.on('connect', function() {
client.call('tasks.getOrders', {
eta: new Date(Date.now() + 15 * 1000) // an hour later
});
});
I start celery worker by using below command:
celery worker -A tasks -l info
Now when I run eta.js and my task 'getOrders' defined inside tasks.py gets triggers and it further hits the url I have requested and that URL does its work.
To run eta.js I run:
node eta.js
What I want is that my task 'getOrders' keeps running after every x seconds.
I have read about setting periodic tasks in celery periodic tasks celery but it is in python and I need this in node-celery.
I want to use celery beat in node-celery if it is possible or should I just avoid using node celery? and use celery python setup, I know python but couldn't find any link to setup celery in python.
Anyone having good knowledge about celery can help me or guide me to follow some tutorial.
thanks!

You can add the scheduled task to Redis directly so Redbeat will read this new task and it will execute it and reschedule it according to the crontab setup.
To do this:
Add the redbead redis url in your celery config
Create a periodic task
const task = {
name : "getOrdersTask",
task : "tasks.getOrders",
schedule: {
"__type__": "crontab",
minute : "*/5",
hour : "*",
day_of_week :
day_of_month : "*/7"
month_of_year : "[1-12]"
},
args : [],
enabled : true,
}
Store the task in redis with:
redisClient.hset("redbeat:getOrdersTask", "definition", JSON.stringify(task));
Insert the task into the tasks list (the 0 is to force Redbeat to execute it immediately so it will be rescheduled in the proper time according to the period config you selected, for this case every 5 minutes)
redisClient.zadd("redbeat::schedule", 0, "getOrdersTask")

Related

Nextflow and Preemptible Machines in Google Life Sciences

Consider a nextflow workflow that uses the Google Life Sciences API and uses preemptible machines. The config might look like this:
google {
project = "cool-name"
region = "cool-region"
lifeSciences {
bootDiskSize = "200 GB"
preemptible = True
}
}
Let's say you have only a single process and this process has the directive maxRetries = 5. After five retries (=after the 6th time), the process will be considered failed.
Is it somehow possible to specify in nextflow that after a certain number of retries were unsuccessful, that nextflow should request a non-preemptible machine instead and continue retrying a couple more times?

Long running jobs redelivering after broker visibility timeout with celery and redis

I am using celery 4.3 + Redis + flower. I have a few long-running jobs with acks_late=True and task_reject_on_worker_lost=True. I am using celery grouping to group jobs run parallelly and append result and use in parent job.
In this scenario, my few jobs will run more than an hour, after every one hour the same child jobs are redelivering again to the worker.
The sample jobs as below.
#app.task(queue='q1', bind=True, acks_late=True, task_reject_on_worker_lost=True, max_retries=3)
def job_1(self):
do_something()
task_group = group(job_2.s(batch) for batch in range(0, len([1,2,3,4,5,6]), 3))
result_group = task_group.apply_async()
#app.task(queue='q1', bind=True, acks_late=True, task_reject_on_worker_lost=True, max_retries=3)
def job_2(self, batch):
do_something()
return result
The above job_2 will run more than hour and after one hour the same is redelivering again to the worker.
My celery setup and config as shown below:
c = Celery(app.import_name,
backend=app.config['CELERY_RESULT_BACKEND'],
broker=app.config['CELERY_BROKER_URL'])
config.py
CELERY_BROKER_URL = os.environ['CELERY_BROKER_URL']
CELERY_RESULT_BACKEND = os.environ['CELERY_RESULT_BACKEND']
CELERY_BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 36000}
I tried to increase visibility timeout to 10 hours after the redelivering issue like above in configuration. but it looks like not working.
Please help with this issue and let me know if any solution.

Celery run task periodically only next 5 days

I have a celery task and i need it to run only next five days at 12.30am. How do i do this using celery-beat. I know how to run it periodically forever but not able to figure out for only next five days Any idea?
Take a look at the celery-beat docs for crontab. Though if this is literally a one time thing, then by definition, it isn't exactly periodic. You could set up a crontab periodic task to run at 12:30am for the next 5 days, but you would have to also remember to manually turn that off.
If you go this route
from celery.schedules import crontab
CELERYBEAT_SCHEDULE = {
'add-at-midnightish': {
'task': 'tasks.add',
'schedule': crontab(hour=0, minute=30,),
'args': (16, 16),
},
}
Alternatively, you could use the eta keyword on apply_async as mentioned in the celery faq

Celery fails to accept tasks

I'm adding a backend for Celery results, and I'm having an issue where I send tasks, and some are accepted while others aren't.
Tasks that are and aren't executed both show this log output:
[2014-06-09 15:50:59,091: INFO/MainProcess] Received task: tasks.multithread_device_listing[e3ae6d12-ad4b-4114-9383-5802c91541f2]
Ones that ARE executed then show this output:
[2014-06-09 15:50:59,093: DEBUG/MainProcess] Task accepted: tasks.multithread_device_listing[e3ae6d12-ad4b-4114-9383-5802c91541f2] pid:2810
While tasks that AREN'T executed never arrive at the above line.
How I send tasks:
from celery import group
from time import sleep
signatures = []
signature = some_method_with_task_decorator.subtask()
signatures.append(signature)
signature = some_other_method_with_task_decorator.subtask()
signatures.append(signature)
job = group(signatures)
result = job.apply_async()
while not result.ready():
sleep(60)
My celery config from having it report it is:
software -> celery:3.1.11 (Cipater) kombu:3.0.18 py:2.7.5
billiard:3.3.0.17 py-amqp:1.4.5
platform -> system:Darwin arch:64bit imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:amqp results:amqp://username:pass#localhost:5672/automated_reports
CELERY_QUEUES:
(<unbound Queue automated_reports -> <unbound Exchange default(direct)> -> automated_reports>,)
CELERY_DEFAULT_ROUTING_KEY: '********'
CELERY_INCLUDE:
('celery.app.builtins',
'automated_reports.queue.tasks',
'automated_reports.queue.subtasks')
CELERY_IMPORTS:
('automated_reports.queue.tasks', 'automated_reports.queue.subtasks')
CELERY_RESULT_PERSISTENT: True
CELERY_ROUTES: {
'automated_reports.queue.tasks.run_device_info_report': { 'queue': 'automated_reports'},
'uploader.queue.subtasks.multithread_device_listing': { 'queue': 'automated_reports'},
'uploader.queue.subtasks.multithread_individual_device': { 'queue': 'automated_reports'},
'uploader.queue.tasks.multithread_device_listing': { 'queue': 'automated_reports'},
'uploader.queue.tasks.multithread_individual_device': { 'queue': 'automated_reports'}}
CELERY_DEFAULT_QUEUE: 'automated_reports'
BROKER_URL: 'amqp://username:********#localhost:5672/automated_reports'
CELERY_RESULT_BACKEND: 'amqp://username:pass#localhost:5672/automated_reports'
My startup command is:
~/Documents/Development/automated_reports/bin/celery worker --loglevel=DEBUG --autoreload -A automated_reports.queue.tasks -Q automated_reports -B --schedule=~/Documents/Development/automated_reports/log/celerybeat --autoscale=10,3
Also, when I stop celery, it pulls tasks out of my queue that were never accepted. Then when I restart, it accepts them and executes them.
Any help with this behavior is much appreciated. I'm certain it has something to do with my backend configuration, but I'm having difficulty isolating the issue or its correction. Thanks!
I found the answer to this.
I noticed that the 'inqueue' seemed to be properly receiving tasks in some cases, but not others. When I searched the Celery docs, I found this note: http://celery.readthedocs.org/en/latest/whatsnew-3.1.html?highlight=inqueue#caveats
I was executing the subtasks from within a long-running task, so this sounded very much like the behavior I was seeing. Also, I'm on the version mentioned, whereas on previous versions I hadn't had this problem with the same config.
I added the -Ofair parameter to starting the worker, and it immediately resolved the issue.

Celery task schedule (Celery, Django and RabbitMQ)

I want to have a task that will execute every 5 minutes, but it will wait for last execution to finish and then start to count this 5 minutes. (This way I can also be sure that there is only one task running) The easiest way I found is to run django application manage.py shell and run this:
while True:
result = task.delay()
result.wait()
sleep(5)
but for each task that I want to execute this way I have to run it's own shell, is there an easy way to do it? May be some king custom ot django celery scheduler?
Wow it's amazing how no one understands this person's question. They are asking not about running tasks periodically, but how to ensure that Celery does not run two instances of the same task simultaneously. I don't think there's a way to do this with Celery directly, but what you can do is have one of the tasks acquire a lock right when it begins, and if it fails, to try again in a few seconds (using retry). The task would release the lock right before it returns; you can make the lock auto-expire after a few minutes if it ever crashes or times out.
For the lock you can probably just use your database or something like Redis.
You may be interested in this simpler method that requires no changes to a celery conf.
#celery.decorators.periodic_task(run_every=datetime.timedelta(minutes=5))
def my_task():
# Insert fun-stuff here
All you need is specify in celery conf witch task you want to run periodically and with which interval.
Example: Run the tasks.add task every 30 seconds
from datetime import timedelta
CELERYBEAT_SCHEDULE = {
"runs-every-30-seconds": {
"task": "tasks.add",
"schedule": timedelta(seconds=30),
"args": (16, 16)
},
}
Remember that you have to run celery in beat mode with the -B option
manage celeryd -B
You can also use the crontab style instead of time interval, checkout this:
http://ask.github.com/celery/userguide/periodic-tasks.html
If you are using django-celery remember that you can also use tha django db as scheduler for periodic tasks, in this way you can easily add trough the django-celery admin panel new periodic tasks.
For do that you need to set the celerybeat scheduler in settings.py in this way
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
To expand on #MauroRocco's post, from http://docs.celeryproject.org/en/v2.2.4/userguide/periodic-tasks.html
Using a timedelta for the schedule means the task will be executed 30 seconds after celerybeat starts, and then every 30 seconds after the last run. A crontab like schedule also exists, see the section on Crontab schedules.
So this will indeed achieve the goal you want.
Because of celery.decorators deprecated, you can use periodic_task decorator like that:
from celery.task.base import periodic_task
from django.utils.timezone import timedelta
#periodic_task(run_every=timedelta(seconds=5))
def my_background_process():
# insert code
Add that task to a separate queue, and then use a separate worker for that queue with the concurrency option set to 1.