Long running jobs redelivering after broker visibility timeout with celery and redis - redis

I am using celery 4.3 + Redis + flower. I have a few long-running jobs with acks_late=True and task_reject_on_worker_lost=True. I am using celery grouping to group jobs run parallelly and append result and use in parent job.
In this scenario, my few jobs will run more than an hour, after every one hour the same child jobs are redelivering again to the worker.
The sample jobs as below.
#app.task(queue='q1', bind=True, acks_late=True, task_reject_on_worker_lost=True, max_retries=3)
def job_1(self):
do_something()
task_group = group(job_2.s(batch) for batch in range(0, len([1,2,3,4,5,6]), 3))
result_group = task_group.apply_async()
#app.task(queue='q1', bind=True, acks_late=True, task_reject_on_worker_lost=True, max_retries=3)
def job_2(self, batch):
do_something()
return result
The above job_2 will run more than hour and after one hour the same is redelivering again to the worker.
My celery setup and config as shown below:
c = Celery(app.import_name,
backend=app.config['CELERY_RESULT_BACKEND'],
broker=app.config['CELERY_BROKER_URL'])
config.py
CELERY_BROKER_URL = os.environ['CELERY_BROKER_URL']
CELERY_RESULT_BACKEND = os.environ['CELERY_RESULT_BACKEND']
CELERY_BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 36000}
I tried to increase visibility timeout to 10 hours after the redelivering issue like above in configuration. but it looks like not working.
Please help with this issue and let me know if any solution.

Related

How to add 2 minutes delay between jobs in a queue?

I am using Hangfire in ASP.NET Core with a server that has 20 workers, which means 20 jobs can be enqueued at the same time.
What I need is to enqueue them one by one with 2 minutes delay between each one and another. Each job can take 1-45 minutes, but I don't have a problem running jobs concurrently, but I do have a problem starting 20 jobs at the same time. That's why changing the worker count to 1 is not practical for me (this will slow the process a lot).
The idea is that I just don't want 2 jobs to run at the same second since this may make some conflicts in my logic, but if the second job started 2 minutes after the first one, then I am good.
How can I achieve that?
You can use BackgroundJob.Schedule() to run your job run at a specific time:
BackgroundJob.Schedule(() => Console.WriteLine("Hello"), dateTimeToExecute);
Based on that set a date for the first job to execute, and then increase this date to 2 minutes for each new job.
Something like this:
var dateStartDate = DateTime.Now;
foreach (var j in listOfjobsToExecute)
{
BackgroundJob.Schedule(() => j.Run(), dateStartDate);
dateStartDate = dateStartDate.AddMinutes(2);
}
See more here:
https://docs.hangfire.io/en/latest/background-methods/calling-methods-with-delay.html?highlight=delay

APScheduler: Interval trigger with end_date

If I run the following:
def my_job_2(text):
for i in range(7):
print("\ni: ",text)
logging.basicConfig()
logging.getLogger('apscheduler').setLevel(logging.DEBUG)
scheduler = BackgroundScheduler()
#scheduler = BlockingScheduler()
scheduler.add_job(my_job_2,'interval',id="my_job_interval", seconds=10,
args=['Running a Job, every 10 seconds'], end_date=datetime(2021,2,22,10,15,0))
try:
scheduler.start()
except (KeyboardInterrupt, SystemExit):
print("Exception Caught")
According to the documentation, this should work. However, when I run it, the script prints:
INFO:apscheduler.scheduler:Added job "my_job_2" to job store "default"
INFO:apscheduler.scheduler:Scheduler started
DEBUG:apscheduler.scheduler:Looking for jobs to run
DEBUG:apscheduler.scheduler:No jobs; waiting until a job is added

Execute multiple pyiron jobs with dependencies

I have 4 jobs (A, B, C, D), which I want to start using pyiron. All jobs need to run on a remote cluster using SLURM. Some of the jobs need results from other jobs as input.
Ideally, I would like to have a workflow like:
Job A is started by the user.
Jobs B and C start automatically and in parallel (!) as soon as job A is done.
Job D starts automatically as soon as the jobs B and C are finished.
I realize that I could implement this in Jupyter using some if-conditions and the sleep-command.
However, the jobs A, B, and C could run for multiple days and I don't want to keep my Jupyter notebook running for so long.
Is there a more convenient way to realize these job dependencies in pyiron?
I guess the easiest way would be to submit the whole Jupyter notebook to the queue using the script job class:
job = pr.create.job.ScriptJob("script")
job.script_path = 'workflow.ipynb'
job.server.queue = 'my_queue'
job.server.cores = 32
job.run()
Here workflow.ipynb would be your current notebook, my_queue your SLURM queue for remote submission and 32 the total number of cores for allocation.

how to use celery beat with node-celery

I am using node-celery github link to implement celery celery first step along with rabbitmq.
As in celery we define task and then we push them.
My task which I have defined inside tasks.py is as below:
import os
import logging
from celery import Celery
import requests
backend = os.getenv('CELERY_BACKEND_URL', 'amqp')
celery = Celery('tasks', backend=backend)
celery.conf.update(
CELERY_RESULT_SERIALIZER='json',
CELERY_ENABLE_UTC=True
)
#celery.task
def getOrders():
requests.get('locahost:4000/getOrders')
My file which I run to trigger tasks:
eta.js:
var celery = require('../celery'),
client = celery.createClient({
CELERY_BROKER_URL: 'amqp://guest:guest#localhost:5672//'
});
client.on('error', function(err) {
console.log(err);
});
client.on('connect', function() {
client.call('tasks.getOrders', {
eta: new Date(Date.now() + 15 * 1000) // an hour later
});
});
I start celery worker by using below command:
celery worker -A tasks -l info
Now when I run eta.js and my task 'getOrders' defined inside tasks.py gets triggers and it further hits the url I have requested and that URL does its work.
To run eta.js I run:
node eta.js
What I want is that my task 'getOrders' keeps running after every x seconds.
I have read about setting periodic tasks in celery periodic tasks celery but it is in python and I need this in node-celery.
I want to use celery beat in node-celery if it is possible or should I just avoid using node celery? and use celery python setup, I know python but couldn't find any link to setup celery in python.
Anyone having good knowledge about celery can help me or guide me to follow some tutorial.
thanks!
You can add the scheduled task to Redis directly so Redbeat will read this new task and it will execute it and reschedule it according to the crontab setup.
To do this:
Add the redbead redis url in your celery config
Create a periodic task
const task = {
name : "getOrdersTask",
task : "tasks.getOrders",
schedule: {
"__type__": "crontab",
minute : "*/5",
hour : "*",
day_of_week :
day_of_month : "*/7"
month_of_year : "[1-12]"
},
args : [],
enabled : true,
}
Store the task in redis with:
redisClient.hset("redbeat:getOrdersTask", "definition", JSON.stringify(task));
Insert the task into the tasks list (the 0 is to force Redbeat to execute it immediately so it will be rescheduled in the proper time according to the period config you selected, for this case every 5 minutes)
redisClient.zadd("redbeat::schedule", 0, "getOrdersTask")

apscheduler at 90 second intervals?

Is it possible to set an apscheduler cron job to run at 90 second intervals? (I have 40 machines that I'd like to schedule evenly over an hour without hard coding time info into the script). I've tried various kinds of this:
job = sched.add_cron_job(_test, minute='*/1', second='30')
job = sched.add_cron_job(_test, minute='*', second='90')
Try this instead:
job = sched.add_interval_job(_test, seconds=90)
Based on your question you want to start a cron job at a particular time and run it indefinitely with 90 seconds interval. You can achieve this by combining triggers
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.triggers.combining import AndTrigger
from apscheduler.triggers.interval import IntervalTrigger
from apscheduler.triggers.cron import CronTrigger
def _test():
print("code comes here")
scheduler = BackgroundScheduler()
# Runs on 2019-12-30 at 5:30 (am) & repeats every 90 seconds interval
trigger = AndTrigger([IntervalTrigger(seconds=90),
CronTrigger(start_date='2019-12-30', hour=5, minute=30)])
scheduler.add_job(_test, trigger)
scheduler.start()
Interval code example:
sched = BlockingScheduler()
sched.add_job(ClassTest, 'interval', seconds=90)
sched.start()