APScheduler: Interval trigger with end_date - apscheduler

If I run the following:
def my_job_2(text):
for i in range(7):
print("\ni: ",text)
logging.basicConfig()
logging.getLogger('apscheduler').setLevel(logging.DEBUG)
scheduler = BackgroundScheduler()
#scheduler = BlockingScheduler()
scheduler.add_job(my_job_2,'interval',id="my_job_interval", seconds=10,
args=['Running a Job, every 10 seconds'], end_date=datetime(2021,2,22,10,15,0))
try:
scheduler.start()
except (KeyboardInterrupt, SystemExit):
print("Exception Caught")
According to the documentation, this should work. However, when I run it, the script prints:
INFO:apscheduler.scheduler:Added job "my_job_2" to job store "default"
INFO:apscheduler.scheduler:Scheduler started
DEBUG:apscheduler.scheduler:Looking for jobs to run
DEBUG:apscheduler.scheduler:No jobs; waiting until a job is added

Related

Scrapyd: How to cancel all jobs with one command?

I am running over 40 spiders which are until now scheduled via cron and issued via scrapy crawl Due to several reasons I am now switching to scrapyd, one of them is to be able to see which jobs are running in case I need to do maintenance and reboot - so I can cancel a job.
Is it possible to cancel multiple jobs at once? I noticed that multiple jobs might be running at once with many waiting in quene with status "pending". Stopping the crawl might therefore require multiple calls of the cancel.json endpoint.
How to stop (or better pause) all jobs?
The scrapyd API (as of v1.3.0) does not support pausing. It does have stopping one job per call, however, so you have to loop the jobs yourself
I took #kolas's script from this question and update it to work with python 3.
import json, os
PROJECT_NAME = "MY_PROJECT"
cd = os.system('curl http://localhost:6800/listjobs.json?project={} > kill_job.text'.format(PROJECT_NAME))
with open('kill_job.text', 'r') as f:
a = json.loads(f.readlines()[0])
pending_jobs = list(a.values())[2]
for job in pending_jobs:
job_id = job['id']
kill = 'curl http://localhost:6800/cancel.json -d project={} -d job={}'.format(PROJECT_NAME, job_id)
os.system(kill)

Long running jobs redelivering after broker visibility timeout with celery and redis

I am using celery 4.3 + Redis + flower. I have a few long-running jobs with acks_late=True and task_reject_on_worker_lost=True. I am using celery grouping to group jobs run parallelly and append result and use in parent job.
In this scenario, my few jobs will run more than an hour, after every one hour the same child jobs are redelivering again to the worker.
The sample jobs as below.
#app.task(queue='q1', bind=True, acks_late=True, task_reject_on_worker_lost=True, max_retries=3)
def job_1(self):
do_something()
task_group = group(job_2.s(batch) for batch in range(0, len([1,2,3,4,5,6]), 3))
result_group = task_group.apply_async()
#app.task(queue='q1', bind=True, acks_late=True, task_reject_on_worker_lost=True, max_retries=3)
def job_2(self, batch):
do_something()
return result
The above job_2 will run more than hour and after one hour the same is redelivering again to the worker.
My celery setup and config as shown below:
c = Celery(app.import_name,
backend=app.config['CELERY_RESULT_BACKEND'],
broker=app.config['CELERY_BROKER_URL'])
config.py
CELERY_BROKER_URL = os.environ['CELERY_BROKER_URL']
CELERY_RESULT_BACKEND = os.environ['CELERY_RESULT_BACKEND']
CELERY_BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 36000}
I tried to increase visibility timeout to 10 hours after the redelivering issue like above in configuration. but it looks like not working.
Please help with this issue and let me know if any solution.

Run Job every 4 days but first run should happen now

I am trying to setup APScheduler to run every 4 days, but I need the job to start running now. I tried using interval trigger but I discovered it waits the specified period before running. Also I tried using cron the following way:
sched = BlockingScheduler()
sched.add_executor('processpool')
#sched.scheduled_job('cron', day='*/4')
def test():
print('running')
One final idea I got was using a start_date in the past:
#sched.scheduled_job('interval', seconds=10, start_date=datetime.datetime.now() - datetime.timedelta(hours=4))
but that still waits 10 seconds before running.
Try this instead:
#sched.scheduled_job('interval', days=4, next_run_time=datetime.datetime.now())
Similar to the above answer, only difference being it uses add_job method.
scheduler = BlockingScheduler()
scheduler.add_job(dump_data, trigger='interval', days=21,next_run_time=datetime.datetime.now())

How to manually run a Sidekiq job

I have an application which uses Sidekiq. The web server process will sometimes put a job on Sidekiq, but I won't necessarily have the worker running. Is there a utility which I could call from the Rails console which would pull one job off the Redis queue and run the appropriate Sidekiq worker?
Here's a way that you'll likely need to modify to get the job you want (maybe like g8M suggests above), but it should do the trick:
> job = Sidekiq::Queue.new("your_queue").first
> job.klass.constantize.new.perform(*job.args)
If you want to delete the job:
> job.delete
Tested on sidekiq 5.2.3.
I wouldn't try to hack sidekiq's API to run the jobs manually since it could leave some unwanted internal state but I believe the following code would work
# Fetch the Queue
queue = Sidekiq::Queue.new # default queue
# OR
# queue = Sidekiq::Queue.new(:my_queue_name)
# Fetch the job
# job = queue.first
# OR
job = queue.find do |job|
meta = job.args.first
# => {"job_class" => "MyJob", "job_id"=>"1afe424a-f878-44f2-af1e-e299faee7e7f", "queue_name"=>"my_queue_name", "arguments"=>["Arg1", "Arg2", ...]}
meta['job_class'] == 'MyJob' && meta['arguments'].first == 'Arg1'
end
# Removes from queue so it doesn't get processed twice
job.delete
meta = job.args.first
klass = meta['job_class'].constantize
# => MyJob
# Performs the job without using Sidekiq's API, does not count as performed job and so on.
klass.new.perform(*meta['arguments'])
# OR
# Perform the job using Sidekiq's API so it counts as performed job and so on.
# klass.new(*meta['arguments']).perform_now
Please let me know if this doesn't work or if someone knows a better way to do this.

apscheduler at 90 second intervals?

Is it possible to set an apscheduler cron job to run at 90 second intervals? (I have 40 machines that I'd like to schedule evenly over an hour without hard coding time info into the script). I've tried various kinds of this:
job = sched.add_cron_job(_test, minute='*/1', second='30')
job = sched.add_cron_job(_test, minute='*', second='90')
Try this instead:
job = sched.add_interval_job(_test, seconds=90)
Based on your question you want to start a cron job at a particular time and run it indefinitely with 90 seconds interval. You can achieve this by combining triggers
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.triggers.combining import AndTrigger
from apscheduler.triggers.interval import IntervalTrigger
from apscheduler.triggers.cron import CronTrigger
def _test():
print("code comes here")
scheduler = BackgroundScheduler()
# Runs on 2019-12-30 at 5:30 (am) & repeats every 90 seconds interval
trigger = AndTrigger([IntervalTrigger(seconds=90),
CronTrigger(start_date='2019-12-30', hour=5, minute=30)])
scheduler.add_job(_test, trigger)
scheduler.start()
Interval code example:
sched = BlockingScheduler()
sched.add_job(ClassTest, 'interval', seconds=90)
sched.start()