Can you alter the polling time interval (5 seconds) for delayed_job worker? - ruby-on-rails-3

Delayed job is great, but I would like to change its timer interval to be more frequent (every 2 second) to meet my special need.
Is there a config or hard-coding anywhere to change it?

With DJ 3.0 you can add this to the config/initializers/delayed_job_config.rb file:
Delayed::Worker.sleep_delay = 2

Try setting
Delayed::Worker.const_set("SLEEP", 2)
in your config/initializers/delayed_job_config.rb file.

Sure, just go to RAILS_ROOT/vendor/plugins/delayed_job/lib/delayed/worker.rb, look for the line
self.sleep_delay = 5
and change it to
self.sleep_delay = 2
or whatever you'd like
On an earlier version of DJ I set this to as little as 0.1 so that the jobs in the queue get picked up for processing almost instantly and it works just fine.

Related

How to run job every 15 mins using quartz in mule

How to configure corn expression to run the job every 15 mins.
Configured below one in the code but it is not working
fp.cron.expr=0 15 0 ? * *
Can you please help on this
You can use the Poll Scheduler in this case and give the Cron scheduler expression as
0 0/15 * 1/1 * ? *
For reference -->
https://docs.mulesoft.com/mule-user-guide/v/3.6/poll-schedulers
You expression is wrong, a job which run every 15 mn should look like this : 0 0/15 * * * ? *
Quartz connector is deprecated. When scheduling tasks, it’s recommended that you instead use the Poll Scope. It has option for Fixed Frequency Scheduler and Cron schedular. With Fixed frequency you can choose time units in MILLISECONDS, SECONDS, MINUTES, HOURS and DAYS.
You can try 0 0/15 * 1/1 * ? *. You can visit Cron Maker site to generate your cron expression.
Right expression is "0 0/15 * * * ?"
Visit https://www.quartz-scheduler.net/documentation/quartz-2.x/tutorial/crontriggers.html for further reference.
If the above expression is not working then you can also use a trigger like as below :
scheduler.Start();
IJobDetail job = JobBuilder.Create<JobName>().Build();
ITrigger trigger = TriggerBuilder.Create().StartNow().WithSimpleSchedule(x => x.WithIntervalInSeconds(15).RepeatForever()).Build();
scheduler.ScheduleJob(job, trigger);

Run Job every 4 days but first run should happen now

I am trying to setup APScheduler to run every 4 days, but I need the job to start running now. I tried using interval trigger but I discovered it waits the specified period before running. Also I tried using cron the following way:
sched = BlockingScheduler()
sched.add_executor('processpool')
#sched.scheduled_job('cron', day='*/4')
def test():
print('running')
One final idea I got was using a start_date in the past:
#sched.scheduled_job('interval', seconds=10, start_date=datetime.datetime.now() - datetime.timedelta(hours=4))
but that still waits 10 seconds before running.
Try this instead:
#sched.scheduled_job('interval', days=4, next_run_time=datetime.datetime.now())
Similar to the above answer, only difference being it uses add_job method.
scheduler = BlockingScheduler()
scheduler.add_job(dump_data, trigger='interval', days=21,next_run_time=datetime.datetime.now())

Is it possible to set dynamic download delay in scrapy?

I know that a constant delay can be set in
settings.py
DOWNLOAD_DELAY = 2
however, if I set the delay to 2s it is not efficient enough. If I set the DOWNLOAD_DELAY = 0.
The crawler is able to crawl about 10 pages. after that, the target page will return something like " you are requesting too frequently ".
What I want to do is the keep the download_delay to 0. once the "requesting too frequently" msg is found in the html. it change the delay to 2s. After a while it switch back to zero.
is there any module can do this? or any other better idea to handle such case?
Update:
I found that is a extension call AutoThrottle
but is it able to customize some logic like this??
if (requesting too frequently) is found
increase the DOWNLOAD_DELAY
If right after you get anti-spider page, then in 2 seconds you can get data page, then what you are asking probably requires writing a downloader middleware
that checks for anti-spider page, reset all scheduled requests to a renew-queue, start a looping call when spider is idle to get request from the renew-queue, (the looping interval is your hack for a new download delay), and try to decide when the download delay is not necessary again (requires some tests), then stop the looping and reschedule all the requests in renew-queue to scrapy scheduler. You will need to use redis queue in case of distributed crawl.
With download delay set to 0, in my experience throughput can go easily above 1000 items/min. If anti-spider page pops up after 10 responses, then it is not worth the effort.
Instead maybe you can try to find out how fast does your target server allow, may be 1.5s, 1s, 0.7s, 0.5s etc. Then maybe redesign your product takes into consideration the throughput your crawler can achieve.
You can use Auto Throttle extension now. It is turned off by default. You can add these parameters in your project's settings.py file to enable it.
AUTOTHROTTLE_ENABLED = True
# The initial download delay
AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
AUTOTHROTTLE_MAX_DELAY = 300
# The average number of requests Scrapy should be sending in parallel to
# each remote server
AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
AUTOTHROTTLE_DEBUG = True
Yes, You can use the time module to set the dynamic delay.
import time
for i in range(10):
*** Operations 1****
time.sleep( i )
*** Operations 2****
Now you can see the delay between Operations 1 and Operations 2.
Note:
the variable 'i' is in the form of seconds.

Add 30 second to current system time in selenium IDE

I am working on script where i have to write command such that it will add 30 second to current time.
I have got current time in my script, but facing problem in adding 30 second to the current time.
Is there any command/ solution for this solution?
Something like:
var timePlusThirtySeconds = new Date().getMilliseconds() + 30 * 1000;
Or are you actually talking about setting the system clock into the future.

Celery task schedule (Celery, Django and RabbitMQ)

I want to have a task that will execute every 5 minutes, but it will wait for last execution to finish and then start to count this 5 minutes. (This way I can also be sure that there is only one task running) The easiest way I found is to run django application manage.py shell and run this:
while True:
result = task.delay()
result.wait()
sleep(5)
but for each task that I want to execute this way I have to run it's own shell, is there an easy way to do it? May be some king custom ot django celery scheduler?
Wow it's amazing how no one understands this person's question. They are asking not about running tasks periodically, but how to ensure that Celery does not run two instances of the same task simultaneously. I don't think there's a way to do this with Celery directly, but what you can do is have one of the tasks acquire a lock right when it begins, and if it fails, to try again in a few seconds (using retry). The task would release the lock right before it returns; you can make the lock auto-expire after a few minutes if it ever crashes or times out.
For the lock you can probably just use your database or something like Redis.
You may be interested in this simpler method that requires no changes to a celery conf.
#celery.decorators.periodic_task(run_every=datetime.timedelta(minutes=5))
def my_task():
# Insert fun-stuff here
All you need is specify in celery conf witch task you want to run periodically and with which interval.
Example: Run the tasks.add task every 30 seconds
from datetime import timedelta
CELERYBEAT_SCHEDULE = {
"runs-every-30-seconds": {
"task": "tasks.add",
"schedule": timedelta(seconds=30),
"args": (16, 16)
},
}
Remember that you have to run celery in beat mode with the -B option
manage celeryd -B
You can also use the crontab style instead of time interval, checkout this:
http://ask.github.com/celery/userguide/periodic-tasks.html
If you are using django-celery remember that you can also use tha django db as scheduler for periodic tasks, in this way you can easily add trough the django-celery admin panel new periodic tasks.
For do that you need to set the celerybeat scheduler in settings.py in this way
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
To expand on #MauroRocco's post, from http://docs.celeryproject.org/en/v2.2.4/userguide/periodic-tasks.html
Using a timedelta for the schedule means the task will be executed 30 seconds after celerybeat starts, and then every 30 seconds after the last run. A crontab like schedule also exists, see the section on Crontab schedules.
So this will indeed achieve the goal you want.
Because of celery.decorators deprecated, you can use periodic_task decorator like that:
from celery.task.base import periodic_task
from django.utils.timezone import timedelta
#periodic_task(run_every=timedelta(seconds=5))
def my_background_process():
# insert code
Add that task to a separate queue, and then use a separate worker for that queue with the concurrency option set to 1.