Repeatedly running a background script in Rails3/Heroku Cedar deployment - ruby-on-rails-3

I am developing a Rails3 app which will run on Heroku Cedar stack and needs to constantly check for new tweets under a certain hashtag. I have the logic to do this in place but I would like to run this task in the background so as not to interfere with the main app performance. I also need to write any new tweets found to a database so I will need access to Active Record. I am looking for advise on what might be the best way to achieve this.

I do something similar, it doesn't matter for me if tweets are slightly out of date - we use the scheduler for 10 minute executions of a rake task which is watching a hashtag. We can change the frequency of the executing to hourly/daily should we feel 10 mins is too frequent.

You could use the Heroku scheduler to regularly execute a Rake task (or some other script).
Alternatively, if you're checking for Tweets in response to a certain user action or some other event, you could use a task queue like Delayed Job.

Related

Daemons-Rails: scaling up to multiple workers

So, I've been given a code base which uses daemons, daemons-rails and delayed jobs to trigger a number of *_ctl files in /lib/daemons/ but here's the problem:
If two people do an action which starts the daemons doing some heavy lifting then whichever one clicks second will have to wait for the first to complete. Not good. We need to start multiple daemons on each queue.
Ideally what I want to do is read a config file like this:
default:
queues: default ordering
num_workers: 10
hours:
queues: slow_admin_tasks
num_workers: 2
minutes:
queues: minute
num_workers: 2
This would mean that 10 daemon processes are started to listen to the default and ordering queues, 2 for slow_admin tasks etc.
How would I go about defining multiple daemons like this, it looks like it might be in one of these places:
/lib/daemons/*_ctl
/lib/daemons/*.rb
/lib/daemons/daemons
I thought it might be a change to the daemons-rails rake tasks, but they just hit the daemons file.
Has anyone looked in to scaling daemons-rails in this way? Where can I get more information?
I suggest you to try Foreman.
Take a look at this discution.
Foreman can help manage multiple processes that your Rails app depends upon when running in development. You can find a tutorial regarding Foreman on RailsCasts. There's a video tutorial + some source code examples.
I also suggest you to take a look at Sidekiq.
Sidekiq allows you to move jobs into the background for asynchronous processing. It uses threads instead of forks so it is much more efficient with memory compared to Resque. You can find a tutorial here.
I also suggest you to take a look at Resque
Resque is a Redis-backed Ruby library for creating background jobs, placing them on multiple queues, and processing them later. You can find a tutorial here.

Schedule background task with Sidekiq

I have a Rails 3 app deployed heroku. I have a Sidekiq worker at app/workers/task_worker.rb:
class TaskWorker
include Sidekiq::Worker
def perform
...
end
end
How to schedule execution of TaskWorker.perform_async daily at 12:01 a.m?
You might want to have a look at sidetiq too. https://github.com/tobiassvn/sidetiq The gem supports complex timing expressions via the ice_cube gem.
I personally found comfortable to have a gem that would integrate seemlessly with sidekiq.
Something like that should work:
class TaskWorker
include Sidekiq::Worker
include Sidetiq::Schedulable
recurrence do
daily.hour_of_day(0).minute_of_hour(1)
end
def perform
# do magic
end
end
Careful though when using this gem since there are some performance related issues with some time expressions. https://github.com/tobiassvn/sidetiq/wiki/Known-Issues. The expression I gave you should circumvent this issue though.
I don't like the overhead Sidetiq adds to Sidekiq so I sought out a different solution.
Apparently Heroku has a little-known, but free scheduler addon that allows you to run rake tasks every 10 minutes, hourly or daily. This is Heroku's answer to cron jobs and it's nice that it's a free add-on. It should work for most non-critical scheduling.
Heroku states in their docs that the scheduler is a "Best Effort" service which may occasionally (but rarely) miss a scheduled event. If it is critical that this job is run, you'll probably want to use a custom clock process. Custom clock processes are more reliable but they count toward your dyno hours. (And as such, incur fees just like any other process.)
Currently it looks like clockwork is the recommended clock process on Heroku.
I'm stating the obvious, but what's wrong with having a Cron Job that invokes the Sidekiq job every night at that time?

What is the best practice for running a scheduler or delayed job on Heroku?

I've seen several solutions for doing this:
Redis / Resque
Delayed Job
Heroku Scheduler
Clockwork
Heroku scheduler won't work because it runs at random times and only once per 10 minutes at its most frequent.
Running on Cedar. Running multiple web dynos.
EDIT: Here's what I want to do:
Call an arbitrary method with params at an arbitrary point in the future. Something like Schedule.set(Notification.send_update_to_user(574), Time.now + 1.days)
I would choose Sidekiq, though there are several other options suitable for your example. Sidekiq lets you schedule jobs to run at arbitrary times in the future:
NotificationUpdateWorker.perform_at(Time.now + 1.day, 574)
The delayed extensions would let you write instead:
Notification.delay_for(1.day).send_update_to_user(574)
Try with rufus/scheduler.
require 'rubygems'
require 'rufus/scheduler'
scheduler = Rufus::Scheduler.start_new
scheduler.every '1m' do
Checkin.check_checkin()
end
After looking at different options, I chose Delayed Job, which is well documented on Heroku.
For jobs that need to run at a certain time each day or once an hour, Heroku scheduler works well, but sometimes it doesn't run.

Rails 3 and scheduling of repetitive tasks without cron

I have a app and i am currently using delayed_job. I was wondering if there are any recommended gems that do scheduling of repetitive tasks.
I want to schedule task that happen on a certain frequency to clean the database/sending emails/run other methods.
I mite want to run some tasks every day or every hour.
Is there any good ones out there that are fairly easy to setup and config which do not use CRON.
You can convert that repetitive work to rakes and call those rakes via cron.
For setting cron over server, whenever is nice gem
checkout here - http://railscasts.com/episodes/164-cron-in-ruby
there are some very good gems about schedule repetitive tasks
i tried delayed job it is simple and easy to use you can watch rails cast to details of using it
You can try rufus-scheduler https://github.com/jmettraux/rufus-scheduler It is an app based scheduler (unless you use it's cron feature).

Database Job Scheduling

I have a procedure written in PLJava that sends out updates over JMS in my postgres database.
What I would like to do is have that function called on an interval (every 15 seconds) internally in the database (preferably not from an outside process). Is this possible? Any ideas?
If you need no external access, you are presumably able to modify the database design so that you don't need the update at all. Can you explain more about what the update is doing?
As depesz said, you could use either cron or pgAgent, but they are only able to go down to a one minute granularity, not 15 seconds. Considering sleeping inside the stored procedure until the next iteration is not a good idea, because you will have an open transaction for all that time which is a really bad idea.
Strict answer: it is not possible. Since you don't want outside process, and PostgreSQL doesn't support jobs - you are out of luck.
If you'll reconsider using outside processes, then you're most likely want something like cron, or better yet pgagent.
On absolutely other hand - what do you need to do that has to happen every 30 seconds? this seems like a problem with design.
First, you'll spend the least amount of effort if you just go with a cron job.
However, if you were starting from scracth: You are trying to periodically replicate rows from your database. I think you are looking at a replication queue.
The PGQ project (used for Londiste replication, both from Skype's SkyTools) has a queue that you can use independently. When configuring it, you set a maximum event count, and a loop delay, before batched events are generated. You can get batches spaced by no more than 15 seconds that way. You now have to produce the events that will be batched, using a trigger that calls pgq.insert_event; and consume the queues. The consumer can call your PL/Java stored proc; you'll have to rewrite the procedure to send everything in the batch instead of scanning the base table for new events.
As far as I know postgresql doesn't support scheduled tasks. You'll need to use a script with cron or at (depending on your operating system.)
Sounds like you're doing sort of replication? Every 15s sounds like a lot of updates. Could you setup a trigger (or a number of triggers) instead of polling?
If you are using JMS why not just have th task wait for input on the queue?
Per your depesz comment, you have a PL/Java stored procedure that "flushes out database tables (updates) as java objects". Since you want it to run in 15 second intervals, it must be processing a batch of updates each time. Rather than processing a batch of updates in a stored procedure every 15 seconds, why not process them one at a time when they happen via an after update trigger and eliminate the need for a timed interval. If you are aggregrating data from multiple tables to build your objects than add the triggers to you upper most tables only.
In my case the problem was that agent couldn't authorize to database so after I've made all connections trusted from localhost the service started successfully and job works fine
for more information about error you should see into windows event viewer or eq in unix based system. see my config file C:\Program Files\PostgreSQL\10\data\pg_hba.conf