So, I've been given a code base which uses daemons, daemons-rails and delayed jobs to trigger a number of *_ctl files in /lib/daemons/ but here's the problem:
If two people do an action which starts the daemons doing some heavy lifting then whichever one clicks second will have to wait for the first to complete. Not good. We need to start multiple daemons on each queue.
Ideally what I want to do is read a config file like this:
default:
queues: default ordering
num_workers: 10
hours:
queues: slow_admin_tasks
num_workers: 2
minutes:
queues: minute
num_workers: 2
This would mean that 10 daemon processes are started to listen to the default and ordering queues, 2 for slow_admin tasks etc.
How would I go about defining multiple daemons like this, it looks like it might be in one of these places:
/lib/daemons/*_ctl
/lib/daemons/*.rb
/lib/daemons/daemons
I thought it might be a change to the daemons-rails rake tasks, but they just hit the daemons file.
Has anyone looked in to scaling daemons-rails in this way? Where can I get more information?
I suggest you to try Foreman.
Take a look at this discution.
Foreman can help manage multiple processes that your Rails app depends upon when running in development. You can find a tutorial regarding Foreman on RailsCasts. There's a video tutorial + some source code examples.
I also suggest you to take a look at Sidekiq.
Sidekiq allows you to move jobs into the background for asynchronous processing. It uses threads instead of forks so it is much more efficient with memory compared to Resque. You can find a tutorial here.
I also suggest you to take a look at Resque
Resque is a Redis-backed Ruby library for creating background jobs, placing them on multiple queues, and processing them later. You can find a tutorial here.
I've seen several solutions for doing this:
Redis / Resque
Delayed Job
Heroku Scheduler
Clockwork
Heroku scheduler won't work because it runs at random times and only once per 10 minutes at its most frequent.
Running on Cedar. Running multiple web dynos.
EDIT: Here's what I want to do:
Call an arbitrary method with params at an arbitrary point in the future. Something like Schedule.set(Notification.send_update_to_user(574), Time.now + 1.days)
I would choose Sidekiq, though there are several other options suitable for your example. Sidekiq lets you schedule jobs to run at arbitrary times in the future:
NotificationUpdateWorker.perform_at(Time.now + 1.day, 574)
The delayed extensions would let you write instead:
Notification.delay_for(1.day).send_update_to_user(574)
Try with rufus/scheduler.
require 'rubygems'
require 'rufus/scheduler'
scheduler = Rufus::Scheduler.start_new
scheduler.every '1m' do
Checkin.check_checkin()
end
After looking at different options, I chose Delayed Job, which is well documented on Heroku.
For jobs that need to run at a certain time each day or once an hour, Heroku scheduler works well, but sometimes it doesn't run.
I am developing a Rails3 app which will run on Heroku Cedar stack and needs to constantly check for new tweets under a certain hashtag. I have the logic to do this in place but I would like to run this task in the background so as not to interfere with the main app performance. I also need to write any new tweets found to a database so I will need access to Active Record. I am looking for advise on what might be the best way to achieve this.
I do something similar, it doesn't matter for me if tweets are slightly out of date - we use the scheduler for 10 minute executions of a rake task which is watching a hashtag. We can change the frequency of the executing to hourly/daily should we feel 10 mins is too frequent.
You could use the Heroku scheduler to regularly execute a Rake task (or some other script).
Alternatively, if you're checking for Tweets in response to a certain user action or some other event, you could use a task queue like Delayed Job.
I want to know if there are any good solutions for autoscaling dynos AND workers on Heroku in a production environment (probably a different solution for each of those, as they are pretty unrelated). What are you/companies using, regarding this?
I found lots of options, but none of them seem really mature for a production environment.
There is Heroscale, which seem to introduce some latency as it does not run locally, and I also heard of some downtime. There are modifications of delayed_jobs, which have not been updated for a long time, and there are some issues with current bundlers. There is also some alternatives related to reque, which seem not to handle very well some HTTP exceptions, which results in app crashing, and others which seem to need an always-running worker to schedule other workers, and may also suffer from some HTTP exceptions problems.
Well. In the end. What is being used, nowadays, for autoscaling Heroku's dynos and workers on a production environment with Rails3?
Thanks in advance.
We ran into this a while ago and I spent quite a bit of time on this to my great frustration. I'll try to stick to the salient point. There are several Heroku autoscaling solution that seem decent at first glance.
The example that has already been given heroku-autoscaler is actually for autoscaling dynos and is pretty much the only solution out there that claims to do this (and it certainly doesn't do it well). Most others will only claim to autoscale workers for you. So, let's focus on that first. The autoscalers you'll look at for workers depend on what you're actually using for you background workers e.g. delayed_job, resque. Those are the most common background processing libs that people use, so the autoscalers will try to hook into one of them. You can use things like:
workless
hirefire
heroku-resque-auto-scale
etc
Some of these work on the Cedar stack some might need a bit of tweaking. The problem with all of them is that it's like trying to pull yourself out of the swamp by your own hair. Let's take hirefire as an example (it's probably the best one of the lot). It modifies delayed_job so that the workers themselves can look at the queue and spin up more workers if necessary, if there are no more jobs in the queue, the workers will all shut each other down. There are several problems:
if you want to put a job on the queue to be executed in the future as opposed to right now, you're out of luck. A worker starts up when jobs enter the queue, but since the job is to be executed in the future the worker will shut down and will not start up unless another job enters the queue (that's the only thing that prompts workers to start up)
you loose the ability to retry failed jobs, this is possible by default in delayed_job, but it takes a little while before a failed job is to be retried (and progressively longer) if it fail multiple times, but the workers will shut down during this time delay and there is nothing to prompt them to start up again (in essence this is the same issue as in the first scenario)
The thing that solves this problem is to have one worker running continuously it can therefore monitor the queue periodically and can execute jobs when necessary or even spin up more workers. But if you do that, you're not saving any money (you have a worker running continuously 24/7 and have to pay for that) and that's the whole premise behind autoscalers on heroku. In essence, if you only have occasional background processing to do, or you have background jobs that are likely to fail but succeed on retry, or you have background jobs that don't need to be executed instantly, there is no autoscaling library you can use that will work for you.
Here is one alternative. The guy who wrote Hirefire, later spun it off into a webapp (Hirefire app), the essence of which is to externally monitor your Heroku workers/dynos for you and spin up/shut down workers dynos as necessary. This was free in beta but it now costs money, less than what you'd pay to run a worker 24/7 but still not insignificant if you only need a few background jobs once in a while. Either way this is the only viable way to make sure your background job infrastructure does what you want (well that and rolling your own solution which means having a machine like an EC2 instance where you can put some scripts which will ping your heroku app and spin up/shut down workers as needed - a non-trivial amount of effort).
Now Hirefire app does offer to autoscale your dynos for you as well, it does this based on hooking in to the latency of your heroku request queue. However I found that this didn't work well, perhaps if you're close to the Amazon datacenter where your heroku app actually lives (we weren't), you might have a different experience. But, for us it unnecessarily spun up a whole bunch of dynos and would never spin them down no matter how much I tweaked the settings. You can put it down to the fact that it was a beta it may have improved since then, but that's the experience that I had.
Long story short, if you want to autoscale your workers, use Hirefire app, you'll be saving a lot less money than you thought, but it is still the cheapest option. If you want to autoscale dynos you're basically out of luck. This is just one of those limitations you live with for having the convenience of a platform like Heroku.
Heroku is offering a new add-on called AdeptScale which is now just out of Beta.
Here is the add-on page for AdeptScale
Here is the more detailed documentation for AdeptScale
Here is the form to sign up for Heroku's Beta Program
Hopefully this will be a robust solution for autoscaling Heroku Dynos, as I'm not still not happy with the current options.
Update (2/4/13): I signed up for Heroku's Beta program to try out this add-on, and its worked really well for me. Occasionally scaling up with traffic, but mostly sitting on the minimum number of dynos I've set of 2. It's greatly reduced my bill, and eliminated worry that I might be slow during peak usage times.
Update (3/6/13): Added link to Heroku's Sign up page for their beta program.
Update (4/14/13): Looks like auto-scaling is out of Beta. It's still working really well for me.
HireFire.io (The Service, not the Open Source Project) now allows you to use your New Relic metrics to auto-scale your web dynos. New Relic is a performance monitoring tool provided as an add-on through Heroku. They have a free tier and it's sufficient to use with HireFire.
You can auto-scale based on:
Response Time
This is the Response Time you find on the New Relic Dashboard. It's a combination of various factors including Request Queuing, Database Performance, App-Layer, Router, etc.
Apdex Score
This allows you to scale based on your New Relic Apdex Score, enabling you to scale based on user experience/satisfaction, which is determined by this score.
Aside of this we have become language/framework agnostic. For worker dynos all you have to do to get auto-scaling working is to setup a JSON end-point at a certain path in your app that returns a very simple JSON string containing the queue size (we provide convenient, but not required, macros for the Ruby language and some out-of-the-box support for Django apps, but like I said it works for any language/framework by manually setting up a JSON end-point - it's very easy). For web dynos, you can use the HireFire Metric Source with basically any language/framework, and the above mentioned New Relic Metric Source for languages/frameworks that are supported by New Relic (these are common languages such as Ruby, Python, Java, etc).
Disclaimer: I built HireFire.
I'm trying to find a good way to autoscale dyno's too.
https://github.com/ddollar/heroku-autoscale does this but has a disclaimer about its immaturity.
I've recently written a heroku auto scaling system called Heroku Vector:
https://github.com/wpeterson/heroku-vector
It allows you to scale multiple types of dynos based on different traffic sources. It currently supports NewRelic throughout and Sidekiq number of busy threads. As traffic goes up or down, it will scale the number of dynos up or down. It's a daemon process that can be run in its own dyno on Heroku or elsewhere.
I have a app and i am currently using delayed_job. I was wondering if there are any recommended gems that do scheduling of repetitive tasks.
I want to schedule task that happen on a certain frequency to clean the database/sending emails/run other methods.
I mite want to run some tasks every day or every hour.
Is there any good ones out there that are fairly easy to setup and config which do not use CRON.
You can convert that repetitive work to rakes and call those rakes via cron.
For setting cron over server, whenever is nice gem
checkout here - http://railscasts.com/episodes/164-cron-in-ruby
there are some very good gems about schedule repetitive tasks
i tried delayed job it is simple and easy to use you can watch rails cast to details of using it
You can try rufus-scheduler https://github.com/jmettraux/rufus-scheduler It is an app based scheduler (unless you use it's cron feature).