Gems/Services for autoscaling Heroku's dynos and workers - ruby-on-rails-3

I want to know if there are any good solutions for autoscaling dynos AND workers on Heroku in a production environment (probably a different solution for each of those, as they are pretty unrelated). What are you/companies using, regarding this?
I found lots of options, but none of them seem really mature for a production environment.
There is Heroscale, which seem to introduce some latency as it does not run locally, and I also heard of some downtime. There are modifications of delayed_jobs, which have not been updated for a long time, and there are some issues with current bundlers. There is also some alternatives related to reque, which seem not to handle very well some HTTP exceptions, which results in app crashing, and others which seem to need an always-running worker to schedule other workers, and may also suffer from some HTTP exceptions problems.
Well. In the end. What is being used, nowadays, for autoscaling Heroku's dynos and workers on a production environment with Rails3?
Thanks in advance.

We ran into this a while ago and I spent quite a bit of time on this to my great frustration. I'll try to stick to the salient point. There are several Heroku autoscaling solution that seem decent at first glance.
The example that has already been given heroku-autoscaler is actually for autoscaling dynos and is pretty much the only solution out there that claims to do this (and it certainly doesn't do it well). Most others will only claim to autoscale workers for you. So, let's focus on that first. The autoscalers you'll look at for workers depend on what you're actually using for you background workers e.g. delayed_job, resque. Those are the most common background processing libs that people use, so the autoscalers will try to hook into one of them. You can use things like:
workless
hirefire
heroku-resque-auto-scale
etc
Some of these work on the Cedar stack some might need a bit of tweaking. The problem with all of them is that it's like trying to pull yourself out of the swamp by your own hair. Let's take hirefire as an example (it's probably the best one of the lot). It modifies delayed_job so that the workers themselves can look at the queue and spin up more workers if necessary, if there are no more jobs in the queue, the workers will all shut each other down. There are several problems:
if you want to put a job on the queue to be executed in the future as opposed to right now, you're out of luck. A worker starts up when jobs enter the queue, but since the job is to be executed in the future the worker will shut down and will not start up unless another job enters the queue (that's the only thing that prompts workers to start up)
you loose the ability to retry failed jobs, this is possible by default in delayed_job, but it takes a little while before a failed job is to be retried (and progressively longer) if it fail multiple times, but the workers will shut down during this time delay and there is nothing to prompt them to start up again (in essence this is the same issue as in the first scenario)
The thing that solves this problem is to have one worker running continuously it can therefore monitor the queue periodically and can execute jobs when necessary or even spin up more workers. But if you do that, you're not saving any money (you have a worker running continuously 24/7 and have to pay for that) and that's the whole premise behind autoscalers on heroku. In essence, if you only have occasional background processing to do, or you have background jobs that are likely to fail but succeed on retry, or you have background jobs that don't need to be executed instantly, there is no autoscaling library you can use that will work for you.
Here is one alternative. The guy who wrote Hirefire, later spun it off into a webapp (Hirefire app), the essence of which is to externally monitor your Heroku workers/dynos for you and spin up/shut down workers dynos as necessary. This was free in beta but it now costs money, less than what you'd pay to run a worker 24/7 but still not insignificant if you only need a few background jobs once in a while. Either way this is the only viable way to make sure your background job infrastructure does what you want (well that and rolling your own solution which means having a machine like an EC2 instance where you can put some scripts which will ping your heroku app and spin up/shut down workers as needed - a non-trivial amount of effort).
Now Hirefire app does offer to autoscale your dynos for you as well, it does this based on hooking in to the latency of your heroku request queue. However I found that this didn't work well, perhaps if you're close to the Amazon datacenter where your heroku app actually lives (we weren't), you might have a different experience. But, for us it unnecessarily spun up a whole bunch of dynos and would never spin them down no matter how much I tweaked the settings. You can put it down to the fact that it was a beta it may have improved since then, but that's the experience that I had.
Long story short, if you want to autoscale your workers, use Hirefire app, you'll be saving a lot less money than you thought, but it is still the cheapest option. If you want to autoscale dynos you're basically out of luck. This is just one of those limitations you live with for having the convenience of a platform like Heroku.

Heroku is offering a new add-on called AdeptScale which is now just out of Beta.
Here is the add-on page for AdeptScale
Here is the more detailed documentation for AdeptScale
Here is the form to sign up for Heroku's Beta Program
Hopefully this will be a robust solution for autoscaling Heroku Dynos, as I'm not still not happy with the current options.
Update (2/4/13): I signed up for Heroku's Beta program to try out this add-on, and its worked really well for me. Occasionally scaling up with traffic, but mostly sitting on the minimum number of dynos I've set of 2. It's greatly reduced my bill, and eliminated worry that I might be slow during peak usage times.
Update (3/6/13): Added link to Heroku's Sign up page for their beta program.
Update (4/14/13): Looks like auto-scaling is out of Beta. It's still working really well for me.

HireFire.io (The Service, not the Open Source Project) now allows you to use your New Relic metrics to auto-scale your web dynos. New Relic is a performance monitoring tool provided as an add-on through Heroku. They have a free tier and it's sufficient to use with HireFire.
You can auto-scale based on:
Response Time
This is the Response Time you find on the New Relic Dashboard. It's a combination of various factors including Request Queuing, Database Performance, App-Layer, Router, etc.
Apdex Score
This allows you to scale based on your New Relic Apdex Score, enabling you to scale based on user experience/satisfaction, which is determined by this score.
Aside of this we have become language/framework agnostic. For worker dynos all you have to do to get auto-scaling working is to setup a JSON end-point at a certain path in your app that returns a very simple JSON string containing the queue size (we provide convenient, but not required, macros for the Ruby language and some out-of-the-box support for Django apps, but like I said it works for any language/framework by manually setting up a JSON end-point - it's very easy). For web dynos, you can use the HireFire Metric Source with basically any language/framework, and the above mentioned New Relic Metric Source for languages/frameworks that are supported by New Relic (these are common languages such as Ruby, Python, Java, etc).
Disclaimer: I built HireFire.

I'm trying to find a good way to autoscale dyno's too.
https://github.com/ddollar/heroku-autoscale does this but has a disclaimer about its immaturity.

I've recently written a heroku auto scaling system called Heroku Vector:
https://github.com/wpeterson/heroku-vector
It allows you to scale multiple types of dynos based on different traffic sources. It currently supports NewRelic throughout and Sidekiq number of busy threads. As traffic goes up or down, it will scale the number of dynos up or down. It's a daemon process that can be run in its own dyno on Heroku or elsewhere.

Related

Use IronWorkers while using my work

My website is hosted on AWS Elastic Beanstalk (PHP). I use Yii Framework as an MVC.
A while ago I wanted to run a SQL query everyday. I looked up how to run crons on Beanstalk and it seemed complicated to merge the concepts of Cloud and Cron. I ran into Iron Worker (http://www.iron.io/worker), and managed to create a worker that is currently doing its job fine.
Today I want to run a more complex cron (Look for notifications in my database, decide whether to send an email, build an email template and send the email (via AWS SES).
From what I understand, worker files are supposed to be self-contained items, with everything they need to work.
However, I have invested a lot of time and effort in building my MVC. I have complex models, verifications, an email templating engine, etc...
It seems very difficult to use the work I've done to create an Iron Worker. Even if I managed to port all of my code to a worker (which seems like a great deal of work), it means anytime I make changes to my main code I need to make sure the worker also has those changes. It means I would have a "branch" of my code. Even more so if I want to create more workers in the future.
What is the correct approach?
Short-term, you could likely just use the scheduling capabilities in IronWorker and have the worker hit an endpoint in your application. The endpoint will then trigger the operations to run within your app environment.
Longer-term, we do suggest you look at more of a service-oriented approach whereby you break your application up to be more loose-coupled and distributed. Here's a post on the subject. The advantages are many especially around scalability and development agility.
https://blog.heroku.com/archives/2013/12/3/end_monolithic_app
You can also take a look at this YII addition.
http://www.yiiframework.com/extension/yiiron/
Certainly don't want you rewrite your app unnecessarily but there are likely areas where you can look to decouple. Suggest creating a worker directory and making efforts to write the workers to be self-contained. In that way, you could run them in a different environment and just pass payloads to the worker. (Push queues can also be used to push to these workers.) Once you get used to distributed async processing, it's a pretty easy process to manage.
(Note: I work at Iron.io)

BIND9.7. When several named processes are running, how to judge which process is providing the service?

For example, I execute "sudo named" several times, so there are several named processes running. When I use "pidof named", I get several pids.
I want to calculate the CPU usage rate of the BIND process,so I need to get some parameters from "/proc/pid/stat", so I need the pid of the named process which is really providing the domain resolution service.
What's the difference between the named process which is providing the service and the others? Could you give me a detailed explanation?
thanks very much~
(It's my first time to use stackoverflow , to use English to ask quetions , please ignore those syntax errors.)
There should be just one named running, the scripts that manage the service ensure that. You shouldn't start it like that, you should use what your distribution uses to start it, probably something along the lines of service bind start (that is probably a RedHat-ism), or /etc/rc.d/bind start (for bog-standard SysVinit).
I was responsible for DNS for quite some time here. Some tips:
DNS is a very critical service, configure and monitor with extreme care. Do read up on setting up and managing this, don't go ahead until you are absolutely clear.
Get somebody as a backup for the case that you aren't available, and make sure they understand the previous point.
DNS isn't CPU intensive (OK, with signed domains and that newfangled stuff that might have changed), it is memory intensive (and network intensive, or at least sensitive to delays). Our main DNS server was running for months at a time, and clocked up some half hour of CPU time during that kind of period IIRC.
Separate your master server (responsible for the domain(s) from the servers queried by clients (caching servers). There have been vulnerabilities where malformed questions or "answers" to questions that hadn't been asked soiled the database
The master server will have all the domain information in RAM, make sure you have got enough of it
Make sure all machines under your jurisdiction use the same caching server. It makes no sense for more than one, that destroys the idea of cache.
The caching servers collect immense amounts of data over time. This data rarely is performance critical, so configure them with plenty of swap space to accommodate overflows.
Bind issues as many named processes as many CPUs you have:
man named:
-n #cpus
Create #cpus worker threads to take advantage of multiple CPUs. If not specified, named will try to determine the number of CPUs present and create one thread per CPU. If it is unable to determine the number of CPUs, a single worker thread will be created.
External source:
https://unix.stackexchange.com/questions/140986/multiple-named-processes-for-bind9-in-debian

On NServiceBus Profiles

I've been trying to find out ways to improve our nservicebus code performance. I searched and stumbled on these profiles that you can set upon running/installing the nservicebus host.
Currently we're running the nservicebus host as-is, and I read that by default we are using the "Lite" version of the available profiles. I've also learnt from this link:
http://docs.particular.net/nservicebus/hosting/nservicebus-host/profiles
that there are Integrated and Production profiles. The documentation does not say much - has anyone tried the Production profiles and noticed an improvement in nservicebus performance? Specifically affecting the speed in consuming messages from the queues?
One major difference between the NSB profiles is how they handle storage of subscriptions.
The lite, integration and production profiles allow NSB to configure how reliable it is. For example, the lite profile uses in-memory subscription storage for all pub/sub registrations. This is a concern because in order to register a subscriber in the lite profile, the publisher has to already be running (so the publisher can store the subscriber list in memory). What this means is that if the publisher crashes for any reason (or is taken offline), all the subscription information is lost (until each subscriber is restarted).
So, the lite profile is good if you are running on a developer machine and want to quickly test how your services interact. However, it is just not suitable to other environments.
The integration profile stores subscription information on a local queue. This can be good for simple environments (like QA etc.). However, in a highly distributed environment holding the subscription information in a database is best, hence the production profile.
So, to answer your question, I don't think that by changing profiles you will see a performance gain. If anything, changing from the lite profile to one of the other profiles is likely to decrease performance (because you incur the cost of accessing queue or database storage).
Unless you tuned the logging yourself, we've seen large improvements based on reduced logging. The performance from reading off the queues is same all around. Since the queues are local, you won't gain much from the transport. I would take a look at tuning your handlers and the underlying infrastructure. You may want to check out tuning MSMQ and look at the disk you are using etc. Another spot would be to look at how distributed transactions are working assuming you are using a remote database that requires them.
Another option to increase processing time is to increase the number of threads consuming the queue. This will require a license. If a license is not an option you can have multiple instances of a single threaded endpoint running. This requires you shard your work based on message type or something else.
Continuing up the scale you can then get into using the Distributor to load balance work. Again this will require a license, but you'll be able to add more nodes as necessary. All of the opportunities above also apply to this topology.

Periodic tasks inside WCF service hosted in IIS

We would like to have some periodic actions executed by our WCF service hosted in IIS. What is the best way to do this? Creating a timer doesn't look as a good solution. Creating a windows service that would behave as some kind of a heart beat looks like a problem solution, but it still doesn't smell good. What approach will be a good solution to this problem?
That depends on what your action is trying to do. If it's a database related clean up action, e.g. deleting orphaned shopping carts, you could schedule a job for this in your database of choice, like SQL Server's very reliable job engine. A Windows service would be a great candidate if it's an OS based action like periodic clean up/deletion of files etc. Since an IIS/WCF service is usually designed more to handle external responses I don't think it'd be wrong to use the service layers of the OS or DB for your task.
I used to run into tasks like this in my PHP days, when I would want to schedule an email to be sent at a given time. After many months of tinkering (mainly trying to handle calls to a page that may never come in), I eventually came to the conclusion that an essentially stateless bit of code is not the place to do it, and scheduled a cron job to fire each night.
I'd definitely recommend going down the route of an externally triggered job (either in SQL, a windows service, etc) and handling your operations from there. The pain, as I know to my cost, is just not worth the return.
I have struggled much with this, and have, in some cases, where clean-up is required, just done an asynchronous (background) task on the back of a common function to do period clean-up, i.e. On GetCommonList(), I do a check in settings/appsetting for lastrun and then kick it off once a day or every 5 minutes, etc. That way, if the app goes to greener pastures (which does happen), I don't need to worry about any lingering tasks running somewhere. Doesn't work in all cases, but security, etc. is also automatically taken care of - whereas services, etc. you may still have issues with that. Just my 2c.

Deploying on EC2

This question is for anyone who has actually used Amazon EC2. I'm looking into what it would take to deploy a server there.
It looks like I can start in VirtualBox, setup my server and then export the image using the provided ec2-tools.
What gets tricky is if I actually want to make configuration changes to my running server, they will not be persistent.
I have some PHP code that I need to be able to deploy (and redeploy) to the system, so I was thinking that EBS would be a good choice there.
I have a massive amount of data that I need stored, but it just so happens that latency is not an issue, so I was thinking something like s3fs might work.
So my question is... What would you do? What does your configuration look like? What have been particular challenges that perhaps you didn't see coming?
We have deployed a large-scale commercial app in the AWS environment.
There are three basic approaches to keeping your changes under control once the server is running, all of which we use in different situations:
Keep the changes in source control. Have a script that is part of your original image that can pull down the latest and greatest. You can pull down PHP code, Apache settings, whatever you need. If you need to restart your instance from your AMI (Amazon Machine Image), just run your script to get the latest code and configuration, and you're good to go.
Use EBS (Elastic Block Storage). EBS is like a big external hard drive that you can attach to your instance. Even if your instance goes away, EBS survives. If you later need two (or more) identical instances, you can give each one of them access to what you save in EBS. See https://stackoverflow.com/a/3630707/141172
Burn a new AMI after each change. There's a tool to create a new AMI from a running instance. If EBS is like having an external hard drive, creating a new AMI is like having a DVD-R. You can save the current state of your machine to it. Next time you have to start a new instance, base it on that new AMI. Good to go.
I recommend storing your PHP code in a repository such as SVN, and writing a script that checks the latest code out of the repository and redeploys it when you want to upgrade. You could also have this script run on instance startup so that you get the latest code whenever you spin up a new instance; saves on having to create a new AMI every time.
The main challenge that I didn't see coming with EC2 is instance startup time - especially with Windows. Linux instances take 5 to 10 minutes to launch, but I've seen Windows instances take up to 40 minutes; this can be an issue if you want to do dynamic load balancing and start up new instances when your load increases.
I'd suggest the best bet is to simply 'try it'. The charges to run a small instance are not high and data transfer rates are very low - I have moved quite a few GB and my data fees are still less than a dollar(!) in my first month. You will likely end up paying mostly for system time rather than data I suspect.
I haven't deployed yet but have run up an instance, migrated it from Ubuntu 8.04 to 8.10, tried different port security settings, seen what sort of access attempts unknown people have tried (mostly looking for phpadmin), run some testing against it and generally experimented with the config and restart of the components I'm deploying. It has been a good prelude to my end deployment. I won't be starting with a big DB so will be initially sticking with the standard EC2 instance space.
The only negativity I have heard it that some spammers have made some of the IP ranges subject to spam-blocking - but have not yet confirmed that.
Your virtual box approach I will suggest you take after you are more familiar with the EC2 infrastructure. I suggest that you go to EC2, open an account and follow Amazon's EC2 getting-started guide. This guide will give you enough overview on all things (EBS, IP, CONNECTIONS, and otherS) to get you started. We are currently using EC2 for production and the way we started was like I am explaining here.
I hope you become a Cloud Expert Soon.
Per timbo's concern, I was able to nab an IP that, so far hasn't legitimately shown up on any spam lists. You will have a few hiccups since many blacklists are technically whitelists and will have every IP on their list until otherwise notified that a Mail Server is running on that IP. It's really easy to remove, most of them have automated removal request forms and every one that doesn't has been very cooperative in removing me from their lists. Just be professional, ask if they can give a time and reason for the block and what steps you should take to remove your IP. All the services I have emailed never asked me to jump through any hoops, within two or three business days they all informed me my IP had been removed.
Still, if you plan on running a mail server I would recommend reserving IPs now. They're 1 cent per every hour they are not bound to an instance so it works out to being about $7 a month. I went ahead and reserved an extra one as I plan on starting up another instance soon.
I have deployed some simple stuff to EC2 Win2k3 instances. Here's my advice:
Find a tutorial. Sign up for the service. Just spend an afternoon setting up your first server. It's pretty darned easy, though there will be obstacles to overcome. It's not too tough.
When I was fooling with EC2 I think I spent like $2.00 setting up a server and playing with it for a while.
Some of your data will be persistent, but you can connect S3 to EC2 as well.
Just go for it!
With regards to the concerns about blacklisting of mail servers, you can also use Amazon's Simple Email Service (SES), which obviates the need to run the mail server on the EC2 instances.
I had trouble with this as well, but posted a note here in their forums - https://forums.aws.amazon.com/thread.jspa?threadID=80158&tstart=0