Amazon EC2 Load Testing - testing

I am designing a AWS deployment solution for a new dynamic website project. I have acquired an EC2 instance for testing the environment. Need some help on how do I do a load testing on an Ec2 instance to determine how many HTTP requests it can safely handle... P.S. I am new to the AWS platform.
Thanks...

RedLine offers an EC2 Load Testing solution that will automate the distribution of load tests on your own EC2 instances.

Late to the party but could help someone in the future:
A possible tool for load tests, stress tests, whatever you may call them, is Apache JMeter, but there are plenty of alternatives.
A simple starting setup, further explained in this excellent tutorial on DigitalOcean, can exist of a Thread Group containing an HTTP Request Sampler and a View Results in Table Listener. The Thread Group can be used to configure the amount of "clients" you want to simulate. The Request Sampler will be used to configure the server's properties (hostname, path, etc). The Table View Listener outputs a handy CSV file that can be used to calculate means, compare different types of EC2 instances,...
JMeter is a beautiful program with a GUI that can be run on your local workstation, producing an XML file that can be executed on another EC2 instance, for instance. You can even do simple manual edits to the XML file on your server afterward, if necessary.
Take a look at Amazon's testing policy to make sure you're not doing anything illegal.

A couple of quick points;
Set the environment up exactly like it's supposed to run. If there's a database involved, you'll want to involve that in the testing too. Synthetic <?php echo "ok"; CPU based benchmarks won't help you much since normally very little of the time spent replying to HTTP requests is actual CPU time.
A recommendation is to use a service for the benchmarking. Setting load testing up is not without its complexities, and unless you consider benchmarking your core business, you're probably better off using something like Neustar to load and measure your site (there are many services, they're not necessarily what fits you best, just pulled one out of memory)
Of course you can set a load test up yourself, but getting that done right is not anything that can be described in a few sentences. There are very well paid people that only do that for a living :)

There is good experience in using curl-loader aka Davilka tool, also on Amazon EC2 env
http://curl-loader.sourceforge.net

Related

Overriding httpd/apache upstream proxy httpcode with another

I have some react code (written by someone else) that needs to be served. The preferred method is via a Google Storage Bucket, fronted by their Cloud CDN, and this works. However, due to some quirks in the code, there is a requirement to override 404s with 200s, and serve content from the homepage instead (i.e. if there is a 404, don't serve a 404, serve the content of the homepage and return as a 200 instead)
(If anyone is interested, this override currently is implemented in CloudFront on AWS. Google CDN does not provide this functionality yet)
So, if the code is served at "www.mysite.com/app/" and someone hits "www.mysite.com/app/not-here" (which would return a 404), what should happen is that the response should NOT be 404, but a 200 with the content being served from index.html instead.
I was able to get this working by bundling all the code inside a docker container and then using the solution here. However, this setup means if we have a code change, all the running containers need to be restarted, and the customer expects zero downtime, hence the bucket solution.
So I now need to do the same thing but with the files being proxied in (with the upstream being the CDN).
I cannot use the original solution since the files are no longer local, and httpd can't check for existence of something that is not local.
I've tried things like ProxyErrorOverride and ErrorDocument, and managed to get it to redirect, but that is not what is needed.
Does anyone know how/if this can be done?
If the question is: how to catch the 404 error provided by Cloud Storage when a file is missing with httpd/apache? I don't know.
However, I think that isn't the best solution. Serving files directly from Cloud Storage is convenient but not industrial.
Imagine, you deploy several broken files successively, how to rollback in a stable format?
The best is to package your different code release in an atomic bag, a container for instance. Each version are in a different container and performing a rollback is easier and consistent.
Now your "container restart" issue. I don't know on which platform you are running your container. If your run it on a Compute Engine (a VM) it's maybe the worse solution. Today, there is container orchestration system that allows you to deploy, scale up and down the containers, and to perform progressive rollout, to replace, without downtime, the existing running containers by a newer version.
Cloud Run is a wonderful serverless solution for that, you also have Kubernetes (GKE on Google Cloud) that you can use with Knative for a better developer experience.

S3: Service that replace access to the local file system with S3

I have an application that heavily uses the local file system. We need to port the application to use S3. What services are out there that will automate the access to the S3 without having to changing the source code of the application.
These services somehow mask the S3 FS as a local FS.
Thanks.
See FuseOverAmazon (or s3fs) but keep in mind that S3 is an eventual consistency data store and your app should be architected to take that into account. It's also important to note that trying to mount an S3 bucket as a file system has very poor performance.
Take a look at RioFS. Our project is an alternative to “s3fs” project, main advantages comparing to “s3fs” are: simplicity, the speed of operations and bugs-free code. Currently the project is in the “beta” state, but it's been running on several high-loaded fileservers for quite some time.
We are seeking for more people to join our project and help with the testing. From our side we offer quick bugs fix and will listen to your requests to add new features.
Hope it helps !

How to stub Active Resource?

My Active Resource connects to some stupid external service that takes a while to respond for whatever reason. This is a little too nagging. I would like to stub Active Resource during development to speed up my development time.
Is this a good thing to do? I think it is. If you think otherwise, please explain.
And is there a mechanism to stub it out based on a switch in environment configuration file, possibly any gem/plugin that you have used for this purpose?
What and how do you do all these in your experience?
I recommend using FakeWeb. I used this on a project recently and it allowed me to register a number of external urls with a predefined response. In your test setup you could do:
FakeWeb.register_uri(:get, %r|users.xml|, :body => File.read("spec/factories/xml/users.xml"))
Now whenever active resource requests anyhost.com/users.xml (in test environment), you'll instead immediately get the contents of the file your referred to. I like this approach because when you're testing a model, you don't really want to be testing the external service too. I'd leave that level of testing to an integration test.
This won't affect development or production environments, so you can use your stupid external service as usual.

Stop an in-progress query-string-authorized download on Amazon S3?

With Amazon S3, can I stop a query-string-authorized download that is in progress?
Are there other file download services that provide such a feature?
I'm not aware of a built in way to do this. If I understand your goal, you want to potentially stop an HTTP response mid-stream based on some custom rules you have. Is that right?
If so, perhaps you could write a very thin proxy to S3 that encapsulates this logic. If you ran the proxy on EC2 you wouldn't incur any additional bandwidth fees.
The downside is that you would have manage scaling the proxy (i.e. add more EC2 nodes based on traffic) so depending on your scaling requirements, this could require a bit of work. But the proxy script itself would probably be fairly trivial. Something like:
Make streaming HTTP request to S3 for object
for each x byte chunk in response from S3:
Check auth condition. Continue if valid. Break if not.
Send chunk to caller
I'm not aware of anyone that allows this. In general, the authentication is only checked once, when you begin downloading, but not thereafter.
Can you describe what you're trying to do more broadly?

Are there alternatives to CGI (and do I really need one)?

I am designing an application that is going to consist of 3-4 services that run as separate processes and are linked by a suitable IPC. The system is going to have a web interface and I want to use whatever webserver is there.
The web interface should be accessed under some URL that allows to have other URLs on the same webserver doing totally different things. I'm planning to use the path below that URL to specify what the web interface should do. It has facilities for use by other applications over the net and for humans to interact with in a browser.
Off the cuff, I'd work as follows:
make the webserver fire up a CGI process for every request it receives (like SetHandler in Apache)
let the CGI connect to the IPC
let it get whatever it needs from the backend services
let the CGI return HTML / XML and whatever HTTP Status based on the services' answers
Now, what I really want is to avoid the first two steps, or if I can't, avoid the second one, because I'm afraid that I'm wasting performance on unneccesary overhead (the requests coming from other applications might be frequent).
PHP, for example, can open persistent connections to a MySQL database that survive the script's runtime and don't need to be recreated next time, though I don't know how they actually do it. Also, as I understand it, the Apache modules are loaded once when the server starts, so that might remove the first step but would tie me to Apache.
So, what are good ways to hook a handler for specific URLs into different webservers? I don't want to handle the HTTP, otherwise I might just use a proxy setup to a second server, but it just seems to be so reinventing-the-wheel. If you think, CGI is fine and have examples where it handles large numbers of request of a similar structure, please let me know.
OK, I overlooked this previously. Explaining my question here brought me onto it:
Instead of creating a new process for every request, FastCGI can use a single persistent process which handles many requests over its lifetime. -- Wikipedia: FastCGI
Even under moderate loads, CGI is a pretty unscalable beast. FastCGI is an option, but you'll probably also find a mod_XXXX package where XXXX is the name of your language. There's a mod for ruby, perl, and python for instance and probably a fair few others.