Overriding httpd/apache upstream proxy httpcode with another - apache

I have some react code (written by someone else) that needs to be served. The preferred method is via a Google Storage Bucket, fronted by their Cloud CDN, and this works. However, due to some quirks in the code, there is a requirement to override 404s with 200s, and serve content from the homepage instead (i.e. if there is a 404, don't serve a 404, serve the content of the homepage and return as a 200 instead)
(If anyone is interested, this override currently is implemented in CloudFront on AWS. Google CDN does not provide this functionality yet)
So, if the code is served at "www.mysite.com/app/" and someone hits "www.mysite.com/app/not-here" (which would return a 404), what should happen is that the response should NOT be 404, but a 200 with the content being served from index.html instead.
I was able to get this working by bundling all the code inside a docker container and then using the solution here. However, this setup means if we have a code change, all the running containers need to be restarted, and the customer expects zero downtime, hence the bucket solution.
So I now need to do the same thing but with the files being proxied in (with the upstream being the CDN).
I cannot use the original solution since the files are no longer local, and httpd can't check for existence of something that is not local.
I've tried things like ProxyErrorOverride and ErrorDocument, and managed to get it to redirect, but that is not what is needed.
Does anyone know how/if this can be done?

If the question is: how to catch the 404 error provided by Cloud Storage when a file is missing with httpd/apache? I don't know.
However, I think that isn't the best solution. Serving files directly from Cloud Storage is convenient but not industrial.
Imagine, you deploy several broken files successively, how to rollback in a stable format?
The best is to package your different code release in an atomic bag, a container for instance. Each version are in a different container and performing a rollback is easier and consistent.
Now your "container restart" issue. I don't know on which platform you are running your container. If your run it on a Compute Engine (a VM) it's maybe the worse solution. Today, there is container orchestration system that allows you to deploy, scale up and down the containers, and to perform progressive rollout, to replace, without downtime, the existing running containers by a newer version.
Cloud Run is a wonderful serverless solution for that, you also have Kubernetes (GKE on Google Cloud) that you can use with Knative for a better developer experience.

Related

Host Blazor WASM in S3 bucket

I am trying to host my Blazor WASM application on my S3 bucket under a sub path, but I can't really get it to work like I would expect it to.
In order to get it to work, I had to make some changes which I would've preferred not needing to make, especially since they break the application on my development build, so I always have to change between the 2 when I publish / want to develop
In my application I have 2 pages, lets say page1 and page2. On application startup, I want to see page 1. this page has the #page "/" on top. The other page has #page ="/page2". These same paths are then found in my NavMenu.razor.
Now locally I don't care for sub paths, so I'm fine with the application being hosted at localhost:port/ which would go to page 1, and localhost:port/page2 which would go to page 2. In order to get it working in my S3 bucket however, where my application is hosted under s3Url/subpath/{published files}, I had to make some changes: in my index.html I put the
I find myself unable to make all of these changes through appsettings or environment, which makes it time consuming, and error prone to accidentally make errors and push the wrong settings to my s3 bucket when i publish.
Is there any way to handle a sub path better? In the documentation I found this the following article where they talk about this exact issue, and they say it can be fixed by running it locally using this command:
dotnet run --pathbase=/{RELATIVE URL PATH (no trailing slash)}
but even if I do that, since I had to edit all page names and nav links it still doesn't work.
I feel like there must be a better way for me to handle this, so that it works on both development and also on publish, without having to make additional code changes whenever I publish, and then undoing them again when I want to develop locally.

S3 download - SDK vs HTTP request inside lambda function

I'm looking for some benchmark or article explaining what is faster.
Inside a lambda function, is it faster to....:
A) Download an S3 file through cloudfront with a regular request module (i.e. hit the cloudfront URL with request or axios and download it)
B) Use the AWS SDK to get the file through the getObject methods
I've been googling this for a while now and I don't quite get to the answer, and I'm hoping I can skip benchmark it if someone else did already.
I'm talking about pretty small files, like fonts or images.
And the root of the question is, I believe AWS uses some sort of backbone communication for some cases. Given that lambda is inside their system, as S3 is, maybe requesting the image through the internet (HTTP) is not that fast.
Thanks!
In the same region it should be faster to use the SDK to download it. If it's not in the same region you might want to replicated it so that it is.

Amazon EC2 Load Testing

I am designing a AWS deployment solution for a new dynamic website project. I have acquired an EC2 instance for testing the environment. Need some help on how do I do a load testing on an Ec2 instance to determine how many HTTP requests it can safely handle... P.S. I am new to the AWS platform.
Thanks...
RedLine offers an EC2 Load Testing solution that will automate the distribution of load tests on your own EC2 instances.
Late to the party but could help someone in the future:
A possible tool for load tests, stress tests, whatever you may call them, is Apache JMeter, but there are plenty of alternatives.
A simple starting setup, further explained in this excellent tutorial on DigitalOcean, can exist of a Thread Group containing an HTTP Request Sampler and a View Results in Table Listener. The Thread Group can be used to configure the amount of "clients" you want to simulate. The Request Sampler will be used to configure the server's properties (hostname, path, etc). The Table View Listener outputs a handy CSV file that can be used to calculate means, compare different types of EC2 instances,...
JMeter is a beautiful program with a GUI that can be run on your local workstation, producing an XML file that can be executed on another EC2 instance, for instance. You can even do simple manual edits to the XML file on your server afterward, if necessary.
Take a look at Amazon's testing policy to make sure you're not doing anything illegal.
A couple of quick points;
Set the environment up exactly like it's supposed to run. If there's a database involved, you'll want to involve that in the testing too. Synthetic <?php echo "ok"; CPU based benchmarks won't help you much since normally very little of the time spent replying to HTTP requests is actual CPU time.
A recommendation is to use a service for the benchmarking. Setting load testing up is not without its complexities, and unless you consider benchmarking your core business, you're probably better off using something like Neustar to load and measure your site (there are many services, they're not necessarily what fits you best, just pulled one out of memory)
Of course you can set a load test up yourself, but getting that done right is not anything that can be described in a few sentences. There are very well paid people that only do that for a living :)
There is good experience in using curl-loader aka Davilka tool, also on Amazon EC2 env
http://curl-loader.sourceforge.net

Pyramid/Pylons: How to check if an uploaded file is complete in a POST request?

I'm building a web tool which allows users to upload PDFs to a server using their web browsers. The server is based on Python (Paste + Pyramid).
The problem I have right now is the following: If a user uploads a rather large file (let's say 100 MB) and they cancel the upload before it is completed, my handler code on the server is still called (instead of the request being aborted).
The problem is that the request.POST['myfile'].file is incomplete when that happens. This effectively means that the PDF file is corrupted if I simply write it to some place on the server.
When I watch the server's log, it shows a "broken pipe" exception within the Paste server; however I have no idea how to catch that exception and have it prevent my view/handler code from executing and storing the incomplete file.
Seems like the paster HTTP server does not correctly validate the uploaded form data and simply passes the request down the WSGI pipeline even if the connection (HTTP POST) was closed by the user.
I worked around this issue by simply setting up NGINX to act as a reverse proxy. This also adds some security benefits as it might be better tested than paster.
Update:
My main problem was that I was using runserver (the built in web server of manage.py). After some trial and error we ended up using WSGI.
More specifically, uWSGI and Nginx as web server. Static content is served directly by Nginx while dynamic pages are piped through uWSGI and are handled by the Python web app.
Unless you are doing something fancy (like tracking the upload progress, etc), your pylons controller should not be invoked until the entire file has been uploaded.

Serving dynamic zip files through Apache

One of the responsibilities of my Rails application is to create and serve signed xmls. Any signed xml, once created, never changes. So I store every xml in the public folder and redirect the client appropriately to avoid unnecessary processing from the controller.
Now I want a new feature: every xml is associated with a date, and I'd like to implement the ability to serve a compressed file containing every xml whose date lies in a period specified by the client. Nevertheless, the period cannot be limited to less than one month for the feature to be useful, and this implies some zip files being served will be as big as 50M.
My application is deployed as a Passenger module of Apache. Thus, it's totally unacceptable to serve the file with send_data, since the client will have to wait for the entire compressed file to be generated before the actual download begins. Although I have an idea on how to implement the feature in Rails so the compressed file is produced while being served, I feel my server will get scarce on resources once some lengthy Ruby/Passenger processes are allocated to serve big zip files.
I've read about a better solution to serve static files through Apache, but not dynamic ones.
So, what's the solution to the problem? Do I need something like a custom Apache handler? How do I inform Apache, from my application, how to handle the request, compressing the files and streaming the result simultaneously?
Check out my mod_zip module for Nginx:
http://wiki.nginx.org/NgxZip
You can have a backend script tell Nginx which URL locations to include in the archive, and Nginx will dynamically stream a ZIP file to the client containing those files. The module leverages Nginx's single-threaded proxy code and is extremely lightweight.
The module was first released in 2008 and is fairly mature at this point. From your description I think it will suit your needs.
You simply need to use whatever API you have available for you to create a zip file and write it to the response, flushing the output periodically. If this is serving large zip files, or will be requested frequently, consider running it in a separate process with a high nice/ionice value / low priority.
Worst case, you could run a command-line zip in a low priority process and pass the output along periodically.
it's tricky to do, but I've made a gem called zipline ( http://github.com/fringd/zipline ) that gets things working for me. I want to update it so that it can support plain file handles or paths, right now it assumes you're using carrierwave...
also, you probably can't stream the response with passenger... I had to use unicorn to make streaming work properly... and certain rack middleware can even screw that up (calling response.to_s breaks it)
if anybody still needs this bother me on the github page