Haproxy - forward request to S3 hosted site - amazon-s3

I have extremely low knowledge on haproxy. Was testing out a configuration where
S3 hosted website (route 53 alias) - content.mydomain.com
URL landing on haproxy - www.mydomain.com/getfiles/
Is it possible to redirect reroute www.mydomain.com/getfiles/ to content.mydomain.com (s3 hosted website).
I was able to redirect it to another application running on the same server with the below mentioned config
acl display-s3-content path_beg /getfiles/
use_backend my-content if display-s3-content
backend my-content
reqrep ^([^\ ]*)\ (.*) \1\ /path/\2
server test www.mycomain.com:1936
but when i try to redirect it to s3 hosted site, it does not work. Below is the backend for the not working config
reqrep ^([^\ ]*)\ (.*) \1\ /path/\2
server test1 content.mydomain.com
Thanks!

As you noted in comments, this works:
reqirep ^Host: Host:\ content.mydomain.com
The idea is to set the Host header to the value that S3 expects.
A somewhat cleaner approach is this:
http-request set-header Host content.mydomain.com
I say "better" because it uses a newer/better/safer mechanism for header manipulation, but it is fundamentally accomplishing the same purpose: changing the request header to what the destination server (S3) expects. Using the http-request approach is safer/cleaner because it is "smarter" about how it manipulates the request -- it's much easier to completely break the protocol using req[i]rep than it is with http-request.
Originally, you asked about /getfiles/test.jpg mapping to /test.jpg in the bucket. This rewrite is very easy and clean in HAProxy 1.6 and later:
http-request set-path %[path,regsub(^/getfiles,)]
...but in 1.5 you have to use reqirep since the regsub (regex substitution) converter isn't available:
reqirep ^([^\ :]+)\ +/getfiles(.*) \1\ \2
This matches the GET (HEAD, POST, etc.) line from the request and removes the extra /getfiles from the path. Including the : in the exclusion pattern prevents it from matching any other header. A similar pattern can be used to add a prefix (such as a release version, etc.) before sending the request to S3.

I posted a sample config here as well. This config will serve up:
http://<haproxy>/foo/index.html
with index.html sitting on S3.
https://gist.github.com/amslezak/6573767376860ed59a74878fbff6e61f

Related

HA Proxy rule - 404 not found

I have configured below rule in HA Proxy. I get 404 not found when I try to hit http://haproxy_ip/service
frontend http-in
bind 10.254.23.225:80
acl has_service path_beg /service
use_backend service_server if has_service
backend service_server
balance roundrobin
cookie SERVERID insert
option httpchk HEAD /check.txt HTTP/1.0
option httpclose
option forwardfor
server server1 192.168.2.1:9000 cookie server1 check
backend doesnt have /service in itself. I use haproxy to have virutal path in url.
With the current configuration, /service/test.txt will trigger the acl has_service and will send /service/test.txt to your backend, without changing the url.
If you want to change the url (proxying /service in the frontend to / in the backend), you should add the following line in your backend:
reqrep ^([^\ ]*)\ /service(.*) \1\ \2
This will remove /service from the proxied request.
Edit:
HAProxy won't rewrite the html output: your assets won't get a leading /service/ and won't be served correctly.
When you proxy requests, it is way easier to keep the same path: proxying / to / or /myapp/ to /myapp/ for example. If you proxy /a to /b/c, the proxy itself will need to rewrite the response: <img src="/a/test.png"> needs to be changed to <img src="/b/c/test.png">. Or worse, <img src="../c/test.png" />. Add relative references in html, js and css too. I'm not sure it's doable with HAProxy.
If you can change the application and deploy it on /service/, you will avoid a lot of issues. Using an other vhost (service.yourdomain.com for example) can solve this issue too.
If not, I'm not sure HAProxy is the right tool here, I'd try apache 2.4 with
mod_proxy_html (but not before trying really hard to deploy the app on /service/).

How to redirect all HTTP requests to HTTPS with GCP Load Balancer

I've setup the standard GCP load balancer to point to my instance group. It talks over the same port on the instance. I would like to redirect http to https. I would normally do this in nginx or apache on the instance but that won't work since its https already from the load balancer.
Is there a way to rewrite the url similar to if I was using nginx and apache to load balance in GCP's Load Balancer? or should I forward http and https to the instance and have the instance handle the rewrite as I normally would. I'm new to GCP thanks in advance.
You can set it up the same way as Nginx does. When you see traffic on a port which is not https, you redirect it to HTTPs.
To do this, you can use X-Forwarded-Proto header which contains the protocol using which the traffic came in. On your server, you can simply look for traffic that has http header and upgrade that request to HTTPS.
Most commonly used way is to use 301 redirect, but that is not a great practice. One should use HTTP 426 upgrade request header.
Read more: Is HTTP status code 426 Upgrade Required only meant signal an upgrade to a secure channel is required?
RFC doc: https://www.rfc-editor.org/rfc/rfc2616#section-14.42

mod_pagespeed with SSL: from // to https://

Apache 2.2.15 on RHELS 6.1
Using mod_pagespeed on a server behind https (implemented by the network's Reverse Proxy).
All html urls are written as "//server.example.com/path/to/file.css" (so, without the protocol specified).
Problem : using the default configuration, pagespeed rewrites the urls as "http://server.example.com/path/to/file.css"
I'm trying to figure out how to have it rewrite the urls as https (or leave it unspecified as //).
After reading the documentation, I tried using ModPagespeedMapOriginDomain like this
ModPagespeedMapOriginDomain http://localhost https://server.example.com
Also tried
ModPagespeedMapOriginDomain http://localhost //server.example.com
ModPagespeedMapOriginDomain localhost server.example.com
... To no avail. Urls keep being rewritten with "http://".
Question: how can I have pagespeed use https instead of http in its urls?
Full pagespeed config here, if needed
It turns out mod_pagespeed does not work with "protocol-relative" urls.
Still, the issue is bypassed if you enable trim_urls
ModPagespeedEnableFilters trim_urls
Be mindful of the potential risks (depending on your javascript codebase, ajax calls could break or produce unexpected html).
Adding this to your configuration might work:
ModPagespeedRespectXForwardedProto on
That works, if your reverse proxy forwards the X-Forwarded-Proto header in its requests.
That request header tells PageSpeed what the original protocol was that was used for the request at the loadbalancer, and thereby hands it all it needs to know to correctly rewrite urls.

Redirection on Apache (Maintain POST params)

I have Apache installed on my server and I need to redirect from http to https. The reason for this is our load balancer solution cannot hand https so requests come in on http and then we transfer them to https using the below lines in the httpd.conf file.
<VirtualHost 10.1.2.91:80>
Redirect 302 /GladQE/link https://glad-test.com/GladQE/link.do
</VirtualHost>
This works fine for GET requests but POST requests will lose the parameters passed on the URL. What would be the easiest way to perform this redirect and maintain POST params?
I need to get from http://glad-test.com/GladQE/link.do to here https://glad-test.com/GladQE/link.do maintaining POST params
Thanks
Tom
You can try with the HTTP status code 307, a RFC compilant browser should repeat the post request.
Reference: http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
In contrast to how 302 was historically implemented, the request
method is not allowed to be changed when reissuing the original
request. For instance, a POST request should be repeated using another
POST request.
To change from 302 to 307, do that:
<VirtualHost 10.1.2.91:80>
Redirect 307 /GladQE/link https://glad-test.com/GladQE/link.do
</VirtualHost>
Standard Apache redirects will not be able to handle POST data as they work on the URL level. POST data is passed in the body of the request, which gets dropped if you do a standard redirect.
You have an option of either using a PHP script to transparently forward the POST request, or using a combination of Rewrite (mod_rewrite) and Proxy (mod_proxy) modules for Apache like follows:
RewriteEngine On
RewriteRule /proxy/(.*)$ http://www.example.com/$1 [P,L]
P flag passes the request to the Proxy module, so anything that comes to your site (via GET or POST doesn't matter) with a URL path starting with a /proxy/ will transparently be handled as a proxy redirect to http://www.example.com/.
For the reference:
http://httpd.apache.org/docs/current/mod/mod_rewrite.html
http://httpd.apache.org/docs/current/mod/mod_proxy.html
Either your public facing website MUST use SSL to protect confidentiality or there is no sensitive data enver passing through it, and no possibility that your site will ever be used for a lauinchboard for sslstripping (there's a very good reason why Google serve up search results over HTTPS).
If you are not encrypting traffic between browser and your site then why are you trying to encrypt them between your load balancer and your webserver? If you do happen to have a SSL termination outside the load balancer (a very silly approach) then using HTTPS between the load balancer and the webserver is far from efficient. The question also implies lots of other security problems like session fixation/sniffing and SSLStripping vulnerabilities.

How to configure mod_pagespeed for SSL pages

We have website e.g. http://www.acb.com which points to a hardware load-balancer which is suppose to load-balance two dedicated server. Each server is running apache as a frontend and uses mod_proxy to forward request to tomcat.
Some pages of our website require SSL like https://www.abc.com/login or https://www.abc.com/checkout
SSL is terminated at hardware load-balancer.
When I configured mod_pagespeed it compressed, minimized and merged css file and rewrote them with an absolute url http://www.abc.com/css/merged.pagespeedxxx.css instead of relative url /css/merged.pagespeedxxx.css.
It works fine for non ssl pages but when I navigate to an ssl page such as https://www.abc.com/login all the css and js files are blocked by browser like chrome as their absolute url is not using ssl.
How can I resolve this issue ?
Check for https string in this documentation and this one.
You should show us in your question your current ModPagespeedMapOriginDomain && ModPagespeedDomain settings.
From what I understand from these lines:
The origin_specified_in_html can specify https but the origin_to_fetch_from can only specify http, e.g.
ModPagespeedMapOriginDomain http://localhost https://www.example.com
This directive lets the server accept https requests for www.example.com without requiring a SSL certificate to fetch resources - in fact, this is the only way mod_pagespeed can service https requests as currently it cannot use https to fetch resources. For example, given the above mapping, and assuming Apache is configured for https support, mod_pagespeed will fetch and optimize resources accessed using https://www.example.com, fetching the resources from http://localhost, which can be the same Apache process or a different server process.
And these ones:
mod_pagespeed offers limited support for sites that serve content through https. There are two mechanisms through which mod_pagespeed can be configured to serve https requests:
Use ModPagespeedMapOriginDomain to map the https domain to an http domain.
Use ModPagespeedLoadFromFile to map a locally available directory to the https domain.
The solution would be something like that (or the one with ModPagespeedLoadFromFile)
ModPagespeedMapOriginDomain http://localhost https://www.example.com
BUT, the real problem for you is that apache does not directly receive the HTTPS requests as the hardware load balancer handle it on his own. So the mod-pagespeed output filter does not even know it was requested for an SSL domain. And when it modify the HTML content, applying domain rewrite maybe, it cannot handle the https case.
So... one solution (untested) would be using another virtualhost on the apache server, still HTTP if you want, dedicated to https handling. All https related urls (/login,/checkout,...) would then be redirected to this specific domain name by the hardware load balancer. Let's say http://secure.acb.com. This name is only in use between the load balancer and front apaches (and quite certainly apache should restrict access to this VH to the load balancer only).
Then in these http://secure.acb.com virtualhosts mod_pagespeed would be configured to externally rewrite domains to https://www.example.com. Something like:
ModPagespeedMapOriginDomain http://secure.example.com https://www.example.com
Finally the end user request is https://www.example.com/login, the load balancer manages HTTPS, talk to apache with http://secure.example.com, and page results contains only references to https://www.example.com/* assets. Now when theses assets are requested with an https domain request you still have the problem of serving theses assets. So the hardware load balancer should allow all theses assets url in the https domain and send them to the http://secure.abc.com virtualhosts (or any other static VH).
This sounds like you configured the rewritten URL as http://www.abc.com/css/merged.pagespeedxxx.css yourself - therefor: Try to use a protocol-relative URL, e.g. remove http: and just state //www.abc.com/css/merged.pagespeedxxx.css - this will use the same protocol as the embedding page was requested in.
One of the well standardized but relatively unknown features of URLs