Apache 2.2 Mod Proxy ProxyPass behavior - apache

I have a server server.example.com which serves Tomcat on port 80 via a ProxyPass/ProxyPassReverse to 8080 and a Drupal site on the same box at server.example.com:8001. If I enter in the port 8001 explicitly, the Drupal site behaves properly, but I need to make it accessible via server.example.com/blog so I created a ProxyPass/ProxyPassReverse for /blog http://server.example.com:8001 which serves the initial page for the Drupal site correctly, but once the form on the home page of Drupal is filled out and submitted, which POSTs to /, the site changes to the Tomcat site, presumably because the / is not relative to the current host on post :8001. How can I get the ProxyPass for /blog to remain persistent so that all subsequent requests remain within the :8001 VirtualHost (Drupal site)?
One thing I tried was with mod_rewrite:
RewriteCond %{HTTP_REFERER} /^blog/.*$
RewriteRule (.*) %{HTTP_HOST}:8001/$1 [L,P,NC]
But that did nothing at all as far as I can tell. I was hoping that if the initial request was for /blog then the referrer would be as well and I could keep requests on the :8001 virtualhost. Perhaps someone can explain why that is flawed.

The problem you are very likely running into is that the documents returned by Drupal include generated links that all reference / instead of /blog. mod_rewrite and proxypass don't do anything to the contents of documents -- they only act upon the request (or, in the case of ProxyPassReverse, on links such as Location: headers in returned content).
To make an application that normally expects to be installed as / operate on a different URL, you need either to :
(a) Configure the application to be aware of the proper base URL. Many applications include such a setting in order to support exactly the situation you have described.
(b) Install some sort of filtering proxy that can modify the content of returned documents. For Apache, mod_proxy_html is made to do exactly this. This is included natively in Apache 2.4 but may need to be installed separately for 2.2.

Related

Rewrite subdomain.domain.com to domain.com/subdomain without redirect

I've read plenty of Stackoverflows but I seem to be missing something.
I have a PHP application running on https://subdomain.example.com/page/x but for SEO reasons I want people/bots to see https://example.com/subdomain/page/x.
I can rewrite the URL by using:
RewriteEngine on
RewriteCond %{HTTP_HOST} subdomain.example.com
RewriteRule ^(.*)$ https://example.com/subdomain/$1 [L,NC,QSA]
This rewrite results in: https://example.com/subdomain/page/x, but I keep recieving a 404 error since the "main" domain doesn't know the path /subdomain/page/x of course.
What I want is to have the URL https://example.com/subdomain/page/x but run it on https://subdomain.example.com/ in the background since this is the place where the PHP application is running.
Is this possible? How should I do this?
There is no strong SEO reason not to use subdomains. See Do subdomains help/hurt SEO? I recommend using subdirectories most of the time but subdomains when they are warranted.
One place where subdomains are warranted is when your content is hosted on a separate server in a separate hosting location. While it is technically possible to serve the content from a subdirectory from the separate server, that comes with its own set of SEO problems:
It will be slow.
It will introduce duplicate content.
From a technical standpoint, you would need to use a reverse proxy to on your example.com webserver to fetch content for the /subdomain/ subdirectory from subdomain.example.com. The code for doing so in the .htaccess file of example.com would be something like:
RewriteEngine on
RewriteRule ^subdomain/(.*)$ https://subdomain.example.com/$1 [P]
The [P] flag means "reverse proxy" which will cause the server to fetch the content from the remote subdomain. This will necessarily make it slower for users. So much so that it would be better for SEO to use a subdomain.
For this to work you would also need to leave the subdomain up and running and serving content for the main server to fetch. This causes duplicate content. You could solve this issue by implementing canonical tags pointing to the subdirectory.
This requires several Apache modules to be available. On my Debian based system I needed to run sudo a2enmod ssl proxy rewrite proxy_connect proxy_http and sudo service apache2 reload. I also had to add SSLProxyEngine on in my <VirtualHost> directive for the site I wanted to use this on.

Apache 2.4 rewriting directory URLs without trailing slash to https://default_site/dir/ instead of preserving domain

This is a relatively recent behavioral change and appears to be related only to requests which include a "Upgrade-Insecure-Requests: 1" request header.
Apache has started rewriting such requests for sites which are HTTP-only to an HTTPS URL using the default site name instead of just adding the / at the end of the requested URL.
Example: URL submitted in browser: http://www.example.com/blah
Intended redirect: 301 to http://www.example.com/blah/
Instead redirects: 301 to https://default.site.configured/blah/
This happens whether it's a named virtual on the same address as the default server or a virtual using a separate address with separate Listen directives.
I understand all the arguments in favor of the idea that everything should always be encrypted and I don't want to get into a debate about that. This site doesn't consider the tradeoffs desirable at this time.
The default site does have SSL and is configured to redirect HTTP->HTTPS, but the www.foo.com site is not configured that way and does not wish to implement SSL at this time.
Is there any way to get Apache 2.4 to disregard that "Upgrade" header and simply rewrite the URL as desired rather than altering the domain name?
After banging on this some more, I finally found the source of my woes.
This happens when you have IP based virtual hosts and did not configure a name for them using the "ServerName" directive.
tl;dr: If you are having this problem, try adding a "ServerName www.example.com" directive within the VirtualHost definition for the site and that should resolve it.
Details:
It does not happen until you encounter a URL that requires a rewrite other than adding a trailing /. (i.e. if you get a request that doesn't contain the "Upgrade-Insecure-Requests: 1" header, it only gets the trailing / added, but if you get one with that header, it also tries to rewrite the protocol to https which triggers the full URL rewrite).
In my case, the default host name had an SSL configuration, so it didn't fall back to HTTP after the rewrite or reject the rewrite as invalid.
YMMV, I did not continue to do an exhaustive test of all permutations once I found the solution.

Apache Reverse Proxy 404 errors when resources loaded from root context path

Seeing an issue after configuring a reverse proxy in my Apache web server and having a tough time finding a solution, hoping one of you may be able to assist.
Example:
I am trying to configure a reverse proxy to map the backend application URL http://appserver/app/ to the URL https://webserver/app/ on my public domain.
I noticed that any resources located under the http://appserver/app/ path (such as /app/images) are being served properly when accessed via corresponding reverse proxy URL (https://webserver/app/images).
However, some html files are being served from the backend application server root context path (http://appserver/test.html) and the requests for these files are returning 404 errors when the application is accessed via the reverse proxy URL
When reviewing the chrome dev tools network trace, I see Apache is serving these resources from the root context of the reverse proxy URL (https://webserver/test.html), instead of the reverse proxy path (https://webserver/app/test.html), as intended.
I believe this is the expected behavior in Apache, but I am trying to find a way to rewrite the URLs to serve these resources via the reverse proxy context path https://webserver/app/ instead of the root.
Below is my current configuration and I am aware that it will not work as intended when configured this way, but I have tried just about every combination of RewriteRule and ProxyPassReverse directives I can think of, to no avail.
RewriteRule ^/app/(.*)$ http://appserver/app/$1 [P,L]
ProxyPassReverse /app/ http://appserver/app/
I have also tried the following, with no luck.
RewriteRule ^/app/(.*)$ http://appserver/$1 [P,L]
ProxyPassReverse /app/ http://appserver/
Outside of the basics, I am a bit of a noob when it comes to Apache, so I apologize if this is a dumb question, but I have looked all over to find a solution, and still haven't found one :(
Any help is appreciated!
Thanks a lot

Use Apache to load a page sitting on a different server with the same URL

We have a situation where ideally we would like a user to access a page on our site at a URL such as https://example.com/path/to/page. However, the HTML to render that page is sitting on an entirely different server (S3 to be exact) that we have control over, and we would like to render that page for that URL without redirecting (i.e. changing the URL itself).
I took a brief look at the Apache mod_proxy module, but it doesn't seem to do the job as we just get 500 or 404 errors. Here is an example entry from our .htaccess:
<IfModule mod_proxy.c>
RewriteRule "/path/to/page/(.*)$" "https://bucketname.s3-website-eu-west-1.amazonaws.com/path/to/page/$1" [P]
</IfModule>
Any help or a pointer in the right direction would be appreciated.
Most likely you stumble over the fact that you are using an absolute path inside a dynamic cohnfiguration files RewriteRule. Have a try with that instead:
RewriteEngine on
RewriteRule "/?path/to/page/(.*)$" "https://bucketname.s3-website-eu-west-1.amazonaws.com/path/to/page/$1" [P]
That slightly modified will work in dynamic configuration files and in the real http servers host configuration.
But as mentioned in the comment I wonder why you should not be able to use the proxy module directly to simplify things. You'd have to do that in in http servers host configuration though, this is not possible in dynamic configuration files:
ProxyRequests off
ProxyPass "/path/to/page/" "https://bucketname.s3-website-eu-west-1.amazonaws.com/path/to/page/"
ProxyPassReverse "/path/to/page/" "https://bucketname.s3-website-eu-west-1.amazonaws.com/path/to/page/"
And a general hint: you should always prefer to place such rules inside the http servers host configuration instead of using dynamic configuration files (".htaccess"). Those files are notoriously error prone, hard to debug and they really slow down the server. They are only provided as a last option for situations where you do not have control over the host configuration (read: really cheap hosting service providers) or if you have an application that relies on writing its own rewrite rules (which is an obvious security nightmare).

reverse proxy with SSL and url encoding, path change

environment http://etrafficcontrol.com/misc/proxy.png
I have two applications. One is an e-commerce site (Drupal 7 running on LAMP) hosted on AWS, and the other is the checkout system which is ASP on IIS-6, is located inside our company, and requires SSL.
Currently we put up with the situation where our customers get forwarded to another domain for checkout -- kind of like what happens with ebay and PayPal. But this leads to difficulty with site tracking code, and kind of feels wrong for the shopper to get forwarded off of the e-commerce site for checkout.
The main concern is that we use Google campaigns, so we want to track conversions from advertising to, and rich content on, domain-1, but the actual sale happens at the time of checkout on domain-2.
Rather than send visitors from www.domain1.com/cart to domain2.com/miscX, I've tried to setup ProxyPass and ProxyPassReverse so I can send them to www.domain1.com/shop/miscX.
App1 (drupal) is in domain1.com/*, and the .htaccess stuff bypasses Drupal's design to intercept everything. The "misc" paths come from the fact that I'm redirecting into a subdirectory, and then proxying from there. When the proxied pages render, they have some hard-coded paths to /miscX, and without making special provisions for those during the rediects, I wind up with /miscX/ (instead of /shop/miscX/ which will follow the proxy) and that causes missing css, js, etc.
Note: Our business customers can login directly to domain2.com, so I'd like to keep that portal unchanged.
Below, local-d7 is a local test instance of the domain1 server. A test of the proxy shows that this concept works, with SSL.
I have this almost working, but it seems like URL-encoded parameters are being lost (even though query strings are ok). When I introduce the proxy, server2 doesn't appear see encoded params (it's a specialized app and I don't know how to view what IIS is receiving). When I route the domain2 test portal login thru apache on server-1 in such a way that doesn't have encoded params, the login works.
In effect I'm trying to
reverse proxy
change path (put an app running in / on domain-2 and expose into a subdir "/shop" on Domain-1
support SSL
proxy an IIS server behind Apache
try to not modify the IIS server so that it can continue to be used by it's original domain-2.com URL, and
do this on a hosted server where I [may] have limited configuration control of Apache. (currently testing on XAMPP).
I've tried all sorts of things in addition to what's shown here, including rewriterules, redirects, etc. I'm just not experienced at all at mod_proxy or mod_rewrite, etc. But it seems to me that this arrangement of a proxy should be doable with some amount of work and possibly fixing server SSL certificates.
Advice? --Thanks
vhosts.conf
## Redirect /misc1/ https://local-d7/shop/misc1/
## Redirect /misc2/ https://local-d7/shop/misc2/
## Redirect /misc3/ https://local-d7/shop/misc3/
## ProxyRequests Off
## ProxyPreserveHost On
## RequestHeader set Proxy-SSL true
## ProxyPass /shop/ https://www.shop.com/
## ProxyPassReverse /shop/ https://www.shop.com/
ProxyPass /shop/ https://www.domain2.com/
ProxyPassReverse /shop/ https://www.domain2.com/
ProxyPass /misc1/ https://www.domain2.com/misc1/
ProxyPassReverse /misc1/ https://www.domain2.com/misc1/
ProxyPass /misc2/ https://www.domain2.com/misc2/
ProxyPassReverse /misc2/ https://www.domain2.com/misc2/
ProxyPass /misc3/ https://www.domain2.com/misc3/
ProxyPassReverse /misc3/ https://www.domain2.com/misc3/
.htaccess
RewriteCond %{REQUEST_URI} ^/misc1/
RewriteCond %{REQUEST_URI} ^/misc2/
RewriteCond %{REQUEST_URI} ^/misc3/
RewriteRule (.*) /shop/$1