Apache Reverse Proxy 404 errors when resources loaded from root context path - apache

Seeing an issue after configuring a reverse proxy in my Apache web server and having a tough time finding a solution, hoping one of you may be able to assist.
Example:
I am trying to configure a reverse proxy to map the backend application URL http://appserver/app/ to the URL https://webserver/app/ on my public domain.
I noticed that any resources located under the http://appserver/app/ path (such as /app/images) are being served properly when accessed via corresponding reverse proxy URL (https://webserver/app/images).
However, some html files are being served from the backend application server root context path (http://appserver/test.html) and the requests for these files are returning 404 errors when the application is accessed via the reverse proxy URL
When reviewing the chrome dev tools network trace, I see Apache is serving these resources from the root context of the reverse proxy URL (https://webserver/test.html), instead of the reverse proxy path (https://webserver/app/test.html), as intended.
I believe this is the expected behavior in Apache, but I am trying to find a way to rewrite the URLs to serve these resources via the reverse proxy context path https://webserver/app/ instead of the root.
Below is my current configuration and I am aware that it will not work as intended when configured this way, but I have tried just about every combination of RewriteRule and ProxyPassReverse directives I can think of, to no avail.
RewriteRule ^/app/(.*)$ http://appserver/app/$1 [P,L]
ProxyPassReverse /app/ http://appserver/app/
I have also tried the following, with no luck.
RewriteRule ^/app/(.*)$ http://appserver/$1 [P,L]
ProxyPassReverse /app/ http://appserver/
Outside of the basics, I am a bit of a noob when it comes to Apache, so I apologize if this is a dumb question, but I have looked all over to find a solution, and still haven't found one :(
Any help is appreciated!
Thanks a lot

Related

Rewrite subdomain.domain.com to domain.com/subdomain without redirect

I've read plenty of Stackoverflows but I seem to be missing something.
I have a PHP application running on https://subdomain.example.com/page/x but for SEO reasons I want people/bots to see https://example.com/subdomain/page/x.
I can rewrite the URL by using:
RewriteEngine on
RewriteCond %{HTTP_HOST} subdomain.example.com
RewriteRule ^(.*)$ https://example.com/subdomain/$1 [L,NC,QSA]
This rewrite results in: https://example.com/subdomain/page/x, but I keep recieving a 404 error since the "main" domain doesn't know the path /subdomain/page/x of course.
What I want is to have the URL https://example.com/subdomain/page/x but run it on https://subdomain.example.com/ in the background since this is the place where the PHP application is running.
Is this possible? How should I do this?
There is no strong SEO reason not to use subdomains. See Do subdomains help/hurt SEO? I recommend using subdirectories most of the time but subdomains when they are warranted.
One place where subdomains are warranted is when your content is hosted on a separate server in a separate hosting location. While it is technically possible to serve the content from a subdirectory from the separate server, that comes with its own set of SEO problems:
It will be slow.
It will introduce duplicate content.
From a technical standpoint, you would need to use a reverse proxy to on your example.com webserver to fetch content for the /subdomain/ subdirectory from subdomain.example.com. The code for doing so in the .htaccess file of example.com would be something like:
RewriteEngine on
RewriteRule ^subdomain/(.*)$ https://subdomain.example.com/$1 [P]
The [P] flag means "reverse proxy" which will cause the server to fetch the content from the remote subdomain. This will necessarily make it slower for users. So much so that it would be better for SEO to use a subdomain.
For this to work you would also need to leave the subdomain up and running and serving content for the main server to fetch. This causes duplicate content. You could solve this issue by implementing canonical tags pointing to the subdirectory.
This requires several Apache modules to be available. On my Debian based system I needed to run sudo a2enmod ssl proxy rewrite proxy_connect proxy_http and sudo service apache2 reload. I also had to add SSLProxyEngine on in my <VirtualHost> directive for the site I wanted to use this on.

Use Apache to load a page sitting on a different server with the same URL

We have a situation where ideally we would like a user to access a page on our site at a URL such as https://example.com/path/to/page. However, the HTML to render that page is sitting on an entirely different server (S3 to be exact) that we have control over, and we would like to render that page for that URL without redirecting (i.e. changing the URL itself).
I took a brief look at the Apache mod_proxy module, but it doesn't seem to do the job as we just get 500 or 404 errors. Here is an example entry from our .htaccess:
<IfModule mod_proxy.c>
RewriteRule "/path/to/page/(.*)$" "https://bucketname.s3-website-eu-west-1.amazonaws.com/path/to/page/$1" [P]
</IfModule>
Any help or a pointer in the right direction would be appreciated.
Most likely you stumble over the fact that you are using an absolute path inside a dynamic cohnfiguration files RewriteRule. Have a try with that instead:
RewriteEngine on
RewriteRule "/?path/to/page/(.*)$" "https://bucketname.s3-website-eu-west-1.amazonaws.com/path/to/page/$1" [P]
That slightly modified will work in dynamic configuration files and in the real http servers host configuration.
But as mentioned in the comment I wonder why you should not be able to use the proxy module directly to simplify things. You'd have to do that in in http servers host configuration though, this is not possible in dynamic configuration files:
ProxyRequests off
ProxyPass "/path/to/page/" "https://bucketname.s3-website-eu-west-1.amazonaws.com/path/to/page/"
ProxyPassReverse "/path/to/page/" "https://bucketname.s3-website-eu-west-1.amazonaws.com/path/to/page/"
And a general hint: you should always prefer to place such rules inside the http servers host configuration instead of using dynamic configuration files (".htaccess"). Those files are notoriously error prone, hard to debug and they really slow down the server. They are only provided as a last option for situations where you do not have control over the host configuration (read: really cheap hosting service providers) or if you have an application that relies on writing its own rewrite rules (which is an obvious security nightmare).

Reverse proxy with request dispatch (to Rstudio server)

I have a multi-tier application of three layers lets say public, business and workspace (all running apache).
Client requests hits the public servers, requests are processed and dispatched on to business servers that does 'things' and response is returned back to public server which then processes the response and pass it on to the client.
I have a scenario wherein I want a request say /rstudio coming to the public server dispatched onto the business which intern reverse proxy to workspace server. There are two catch here:
the workspace server varies per request
application running on workspace server (Rstudio) uses GWT and references resources (static resources js, css etc and RPC coms) on the root url. All the in-application redirection also happens on the domain.
From the business server, I have setup reverse proxy to Rstudio server from my application server.
<Proxy *>
Allow from localhost
</Proxy>
ProxyPass /rstudio/ http://business_server/
ProxyPassReverse /rstudio/ http://business_server/
RedirectMatch permanent ^/rstudio$ /rstudio/
and this work fine (ref. https://support.rstudio.com/hc/en-us/articles/200552326-Running-with-a-Proxy). To handle dynamic workspace server, I could the following but ProxyPassReverse does not support expression in value and this no joy with this approach.
ProxyPassMatch ^/rstudio/(.*)$ http://$1
ProxyPassReverse ^/rstudio/(.*)$ http://$1
RedirectMatch permanent ^/rstudio$ /rstudio/
I have tried the same with mod_rewrite rule (following) but without ProxyPassReverse and due to domain redirection on the GWT Rstudio, this does not work. Adding ProxyPassReverse would fix the problem but I am caught up with no expression on value part to deal with dynamic workspace server issue.
RewriteRule "^/rstudio/(.*)" "http://$1" [P]
Following is the third approach to solve this problem using LocationMatch and mod_headers:
<LocationMatch ^/rstudio/(.+)>
ProxyPassMatch http://$1
Header edit Location ^http:// "http://%{SERVER_NAME}e/rstudio/"
</LocationMatch>
But this is no joy too because value on header directive is not evaluated against environment variable (and only back-references work here). Althought I can get the reverse proxy thing working if I had code the business_server, which is :
<LocationMatch ^/rstudio/(.+)>
ProxyPassMatch http://$1
Header edit Location ^http:// "http://private_server/rstudio/"
</LocationMatch>
Question 1: I was wondering if there are any better way to solve this problem without hardcoding the server DNS in apache conf?
Question 2: With the hard coded server DNS the reverse proxy works for me (patchy but works) but I am hit with GWT issue of resource references on root and the request dispatch is not fully working. I get to the signin page but resources are not found.
I was wondering if there is any better way to handle that?
Following is the example log from browser:
Navigated to https://public_server/rstudio
rworkspaces:43 GET https://public_server/rstudio.css
rworkspaces:108 GET https://public_server/js/encrypt.min.js
rworkspaces:167 GET https://public_server/images/rstudio.png 404 (Not Found)
rworkspaces:218 GET https://public_server/images/buttonLeft.png 404 (Not Found)
rworkspaces:218 GET https://public_server/images/buttonTile.png 404 (Not Found)
rworkspaces:218 GET https://public_server/images/buttonRight.png 404 (Not Found)

Apache webserver rewrite all URL's excluding some

I'm using apache httpd v2.2 server as a frontend proxy for our actual tomcat web server which hosts the Java web application.
I want to forward all urls received by apache webserver other than those having the prefix /product to tomcat.
I've tried the following set up in httpd.conf but it' doesn't seem to work
<VirtualHost *:6111>
ServerName localhost
RewriteEngine on
RewriteRule !^(/product($|/)) http://localhost:1234/$1
Alias /product /opt/productdoc
</VirtualHost>
I tried to follow Redirect site with .htaccess but exclude one folder but was not successful
Basically all http://localhost:6111/product urls should serve from hard drive (using alias)
Any other url should be forwarded to http://localhost:1234/<original-path>
You probably want to use something like mod_jk http://tomcat.apache.org/connectors-doc/webserver_howto/apache.html.
There are a ton of examples and tutorials and it should be pretty simple to setup and install. Now that you know the name of the connection technology, you should probably be able to find more information.
Using modjk also allows you to secure your tomcat server and keep the public off of it.

Apache 2.2 Mod Proxy ProxyPass behavior

I have a server server.example.com which serves Tomcat on port 80 via a ProxyPass/ProxyPassReverse to 8080 and a Drupal site on the same box at server.example.com:8001. If I enter in the port 8001 explicitly, the Drupal site behaves properly, but I need to make it accessible via server.example.com/blog so I created a ProxyPass/ProxyPassReverse for /blog http://server.example.com:8001 which serves the initial page for the Drupal site correctly, but once the form on the home page of Drupal is filled out and submitted, which POSTs to /, the site changes to the Tomcat site, presumably because the / is not relative to the current host on post :8001. How can I get the ProxyPass for /blog to remain persistent so that all subsequent requests remain within the :8001 VirtualHost (Drupal site)?
One thing I tried was with mod_rewrite:
RewriteCond %{HTTP_REFERER} /^blog/.*$
RewriteRule (.*) %{HTTP_HOST}:8001/$1 [L,P,NC]
But that did nothing at all as far as I can tell. I was hoping that if the initial request was for /blog then the referrer would be as well and I could keep requests on the :8001 virtualhost. Perhaps someone can explain why that is flawed.
The problem you are very likely running into is that the documents returned by Drupal include generated links that all reference / instead of /blog. mod_rewrite and proxypass don't do anything to the contents of documents -- they only act upon the request (or, in the case of ProxyPassReverse, on links such as Location: headers in returned content).
To make an application that normally expects to be installed as / operate on a different URL, you need either to :
(a) Configure the application to be aware of the proper base URL. Many applications include such a setting in order to support exactly the situation you have described.
(b) Install some sort of filtering proxy that can modify the content of returned documents. For Apache, mod_proxy_html is made to do exactly this. This is included natively in Apache 2.4 but may need to be installed separately for 2.2.