Apache reverse proxy and load balancer - does not work as it should - apache

I have 3 machines.
One (loadbalance.lan) is used as a load balancer, the other two (172.16.30.5 and 172.16.30.6) are tomcat's servers. Main page of the tomcat is listening on port 8080
Im typing in the browser loadbalance.lan/tomcat and I am able to see one of the tomcat content (default tomcat page)
The problem is page isn't displayed correctly. There's no images and when I click on any link it displays 404 Not found error.
Lets say I want to access one of the sub pages on the tomcat website. Tomcat website address: 172.16.30.5:8080
Now I can choose, lets say "status" link which redirects me to: 172.16.30.5:8080/manager/status (and works fine)
When I access the same page but via reverse proxy server (loadbalance.net) and click that link on the loadbalance.lan page, links redirect me to loadbalance.lan/manager/status and I get 404 error.
Of course when I type in the browser loadbalance.lan/tomcat/manager/status it displays correct.
Problem with the images is also weird. When I use url: loadbalance.lan/tomcat I can't see images (Tomcat logo)
When I use this one: loadbalance.lan/tomcat/ (slash at the end) it's ok. At least images because links still redirect in wrong place.
Here is my loadbalance.lan apache config:
<Proxy *>
Order deny,allow
Allow from all
</Proxy>
<VirtualHost *:80>
ProxyRequests Off
ProxyVia On
ProxyPreserveHost On
<Proxy balancer://cluster>
Order Deny,Allow
Allow from all
</Proxy>
<Proxy balancer://cluster>
BalancerMember http://172.16.30.5:8080
BalancerMember http://172.16.30.6:8080
<Proxy balancer://cluster>
</Proxy>
<Location /tomcat>
ProxyPass balancer://cluster
ProxyPassReverse balancer://cluster
</Location>
</VirtualHost>
Could someone help me with this?
Obviously there is something wrong with that proxy but I have no idea how to fix that :(

From ProxyPassReverse documentation (strong added):
This directive lets Apache adjust the URL in the Location, Content-Location and URI headers on HTTP redirect responses. This is essential when Apache is used as a reverse proxy (or gateway) to avoid by-passing the reverse proxy because of HTTP redirects on the backend servers which stay behind the reverse proxy.
Only the HTTP response headers specifically mentioned above will be rewritten. Apache will not rewrite other response headers, nor will it rewrite URL references inside HTML pages. This means that if the proxied content contains absolute URL references, they will by-pass the proxy. A third-party module that will look inside the HTML and rewrite URL references is Nick Kew's mod_proxy_html.
So, the proxy job is not to rewrite the html content of the pages, if the proxyied content does not know that the final url should contain /tomcat extension and the proxy does not alter the pages... you're stuck.
This is usually something you do not see because the 172.16.30.5:8080 part is well rewritten in localhost.lan, but this rewrite is not made by the proxy, quite certainly because urls are in fact only relative (<img src="/foo/bar.png">). Check the source code of the page to see if the domain name is really rewritten in urls).
There's several ways of handling that:
- You could avoid altering relative urls paths in, the proxy (so not using a tomcat/ prefix, but instead a dedicated virtualhost with a name, like tomcat.lodabalncer.lan).
- You could also use some dedicated tools, like mod_proxy_html to rewrite the content of the pages, but that's a slow and complex thing.
- The third way is to manage the final full url on the application side (here tomcat) and detect the proxy chain elements in X-Forwareded-for Header to rebuild the right domain.
- Some applications provides tools for that, like the VirtualHostMonster in Zope
For tomcat the preferred tool is mod_proxy_ajp and not mod_proxy. But for a load balancer proxy I do not think you can use mod_proxy_ajp. And, it's been a long time since I made this, but in my memory I think mod_jk was the solution to that.
Read this full documentation on tomcat proxying for details. At least you should get some hints for the solution.

Related

How to set up a seamless proxy in Apache to get around my ISP's firewall?

I'm really hoping someone can help me out with this because I've been at it for several days and I think I'm going crazy!
I'm trying to do what to me sounds like a stupidly simple thing. I want to set up a proxy server using Apache on a dedicated machine that I rent so that I can get around my ISPs nonsense firewall. I am aware that I could use a VPN, I don't want to do that for reasons that should hopefully become clear after I explain the details of what I want.
First of all, I don't want the proxy server to be used for every request. Only for the sites that are blocked by my ISP.
Suppose I try to access blockedsite.com/path/to/resource and it fails. I then simply want to change the URL in the address bar to proxy.myserver.com/proxy/blockedsite.com/path/to/resource and have Apache handle everything to provide me with a seamless experience. That means,
ProxyPassReverse should modify the response headers to use to the proxy server.
All URLs in the response body should be modified to use the proxy
Here's what I have so far:
<VirtualHost *:80>
ServerName proxy.myserver.com
ProxyRequests off
ProxyPass /proxy/ http://
ProxyPassReverse /proxy/ http://
ProxyPassReverse /proxy/ https://
ProxyHTMLURLMap http:// /proxy/
ProxyHTMLURLMap https:// /proxy/
<Location /proxy/>
ProxyPassReverse /
AddOutputFilterByType SUBSTITUTE text/html
Substitute "s|</title>|</title><meta name='referrer' content='no-referrer' />|ni"
ProxyHTMLEnable On
#ProxyHTMLURLMap / /app1/
RequestHeader unset Accept-Encoding
Order allow,deny
Allow from all
</Location>
</VirtualHost>
This setup works beautifully for URLs that don't try to redirect me elsewhere. But if for example I try to access proxy.myserver.com/proxy/facebook.com I am still being redirected on the client side to https://www.facebook.com instead of https://proxy.myserver.com/proxy/www.facebook.com as I would like. The extra weird thing is that when I set up my own test site which does nothing except redirect me to an HTTPS address, the ProxyPassReverse rule for HTTPS does actually seem to work... but not when I try to access sites like Facebook or Google.
I see no reason to ramble on about my issues, what I'm looking for is astoundingly simple: a transparent, seamless experience! Aside from sticking proxy.myserver.com/proxy/ in front of the URL in the address bar, I shouldn't have to do anything else for it to work. Yet that is not the case and despite over a week of searching, I have found nothing online to help me with this. It's as if I'm the only person in the universe to want to create a simple proxy with Apache that actually works as a firewall-get-arounder.
Please can someone lend me a hand here?! Even just to tell me I'm going about this all wrong and should give up and install Squid or something??
Your last paragraph contains the right answer. You should indeed just "install Squid or something". In particular, I'd recommend Apache Traffic Server - http://trafficserver.apache.org/ - this is exactly what it's made for.
While Apache httpd can do proxying, it's not it's primary function, and so there are always things that will end up being frustrating with it. We could get your above scenario working, but it's really not the right tool for the job.

Reverse proxy with request dispatch (to Rstudio server)

I have a multi-tier application of three layers lets say public, business and workspace (all running apache).
Client requests hits the public servers, requests are processed and dispatched on to business servers that does 'things' and response is returned back to public server which then processes the response and pass it on to the client.
I have a scenario wherein I want a request say /rstudio coming to the public server dispatched onto the business which intern reverse proxy to workspace server. There are two catch here:
the workspace server varies per request
application running on workspace server (Rstudio) uses GWT and references resources (static resources js, css etc and RPC coms) on the root url. All the in-application redirection also happens on the domain.
From the business server, I have setup reverse proxy to Rstudio server from my application server.
<Proxy *>
Allow from localhost
</Proxy>
ProxyPass /rstudio/ http://business_server/
ProxyPassReverse /rstudio/ http://business_server/
RedirectMatch permanent ^/rstudio$ /rstudio/
and this work fine (ref. https://support.rstudio.com/hc/en-us/articles/200552326-Running-with-a-Proxy). To handle dynamic workspace server, I could the following but ProxyPassReverse does not support expression in value and this no joy with this approach.
ProxyPassMatch ^/rstudio/(.*)$ http://$1
ProxyPassReverse ^/rstudio/(.*)$ http://$1
RedirectMatch permanent ^/rstudio$ /rstudio/
I have tried the same with mod_rewrite rule (following) but without ProxyPassReverse and due to domain redirection on the GWT Rstudio, this does not work. Adding ProxyPassReverse would fix the problem but I am caught up with no expression on value part to deal with dynamic workspace server issue.
RewriteRule "^/rstudio/(.*)" "http://$1" [P]
Following is the third approach to solve this problem using LocationMatch and mod_headers:
<LocationMatch ^/rstudio/(.+)>
ProxyPassMatch http://$1
Header edit Location ^http:// "http://%{SERVER_NAME}e/rstudio/"
</LocationMatch>
But this is no joy too because value on header directive is not evaluated against environment variable (and only back-references work here). Althought I can get the reverse proxy thing working if I had code the business_server, which is :
<LocationMatch ^/rstudio/(.+)>
ProxyPassMatch http://$1
Header edit Location ^http:// "http://private_server/rstudio/"
</LocationMatch>
Question 1: I was wondering if there are any better way to solve this problem without hardcoding the server DNS in apache conf?
Question 2: With the hard coded server DNS the reverse proxy works for me (patchy but works) but I am hit with GWT issue of resource references on root and the request dispatch is not fully working. I get to the signin page but resources are not found.
I was wondering if there is any better way to handle that?
Following is the example log from browser:
Navigated to https://public_server/rstudio
rworkspaces:43 GET https://public_server/rstudio.css
rworkspaces:108 GET https://public_server/js/encrypt.min.js
rworkspaces:167 GET https://public_server/images/rstudio.png 404 (Not Found)
rworkspaces:218 GET https://public_server/images/buttonLeft.png 404 (Not Found)
rworkspaces:218 GET https://public_server/images/buttonTile.png 404 (Not Found)
rworkspaces:218 GET https://public_server/images/buttonRight.png 404 (Not Found)

Sonatype Nexus: Proxy from SSL using Apache

We're running Sonatype's Nexus to store all of our builds, cache our dependencies, etc. etc. However, I'd like to move away from the default install's port 8081 URL and instead host it over SSL via an Apache proxy. I've setup Apache's mod_proxy to proxy to it such that https://myserver.com/nexus brings up Nexus. I used the following configuration directives inside of my virtual host config:
# Configure mod_proxy to be used for proxying URLs on this site to other URLs/ports on this server.
ProxyRequests Off
ProxyVia Off
ProxyPreserveHost On
<Proxy *>
AddDefaultCharset off
Order deny,allow
Allow from all
</Proxy>
# Proxy the Sonatype Nexus OSS web application running at http://localhost:8081/nexus
<Location /nexus>
ProxyPass http://localhost:8081/nexus
ProxyPassReverse http://localhost:8081/nexus
</Location>
This seems to match the instructions at Running Nexus Behind a Proxy. However, I was unable to clear the "Base URL" setting in Nexus: it wouldn't let me leave it blank.
And everything mostly works: I can access Nexus at the HTTPS URL, log in, and perform most GUI functions.
However, when logging in I get the following warning message:
WARNING: Base URL setting of http://myserver.com/nexus does not match your actual URL! If you're running Apache mod_proxy, here's more information on configuring Nexus with it.
And not everything in the GUI actually works. So far I've noticed the following:
System Feeds: Gives the following error:
Problem accessing /nexus/service/local/feeds. Reason:
The resource identified by the request is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request
Nexus returned an error: ERROR 406: The resource identified by the request is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request
Deleting Hosted Repositories: I went through and deleted several empty & unneeded repositories. However, after confirming the deletions, only the first was removed. I had to login to the 8081 site to delete any of the others.
Per the documentation, it looks like a better solution may be to add a RequestHeader to the Apache configuration:
RequestHeader set X-Forwarded-Proto "https"
I tried the accepted answer, which appears to work, but once I added the RequestHeader, I was able to uncheck Force URL and the warning was cleared. I have not tested the other behavior the OP is describing, though.
You just need to adjust the baseUrl setting in the Administration->Server configuration screen. Set the url you are using and click the Force Base Url option.

ProxyPassMatch with ProxyPassReverse

Folks,
We are trying to setup Apache reverse proxy for the following scenario:
Incoming requests take the form http://foo.com/APP/v1/main.html
For some servers the URL will reference a difference version, say, http://foo.com/APP/v2/main.html
An upstream load balancer (HAProxy) will send the request to the right server which will have an Apache2 reverse proxy fronting a JBoss server.
When the request shows up at Apache 2 it will have request path like /APP/v1/main.html
We want it to (reverse) proxy out to http://localhost:8080/AppContext/main.html, irrespective of version fragment in URL (v1, v2, etc.).
I have been trying to do this like so:
ProxyPassMatch ^/.*?/APP.*?/(.*)$ http://localhost:8080/AppContext/$1
ProxyPassReverse /APP http://localhost:8080/AppContext
My questions are:
Is my use of ProxyPassMatch correct?
My ProxyPassReverse is "static". How do I make it aware of the potentially variable stuff after /APP?
Thanks for any insights.
-Raj
You're close, try changing the regex a little to account for the version fragment:
ProxyPassMatch ^/.*?/APP.*?/v[0-9]+/(.*)$ http://localhost:8080/AppContext/$1
The ProxyPassReverse is mostly to ensure the rewriting on-the-fly of location header fields in the responses given by the proxied app. So when it returns a 301 redirect to, say, http://localhost:8080/AppContext/something, apache knows to change it to /APP/v1/something so information behind the proxy won't get exposed. Because you have a dynamic URL used in the reverse proxy, you have a few choices here. You can either send it to the HAProxy load balancer (not sure where that is for you), or you can just pick one and hope for the best. For example, if you have a load balancer at /APP/balancer/ which then sends requests to /APP/v1/, /APP/v2/, /APP/v3/, etc. Then you can do this:
ProxyPassReverse /APP/balancer http://localhost:8080/AppContext
Otherwise, you can just point it to one and hope for the best:
ProxyPassReverse /APP/v1 http://localhost:8080/AppContext

Apache - Reverse Proxy and HTTP 302 status message

My team is trying to setup an Apache reverse proxy from a customer's site into one of our web applications.
http://www.example.com/app1/some-path maps to http://internal1.example.com/some-path
Inside our application we use struts and have redirect = true set on certain actions in order to provide certain functionality. The 302 status messages from these re-directs cause the user to break out of the proxy resulting in an error page for the end user.
HTTP/1.1 302 Found
Location: http://internal.example.com/some-path/redirect
Is there any way to setup the reverse proxy in apache so that the redirects work correctly?
http://www.example.com/app1/some-path/redirect
There is an article titled Running a Reverse Proxy in Apache that seems to address your problem. It even uses the same example.com and /app1 that you have in your example. Go to the "Configuring the Proxy" section for examples on how to use ProxyPassReverse.
The AskApache article is quite helpful, but in practice I found a combination of Rewrite rules and ProxyPassReverse to be more flexible. So in your case I'd do something like this:
<VirtualHost example>
ServerName www.example.com
ProxyPassReverse /app1/some-path/ http://internal1.example.com/some-path/
RewriteEngine On
RewriteRule /app1/(.*) http://internal1.example.com/some-path$1 [P]
...
</VirtualHost>
I like this better because it gives you finer-grained control over the paths you're proxying for the internal server. In our case we wanted to expose only part of third-party application. Note that this doesn't address hard-coded links in HTML, which the AskApache article covers.
Also, note that you can have multiple ProxyPassReverse lines:
ProxyPassReverse / http://internal1.example.com/some-path
ProxyPassReverse / http://internal2.example.com/some-path
I mention this only because another third-party app we were proxying was sending out redirects that didn't include their internal host name, just a different port.
As a final note, keep in mind that Firebug is extremely useful when debugging the redirects.
Basically, ProxyPassReverse should take care of rewriting the Location header for you, as Kevin Hakanson pointed out.
One pitfall I have encountered is missing the trailing slash in the url argument. Make sure to use:
ProxyPassReverse / http://internal1.example.com/some-path/
(note the trailing slash!)
Try using the AJP connector instead of reverse proxy. Certainly not a trivial change, but I've found that a lot of the URL nightmares go away when using AJP instead of reverse proxy.