How to set up a seamless proxy in Apache to get around my ISP's firewall? - apache

I'm really hoping someone can help me out with this because I've been at it for several days and I think I'm going crazy!
I'm trying to do what to me sounds like a stupidly simple thing. I want to set up a proxy server using Apache on a dedicated machine that I rent so that I can get around my ISPs nonsense firewall. I am aware that I could use a VPN, I don't want to do that for reasons that should hopefully become clear after I explain the details of what I want.
First of all, I don't want the proxy server to be used for every request. Only for the sites that are blocked by my ISP.
Suppose I try to access blockedsite.com/path/to/resource and it fails. I then simply want to change the URL in the address bar to proxy.myserver.com/proxy/blockedsite.com/path/to/resource and have Apache handle everything to provide me with a seamless experience. That means,
ProxyPassReverse should modify the response headers to use to the proxy server.
All URLs in the response body should be modified to use the proxy
Here's what I have so far:
<VirtualHost *:80>
ServerName proxy.myserver.com
ProxyRequests off
ProxyPass /proxy/ http://
ProxyPassReverse /proxy/ http://
ProxyPassReverse /proxy/ https://
ProxyHTMLURLMap http:// /proxy/
ProxyHTMLURLMap https:// /proxy/
<Location /proxy/>
ProxyPassReverse /
AddOutputFilterByType SUBSTITUTE text/html
Substitute "s|</title>|</title><meta name='referrer' content='no-referrer' />|ni"
ProxyHTMLEnable On
#ProxyHTMLURLMap / /app1/
RequestHeader unset Accept-Encoding
Order allow,deny
Allow from all
</Location>
</VirtualHost>
This setup works beautifully for URLs that don't try to redirect me elsewhere. But if for example I try to access proxy.myserver.com/proxy/facebook.com I am still being redirected on the client side to https://www.facebook.com instead of https://proxy.myserver.com/proxy/www.facebook.com as I would like. The extra weird thing is that when I set up my own test site which does nothing except redirect me to an HTTPS address, the ProxyPassReverse rule for HTTPS does actually seem to work... but not when I try to access sites like Facebook or Google.
I see no reason to ramble on about my issues, what I'm looking for is astoundingly simple: a transparent, seamless experience! Aside from sticking proxy.myserver.com/proxy/ in front of the URL in the address bar, I shouldn't have to do anything else for it to work. Yet that is not the case and despite over a week of searching, I have found nothing online to help me with this. It's as if I'm the only person in the universe to want to create a simple proxy with Apache that actually works as a firewall-get-arounder.
Please can someone lend me a hand here?! Even just to tell me I'm going about this all wrong and should give up and install Squid or something??

Your last paragraph contains the right answer. You should indeed just "install Squid or something". In particular, I'd recommend Apache Traffic Server - http://trafficserver.apache.org/ - this is exactly what it's made for.
While Apache httpd can do proxying, it's not it's primary function, and so there are always things that will end up being frustrating with it. We could get your above scenario working, but it's really not the right tool for the job.

Related

Apache Reverse Proxy Sending Browser to Backend Directly Instead

(UPDATE at the bottom for the main question, below may be superfluous details)
I'm having an interesting problem with Apache not reverse proxying as expected.
Basically, what's happening is when I click a link on my website that goes to the relative path /app1, I am expecting it the URL to be external.company.ca/app1 with content coming from internal.company.ca/some_app. Instead, the browser is going directly to internal.company.ca/some_app.
No 302 or anything, just straight there. This is odd to me, since internal.company.ca is not mentioned anywhere in the configuration except for the reverse proxy config, so I don't know how the browser is learning of the domain at all.
Here is a Fiddler capture from the client (browser) point of view showing the behaviour right after I click the link that goes to /app1 (you'll have to trust me that the green names are external.company.ca and the black names are internal.company.com and the path is /some_app/blahblah):
Everything happening after this point is loading the page with internal.company.com. This won't work at all in production, of course.
The following is a (truncated) version of our Apache configuration files for consideration:
<VirtualHost *:80>
# rewrite rules to 443
</VirtualHost>
<VirtualHost *:443>
ServerName external.company.ca
ServerAlias external.company.com
# Logging rules.........
SSLEngine on
SSLProxyEngine on
SSLProxyVerify none
# Most of this is off for testing purposes, adding in case it matters
SSLProxyCheckPeerCN off
SSLProxyCheckPeerName off
SSLProxyCheckPeerExpire off
# more SSL stuff.... Now on to the interesting part
ProxyPreserveHost On
ProxyPass /app1 https://internal.company.com/some_app
ProxyPassReverse /app1 https://internal.company.com/some_app
</VirtualHost>
At one point, I thought that possibly the cookies were throwing things off since they were under different domains (.ca in front, .com in back), but I believe if the reverse proxying was working correctly, the browser would be none the wiser. Anyone see anything wrong with the above?
UPDATE
I found the culprit:
<script type="text/javascript">window.location.assign('https://internal.company.com/app1/login?redirectUrl=' + encodeURIComponent(window.location.pathname + window.location.hash));</script>
The problem is, how do I rewrite this absolute URL using Apache? I know mod_proxy_html modifies element attributes (such as href in the a element) but can it rewrite arbitrary data in an element itself?
The internal application was provided by a vendor, and although it may be possible to make modifications to it to remove code like the above, I would prefer to stay away from that path for now to see if there are alternatives.
I've come up with a somewhat nasty work-around:
ProxyHTMLEnable On
ProxyHTMLExtended On
ProxyHTMLLinks script src
ProxyHTMLURLMap https://internal.company.com
The problem is the use of absolute URL's throughout the HTML (and javascript) coming from the vendor's app. A search and removal of the domain solves the problem (but is incredibly slow).
If anyone has this problem in the future, I do not recommend using this solution. I'm guessing you're here because you can't modify the internal application. You should instead be sending in a ticket to whoever maintains the code to make their application more reverse-proxy friendly.
A potentially safer solution would be the use of mod_substitute. You could also consider ProxyHTMLExtended, but it can be quite brutal in its replacements, occasionally breaking JavaScript here and there.
Edit: Just noticed you're currently using ProxyHTMLExtended. My bad. As you've highlighted it is a pretty brutal and dangerous solution to the problem.

Apache reverse proxy and load balancer - does not work as it should

I have 3 machines.
One (loadbalance.lan) is used as a load balancer, the other two (172.16.30.5 and 172.16.30.6) are tomcat's servers. Main page of the tomcat is listening on port 8080
Im typing in the browser loadbalance.lan/tomcat and I am able to see one of the tomcat content (default tomcat page)
The problem is page isn't displayed correctly. There's no images and when I click on any link it displays 404 Not found error.
Lets say I want to access one of the sub pages on the tomcat website. Tomcat website address: 172.16.30.5:8080
Now I can choose, lets say "status" link which redirects me to: 172.16.30.5:8080/manager/status (and works fine)
When I access the same page but via reverse proxy server (loadbalance.net) and click that link on the loadbalance.lan page, links redirect me to loadbalance.lan/manager/status and I get 404 error.
Of course when I type in the browser loadbalance.lan/tomcat/manager/status it displays correct.
Problem with the images is also weird. When I use url: loadbalance.lan/tomcat I can't see images (Tomcat logo)
When I use this one: loadbalance.lan/tomcat/ (slash at the end) it's ok. At least images because links still redirect in wrong place.
Here is my loadbalance.lan apache config:
<Proxy *>
Order deny,allow
Allow from all
</Proxy>
<VirtualHost *:80>
ProxyRequests Off
ProxyVia On
ProxyPreserveHost On
<Proxy balancer://cluster>
Order Deny,Allow
Allow from all
</Proxy>
<Proxy balancer://cluster>
BalancerMember http://172.16.30.5:8080
BalancerMember http://172.16.30.6:8080
<Proxy balancer://cluster>
</Proxy>
<Location /tomcat>
ProxyPass balancer://cluster
ProxyPassReverse balancer://cluster
</Location>
</VirtualHost>
Could someone help me with this?
Obviously there is something wrong with that proxy but I have no idea how to fix that :(
From ProxyPassReverse documentation (strong added):
This directive lets Apache adjust the URL in the Location, Content-Location and URI headers on HTTP redirect responses. This is essential when Apache is used as a reverse proxy (or gateway) to avoid by-passing the reverse proxy because of HTTP redirects on the backend servers which stay behind the reverse proxy.
Only the HTTP response headers specifically mentioned above will be rewritten. Apache will not rewrite other response headers, nor will it rewrite URL references inside HTML pages. This means that if the proxied content contains absolute URL references, they will by-pass the proxy. A third-party module that will look inside the HTML and rewrite URL references is Nick Kew's mod_proxy_html.
So, the proxy job is not to rewrite the html content of the pages, if the proxyied content does not know that the final url should contain /tomcat extension and the proxy does not alter the pages... you're stuck.
This is usually something you do not see because the 172.16.30.5:8080 part is well rewritten in localhost.lan, but this rewrite is not made by the proxy, quite certainly because urls are in fact only relative (<img src="/foo/bar.png">). Check the source code of the page to see if the domain name is really rewritten in urls).
There's several ways of handling that:
- You could avoid altering relative urls paths in, the proxy (so not using a tomcat/ prefix, but instead a dedicated virtualhost with a name, like tomcat.lodabalncer.lan).
- You could also use some dedicated tools, like mod_proxy_html to rewrite the content of the pages, but that's a slow and complex thing.
- The third way is to manage the final full url on the application side (here tomcat) and detect the proxy chain elements in X-Forwareded-for Header to rebuild the right domain.
- Some applications provides tools for that, like the VirtualHostMonster in Zope
For tomcat the preferred tool is mod_proxy_ajp and not mod_proxy. But for a load balancer proxy I do not think you can use mod_proxy_ajp. And, it's been a long time since I made this, but in my memory I think mod_jk was the solution to that.
Read this full documentation on tomcat proxying for details. At least you should get some hints for the solution.

Apache routes trafic from port 80 to other port

I have an Apache server handling many VirtulServers and everything works fine. I don't know how it works internally but it does.
I recently tinkered a bit with nodejs, making experiments on this server, on the 8080 port. Now that I want to go on production, I have set up a domain name pointing to my server, but I want to avoid the ugly example.org:8080/ URL that I have at the moment. How could I tell Apache, which is listening on 80, to route traffic from example.org to 123.12.12.123:8080 and vice-versa, without breaking access to the other VirtualServers?
I have tried ModRewrite [L] but specifying the port and domain forces it to appear in the address bar of a browser, which is even uglier. I have tried ModRewrite [P] and ProxyPass but to no success (both give 500 error). What should I try next?
Use the mod_proxy module instead of mod_rewrite.
You need these lines:
ProxyRequests off
ProxyPass http://example.org http://123.12.12.123:8080
ProxyPassReverse http://example.org http://123.12.12.123:8080
That's it.
Oh, and yes, that is ProxyPass OFF. not ON.

Front-end Proxy does not reference resources correctly

I'm having quite a lot of difficulties with running a frontend proxy in front of play.
This post is also on the google group, I'll post the received suggestions on both.
I'm using apache and mod_proxy and the application is supposed to be running in a location "mywebsite.be/dev/app/". It is able to display the HTML.
But when I run the application, all the CSS/JS/images are missing, also all references are incorrect. I look up the source and it seems that Play did not replace the #{/pathname/to/resources} and other relative links to its correct url. In order to let everything work, all urls should be prefixed with "/dev/app". How can this be done?
I tried experimenting with the ctxPath, but that's not what I need, the application runs fine on its own, but apache2 has issues translating all those urls in the reverse proxy.
Can this be solved? I was thinking of somehow editing the #-operator in the templating system, but that can't be it, right?
greetings,
Jasper
Have you looked at this post? I think it is related.
Can not generate correct URLs for static resources with playframework when using Apache as a Proxy
Also, please keep an eye out for Play 1.2.2, as this intends to solve this problem, according to a post I have read in the Play groups.
One of my teammates came up with the answer. It's quite simple.
If you have your apache2 configuration, instead of pointing to the localhost root, you just point to the localhost:9000/dev/app:
ProxyPreserveHost On
RedirectMatch /dev/app /dev/app/
<Location /dev/app/>
AuthType Basic
AuthName "Test Omgeving"
AuthUserFile /var/trac/htpasswd
Require valid-user
ProxyPass http://127.0.0.1:9000/dev/app/
ProxyPassReverse http://127.0.0.1:9000/dev/app/
</Location>
This tricks apache2 in thinking that there is another subdirectory in your localhost server, but in fact, there isn't any, but now it references correctly, therefore translating all trafic to the corresponding resources.
Perhaps not the classy way to do things, but it works fine :)
Thanks for all the help. Hope this post helps other people with frontend proxies out there.
Greetings
i will recommend you use proxy balancer as it will help to balance your servers if you plan to use more than one instance of play server in future
<Proxy balancer://my-balancer>
Order deny,allow
Allow from all
BalancerMember url1:port route=instanceOne
BalancerMember url2:port route=instanceTwo
ProxySet lbmethod=bytraffic
</Proxy>
ProxyPass / balancer://my-balancer/
now it will pass your traffic to url1:port or url2:port and it will also fetch your images and other static urls

Apache - Reverse Proxy and HTTP 302 status message

My team is trying to setup an Apache reverse proxy from a customer's site into one of our web applications.
http://www.example.com/app1/some-path maps to http://internal1.example.com/some-path
Inside our application we use struts and have redirect = true set on certain actions in order to provide certain functionality. The 302 status messages from these re-directs cause the user to break out of the proxy resulting in an error page for the end user.
HTTP/1.1 302 Found
Location: http://internal.example.com/some-path/redirect
Is there any way to setup the reverse proxy in apache so that the redirects work correctly?
http://www.example.com/app1/some-path/redirect
There is an article titled Running a Reverse Proxy in Apache that seems to address your problem. It even uses the same example.com and /app1 that you have in your example. Go to the "Configuring the Proxy" section for examples on how to use ProxyPassReverse.
The AskApache article is quite helpful, but in practice I found a combination of Rewrite rules and ProxyPassReverse to be more flexible. So in your case I'd do something like this:
<VirtualHost example>
ServerName www.example.com
ProxyPassReverse /app1/some-path/ http://internal1.example.com/some-path/
RewriteEngine On
RewriteRule /app1/(.*) http://internal1.example.com/some-path$1 [P]
...
</VirtualHost>
I like this better because it gives you finer-grained control over the paths you're proxying for the internal server. In our case we wanted to expose only part of third-party application. Note that this doesn't address hard-coded links in HTML, which the AskApache article covers.
Also, note that you can have multiple ProxyPassReverse lines:
ProxyPassReverse / http://internal1.example.com/some-path
ProxyPassReverse / http://internal2.example.com/some-path
I mention this only because another third-party app we were proxying was sending out redirects that didn't include their internal host name, just a different port.
As a final note, keep in mind that Firebug is extremely useful when debugging the redirects.
Basically, ProxyPassReverse should take care of rewriting the Location header for you, as Kevin Hakanson pointed out.
One pitfall I have encountered is missing the trailing slash in the url argument. Make sure to use:
ProxyPassReverse / http://internal1.example.com/some-path/
(note the trailing slash!)
Try using the AJP connector instead of reverse proxy. Certainly not a trivial change, but I've found that a lot of the URL nightmares go away when using AJP instead of reverse proxy.