reverse proxy+dispatcher - what is the easiest way - reverse-proxy

I am looking for a solution which would redirect the externally facing http://mycompany.com/external/* to be redirected/proxied to http://internal-host:1234/internal/*
(the asterisk is used as a wildcard)
OK, I guess the sentence above is not enough, so here are the details:
In my intranet I have several servers, (names, addresses, ports, and context paths are obviously made-up for the sake of simplicity):
HRServer running at address 10.10.10.10:1010/hr
MailServer running at address 20.20.20.20:2020/mail
My system is accessible from internet only from ip 78.78.78.78, and the constraint here is that I can use only one port (e.g. 8080). In other words - whatever the solution of my problem is - the external address should start with 78.78.78.78:8080
What I need to do is to expose both HR and Mail services though this port.
The first thing which came to my mind was to write two simple portlets (or an HTML with two frames) and to embed them in a simple web page at 78.78.78.78:8080/
But obviously this will not work, as the portlets will redirect the browser to e.g 10.10.10.10:1010/hr which is not visible from the internet.
So my next thought was - OK, lets find a reverse proxy which has dispatching capabilities. Then I can make
78.78.78.78:8080/hr to "redirect" to the internal 10.10.10.10:1010/hr
78.78.78.78:8080/mail to "redirect" to the internal 20.20.20.20:2020/mail
I'd also expect that if let's say the mail server unread messages are seen on 20.20.20.20:2020/mail/unread the unread messages to be also accessible from internet.
Roughly speaking - I'd expect
78.78.78.78:8080/mail/* to redirect to the internal 20.20.20.20:2020/mail/* (the asterisk is used as a wildcard)
I really feel I am missing the obvious here, but honestly - I've spent quite a while in researching several proxies and I did not find the answer. I might be looking for the wrong words or something, but I could not find reverse proxy which can be configured to dispatch external path to different internal paths.
So please - if the answer is e.g. the Apache mod_proxy - please give me a hint about the parameter names that I should be looking for.
Lastly - I am going to run thin in a FreeBSD OS, but this is not a strong requirement (other *nix OSes are also fine)
Thanks!

It took quite a while, but here is the answer:
A good solution is nginx (pronounced "Engine X").
To reroute all traffic which comes to
https://mycompany.com/external/* to
http://internal-host:1234/internal/* (the asterisk is used as a wildcard) you need to have the following configuration:
location ~ ^/internal/ {
rewrite ^/internal/(.*)$ /$1 break;
proxy_pass http://internal-host:1234;
}
And this approach can be used for all the other addresses - e.g. HR portal, mail, etc.
Finally, to give you a heads up - the following configuration does not work:
location ~ ^/internal/(.*)$ {
proxy_pass http://internal-host:1234/internal/$1;
}
It turns out nginx will always proxypass the whole URI when regex is used, so the rule has to be the one above (which does url-rewrite).

Related

Get FQDN from domain

this is my first question here, so I will try my best.
I am trying to get the protocol and the FQDN (fully qualified domain name) from a bunch of domains, i.e. get https://es.aliexpress.com from aliexpress.com.
I have tried Selenium webdriver, but it takes too long to compute all the domains (even with short timeouts and blocking images).
I am asking if someone knows a way to do this without loading the content, something like wget but only for the URL.
Thank you for reading.
Not really...
First of all, http and https have nothing to do with domain names. Those are transfer protocols.
Ignoring that part, what you are calling FQDN are often generated at the time you access them.
For instance, many websites redirect the browser from a desktop site to a mobile version (the typical m.something.com) based on your User Agent string. Which mean www.something.com and m.something.com are both valid answers
In the example you gave, aliexpress.com, prepended es. which means there is most likely some code on the server that reads in either your location (based on IP address) or a locale setting in your browser to direct you to the es version of the website as opposed to the en or dk version.
These changes can be done via an .htaccess file in the root folder of the website, or via back end code.
Google Chrome itself automatically tries to add www. if it looks like you typed a URL into the everything bar.
It's also possible that the URL is one giant redirect. Some websites buy up extra domain names that all redirect to their core site. So even if you input xyz.com you'll end up at abcd.com.
There is no algorithmic way to go from a base URL to what you're calling the FQDN.
P.S. Here is an article about what FQDN means.

Secretly do a mod_rewrite on domain name

I'm struggling with a mod_rewrite problem. Basically I need to do a secret redirect on the domain name, going from
http://domainname.com.someotherstuff.com
to
http://domainname.com
This rule should affect all subdirectories as well.
I've understood there are three steps:
tell the system if the path matches what we're looking for
define the RewriteRule
pass the new path to the old one so that the system knows (even if it doesn't show) that the two match
I've looked up several posts and resources (the closest ones being this and this) but none of them can solve both my problems – rewriting and secrecy – at once.
Can anybody point me in the right direction?
Moreover, can someone explain the tradeoff between a hidden redirect and a 301? Hidden redirect is not search engine friendly, correct?
Thanks a lot!
referring to an older post for clarification on rewrite vs redirect
If you want the customer's browser to say http://domainname.com, but fetch the content from http://domainname.com.someotherstuff.com, then what you want is a rewrite. You will point your customer at http://domainname.com and that answering frontend (server/LB/etc...) will then rewrite "domainname.com" to "domainname.com.someotherstuff.com" and send the request on to a backend service that will answer that request. I prefer to SNAT in this case, so the backend responds directly to the frontend, which then returns the content to the customer none the wiser.
You have several moving parts here:
DNS entries for domainname.com and domainname.com.someotherstuff.com
frontend - F5s are my favorites, but you can achieve similar results with any linux server; needs to be able to resolve domainname.com.someotherstuff.com and has network connectivity to the backend; servicing requests for http://domainname.com
backend - web server; servicing requests from frontend for http://domainname.com.someotherstuff.com

multiple domains in server - howto

let's suppose we have shopify.com,a platform where everybody can create his e-shop and provide it under his domain,the user can add his domain in other words.
When somebody ads a domain,what's actually happening under the hood?
As far as i know,in apache2 a new VirtualHost is created for each new domain,pointing to the user's folder. But is this the best and most efficient solution to this?
I'm asking for curiosity reasons mainly and also i'd like how those systems work (like shopify.com or webs.com,where every user adds a domain)
Thank you in advance!!
You have a few options that I know of, mostly depending on whether traffic goes to the same ip or not.
When setting up DNS entries you can specify wildcard for subdomains. *.example.com which makes it so that any request for any subdomain that isn't match by another DNS record goes to example.com.
So, having:
*.example.com <ip A>
blog.example.com <ip B>
Would make blog.example.com go to < ip B> and example.com and all other subdomains go to < ip A>.
This means you could have the possibility of giving each new subdomain go to its own ip (very unlikely). You can also catch them all at the same ip and handle it there.
As you mentioned, you could add a new virtual host for each new sub domain created. However, that's kind of a heavy solution, and I think it would generally involve restarting your webserver program to reload the new configuration. Instead, you can use something like rewrites to achieve something similar to the virtual host.
Having a rewrite rule that does <subdomain>.example.com/<resource> => example.com/<subdomain>/<resource> would mean all that would be necessary is creating a new folder in the root of your served directories containing the user's content. No change to configuration. Also, I'm not sure if you're familiar with rewrites, but, they're invisible to the browser/user, so the user still sees <subdomain>.example.com/<resource> even though they're being served content from example.com/<subdomain/<resource>.
This isn't a definitive list of the possibilities, simply a couple possibilities. Any large or scalable solution is probably going to involve many layers of indirection allowing for more complex DNS directing, load balancing, and serving of content.

Strange domains in mod_pagespeed cache folder

About a year ago I have installed mod_pagespeed on my VPS server, set it up and left it running. Recently I was exploring files on my server, went to pagespeed cache folder and discovered some strange folders.
All folders usually named this way ,2Fwww.mydomain.com or ,2F111.111.111.111 for IP addresses. I was surprised to see some domains that does not belong to me, like:
24x7-allrequestsallowed.com
allrequestsallowed.com
m.odnoklassniki.ru
www.fbi.gov
www.securitylab.ru
It looks like something dodgy is going on, was my server compromised, is there any reasonable explanation?
That does look peculiar. Everything in the cache folder should be files that mod_pagespeed tried to rewrite. There are two ways that I know of that this can happen:
1) You reference some third-party resource (say an image from another domain, or google analytics script) and you have explicitly enabled rewriting of that domain with ModPagespeedDomain www.example.com or ModPagespeedDomain *.
2) If your server accepts HTTP requests with invalid Host headers. Try (for example) wget --header="Host: www.fbi.gov" www.yourdomain.com/foo/bar.html. If your server accepts requests like that it may be providing mod_pagespeed with an incorrect base domain, and then subresources would be fetched from the same domain (so if www.yourdomain.com/foo/bar.html references some.jpeg, and your server accepts invalid host headers, we could fetch www.fbi.gov/foo/some.jpeg as the resource). There was a recent security release that makes sure all of these subrequests are done against localhost (not arbitrary third-party websites). Please see: https://developers.google.com/speed/docs/mod_pagespeed/CVE-2012-4001
You might want to look through these folders and see what specific resources are in there. I think that the biggest concern you should have is that someone might be trying to perform an XSS attack on your users or maybe a DDoS attack against another website (like www.fbi.gov), using your server as one vector. I do not think that these folders are indicative that your server itself is compromised.
If you would like to discuss this more, https://groups.google.com/forum/?fromgroups#!forum/mod-pagespeed-discuss is a good list to join and email.

301 web forwarding on main domain - Can subdomain point somewhere else?

Excuse the potential noobishness of the question, but I'm, well, a bit of a noob when it comes to this domain architecture lark. If "domain architecture" is even the technical term. Anyway, I digress...
So, I've googled this question , but I can't see the answer I'm looking for (maybe it doesn't exist, who knows!?) The situation is that I host a .com top level domain which does a 301 forward to another site on the net not hosted by me. Can I set up a subdomain that then points somewhere else whether that be on my host itself or just some other site elsewhere on the net?
Essentially, if I set up a subdomain, will it too inherit the web forwarding, and if so, can I directly affect where that subdomain points?
Any answers gratefully appreciated!
Before I try to answer your question, let me be a little fussy :)
First thing first you are confusing and mixing together two different protocols ([DNS] and [HTTP]), actually there is even a dedicated page to the Wikipedia for HTTP 301 responses: http://en.wikipedia.org/wiki/HTTP_301 (but you should read the whole shebang: ([Wikipedia, search for HTTP] is always a good start, and the [RFC 2616] is absolutely a must, IETF RFCs are not easy reading but the Internet is built on them).
DNS is used to translate a name, like www.example.com into an IP address, like 192.168.0.1, in order to locate a machine on the Internet. So DNS is involved as the one of the very first steps a browser takes in order to resolve an URL: but once the "machine name" is translated by the separate DNS Service, and it has become an IP address, DNS job is over and it is used/involved no more.
Then when a browser, using HTTP, contacts the Web Server located on that machine (in this example the machine www.example.com, which the DNS Service has kindly translated to an IP address, in our example 192.168.0.1, because the operating system can only use an IP address as the argument for an [internet socket]) and only at that moment the web server, instead of serving a page, answers whith an "error" code (which, actually is a "response header" with a numeric code that does not start with "2").
Only that this error code is actually used to tell something else: that the browser should try again an "HTTP request", this time connecting to another machine (and, as long as this redirection is "permanent" instead of "temporary" ([HTTP_307]), the new address should be remembered by the browser, its cache and history).
So, if you can setup [redirection response header] on the first machine, it means that there is a Web Server on that first machine that is programmed (given a certain URL pattern) to spit out a Redirection Header, and as long as you can control these redirections, you can as well send the browser wherever you want, not merely sending them to another machine on the Internet but to another URL as well, even on the same website (actually this is the original intended use of code 301, as a measure against [link rot]).
Basically you are free to do whatever you want, or better, to send them wherever you want.
The pros are obvious... the cons are that you must have control over the first web server, and that the visiting browsers will have to perform two "GET request" in order to land at the intended page (this is not grim as it looks, since the [RFC 2616] suggests that the browser (they call it User Agent) caches and remembers the redirection (because it is
permanent)).
Disclaimer: I am being prevented to post hyperlinks, but they where basically all from the Wikipedia so, if you will, you can look the words in brackets "[...]" on the Wikipedia...