TL;DR
I want to set up a local HTTPS proxy that can (LOCALLY) modify the content of HTML pages on my machine. Is this possible?
Motivation
I have used an HTTP Proxy called GlimmerBlocker for years. It started in 2008 as a proxy-based approach to blocking ads (as opposed to browser extensions or other OS X-specific hacks like InputManagers). But besides blocking ads, it also allows the user to inject their own CSS or JavaScript into the page. Development has seriously slowed, but it remains incredibly useful.
The only problem is that it doesn’t do HTTPS (from its FAQ):
Ads on https pages are not blocked
When Safari fetches an https page using a proxy, it doesn't really use the http protocol, but makes a tunneled tcp connection so Safari receives the encrypted bytes. The advantage is that any intermediate proxies can't modify or read the contents of the page, nor the URL. The disadvantage is, that GlimmerBlocker can't modify the content. Even if GlimmerBlocker tried to work as a middleman and decoded/encoded the content, it would have no means of telling Safari to trust it, nor to tell Safari if the websites certificate is valid, so Safari would think you have visited a dubious website.
Fortunately, most ad-providers are not going to switch to https as serving pages using https are much slower and would have a huge processing overhead on the ad-providers servers.
Back in 2008, maybe that last part was true…but not any more.
To be clear, I think the increasing use of SSL is a good thing. I just want to get back the control I had over the content after it arrives on my end.
Points of Confusion
While searching for a solution, I’ve become confused by some apparently contradictory points.
(Also, although I’m quite experienced with the languages of web pages, I’ve always had a difficult time grokking networks and protocols. On that note, sorry if I’m missing something that is way obvious!)
I found this StackOverflow question asking whether HTTPS proxies were possible. The best answer says that “TLS/SSL (The S in HTTPS) guarantees that there are no eavesdroppers between you and the server you are contacting, i.e. no proxies.” (The same answer then described a hack to pull it off, but I don’t understand the instructions. It was very theoretical, anyway.)
In OS X under Network Preferences ▶︎ Advanced… ▶︎ Proxies, there is clearly a setting for an HTTPS proxy. This seems to contradict the previous statement that TLS/SSL’s guarantee against eavesdropping implies the impossibility of proxies.
Other things of note
I can’t remember where, but I read that it is possible to set up an HTTPS proxy, but that it makes HTTPS pointless (by breaking the secure communication in the process). I don’t want this! Encryption is good. I don’t want to filter anyone else’s traffic; I just want something to customize the content after I’ve already received it.
GlimmerBlocker has a nice GUI interface, but I’m fine with non-GUI solutions, too. I may have a poor understanding of networking and protocols, but I’m perfectly comfortable on the command line, tweaking settings in text editors, and so on.
Is what I’m asking possible? Or is my question a case of “either you get security, or you can break it with hacks and get to customize your content—but not both”?
The common idea of a HTTP proxy is a server which accepts a CONNECT request which includes the target hostname and port and then just builds a tunnel to the target server. All the https is done inside the tunnel, so there is no way for the proxy to modify it (end-to-end security from browser to web server).
To modify the data you need to have a proxy which plays man-in-the-middle. In this case you have a https connection between the proxy and the web server and another https connection between the browser and the proxy. Between proxy and web server the original server certificate is used, while between browser and proxy a newly created certificate is used, which is signed by a CA specific to the proxy. Of course this CA must be imported as trusted into he browser, otherwise it would complain all the time about possible attacks.
Of course - all the verification of the original server certificate has to be done in the proxy now, and not all solutions do this the correct way. See also http://www.secureworks.com/cyber-threat-intelligence/threats/transitive-trust/
There are several proxy solution which might do this SSL interception, like squid, mitmproxy (python) or App::HTTP_Proxy_IMP (perl). The last two are specifically designed to let you modify the content with your own code, so these might be good places to start.
Related
I have a medium sized website called algebra.com. As of today, it is ranked 900th website in US in Quantcast ratings.
At the peak of its usage, during weekday evenings, it serves over 120-150 queries for objects per second. Almost all objects, INCLUDING IMAGES, are dynamically generated.
It has 7.5 million page views per month.
It is server by Apache2 on Ubuntu and is supplemented by Perlbal reverse proxy, which helps reduce the number of apache slots/child processes in use.
I spent an inordinate amount of time working on performance for HTTP and the result is a fairly well functioning website.
Now that the times call for transition to HTTPS (fully justified here, as I have logons and registered users), I want to make sure that I do not end up with a disaster.
I am afraid, however, that I may end up with a performance nightmare, as HTTPS sessions last longer and I am not sure whether a reverse proxy can help as much as it did with HTTP.
Secondly, I want to make sure that I will have enough CPU capacity to handle HTTPS traffic.
Again, this is not a small website with a few hits per second, we are talking 100+ hits per second.
Additionally, I run multiple sites on one server.
For example, can I have a reverse proxy, that supports several virtual domains on one IP (SNI), and translates HTTPS traffic into HTTP, so that I do not have to encrypt twice (once by apache for the proxy, and once by the proxy for the client browser)?
What is the "best practices approach" to have multiple websites, some large, served by a mix of HTTP and HTTPS?
Maybe I can continue running perlbal on port 80, and run nginx on port 443? Can nginx be configured as a reverse proxy for multiple HTTPS sites?
You really need to load test this, and no one can give a definitive answer other than that.
I would offer the following pieces of advice though:
First up Stack overflow is really for programming questions. This question probably belongs on the sister site www.serverfault.com.
Https processing is, IMHO, not an issue for modern hardware unless you are encrypting large volumes of traffic (e.g. video streaming). Especially with proper caching and other performance tuning that I presume you've already done from what you say in your question. However not dealt with a site of your traffic so it could become an issue there.
There will be a small hit to clients as the negotiate the https session on initial connection. This is in the order of a few hundred milliseconds, will only happen on initial connection for each session, is unlikely to be noticed by most people, but it is there.
There are several things you can do to optimise https including choosing fast ciphers, implementing session resumption (two methods for this - and this can get complicated on load balanced sites). Ssllabs runs an excellent https tester to check your set up, Mozilla has some great documentation and advice, or you could check out my own blog post on this.
As to whether you terminate https at your end point (proxy/load balanced) that's very much up to you. Yes there will be a performance hit if you re-encrypt to https again to connect to your actual server. Most proxy servers also allow you to just pass through the https traffic to your main server so you only decrypt once but then you lose the original IP address from your webserver logs which can be useful. It also depends on if you access your web server directly at all? For example at my company we don't go through the load balanced for internal traffic so we do enable https on the web server as well and make the LoadBalancer re-encrypt to connect to that so we can view the site over https.
Other things to be aware of:
You could see an SEO hit during migration. Make sure you redirect all traffic, tell Google Search Console your preferred site (http or https), update your sitemap and all links (or make them relative).
You need to be aware of insecure content issues. All resources (e.g. css, javascript and images) need to be served over https or you will get browsers warnings and refuse to use those resources. HSTS can help with links on your own domain for those browsers that support HSTS, and CSP can also help (either to report on them or to automatically upgrade them - for browsers that support upgrade insecure requests).
Moving to https-only does take a bit of effort but it's once off and after that it makes your site so much easier to manage than trying to maintain two versions of same site. The web is moving to https more and more - and if you have (or are planning to have) logged in areas then you have no choice as you should 100% not use http for this. Google gives a slight ranking boost to https sites (though it's apparently quite small so shouldn't be your main reason to move), and have even talked about actively showing http sites as insecure. Better to be ahead of the curve IMHO and make the move now.
Hope that's useful.
I've created WCF Service and I share it via ssl. I have little knowledge about security, but I'm curious why can I see whole communication as a plain text in httpAnalyzer, even though POSTs are sending via https?
When my client application invokes wcf service, then I can see it in sniffer - passwords etc.
Does it mean that SSL works only on the lower layer - while transporting data? So every evil application can sniff communication on client's side and an encryption only secures us against man-in-the-middle?
SSL works indeed on a "lower layer" than HTTP. According to the OSI Model, SSL works on the Session Layer, while HTTP is on the Application Layer.
Most of these clientside HTTP Analyzers work from within the browser, analyzing the HTTP traffic on the application layer, before it is processed by the SSL logic. So it is completely normal to see the plain HTTP request.
Concerning security, an evil application installed within the browser can indeed read upon the traffic. But once it is processed by the SSL layer, it becomes way harder for an evil application to read the traffic.
SSL works by firstly authenticating the server to you as a client. (Do I talk to the one I really want to talk to). As you can't know all of the servers and their certificates before hand, you use some well known root certificates, which are pre-installed on your OS. These are used to check if some server is perhaps known by an already well known service. (I don't know you, but some really important server tells me that you indeed are who you say you are).
This authentication step works independent from the encryption of the traffic. No program can decrypt an arbitrary SSL stream by "installing a root certificate". (As said these root certificates are already on your machine from the first moment you install an OS on it =)
But if a evil programs is able to let you believe that you are talking to a legitimate server, using a forged root certificate for example, instead of actually talking to malware, it is able to see what the contents of the SSL traffic is. But then again, you are talking to the evil program itself, not the server you were intended to talk to. This is however not the case with HTTP Analyzer
This is in short terms how SSL works and hopefully answers your question.
Most likely HTTP analyzer install it's own root certificate, and intercepts SSL traffic, working as man-in-the-middle.
I'm running a program (Mathematica) in a VMWare VPC behind a corporate internet proxy. Various programs installed in that VPC like IE, Chrome, Excel, Word, Acrobat Reader, and even MS Paint get data from the Internet without problems, but Mathematica doesn't seem to handle the proxy correctly.
My guess is it's not able to handle the proxy's NTLM authentication.
In an earlier situation, behind a different firewall, I had some success with CNTLM as an intermediate between Mathematica and the proxy. CNTLM talks to the proxy and takes care of the NTLM authentication, and Mathematica is given the port CNTLM listens to and ip address (localhost), to talk to. However, in that earlier case I knew the credentials to be used for the proxy (i.e., my own).
In the current situation, my logon takes place using a smartcard and a PIN. The VPC gets credentials passed transparently (I don't have to enter them) and apparently all the programs I mentioned above automagically know about them. This makes me think Mathematica or CNTLM should be able to do this as well. However, my PIN used as password doesn't work (in fact, I get locked out if I try too often). I assume that the credentials used are in fact not my own but are either the windows password (that I don't have as smartcard user) or are derived from my PIN and smartcard.
My question is: how can I make this setup work? This may involve CNTLM, but other solutions are welcome as well.
You could have a chance by using a browser proxy such as Fiddler
Like CNTLM also Fiddler act as a local proxy and allow applications that support proxy, but do not support NTLM (they support a “plain” proxy) to use the corporate proxy not directly but through a local proxy.
Unlike CNTLM , Fiddler doesn't require to configure the credentials but it uses the current user crediatials to authenticate the web requests.
I Can't be sure that this is the solution for you , since I haven't an enviroment like your, but this workaround works in some other cases as reported in this
answer about ruby gem
or/and this blog about mercurial so I hope this can work with Mathematica too.
Note: Once you run Fiddler it automatically configure the browser proxy to itself ( http://localhost:8888 ) therefore you can leave the proxy settings of your application to "Use Proxy Settings from My System or Browser". By the way Fiddler it's not only a local proxy and could be used also to troubleshooting or debugging, the feature list is available in here
A friend of mine asked me this, and I had not much of an idea about it.
So, here I am asking you:
A custom application that works like a proxy server(not the complete version of it), i.e The app allows you to specify some websites that the users of the network can visit in their client browsers.
I have some idea, that this functionality is inbuilt in a proxy server and we can use Apache as a proxy server, but I don't know how to do it.
Can I develop such an application in Java, Ruby, or .NET, coz that will allow me to use a database to maintain the list of allowed and blocked websites + I can provide an easier UI to add or modify data.
Help me, I am quite confused.
Any proxy server has this functionality. For example using squid, you can set it to default deny any requests and only allow specific sites. However if that's the only goal, you may want to consider denying requests to port 80 and only allow specific IP ranges in your firewall instead.
Both options work though. The firewall option is faster and cannot be bypassed by the browser, but is less dynamic (DNS resolving only happens on rule start/reload) and may allow more sites then intended if one IP hosts more then one site.
You should probably ask your friend what his/her non-technical goals are. Like "I don't want my daughter to surf porn" rather then "I need an app that blocks sites".
I have an HTTP server which is in our internal network and accessible only from inside it. I would like to put another server that would listen to an HTTPS port accessible from outside, and forward the requests to that HTTP server (and send back the responses via HTTPS). I know that there are several ways to do this with some programming involved (and I myself made a temporary solution with Tomcat and a very simple servlet I wrote), but is there a way to do the same just plugging parts already made (like Apache + modules)?
This is the sort of use-case that stunnel is designed for. There is a specific example of using stunnel to wrap an HTTP server.
You should consider whether this is really a good idea, though. Web applications designed for use inside a corporate firewall are often fairly lax about security. Merely encrypting the connections prevents casual eavesdropping, but does not secure the site. If an attacker finds your outward facing server and starts connecting to it, they can still try to find exploitable flaws in the web service (SQL injection, cross-site scripting, etc).
With Apache look into mod_proxy.
Apache 2.2 mod_proxy docs
Apache 2.0 mod_proxy docs