Medium sized website: Transition to HTTPS, Apache and reverse proxy

Medium sized website: Transition to HTTPS, Apache and reverse proxy - apache

I have a medium sized website called algebra.com. As of today, it is ranked 900th website in US in Quantcast ratings.
At the peak of its usage, during weekday evenings, it serves over 120-150 queries for objects per second. Almost all objects, INCLUDING IMAGES, are dynamically generated.
It has 7.5 million page views per month.
It is server by Apache2 on Ubuntu and is supplemented by Perlbal reverse proxy, which helps reduce the number of apache slots/child processes in use.
I spent an inordinate amount of time working on performance for HTTP and the result is a fairly well functioning website.
Now that the times call for transition to HTTPS (fully justified here, as I have logons and registered users), I want to make sure that I do not end up with a disaster.
I am afraid, however, that I may end up with a performance nightmare, as HTTPS sessions last longer and I am not sure whether a reverse proxy can help as much as it did with HTTP.
Secondly, I want to make sure that I will have enough CPU capacity to handle HTTPS traffic.
Again, this is not a small website with a few hits per second, we are talking 100+ hits per second.
Additionally, I run multiple sites on one server.
For example, can I have a reverse proxy, that supports several virtual domains on one IP (SNI), and translates HTTPS traffic into HTTP, so that I do not have to encrypt twice (once by apache for the proxy, and once by the proxy for the client browser)?
What is the "best practices approach" to have multiple websites, some large, served by a mix of HTTP and HTTPS?
Maybe I can continue running perlbal on port 80, and run nginx on port 443? Can nginx be configured as a reverse proxy for multiple HTTPS sites?

You really need to load test this, and no one can give a definitive answer other than that.
I would offer the following pieces of advice though:
First up Stack overflow is really for programming questions. This question probably belongs on the sister site www.serverfault.com.
Https processing is, IMHO, not an issue for modern hardware unless you are encrypting large volumes of traffic (e.g. video streaming). Especially with proper caching and other performance tuning that I presume you've already done from what you say in your question. However not dealt with a site of your traffic so it could become an issue there.
There will be a small hit to clients as the negotiate the https session on initial connection. This is in the order of a few hundred milliseconds, will only happen on initial connection for each session, is unlikely to be noticed by most people, but it is there.
There are several things you can do to optimise https including choosing fast ciphers, implementing session resumption (two methods for this - and this can get complicated on load balanced sites). Ssllabs runs an excellent https tester to check your set up, Mozilla has some great documentation and advice, or you could check out my own blog post on this.
As to whether you terminate https at your end point (proxy/load balanced) that's very much up to you. Yes there will be a performance hit if you re-encrypt to https again to connect to your actual server. Most proxy servers also allow you to just pass through the https traffic to your main server so you only decrypt once but then you lose the original IP address from your webserver logs which can be useful. It also depends on if you access your web server directly at all? For example at my company we don't go through the load balanced for internal traffic so we do enable https on the web server as well and make the LoadBalancer re-encrypt to connect to that so we can view the site over https.
Other things to be aware of:
You could see an SEO hit during migration. Make sure you redirect all traffic, tell Google Search Console your preferred site (http or https), update your sitemap and all links (or make them relative).
You need to be aware of insecure content issues. All resources (e.g. css, javascript and images) need to be served over https or you will get browsers warnings and refuse to use those resources. HSTS can help with links on your own domain for those browsers that support HSTS, and CSP can also help (either to report on them or to automatically upgrade them - for browsers that support upgrade insecure requests).
Moving to https-only does take a bit of effort but it's once off and after that it makes your site so much easier to manage than trying to maintain two versions of same site. The web is moving to https more and more - and if you have (or are planning to have) logged in areas then you have no choice as you should 100% not use http for this. Google gives a slight ranking boost to https sites (though it's apparently quite small so shouldn't be your main reason to move), and have even talked about actively showing http sites as insecure. Better to be ahead of the curve IMHO and make the move now.
Hope that's useful.

Related

Apache Server Timing Out taking long time

i was in in trouble help me by figuring out the problem I've run my website on my Apache server for quite some time now and recently ran into an issue that has me stumped.
My server has been DDOS attacked in the past requiring me to move my server behind a proxy/WAF. For some time I was behind Sucuri as it provided the best affordable defense at the time. The attacks tapered off and I moved to Cloudflare free to protect my IP address while lightening up on my monthly server costs. The switch was smooth and everything has been working fine for several months.
I was recently hit again with what seemed to be a layer 7 attack. I could see several IP addresses making 10-20 requests every couple of seconds in my domain's access.log. Running netstat returned thousands of TIME_WAIT and SYN_RECV all with Cloudflare IP addresses. This lead me to believe the attack was against my domain, being proxied by Cloudflare, and reaching my server regardless of my security settings. I confirmed this by viewing the statistics provided by Cloudflare and seeing millions of requests being made in a short time period. Unfortunately this is making it even more difficult to pinpoint the attack. what should i do.
I've enabled syn cookies, added mod_cloudflare to Apache, activated Cloudflare's WAF / rate limiting rules, blocked offending IP addresses, and used mod_evasive to automatically blacklist future offenders. This has reduced (and almost stopped) the amount of malicious requests seen in the Apache access log but has not resolved the timeouts.check site
According to Cloudflare analytics, I've only received 16,000 requests in the previous 6 hours (as opposed to the tens of millions when I was being actively attacked) but I get timeouts on every other request (even directly connecting, without Cloudflare).
Thanks

Boost proxy server security and defend against DoS attacks by blocking unsolicited packets or by using load balancers, as these actions could help reduce the impact the attack has on the server.
There are also attacks that use a proxy server on the Internet as a transit device to hide the originating source of the attack on your network. Blocking open or malicious proxy servers from accessing the network or servers is one way to prevent this type of attack from being successful
i Hope this will definitely help you

i think you have to ask your webhost or ask cloudflare support
and also raise s ticket on Sucuri. Their team closely works with the respective developers in fixing the security issues. Once fixed, Sucuri patches those vulnerabilities at the firewall level
During the attacks, website with heavy traffic like yours would slow down significantly due to the high server load. Sometimes it would even cause the server to restart causing downtime.
When you enable Sucuri, all your site traffic goes through their cloudproxy firewall before coming to your hosting server. This allows them to block all the attacks and only send you legitimate visitors.
Sucuri’s firewall blocks all the attacks before it even touches our server. Since they’re one of the leading security companies, Sucuri proactively research and report potential security issues to WordPress core team as well as third-party plugins.
If you still not resolve the problem then then it may be a different type of attack
TCP Connection Attacks
These attempt to use up all the available connections to infrastructure devices such as load-balancers, firewalls and application servers. Even devices capable of maintaining state on millions of connections can be taken down by these attacks.
Volumetric Attacks
These attempt to consume the bandwidth either within the target network/service, or between the target network/service and the rest of the Internet. These attacks are simply about causing congestion
Fragmentation Attacks
These send a flood of TCP or UDP fragments to a victim, overwhelming the victim's ability to re-assemble the streams and severely reducing performance.
Application Attacks
These attempt to overwhelm a specific aspect of an application or service and can be effective even with very few attacking machines generating a low traffic rate (making them difficult to detect and mitigate).

Local HTTPS proxy possible?

TL;DR
I want to set up a local HTTPS proxy that can (LOCALLY) modify the content of HTML pages on my machine. Is this possible?
Motivation
I have used an HTTP Proxy called GlimmerBlocker for years. It started in 2008 as a proxy-based approach to blocking ads (as opposed to browser extensions or other OS X-specific hacks like InputManagers). But besides blocking ads, it also allows the user to inject their own CSS or JavaScript into the page. Development has seriously slowed, but it remains incredibly useful.
The only problem is that it doesn’t do HTTPS (from its FAQ):
Ads on https pages are not blocked
When Safari fetches an https page using a proxy, it doesn't really use the http protocol, but makes a tunneled tcp connection so Safari receives the encrypted bytes. The advantage is that any intermediate proxies can't modify or read the contents of the page, nor the URL. The disadvantage is, that GlimmerBlocker can't modify the content. Even if GlimmerBlocker tried to work as a middleman and decoded/encoded the content, it would have no means of telling Safari to trust it, nor to tell Safari if the websites certificate is valid, so Safari would think you have visited a dubious website.
Fortunately, most ad-providers are not going to switch to https as serving pages using https are much slower and would have a huge processing overhead on the ad-providers servers.
Back in 2008, maybe that last part was true…but not any more.
To be clear, I think the increasing use of SSL is a good thing. I just want to get back the control I had over the content after it arrives on my end.
Points of Confusion
While searching for a solution, I’ve become confused by some apparently contradictory points.
(Also, although I’m quite experienced with the languages of web pages, I’ve always had a difficult time grokking networks and protocols. On that note, sorry if I’m missing something that is way obvious!)
I found this StackOverflow question asking whether HTTPS proxies were possible. The best answer says that “TLS/SSL (The S in HTTPS) guarantees that there are no eavesdroppers between you and the server you are contacting, i.e. no proxies.” (The same answer then described a hack to pull it off, but I don’t understand the instructions. It was very theoretical, anyway.)
In OS X under Network Preferences ▶︎ Advanced… ▶︎ Proxies, there is clearly a setting for an HTTPS proxy. This seems to contradict the previous statement that TLS/SSL’s guarantee against eavesdropping implies the impossibility of proxies.
Other things of note
I can’t remember where, but I read that it is possible to set up an HTTPS proxy, but that it makes HTTPS pointless (by breaking the secure communication in the process). I don’t want this! Encryption is good. I don’t want to filter anyone else’s traffic; I just want something to customize the content after I’ve already received it.
GlimmerBlocker has a nice GUI interface, but I’m fine with non-GUI solutions, too. I may have a poor understanding of networking and protocols, but I’m perfectly comfortable on the command line, tweaking settings in text editors, and so on.
Is what I’m asking possible? Or is my question a case of “either you get security, or you can break it with hacks and get to customize your content—but not both”?

The common idea of a HTTP proxy is a server which accepts a CONNECT request which includes the target hostname and port and then just builds a tunnel to the target server. All the https is done inside the tunnel, so there is no way for the proxy to modify it (end-to-end security from browser to web server).
To modify the data you need to have a proxy which plays man-in-the-middle. In this case you have a https connection between the proxy and the web server and another https connection between the browser and the proxy. Between proxy and web server the original server certificate is used, while between browser and proxy a newly created certificate is used, which is signed by a CA specific to the proxy. Of course this CA must be imported as trusted into he browser, otherwise it would complain all the time about possible attacks.
Of course - all the verification of the original server certificate has to be done in the proxy now, and not all solutions do this the correct way. See also http://www.secureworks.com/cyber-threat-intelligence/threats/transitive-trust/
There are several proxy solution which might do this SSL interception, like squid, mitmproxy (python) or App::HTTP_Proxy_IMP (perl). The last two are specifically designed to let you modify the content with your own code, so these might be good places to start.

Should I run Tomcat by itself or Apache + Tomcat?

I was wondering if it would be okay to run Tomcat as both the web server and container? On the other hand, it seems that the right way to go about scaling your webapp is to use Apache HTTP listening on port 80 and connecting that to Tomcat listening on another port?
Are both ways acceptable? What is being used nowdays? Whats the prime difference? How do most major websites go about this?
Thanks.

Placing an Apache (or any other webserver) in front of your application server(s) (Tomcat) is a good thing for a number of reasons.
First consideration is about static resources and caching.
Tomcat will probably serve also a lot of static content, or even on dynamic content it will send some caching directives to browsers. However, each browser that hits your tomcat for the first time will cause tomcat to send the static file. Since processing a request is a bit more expensive in Tomcat than it is in Apache (because of Apache being super-optimized and exploiting very low level stuff not always available in Tomcat, because Tomcat extracting much more informations from the request than Apache needs etc...), it may be better for the static files to be server by Apache.
Since however configuring Apache to serve part of the content and Tomcat for the rest or the URL space is a daunting task, it is usually easier to have Tomcat serve everything with the right cache headers, and Apache in front of it capturing the content, serving it to the requiring browser, and caching it so that other browser hitting the same file will get served directly from Apache without even disturbing Tomcat.
Other than static files, also many dynamic stuff may not need to be updated every millisecond. For example, a json loaded by the homepage that tells the user how much stuff is in your database, is an expensive query performed thousands of times that can safely be performed each hour or so without making your users angry. So, tomcat may serve the json with proper one hour caching directive, Apache will cache the json fragment and serve it to any browser requiring it for one hour. There are obviously a ton of other ways to implement it (a caching filter, a JPA cache that caches the query etc...), but sending proper cache headers and using Apache as a reverse proxy is quite easy, REST compliant and scales well.
Another consideration is load balancing. Apache comes with a nice load balancing module, that can help you scale your application on a number of Tomcat instances, supposed that your application can scale horizontally or run on a cluster.
A third consideration is about ulrs, headers etc.. From time to time you may need to change some urls, or remove or override some headers. For example, before a major update you may want to disable caching on browsers for some hours to avoid browsers keep using stale data (same as lowering the DNS TTL before switching servers), or move the old application on another url space, or rewrite old URLs to new ones when possible. While reconfiguring the servlets inside your web.xml files is possible, and filters can do wonders, if you are using a framework that interprets the URLs you may need to do a lot of work on your sitemap files or similar stuff.
Having Apache or another web server in front of Tomcat may help a lot changing only Apache configuration files with modules like mod_rewrite.
So, I always recommend having Apache httpd in front of Tomcat. The small overhead on connection handling is usually recovered thanks to caching of resources, and the additional configuration works is regained the first time you need to move URLs or handle some headers.

It depends on your network and how you wish to have security set up.
If you have a two-firewall DMZ, with applications deployed inside the second firewall, it makes sense to have an Apache or IIS instance in between the two firewalls to handle security and proxy calls into the app server. If it's acceptable to put the Tomcat instance in the DMZ you're free to do so. The only downside that I see is that you'll have to open a port in the second firewall to access a database inside. That might put the database at risk.
Another consideration is traffic. You don't say anything about traffic, sizing servers, and possible load balancing and clustering. A load balancer in front of a cluster of app servers is more likely to be kept inside the second firewall. The Tomcat instance is capable of handling traffic on its own, but there are always volume limitations depending on the hardware it's deployed on and what the application is doing with each request. It's almost impossible to give a yes or no answer without more detailed, application-specific information.
Search the site for "tomcat without apache" - it's been asked before. I voted to close before finding duplicates.

Avoiding bandwidth cap by ISP on an Apache HTTP Web Server: encryption methods can do it?

I got a working Apache HTTP web server on my computer so a friend (and only him, no one else) who has no computer at home could get my files directly, as from an website, from an Internet café.
I did some speed tests on my computer at home and on my computer at workplace and found out that, in both cases, I get almost full bandwidth (~7MB/s) when using protocol encryption methods in some P2P softwares (BitTorrent, eMule). This leads me to believe that this is happening because the data is hidden from their ISPs.
Well, at the same very moment, when downloading from my web server at home to my work, it goes sluggish as hell (~90KB/s)...
Is there a protocol encryption method like the one in P2P to prevent my Apache web server from being slowed down by the ISP? Or at least some alternate solution to achieve better speed in this situation? Tried HTTPS but it seemed to not work.

Download != upload. Your upload at home will most likely be 1 Mbit (do you have an ADSL connection?), which will come down to ~ 90 KB/s.
But this doesn't belong on SO. :-)

Using SSL Across Entire Site

Instead of just having a few select pages for HTTPS access, I was thinking about just using SSL for my entire site.
What would be the drawbacks to this?
Edit Aug 7, 2014
Google now factors in HTTPS for rankings, so you absolutely should use SSL across your entire site:
http://googleonlinesecurity.blogspot.com/2014/08/https-as-ranking-signal_6.html

It is highly recommended these days to run the entire site on TLS (https that is) if possible.
The overhead concern is a thing of the past, it is no longer an issue with the newer TLS protocols, because it is now maintaining sessions, and even caching them for reuse if the client drops the connection. In the old days this was not the case. Which means that today, the only time you have to do public-key crypto(the type that is cpu heavy) is when establishing the connection. So there isn't really any drawbacks when you have a cert anyway. This means that you won't have to send people back and forth between http and https, and the customers will always see the lock sign in their browser.
Extra attention has been drawn to this subject after the release of Firesheep. As you might've heard Firesheep is a Firefox addon that let's you easily (if you are both using the same open wifi network) highjack other people's sessions on sites like Facebook, Twitter etc. This works because those sites only use TLS selectively, and this would not be a problem for them if TLS was enabled site-wide.
So, in conclusion, the cons (such as added CPU use) are negligible with the state of current technology, and the pros are clear, so serve all content via SSL/TLS! It's the way to go these days.
Edit: As mentioned in other answers, another problem with serving some of a site's content (like images) without SSL/TLS, is that customers/users will get a very annoying "unsecure content on secure page" message.
Also, as stated by thirtydot, you should redirect people to the https site. And you can even enable the flag that makes your server deny non-ssl connections.
Another edit: As pointed out in a comment below, remember that SSL/TLS isn't the only solution to all your site's security needs, there is still a lot of other considerations, but it does solve a few security issues for the users, and solves them well (Even though there are ways to do a man-in-the-middle, even with SSL/TLS)

It is a good idea to do this if possible, however you should:
Serve static resources (images, CSS, etc) from plain HTTP to avoid the HTTPS overhead.
(Don't do this or you will get warnings about "insecure resources").
You should also redirect the HTTP homepage to the HTTPS version so that users do not have to type HTTPS to access your site.
Drawbacks include:
Less responsive browsing experience - because there is more back and forth between the
server and client with HTTPS vs HTTP - the amount this is noticeable will be dependent on the latency between the server and client.
More CPU usage on your server - because every page has to be encrypted instead of just the select few.

Server side algorithms for establishing SSL connection are expensive, so serving all content via SSL requires more CPU power on the back end.
As far as I know that is the only drawback.

SSL was not designed for virtual hosting, especially of the elastic cloud type. You may face some difficulties if you cannot control the host names of the web servers, and how they resolve to IP addresses.
But in general, that it is excellent idea, and if you allow users to login to your site, almost a necessity (as shown by Firesheep).
I should also add what I am trying to do. I would like to allow social service logins (like FaceBook), but we will also be storing credit card information
For the pages where the user can review his credit card information, or make financial transactions, better shift into a more secure authentication mode. Facebook is a big target, and attracts hackers. If someone's Facebook account gets hacked, and they can then spend money or gather credit card info from your site, that would not be good. Accepting social service logins for non-critical stuff is fine, but for the more serious parts of your site, better require additional passwords.

It is highly recommended these days to
run the entire side on TLS
It's highly recommended by some people.
The total number of users your
system can support is gated by
either the CPU demands or IO load;
if you are up against the CPU, TLS
makes it that much worse.
Encrypting the traffic makes it impossible to use certain kinds of diagnostic techniques.
Most browsers will give your user a warning if you load any non-encrypted files. Which can be a huge problem if you are trying access third-party resources.
In some circumstances (e.g. a lot of money at stake), it makes sense to just bite the bullet and encrypt everything; in others, the odds of an attacker intercepting a packet in flight and deciding to hijack the session are so low and the amount of damage that could be done is so small, you can just go bare-back, as it work. (For example, this session, the one I'm using to post this answer, is unencrypted and I really, really don't care.)
For still other cases, you may want to offer your user a choice. Someone using a hard-wired connection in his own basement can make a different situation than someone using WiFi at the Starbucks across from a Black Hat convention.
I'm working on a protocol and a library to let you sign XHR requests. The idea is that the entire site would be set up as static files of HTML, CSS, and JavaScript, which would be loaded from a CDN. The actual application would be conducted entirely by JavaScript making AJAX and COMET requests. Any request that has to be authenticated is, but as a practical matter, most requests do not. I've done several sites this way -- they're very, very scalable.

We run a fully forced, secured website and shop. I've done this on the advice of a friend that knows a thing or two about website security.
The positive is that our website doesn't seem noticeably slower. Also Google Analytics runs although I can't get ecommerce to work. If it protected us against attacks I can't say offcourse but until today no trouble.
The bad thing however is that you will have a very hard time running Youtube and Social ("Like") boxes on a secured website.
Tips for good security:
Good webhost (they will cost you but it's worth it!)
No login for visitors. It kills usability but with a fast and easy checkout it goes and the obvious pro is that you simply don't store sensitive info.
Use a good Payment Service Provider and let them handle payment.
*2 I know this won't go for a lot of websites but "what you don't have, can't be stolen".
We have been selling on our webshop without login for 2 years now and it works fine as long as the Checkout is Mega simple and lightning fast.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas