speeding up website load using multiple servers/domains - optimization

When Yahoo! developer guide says "Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user's perspective".
And as an explanation I read somewhere, that browsers will load up to 5 things simultaneously from the same domain.
Would a subdomain, for example cdn.example.com be considered a new domain, in the previous statement?

Yahoo: The HTTP/1.1 specification suggests that browsers download no more than two components in parallel per hostname. If you serve your images from multiple hostnames, you can get more than two downloads to occur in parallel.
Google also says you only need different host names.
This may depend on browser, but I believe they may need to have different IP addresses. All that HTTP spec really says is: "Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a given server."
So the safest choice is to have different host name AND address.

Related

Medium sized website: Transition to HTTPS, Apache and reverse proxy

I have a medium sized website called algebra.com. As of today, it is ranked 900th website in US in Quantcast ratings.
At the peak of its usage, during weekday evenings, it serves over 120-150 queries for objects per second. Almost all objects, INCLUDING IMAGES, are dynamically generated.
It has 7.5 million page views per month.
It is server by Apache2 on Ubuntu and is supplemented by Perlbal reverse proxy, which helps reduce the number of apache slots/child processes in use.
I spent an inordinate amount of time working on performance for HTTP and the result is a fairly well functioning website.
Now that the times call for transition to HTTPS (fully justified here, as I have logons and registered users), I want to make sure that I do not end up with a disaster.
I am afraid, however, that I may end up with a performance nightmare, as HTTPS sessions last longer and I am not sure whether a reverse proxy can help as much as it did with HTTP.
Secondly, I want to make sure that I will have enough CPU capacity to handle HTTPS traffic.
Again, this is not a small website with a few hits per second, we are talking 100+ hits per second.
Additionally, I run multiple sites on one server.
For example, can I have a reverse proxy, that supports several virtual domains on one IP (SNI), and translates HTTPS traffic into HTTP, so that I do not have to encrypt twice (once by apache for the proxy, and once by the proxy for the client browser)?
What is the "best practices approach" to have multiple websites, some large, served by a mix of HTTP and HTTPS?
Maybe I can continue running perlbal on port 80, and run nginx on port 443? Can nginx be configured as a reverse proxy for multiple HTTPS sites?
You really need to load test this, and no one can give a definitive answer other than that.
I would offer the following pieces of advice though:
First up Stack overflow is really for programming questions. This question probably belongs on the sister site www.serverfault.com.
Https processing is, IMHO, not an issue for modern hardware unless you are encrypting large volumes of traffic (e.g. video streaming). Especially with proper caching and other performance tuning that I presume you've already done from what you say in your question. However not dealt with a site of your traffic so it could become an issue there.
There will be a small hit to clients as the negotiate the https session on initial connection. This is in the order of a few hundred milliseconds, will only happen on initial connection for each session, is unlikely to be noticed by most people, but it is there.
There are several things you can do to optimise https including choosing fast ciphers, implementing session resumption (two methods for this - and this can get complicated on load balanced sites). Ssllabs runs an excellent https tester to check your set up, Mozilla has some great documentation and advice, or you could check out my own blog post on this.
As to whether you terminate https at your end point (proxy/load balanced) that's very much up to you. Yes there will be a performance hit if you re-encrypt to https again to connect to your actual server. Most proxy servers also allow you to just pass through the https traffic to your main server so you only decrypt once but then you lose the original IP address from your webserver logs which can be useful. It also depends on if you access your web server directly at all? For example at my company we don't go through the load balanced for internal traffic so we do enable https on the web server as well and make the LoadBalancer re-encrypt to connect to that so we can view the site over https.
Other things to be aware of:
You could see an SEO hit during migration. Make sure you redirect all traffic, tell Google Search Console your preferred site (http or https), update your sitemap and all links (or make them relative).
You need to be aware of insecure content issues. All resources (e.g. css, javascript and images) need to be served over https or you will get browsers warnings and refuse to use those resources. HSTS can help with links on your own domain for those browsers that support HSTS, and CSP can also help (either to report on them or to automatically upgrade them - for browsers that support upgrade insecure requests).
Moving to https-only does take a bit of effort but it's once off and after that it makes your site so much easier to manage than trying to maintain two versions of same site. The web is moving to https more and more - and if you have (or are planning to have) logged in areas then you have no choice as you should 100% not use http for this. Google gives a slight ranking boost to https sites (though it's apparently quite small so shouldn't be your main reason to move), and have even talked about actively showing http sites as insecure. Better to be ahead of the curve IMHO and make the move now.
Hope that's useful.

One domain name "load balanced" over multiple regions in Google Compute Engine

I have service running on Google Compute Engine. I've got few instances in Europe in a target pool and few instances in US in a target pool. At the moment I have a domain name, which is hooked up to the Europe target pool IP, and can load balance between those two instances very nicely.
Now, can I configure the Compute Engine Load Balancer so that the one domain name is connected to both regions? All load balancing rules seem to be related to a single region, and I don't know how I could get all the instances involved.
Thanks!
You can point one domain name (A record) at multiple IP addresses, i.e. mydomain.com -> 196.240.7.22 204.80.5.130, but this setup will send half the users to the U.S., and the other half to Europe.
What you probably want to look for is a service that provides geo-aware or geo-located DNS. A few examples include loaddns.com, Dyn, or geoipdns.com, and it also looks like there are patches to do the same thing with BIND.
You should configure your DNS server. Google does not have a DNS service, as one part of their offering, at the moment. You can use Amazon's Route 53 to route your requests. It has a nice feature called latency based routing which allows you to route clients to different IP addresses (in your case - target pools) based on latency. You can find more information here - http://aws.amazon.com/about-aws/whats-new/2012/03/21/amazon-route-53-adds-latency-based-routing/
With Google's HTTP load balancing, you can load balance traffic over these VMs in different regions by exposing via one IP. Google eliminates the need for GEO DNS. Have a look at the doc:
https://developers.google.com/compute/docs/load-balancing/
Hope it helps.

Explain CouchDB's serving of websites, is CouchDB bundled somehow with Apache and how does it work?

I am trying to understand how CouchDB work. Does it come bundled up with separate Apache or does it use the Apache in the system. I am trying to understand how it determines where to serve the site and how are different directions done. This is important information because I am trying to understand how to implement the Apache 2.2 mod-proxy -module here with it. Do I need to tune CouchDB or do I need to tune a separate Apache process? Suppose you have 10 CouchDB processes and you want to direct their results to siteA, how can you do that?
Sorry I am now vague but I am trying to understand how to combine different things from one Site to another, having different authorization-cookies etc. I am having a problem where I have two separates sites hello.com/myCouchDb/ and hallo.de/someOthersite.html working separately. When I merge the codes, the authentication fails -- I think there are at least three different solution candidates:
A) redirect the verification things from the other site to another (a bit hackish) and/or
B) somehow configure the CouchDB Apache -settings, I have tried in Futon but failed.
C) store the authentication cookies to some dir or db and refresh them when they become old (or use never-old cookies)
So how can I merge different CouchDB -instances together with different authentication settings? Suppose you have ten people with different authentication cookies and you want to get them somehow incorporated to the same site. How can you do it? Do you tune network -settings, Apache -settings or CouchDB -settings? Or do you just stores the cookies to some directory or DB that you refresh every time they become old?
P.s. I am the admin so do not worry about the OAuth2.0, I have the authentication-cookies to do whatever I want with the different instances. I just cannot understand how to merge the different instances.
Perhaps related
CouchDB proxy? Apache As a Reverse Proxy?
https://stackoverflow.com/questions/12398389/different-definitions-of-the-term-proxy
What is a proxy? What is it in Apache? Does it have many different meanings?
It sounds like you're confused about the structure of CouchDB. CouchDB is a native JSON Database that has an HTTP API. That API is provided via Mochiweb, an Erlang based webserver that is bundled inside CouchDB. There's only one CouchDB server running, but it runs inside the Erlang Virtual Machine (BEAM) and has a fundamentally different architecture to the typical Apache httpd approach.
Regarding authentication, CouchDB has a per-instance (server) _users database that contains passwords and minimal account details. As an admin you can see this using Futon, although normal users only have access to their own profile. You can assign users into various roles, and then apply those roles and users to each database. Once the _security object is set on a DB, you need to be authenticated to read, and you can use validation update functions to enforce constraints on write. Some brief information on http://blog.couchbase.com/what%E2%80%99s-new-couchdb-10-%E2%80%94-part-4-security%E2%80%99n-stuff-users-authentication-authorisation-and-permissions and http://blog.mattwoodward.com/2012/03/definitive-guide-to-couchdb.html as well as on the wiki.

Error with DOJO when using IP

Strange error with an Project using dojo:
if i call : http://localhost/project everything works like expected.
if i call : http://127.0.0.1/project everything works like expected.
if i call : http://192.168.2.1/project i get the following error (ONLY in IE6!):
"Bundle not found, locale.."
Any ideas?
Iam running Zend Server CE with PHP 5.2
if i add: 192.168.2.1 to "hosts" it works (windows)
Sounds like Zend server is performing some kind of virtual site support using the site name as a partial domain.
I can't say 100% if/how it is beacuse I don't use Zend, but I can explain the principle using Apache as an Example.
There are 3 ways in which a web site can be virtually hosted under a single web server application, this applies to most servers on the market today, Apache, IIS, nginx and many others.
It all boils down to one thing, giving one running server application instance the ability to host multiple individual websites.
The 3 methods of seperating sites are as follows:
By IP address : If you have multiple IP addresses (Usually -but not always beacuse you have multiple network interface cards) then you can tell your server application to listen to one IP for one site, another IP for another site and so on. If you browse to one IP you'll get one site, and likewise the other on the other IP.
By Port Number : If your using only one IP address, then you can bind to multiple port numbers, port 80 is generally the default for web servers, but by browsing to an address and pinning the port number on the end (http://mysite.com:99) you'll force the browser to use that port. You can then have multiple websites listening on different ports and select them manually at browse time as required.
By Host Name Header: This is by far the most common way of supporting multiple sites, all web servers that understand the HTTP/1.1 protocol have to obey a header field in the request that contains the host name, when a request comes in for EG: http://mysite,com/ then there will be an entry in the request header that looks like 'Host: mysite.com' the webserver can then use that to say, oh yes.. I know which one that is.. and it then selects and serves the correct website.
The problems start to arise however when you start to use IP addresses that generally cannot be resolved or have no DNS name, because the web server then doesn't know which hostname to tag it to.
As an example in Apache, if you set up a virtual host, then try to browse that server using just the IP address, you'll get the default server, which in many cases won't even be configured to respond correctly or display anything.
To compound this, going up to web application layer, many frameworks also do their own checks on hostnames and other variables passed to them by the web server, and many make decisions on how to operate based on this information.
If you've gotten to the default web application by IP address, then there's a high chance that the framework may get confused at being presented with an IP address as a host name.
As the OP noted, in many cases, you can add a name to your hosts file and use this as a poor man's DNS substitute, the file to modify can be found in the following locations:
c:\windows\system32\drivers\etc\ - on windows
and
/etc/
on Linux/Unix
The file is generally just called 'hosts' and is a plain text file. Adding a line like:
123.456.789.123 myserver
Will tie http://myserver/ to http://123.456.789.123/
If you can, and your doing a lot of web applications it may be worth setting up your own DNS server, most Linux distros will allow you to install 'Bind' and I do also believe there is a version available for windows too.
I'm not going to go into the pro's and cons of private DNS servers here, it's a whole other subject in itself, but if your likely to be doing a lot of additions to your hosts, then in the long run you'll find it a better option.

Unique identification of a certain computer

i have following scenario and can't seem to find anything on the net, or maybe i am looking for the wrong thing:
i am working on a webbased data storage system. there are different users and different places and only certain users are allowed to access certain parts of the system. now, we do not want them to connect to these parts from at home or with a different computer than they are using at their work-place (there are different reasons for that).
now my question is: if there is a way to have the work-place-pc identify itself to the server in some way over the browser, how can i do that?
oh and yes, it is supposed to be webbased.
i hope i explained it so everyone understands.
thnx for your replies in advance.
... dg
I agree with Lenni... IP address is a possible solution if they are static or the DHCP server consistently assigns the same IP address to the same machine.
Alternatively, you might also consider authentication via "personal certificates" ... that's what they are referred to in Firefox, don't know it that's the standard name or not. (Obviously I haven't worked with these before.)
Basically they are SSL or PKI certificates that are installed on the client (user's) machine that identify that machine as being the machine it says it is -- that is, if the user tries to connect from a machine that doesn't have a certificate or doesn't have a certificate that you allow, you would deny them.
I don't know the issues around this ... it might be relatively easy for the same user to take the certificate off one computer and install it on another one with the correct password (i.e. it authenticates the user), or it might be keyed specifically to that machine somehow (i.e. it authenticates the machine). And a quick google search didn't turn up any obvious "how to" instructions on how it all works, but it might be worth looking into.
---Lawrence
Since you're going web based you can:
Examine the remote host's IP Address (compare it against known internal subnets, etc)
During the authentication process, you can ping the remote IP and take a look at the TTL on the returned packets, if it's too low, then the computer can't be from the local network. (of course this can be broken, but it's just 1 more thing)
If you're doing it over IIS, then you can integrate into SSO (probably the best if you can do it)
If it's supposed to be web-based (and by that I mean that the web server should be able to uniquely identify the user's machine), then you choices are limited: per se, there's nothing you can obtain from the browser's headers or request body that allows you to identify the machine. I suppose this is by design, due to the obvious privacy implications.
There are choices though, none of which pain-free: you could use an ActiveX control, which however only runs on Windows (and not on all browsers I think) and requires elevated privileges. You could think of a Firefox plug-in (obviously Firefox only). At any rate, a plain-vanilla browser will otherwise escape identification.
There are only a few of REAL solutions to this. Here are a couple:
Use domain authentication, and disallow users who are connecting over a VPN.
Use known IP ranges to allow or disallow access.
IP address. Not bombproof security but a start.