Do I need more sources or need Load Balancer? - load-balancing

I am not a Pro but I did a fully working Fedora server and hosted my website on it, I did a small traffic test, # 100 visitors both RAM and CPU were full, do I need load balancer or I need more RAM and CPU?
Regards

Loadbalancer is used in a clustered environment in order to achieve high availability. You will eventually consider having high avail. by the time your traffic goes higher and if your website's avail. is important enough. However the current situation seems like you need more resources if you are sure that your webserver and applications are working at optimum resource utilization.

Related

Server configuration for high traffic website

I'm managing a hosting server and one of my customers will launch a high traffic PHP website. It's a penny auction website and we expect between 25k and 30k visitors per day.
Can you tell me please what should I change in my server configuration (PHP and Apache) to avoid problems? I'm afraid that the server crash with a large number of visitors.
Thank you
Using a lighter web server like nginx as a reverse proxy and a static content server should keep the Apache memory and CPU usage to a minimum which will be a problem on larger sites.
APC as an opcode cache will also be useful in a large site because compiling the PHP scripts to opcode is expensive.
Which Apache forking model are you using for the server? Event and Worker MPM's will probably work better for larger sites with higher concurrent connections.
How is PHP setup within Apache, i.e. FastCGI/CGI/DSO/SuPHP/FPM? SuPHP will be slowest while FastCGI, FPM and DSO will give you much better performance and allow you to use opcode caches.
If you don't need SSL support on the site a free service like https://www.cloudflare.com/ will also lessen the load on your servers.
You could put an opcode cache into use, eAccelerator is a good one for this purpose.
You may also want to consider creating Apache vHosts for static content like images/CSS/javascript to be served from. If these can be put into a CDN, then even better.
There are other tools available for benchmarking, including the Apache benchmarking tool "ab". You can use this to stress-test your site.
There are several areas in which tuning can take place, not just PHP.

Lighttpd instead of Apache

No doubt Apache is the most popular web server to use with PHP and definately it works great. However I'm curious to know what are advantages (if any) to use Lighttpd instead of Apache.
Thanks.
Theoretically, because of a smaller footprint Lighttpd should allow more users to visit site at the same time using exactly the same resources as Apache would.
As example (just to prove the point, this is not the real numbers)
On the same hardware Apache would allow 100 users to view your page at the same time,
while Lighttpd would allow 150.
Lighttpd also has a different scheme of mapping processes, so it would serve better when the number of visitors is spiking.
Every server, and webpage is written differently, so it is very hard to predict how each of these servers would perform on :
a) your specific hardware, it is good to contact your hosting company and ask what they advice to use on their hardware
b) your software, Plesk of CPanel would perform differently than clean Apache of Lighttpd installation
c) Your site content, site with a lot of pictures has different fingerprint than server which serves video.
d) Your processor cores
They claim to scale better as their main advantage http://www.lighttpd.net/benchmark/

What's the most scalable and high performing Amazon Web Service (AWS) configuration for a RESTful web service?

I'm building an asynchronous RESTful web service and I'm trying to figure out what the most scalable and high performing solution is. Originally, I planned to use the FriendFeed configuration, using one machine running nginx to host static content, act as a load balancer, and act as a reverse proxy to four machines running the Tornado web server for dynamic content. It's recommended to run nginx on a quad-core machine and each Tornado server on a single core machine. Amazon Web Services (AWS) seems to be the most economical and flexible hosting provider, so here are my questions:
1a.) On AWS, I can only find c1.medium (dual core CPU and 1.7 GB memory) instance types. So does this mean I should have one nginx instance running on c1.medium and two Tornado servers on m1.small (single core CPU and 1.7 GB memory) instances?
1b.) If I needed to scale up, how would I chain these three instances to another three instances in the same configuration?
2a.) It makes more sense to host static content in an S3 bucket. Would nginx still be hosting these files?
2b.) If not, would performance suffer from not having nginx host them?
2c.) If nginx won't be hosting the static content, it's really only acting as a load balancer. There's a great paper here that compares the performance of different cloud configurations, and says this about load balancers: "Both HaProxy and Nginx forward traffic at layer 7, so they are less scalable because of SSL termination and SSL renegotiation. In comparison, Rock forwards traffic at layer 4 without the SSL processing overhead." Would you recommend replacing nginx as a load balancer by one that operates on layer 4, or is Amazon's Elastic Load Balancer sufficiently high performing?
1a) Nginx is asynchronous server (event based), with single worker itself they can handle lots of simultaneous connection (max_clients = worker_processes * worker_connections/4 ref) and still perform well. I myself tested around 20K simultaneous connection on c1.medium kind of box (not in aws). Here you set workers to two (one for each cpu) and run 4 backend (you can even test with more to see where it breaks). Only if this gives you more problem then go for one more similar setups and chain them via an elastic load balancer
1b) As said in (1a) use elastic load balancer. See somebody tested ELB for 20K reqs/sec and this is not the limit as he gave up as they lost interest.
2a) Host static content in cloudfront, its CDN and meant for exactly this (Cheaper and faster then S3, and it can pull content from s3 bucket or your own server). Its highly scalable.
2b) Obviously with nginx serving static files, it will now have to serve more requests to same number of users. Taking that load away will reduce work of accepting connections and sending the files across (less bandwidth usage).
2c). Avoiding nginx altogether looks good solution (one less middle man). Elastic Load balancer will handle SSL termination and reduce SSL load on your backend servers (This will improve performance of backends). From above experiments it showed around 20K and since its elastic it should stretch more then software LB (See this nice document on its working)

nginx/apache/php vs nginx/php

I currently have one server with nginx that reverse_proxy to apache (same server) for processing php requests. I'm wondering if I drop apache so I'd run nginx/fastcgi to php if I'd see any sort of performance increases. I'm assuming I would since Apache's pretty bloated up, but at the same time I'm not sure how reliable fastcgi/php is especially in high traffic situations.
My sites gets around 200,000 unique visitors a month, with around 6,000,000 page crawls from the search engines monthly. This number is steadily increasing so I'm looking at perfomrance options.
My site is very optimized code wise and there isn't any caching (don't want that either), each page has a max of 2 sql queries without any joins on other tables, indexes are perfect as well.
In a year or so I'll be rewriting everything to use ClearSilver for the templates, and then probably use python or else c++ for extreme performance.
I suppose I'm more or less looking for any advice from anyone who is familiar with nginx/fastcgi and if willing to provide some benchmarks. My sites are one server with 1 quad core xeon, 8gb ram, 150gb velociraptor drive.
nginx will definitely work faster than Apache. I can't tell about fastcgi since I never used it with nginx but this solution seems to make more sense on several servers (one for static contents and one for fastcgi/PHP).
If you are really targeting performance -and even consider C/C++- then you should give a try to G-WAN, an all-in-one server which provides (very fast) C scripts.
Not only G-WAN has a ridiculously small memory footprint (120 KB) but it scales like nothing else. There's work ahead of you if you migrate from PHP, but you can start with the performance-critical tasks and migrate progressively.
We have made the jump and cannot consider to go back to Apache!
Here is a chart showing the respective performances of nginx, apache and g-wan:
g-wan.com/imgs/gwan-lighttpd-nginx-cherokee.png
apache does not seem to lead the pack (and that's a -Quad XEON # 3GHz).
Here is an independent benchmark for g-wan vs nginx, varnish and others http://nbonvin.wordpress.com/2011/03/14/apache-vs-nginx-vs-varnish-vs-gwan/
g-wan handles much more requests per second with much less CPU time.
NGINX is the best choice as a webserver now a days.
The main difference between Apache and NGINX lies in their design
architecture. Apache uses a process-driven approach and creates a
new thread for each request. Whereas NGINX uses an event-driven
architecture to handle multiple requests within one thread.
As far as Static content is concerned, Nginx overpasses Apache.
Both are great at processing Dynamic content.
Apache runs on all operating systems such as UNIX, Linux or BSD and
has full support for Microsoft Windows & NGINX also runs on several
modern Unix-like systems and has support for Windows, but its
performance on Windows is not as stable as that on UNIX platforms.
Apache allows additional configuration on a per-directory basis via
.htaccess files. Where Nginx doesn’t allow additional configuration.
Request Interpretation-Apache pass file System location. Nginx
Passes URI to interpret requests.
Apache have 60 official dynamically loadable modules that can be
turned On/Off.Nginx have 3rd Party core modules (not dynamically
loadable).NGINX provides all of the core features of a web server,
without sacrificing the lightweight and high-performance qualities
that have made it successful.
Apache Supports customization of web server through dynamic modules.
Nginx is not flexible enough to support dynamic modules and loading.
Apache makes sure that all the website that runs on its server are
safe from any harm and hackers. Apache offers configuration tips for
DDoS attack handling, as well as the mod_evasive module for
responding to HTTP DoS, DDoS, or brute force attacks.
When Choose Apache over NGINX?
When needs .htaccess files, you can override system-wide settings on
a per-directory basis.
In a shared hosting environment, Apache works better because of its
.htaccess configuration.
In case of functionality limitations – use Apache
When Choose NGINX over Apache?
Fast Static Content Processing
Great for High Traffic Websites
When Use Both of them -Together
User can use Nginx in front of Apache as a server proxy.

Round robin server setup

From what I understand, if you have multiple web servers, then you need some kind of load balancer that will split the traffic amongst your web servers.
Does this mean that the load balancer is the main connecting point on the network? ie. the load balancer has the IP address of the domain name?
If this is the case, it makes it really easy to add new hardware since you don't have to wait for any dns propogation right?
There are several solutions to this "problem".
You could round-robin at the DNS-level. I.e. have www.yourdomain.com point to several IP-addresses (well all your servers).
This doesn't give you any intelligence in the load balancing, but the load will be more or less randomly distributed, but you wouldn't be resilient to hardware failures as they would still require changes to DNS.
On the other hand you could use a proxy or a loadbalancing proxy that has a single IP but then distributes the traffic to several back-end boxes. This gives you a single point of failure (the proxy, you could of course have several proxies to defeat that problem) and would also give you the added bonus of being able to use some metric to divide the load more evenly and intelligently than with just round-robin dns.
This setup can also handle hardware failure in the back-end pretty seamlessly. The end user never sees the back-end, just the front-end.
There are other issues to think about as well, if your page uses sessions or other smart logic, you can run into synchronisation problems when your user (potentially) hits different servers on every access.
It does (in general). It depends on what server OS and software you are using, but in general, you'll hit the load balancer for each request, and the load balancer will then farm out the work according to the scheme you have in place (round robin, least busy, session controlled, application controlled, etc...)
andy has part of the answer, but for true load balancing and high availability you would want to use a pair of hardware load balancers like F5 bigips in an active passive configuration.
Yes your domain IP would be hosted on these devices and traffic would connect firstly to those devices. Bigips offer a lot of added functionality including multiple ways of load balancing and some great url rewriting, ssl acceleration, etc. It also allows you to run your web servers on a seperate non routable address scheme and even run multiple sites on different ports with the F5's handling the translations.
Once you introduce load balancing you may have some other considerations to take into account for your application(s) like sticky sessions and session state but that is a different subject