NGINX as a Web Server + Load Balancer with Cacheing Enabled - apache

We currently run a SaaS application on apache which server ecommerce websites (its a store builder). We currently host over 1000 clients on that application and are now running into scalability issues (CPU going over 90% even on a fairly large 20 core 80GB ram + all SSD disk server).
We're looking for help from an nginx expert who can:
1. Explain the difference between running nginx as a web server vs. using it like a reverse proxy. What are the benefits?
2. We also want to use nginx as a load balancer (and have that already setup in testing), but we haven't enabled cacheing on the load balancer. So while its helping redirect requests, its not really serving any traffic directly and it simply passes through everything to one of the two apache servers.
The question is that we have a lot of user-generated content coming from the apache servers, how do we invalidate the cache for only certain pages that are being cached by nginx? If we setup a cron to clear this cache every 1 minute or so, it wouldn't be that useful... as cache would then be virtually non existent.
--
Also need an overall word on what is the best architecture to build for given the above scenarios.
Is it
NGINX Load Balancer + Cacheing ==> Nginx Web Server
NGINX Load Balancer ==> Nginx Web Server + Cacheing ?
NGINX Load Balancer + Cacheing ==> Apache Web Server
NGINX Load Balancer == > Apache Web Server (unlikely)
Please help!

Scaling horizontally to support more clients is a good option. Its recommended to first evaluate what is causing the bottleneck, memory within the application, long running requests etc.
Nginx Vs other web servers: Nginx is a HTTP server and not a servlet engine. Given that, you can check if it fits your needs.
It is a fast web server. You need to evaluate the benefits of using it as a single stand alone webserver against other web servers. Speed and memory could help.
Nginx as a load balancer:
You can have multiple web server instances behind nginx.
It supports load balancing algorithms like round robin, weighted etc so the load can be distributed based on the resource availability.
It helps in terminating ssl at Nginx, filter requests, modify headers,
compression, application upgrades wihtout downtime, serve cached content etc. This frees up resources on the server running the application. Also separation of concerns.
This setup is a reverse proxy and the benefits to it.
You can handle cache expiry with nginx. nginx documentaion has good details http://nginx.com/resources/admin-guide/caching/

Related

Can F5 work as both webserver as well as load balancer

I have a fair bit of understanding of the webserver namely apache httpd. We have a web component which is is build in Angular, HTML5, CSS3.
We deploy the UI compoenent in apache httpd 2.4.
Recently someone proposed to replace this with F5 load balancer.
Is it possible to replace webserver with load balancer ?
Can we deploy the html components in F5 load balancer ?
My understanding is that load balancer helps in clustering the webservers based on different algorithms like Round robin, Weighted round robin, Least Connection etc and cannot indpendently the server the requests coming from end user.
You can build out basic web functionality in iRules, but if you need more than something like a static or slighting dynamic maintenance page or serving a proxy pac file I'd recommend against it.
Maintenance Page.
Serving a Proxy Pac file
I need to clean up the formatting of the second link, but you'll get the idea.

Varnish x Apache CDN & HTTPs

I'm currently trying to setup a DYI CDN using Varnish, Nginx, & Apache.
This is the following setup I have planned.
The following assumes:
1. The Varnish Origin server is on the same server as the web server (Apache in this case)
2. You have 2 Varnish Cache Servers located in different countries, one in NA, & one in EU
Example of an NA client attempting to retrieve data:
NA Client --> Varnish Origin Server --> NA Varnish Cache Server --> If result NOT IN cache --> Return Apache Origin Server results --> Input request data into NA Varnish Cache Server
Example of an EU client attempting to retrieve data:
EU Client --> Varnish Origin Server --> EU Varnish Cache Server --> If result IN cache --> Return EU Varnish Cache results
Any suggestions and/or mistakes? Where would I insert Nginx/HAProxy in order to terminate SSL, since Varnish doesn't accept HTTPs?
What you're suggesting is perfectly possible and has become an increasingly more popular use case for us at Varnish Software.
Geolocation
First things first: I'm assuming all users, regardless of their location, will use the same hostname to connect to the CDN. Let's say the hostname is www.example.com.
US users should automatically connect to a Varnish edge node in the US, EU users should be directed to the EU.
This geographical targeting requires some sort of GeoDNS approach. If your DNS provider can do this for you, things will be a lot easier.
If not, I know for a fact that AWS Route 53 does this. If you want to use open source technology, you can host the DNS zone yourself using https://github.com/abh/geodns/.
So if a US user does a DNS call on www.example.com, this should resolve to us.example.com. For EU users this will be eu.example.com.
Topology
Your setup will connect from the a local Varnish server to a remote Varnish server. This seems like one hop too many. If the geolocation properly works, you'll directly end up on the Varnish server that is closest to your user.
We call these geolocated servers "edge nodes". They will connect back to the origin server(s) in case requested content is not available in cache.
It's up to you to decide if one origin Apache will do, or if you want to duplicate your Apache servers in the different geographical regions.
SSL/TLS
My advice in terms of SSL/TLS termination: Use Hitch. It's a dedicated TLS Proxy that was developed by Varnish, to use with Varnish. It's open source.
You can install Hitch on each Varnish server and accept HTTPS there. The connection between Hitch and Varnish can be done over Unix Domain Sockets, which further reduces latency.
Our tests show you can easily process 100 Gbps on a single server using terminated TLS with Hitch.
Single tier Varnish or multi-tier Varnish
If your CDN requires a lot of storage, I'd advise you to setup a multi-tier Varnish setup in each geographical location:
The edge tier will be RAM heavy and will cache the hot content in memory using the malloc stevedore in Varnish
The storage tier will be disk heavy and will cache long tail content on disk using the file stevedore in Varnish
Although the file stevedore is capable of caching terrabytes of data, it is quite prone to disk fragmentation, which at very large scale will slow you down in the long run.
If you have tiered Varnish servers, you can tune each tier to its needs. Combined, the results will be quite good: although the file stevedore has its limitations, it will still be a lot faster than constantly accessing the origin when the cache of the edge servers is full.
Varnish Software's DIY CDN solution
Varnish Software, the company behind the Varnish Cache project, has done many CDN integration projects for some of the world's biggest web platforms.
Varnish Cache, the open source project, is the foundation of these CDN solutions. However, typical CDN clients have some extra requirements, that are not part of the open source solution.
That's why we developed Varnish Enterprise, to tackle these limitations.
Have a look at Varnish Software's DIY CDN solution to learn more. Please also have a look at the docs containing the extra features of the product.
If you want to play around with these features without buying a license up front, you can play around with Varnish Enterprise images in the Cloud.
We have an AWS image available on the AWS marketplace.
We have an Azure image available on the Azure marketplace.
We have a GCP image available on the GCP marketplace
Our most significant CDN feature in Varnish Enterpise is the Massive Storage Engine. It was specifically built to counter the limitations of the file stevedore that is prone to disk fragmentation and non-persistent.
There's a lot of other cool stuff in Varnish Enterprise for CDN as well, but you'll find that on the docs pages I referred to.

Weblogic vs Apache load balancer

In our typical production environment, Apache web server works as proxy to our application server like weblogic. I have question about load balancing. Both apache and web logic provide its own functionality of load balancing. If apache can balance the load, what is the use of web logic load balancer.
As mentioned in the oracle doc Load Balancing, there are many ways of doing load balancing for weblogic. Should you already have an Apache web server, it is better to use that instead of having Weblogic do the load balancing. The load balancer must typically be off the JVM because the should there be higher traffic, weblogic must have reserve resources for these incidents. Apache does load balancing very easily but weblogic requires more effort as it is an additional feature. Its basically like a boat in water and a car that can also float (the car being weblogic).

Improving a Web App's Performance

My web app, an exploded WAR, is hosted by Apache (static content) and Tomcat (dynamic content) via mod_jk. Optionally, there's an ActiveMQ component of this system, but it's currently not being used.
As I understand, each HTTP request will hit Apache. If it's a dynamic content request, Apache will forward the request to Tomcat via mod_jk. To fulfill this request, Tomcat will start a new thread to do the work.
I'm running the app on a 6-core, 12 GB RAM machine.
Besides using the ActiveMQ component, how can I improve my system's performance? Also, please correct me if I'm misstating how Apache and Tomcat communicate.
while (unhappyWithSitePerformance) {
executeLoadTest();
identifyBiggestBottleneck(); // e.g. what breaks first
fixIdentifiedBottleneck();
}
There is no blank silver bullet to provide. You should make sure your load test simulates realistic user behaviour and define the number of (virtual) users you want your server to handle within given answering time. Then tune your server until your goal is met.
Common parameters to look for are
memory consumption
CPU consumption (e.g. certain algorithms)
I/O saturation - e.g. communication to the database, general HTTP traffic saturating the network adapter
Database or backend answering time - e.g. sometimes you'll have to tune the backend, not the webserver itself.

What's the most scalable and high performing Amazon Web Service (AWS) configuration for a RESTful web service?

I'm building an asynchronous RESTful web service and I'm trying to figure out what the most scalable and high performing solution is. Originally, I planned to use the FriendFeed configuration, using one machine running nginx to host static content, act as a load balancer, and act as a reverse proxy to four machines running the Tornado web server for dynamic content. It's recommended to run nginx on a quad-core machine and each Tornado server on a single core machine. Amazon Web Services (AWS) seems to be the most economical and flexible hosting provider, so here are my questions:
1a.) On AWS, I can only find c1.medium (dual core CPU and 1.7 GB memory) instance types. So does this mean I should have one nginx instance running on c1.medium and two Tornado servers on m1.small (single core CPU and 1.7 GB memory) instances?
1b.) If I needed to scale up, how would I chain these three instances to another three instances in the same configuration?
2a.) It makes more sense to host static content in an S3 bucket. Would nginx still be hosting these files?
2b.) If not, would performance suffer from not having nginx host them?
2c.) If nginx won't be hosting the static content, it's really only acting as a load balancer. There's a great paper here that compares the performance of different cloud configurations, and says this about load balancers: "Both HaProxy and Nginx forward traffic at layer 7, so they are less scalable because of SSL termination and SSL renegotiation. In comparison, Rock forwards traffic at layer 4 without the SSL processing overhead." Would you recommend replacing nginx as a load balancer by one that operates on layer 4, or is Amazon's Elastic Load Balancer sufficiently high performing?
1a) Nginx is asynchronous server (event based), with single worker itself they can handle lots of simultaneous connection (max_clients = worker_processes * worker_connections/4 ref) and still perform well. I myself tested around 20K simultaneous connection on c1.medium kind of box (not in aws). Here you set workers to two (one for each cpu) and run 4 backend (you can even test with more to see where it breaks). Only if this gives you more problem then go for one more similar setups and chain them via an elastic load balancer
1b) As said in (1a) use elastic load balancer. See somebody tested ELB for 20K reqs/sec and this is not the limit as he gave up as they lost interest.
2a) Host static content in cloudfront, its CDN and meant for exactly this (Cheaper and faster then S3, and it can pull content from s3 bucket or your own server). Its highly scalable.
2b) Obviously with nginx serving static files, it will now have to serve more requests to same number of users. Taking that load away will reduce work of accepting connections and sending the files across (less bandwidth usage).
2c). Avoiding nginx altogether looks good solution (one less middle man). Elastic Load balancer will handle SSL termination and reduce SSL load on your backend servers (This will improve performance of backends). From above experiments it showed around 20K and since its elastic it should stretch more then software LB (See this nice document on its working)