IIS, APACHE, YAWS runtime environment - apache

Recently I gone through a an article explaining potentiality of YAWS server and the number of requests it processes per second. It was mentioned that YAWS can handle 80K requests per second and it also run in multi threaded environment to improve request processing limit.
How can we compare IIS, Apache with YAWS? Which one will process maximum requests? Can I find any comparisons somewhere?

Check this link out:http://www.sics.se/~joe/apachevsyaws.html Link to Yaws vs apache
You see that Yaws handles 80000 concurrent requests (and continuing) while apache fails at around 4000 connections. This is because Yaws runs on the Erlang/OTP VM. Processes belong to this machine and not the operating system. Erlang has been highly customised for concurrent programming. Infact, other erlang web applications like:mochiweb,webmachine, e.t.c are much more powerful than apache when it comes to handling many concurrent requests. Yaws web server scales better than any web server i know of today. With the ability to create appmods, you can create Erlang Applications that communicate over http protocol, making use of the power of yaws.
Yaws home page is: http://yaws.hyber.org/. Actually, Yaws gets its power from OTP (Open Telecom Platform). This set of powerful libraries found at http://erlang.org/, has the most advanced design patterns such as fail over systems, supervision trees, finite state machines, event handlers, e.t.c, You should actually start using erlang for your next web application!!!!

Related

Apache mod_wsgi slowloris DoS protection

Assuming the following setup:
Apache server 2.4
mpm_prefork with default settings (256 workers?)
Default Timeout (300s)
High KeepAliveTimeout (100s)
reqtimeout_mod enabled with the following config: RequestReadTimeout header=62,MinRate=500 body=62,MinRate=500
Outdated mod_wsgi 3.5 using Daemon mode with 15 threads and 1 process
AWS ElasticBeanstalk's load balancer acting as a reverse proxy to apache with 60s idle connection timeout
Python/Django being the wsgi application
A simple slowloris attack like the one described here, using a "slow" request body: https://www.blackmoreops.com/2015/06/07/attack-website-using-slowhttptest-in-kali-linux/
The above attack, with just 15 requests (same as mod_wsgi threads) can easily lock the server until a timeout happens, either due to:
Load balancer timeout (60s) happens due to no data sent, this kills the apache connection and mod_wsgi can once again serve requests
Apache RequestReadTimeout happens due to data being sent, but not enough, again mod_wsgi is able to serve requests after this
However, with just 15 concurrent "slow" requests, I was able to lock the server up to 60 seconds.
Repeating the same but with a more bizarre number, like 4096 requests, pretty much locks the server permanently since there will be always a new request that needs to be served by mod_wsgi once the previous times out.
I would expect that the load balancer should handle/detect this before even sending requests to apache, which it already does for similar attacks (partial headers, or tcp syn flood attacks never hit apache which is nice)
What options are available to help against this? I know there's no failproof option since these kind of attacks are difficult to detect and protect, but it's quite silly that the server can be locked that easily.
Also, if the wsgi application never reads request body, I would expect for the issue to not happen as well since the request should return immediately, but I'm not sure about this or the internals of mod_wsgi, for example, this is true when using a local dev wsgi server (the attack files since the request body is never read) but the attack succeeds when using mod_wsgi, which leads me to think it tries to read the body even before sending it to the wsgi code.
Slowloris is a very simple Denial-of-Service attack. This is easy to detect and block.
Detecting and preventing DoS and DDos attacks are complex topics with many solutions. In your case you are making the situation worse by using outdated software and picking a low worker thread count so that the problem arises quickly.
A combination of services are available that would be used to manage Dos and DDos attacks.
The front-end of the total system would be protected by a firewall. Typically this firewall would include a Web Application Firewall to understand the nuances of HTTP protocols. In the AWS world, Amazon WAF and Shield are commonly used.
Another service that helps is a CDN. Amazon CloudFront uses Amazon Shield so it has good DDoS support.
The next step is to combine load balancers with auto scaling mechanisms. When the health checks start to fail (caused by Slowloris), the auto scaler will begin launching new instances and terminating failed instances. However, a sustained Slowloris attack will just hit the new servers. This is why the Web Application Firewall needs to detect the attack and start blocking it.
For your studies, take a look at mod_reqtimeout. This is an effective and tuneable solution for Apache for most Slowloris attacks.
[Update]
In the Amazon DDoS White Paper June 2015, Slowloris is specifically mentioned.
On AWS, you can use Amazon CloudFront and AWS WAF to defend your
application against these attacks. Amazon CloudFront allows you to
cache static content and serve it from AWS Edge Locations that can
help reduce the load on your origin. Additionally, Amazon CloudFront
can automatically close connections from slow-reading or slow-writing
attackers (e.g., Slowloris).
Amazon DDoS White Paper June 2015
In mod_wsgi daemon mode there are a bunch of options to further help to combat such attacks by recovering from it and discarding queued requests as well which have been waiting too long. Try your tests using mod_wsgi-express as it defines defaults for a lot of these options whereas when using mod_wsgi yourself directly, there are no defaults. Use mod_wsgi-express start-server --help to see what defaults are. The actual options you want to look at for mod_wsgi daemon mode are request-timeout, connect-timeout, socket-timeout and queue-timeout. There are also other options related to buffer sizes and listener backlog you can play with. Do note that ultimately the listen backlog of the main Apache worker processes can still be an issue because it usually defaults to 500, which means a lot of requests can queue up stuck before you can even tag them with a time so as to help discard the backlog by tracking queue time.
You can find the documentation at:
http://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIDaemonProcess.html
On the point of whether mod_wsgi reads the request body before sending it, no it doesn't. Apache itself because it reads in block may partially read the request body when reading the headers, but it shouldn't block on it. Once the full request headers are passed off to mod_wsgi and sent through to the daemon process, then mod_wsgi will start transferring the request body.
Soloution:
If you are getting hit, I recommend you go to a provider that protects against DDoS attacks. However your best bet would be to programatically block the IP once it has been decided that it is being malicious. If you receive two large Content-Length POST requests than you should block the IP for a few minutes for suspicious activities. Many large companies are very cheap, and some of them are free for the basic package such as Cloud Flare. I use them for my company and I am beyond happy to have them!
Edit: Their job is literally just to protect you. That is it.

Apache or nginx ? I like to understand the basic working flow of Nginx , its advantage and disadvantage

Pros & cons over Apache or nginx and how they work internally in order to maximize the resource utilization
Can I use Apache & Nginx together ? If I use only Nginx then what problem I can face ?
Apache has some disadvantages, especially when it is used with the PHP module.
Apache's process model is such that each connection uses a separate process. Each process carries all the overhead of PHP and any other modules you may have loaded with it. An Apache process might run a PHP script or serve static content for one request. If the PHP has a memory leak (which does happen sometimes), the process continues to grow in size. Also, when KeepAlive is enabled, which is usually recommended, that process stays alive for a few seconds after the connection, consuming a "slot" that another client might be able to use and helping the server to reach its MaxClients sooner.
Nginx is an alternative webserver that normally uses the Linux "epoll" API to process requests in a non-blocking mode. This means that one single process can handle many simultaneous connections. Epoll is an efficient way to tell the single process which connection(s) it needs to deal with and which can wait. Nginx has a goal of solving the "C10k" problem - how to have 10,000 concurrent connections.
This naturally goes hand in hand with php-fpm, the FastCGI Process Manager. Nginx itself does not have PHP built-in. When it receives a request for a PHP script, it makes a call out to php-fpm to run the script, which then returns the result to nginx, which returns it to the client.
This all uses a lot less memory than a similar Apache+mod_php configuration.
There are a couple more huge advantages of php-fpm over mod_php:
It uses different "pools", each of which can run as a separate Linux user. This provides a simple and effective way of isolating websites (for example, if they are run by different customers who should not read each other's code) without the overhead or nastiness of suexec or suphp.
It has a slow log feature where it can dump a PHP stack trace of any script that has been running for greater than X seconds. This can help diagnose slow code issues.
Php-fpm can be run with Apache, and in fact this allows you to take advantage of Apache's more efficient Worker MPM (or Event in Apache 2.4). However, my experience is that configuring it in Apache is significantly more complex than configuring it in nginx, and even with Worker, it still is not quite as efficient with nginx.
Disadvantages of moving to nginx - not many, but things to keep in mind:
It does not support .htaccess files. I think this is a good thing personally as .htaccess files must be parsed by Apache for every request, which can cause significant overhead.
Configuration files need to be re-written. If you have many complex site configurations, this could take some doing. For simple cases it is not usually a big deal.
Feature Of Nginx
Nginx is fast because it does not need to create a new process for
each new request.
HTTP proxy and Web server features
Ability to handle more than 10,000 simultaneous connections with a
low memory footprint (~2.5 MB per 10k inactive HTTP keep-alive
connections)
Handling of static files, index files, and auto-indexing
Reverse proxy with caching
Load balancing with in-band health checks
Fault tolerance
Nginx uses very little memory, especially for static Web pages..
FastCGI, SCGI, uWSGI support with caching
Name- and IP address-based virtual servers
IPv6-compatible
SPDY protocol support
FLV and MP4 streaming
Web page access authentication
gzip compression and decompression
URL rewriting having its own rewrite engine
Custom logging with on-the-fly gzip compression
Response rate and concurrent requests limiting
Bandwidth throttling
Server Side Includes
IP address-based geolocation
User tracking
WebDAV
XSLT data processing
Embedded Perl scripting
Nginx is highly scalable, and performance is not dependent on
hardware.
With only Nginx, you lose a whole bunch of apache-specific features such as all the mod_dav stuff. You lose a lot of modules, effectively
Conclusion
The best use for nginx is in front of Apache if you need Apache modules. Use it as a load-balancer if you might, between multiple Apache instances, and you suddenly have a mixed set-up that is rather

The Node.js event loop - nginx/apache

Both nginx and Node.js have event loops to handle requests. I put nginx in front of Node.js as has been recommended here
Using Node.js only vs. using Node.js with Apache/Nginx
with the setup shown here
Node.js + Nginx - What now?
How do the two event loops play together? Is there any risk of conflicts between the two? I wonder because Nginx may not be able to handle as many events per second as Node.js or vice versa. For example, if Nginx can handle 1000 events per second but node.js only 500, won't that cause issues? (I have no idea if 1000,500 are reasonable orders of magnitude, you could correct me on that.)
What about putting Apache in front of Node.js? Apache has no event loop. Just threads. So won't putting Apache in front of Node.js defeat the purpose?
In this 2010 talk, Node.js creator Ryan Dahl had vision to get rid of nginx/apache/whatever entirely and make node talk directly to the internet. When do you think this will be reality?
Both nginx and Node use an asynchronous and event-driven approach. The communication between them will go more or less like this:
nginx receives a request
nginx forwards the request to the Node process and immediately goes back to wait for more requests
Node receives the request from nginx
Node handles the request with minimal CPU usage, until at some point it needs to issue one or more I/O requests (read from a database, write the response, etc). At this point it launches all these I/O requests and goes back to wait for more requests.
The above can repeat lots of times. You could have hundreds of thousands of requests all in a non-blocking wait state where nginx is waiting for Node and Node is waiting for I/O. And while this happens both nginx and Node are ready to accept even more requests!
Eventually async I/O started by the Node process will complete and a callback function will get invoked.
If there are still I/O requests that haven't completed for this request, then Node goes back to its loop one more time. It can also happen that once an I/O operation completes this data is consumed by the Node callback and then new I/O needs to happen, so Node can start more async I/O requests before going back to the loop.
Eventually all I/O operations started by Node for a particular request will be complete, including those that write the response back to nginx. So Node ends this request, and then as always goes back to its loop.
nginx receives an event indicating that response data has arrived for a request, so it takes that data and writes it back to the client, once again in a non-blocking fashion. When the response has been written to the client and event will trigger and nginx will then end the request.
You are asking about what would happen if nginx and Node can handle a different number of maximum connections. They really don't have a maximum, the maximum in general comes from operating system configuration, for example from the maximum number of open handles the system can have at a time or the CPU throughput. So your question does not really apply. If the system is configured correctly and all processes are I/O bound, neither nginx or Node will ever block.
Putting Apache in front of Node will only work well if you can guarantee that your Apache never blocks (i.e it never reaches its maximum connection limit). This is hard/impossible to achieve for large number of connections, because Apache uses an individual process or thread for each connection. nginx and Node scale really well, Apache does not.
Running Node without another server in front works fine and it should be okay for small/medium load sites. The reason putting a web server in front of it is preferred is that web servers like nginx come with features that Node does not have and you would need to implement yourself. Things like caching, load balancing, running multiple apps from the same server, etc.
I think your questions have been largely covered by some of the others answers, but there are a few pieces missing, and some that I disagree with, so here are mine:
The event loops are isolated from each other at the process level, but do interact. The issues you're most likely to encounter are around the configuration of nginx response buffers, chunked data, etc. but this is optimisation rather than error resolution.
As you point out, if you use Apache you're nullifying the benefit of using Node.js, i.e. massive concurrency and websockets. I wouldn't recommend doing that.
People are already using Node.js at the front of their stack. Searching for benchmarks returns some reasonable-looking results in Node's favour, so performance to my mind isn't an issue. However, there are still reasons to put Nginx in front of Node.
Security - Node has been given increasing scrutiny, but it's still young. You may not have problems here, but caution is often your friend.
Training - Ops staff that you hire will know how to manage Nginx, but the configuration and management of your custom Node app will only ever be understood by those people your developers successfully communicate it to. In some companies this is nobody.
Operational Flexibility - If you reach scale you might want to split out the serving of static content, purely to reduce the load on your app servers. You might want to split content amongst different domains and have it managed separately, or have different SSL or proxying behaviour for different domains or URL patterns. These are the things that are easy for Ops guys to configure in Nginx, but you'd have to code manually in a Node app.
The event loops are independent. Event loops are implemented at the application level, so neither cares what sort of architecture the other uses.
NodeJS is good at many things, but there are some places where it still falters. Once example is serving static files. At the moment, nodejs performs fairly poorly in this test, so having a dedicated web server for your static files greatly improves response time. Also, nodejs is still in its infancy, and has not been "tested and hardened" in the matters of security like Apache on nginX.
It'll take a long time for people to consider fronting nodejs all by itself. The cluster module is a step in the right direction, but it'll take a long time even after it reaches v1 before it happens.
Both event loops are unrelated. They don't play together.
Yes, it is pretty useless. Apache is not a load balancer.
What Ryan Dahl said may be applicable already. The limit of concurrent users is definitely higher than that of Apache. Before node.js websites with fair amount of concurrent users had to use nginx to balance the load. For small to medium sized businesses it can be done with node.js alone. But ruling out nginx completely will take time. Let node.js be stable before it can follow this ambitious dream.

What specifically makes Node.js more scalable than Apache?

To be honest I've not understood it completely yet - and I even do understand how Node.js works, as a single thread using the event model. I just don't get how this is better than Apache, and how it scales horizontally if it's single-threaded.
I've found that this blog post by Tomislav Capan explains it very well:
Why The Hell Would I Use Node.js? A Case-by-Case Introduction
My interpretation of the gist of it, for Node 0.10, compared to Apache:
The good parts
Node.js avoids spinning up threads for each request, or does not need to handle pooling of requests to a set of threads like Apache does. Therefore it has less overhead to handle requests, and excels at responding quickly.
Node.js can delegate execution of the request to a separate component, and focus on new requests until the delegated component returns with the processed result. This is asynchronous code, and is made possible by the eventing model. Apache executes requests in serial within a pool, and cannot reuse the thread when one of its modules is simply waiting for a task to complete. Apache will then queue requests until a thread in the pool becomes available again.
Node.js talks JavaScript and is therefore very fast in passing through and manipulating JSON retrieved from external web API sources like MongoDB, reducing time needed per request. Apache modules, like PHP, may need more time, because they cannot efficiently parse and manipulate JSON because they need marshalling to process the data.
The bad parts
Note: most of the bad parts listed below will be improved with the upcoming version 0.12, something to keep aware of.
Node.js sucks at computational intensive tasks, because whenever it does something long running, it will queue all other incoming requests, due to its single thread. Apache will generally have more threads available, and the OS will neatly and fairly schedule CPU time between these threads, still allowing new threads to be handled, albeit a bit slower. Except when all available threads in Apache are handling requests, then Apache will also start queueing requests.
Node.js doesn't fully utilize multi-core CPUs, unless you make a Node.js cluster or spin up child processes. Ironically, if you do the latter two, you may add more orchestrating overhead, the same issue that Apache has. Logically you could also spin up more Node.js processes, but this is not managed by Node.js. You would have to test your code to see what works better; 1) multi-threading from within Node.js with clusters and child processes, or 2) multiple Node.js processes.
Mitigations
All server platforms have an upper limit. Node.js and Apache both will reach it at some point.
Node.js will reach it the fastest when you have heavy computational tasks.
Apache will reach it the fastest when you throw tons of small requests at it that require long serial execution.
Three things you could do to scale the throughput of Node.js
Utilize multi-core CPUs, by either setting up a cluster, use child processes, or use a multi-process orchestrator like Phusion Passenger.
Setup worker roles connected with a message queue. This will be the most effective solution against computational intensive long running requests; off-load them to a worker farm. This will split up your servers in two parts; 1) public facing clerical servers that accept requests from users, and 2) private worker servers handling long running tasks. Both are connected with a message queue. The clerical servers add messages (incoming long-running requests) to the queue. The worker roles listen for incoming messages, handle those, and may return the result into the message queue. If request/response is needed, then the clerical server could asynchronously wait for the response message to arrive in the message queue. Examples of message queues are RabbitMQ and ZeroMQ.
Setup a load balancer and spin up more servers. Now that you efficiently use hardware and delegate long running tasks, you can scale horizontally. If you have a load balancer, you can add more clerical servers. Using a message queue, you can add more worker servers. You could even set this up in the cloud so that you could scale on demand.
It depends on how you use it. Node.js is single threaded by default, but using the (relatively) new cluster module you can scale horizontally across multiple threads.
Furthermore, your database needs will also dictate how effective scaling is with node. For example, using MySQL with node.js won't get you nearly as much benefit as using MongoDB, because of the event driven nature of both MongoDB and node.js.
The following link has a lot of nice benchmarks of systems with different setups:
http://www.techempower.com/benchmarks/
Node.js doesn't rank the highest but compared to other setups using nginx (no apache on their tables, but close enough) it does pretty well.
Again though, it highly depends on your needs. I believe if you are simply serving static websites it is recommend you stick with a more traditional stack. However people have done some amazing things with node.js for other needs: http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/ (c10k? ha!)
Edit: It is worth mentioning that you really aren't 'replacing' just apache with node.js. You would be replacing apache AND php (in a typical lamp stack).

Low latency web server/load balancer for the non-Twitters of the world

Apache httpd has done me well over the years, just rock solid and highly performant in a legacy custom LAMP stack application I've been maintaining (read: trying to escape from)
My LAMP stack days are now numbered and am moving on to the wonderful world of polyglot:
1) Scala REST framework on Jetty 8 (on the fence between Spray & Scalatra)
2) Load balancer/Static file server: Apache Httpd, Nginx, or ?
3) MySQL via ScalaQuery
4) Client-side: jQuery, Backbone, 320 & up or Twitter Bootstrap
Option #2 is the focus of this question. The benchmarks I have seen indicate that Nginx, Lighthttpd, G-WAN (in particular) and friends blow away Apache in terms of performance, but this blowing away appears to manifest more in high-load scenarios where the web server is handling many simultaneous connections. Given that our server does max 100gb bandwidth per month and average load is around 0.10, the high-load scenario is clearly not at play.
Basically I need the connection to the application server (Jetty) and static file delivery by the web server to be both reliable and fast. Finally, the web server should double duty as a load balancer for the application server (SSL not required, server lives behind an ASA). I am not sure how fast Apache Httpd is compared to the alternatives, but it's proven, road warrior tested software.
So, if I roll with Nginx or other Apache alternative, will there be any difference whatsoever in terms of visible performance? I assume not, but in the interest of achieving near instant page loads, putting the question out there ;-)
if I roll with Nginx or other Apache alternative, will there be any difference whatsoever in terms of visible performance?
Yes, mostly in terms of latency.
According to Google (who might know a thing or tow about latency), latency is important both for the user experience, high search-engine rankings, and to survive high loads (success, script kiddies, real attacks, etc.).
But scaling on multicore and/or using less RAM and CPU resources cannot hurt - and that's the purpose of these Web server alternatives.
The benchmarks I have seen indicate that Nginx, Lighthttpd, G-WAN (in particular) and friends blow away Apache in terms of performance, but this blowing away appears to manifest more in high-load scenarios where the web server is handling many simultaneous connections
The benchmarks show that even at low numbers of clients, some servers are faster than others: here are compared Apache 2.4, Nginx, Lighttpd, Varnish, Litespeed, Cherokee and G-WAN.
Since this test has been made by someone independent from the authors of those servers, these tests (made with virtualization and 1,2,4,8 CPU Cores) have clear value.
There will be a massive difference. Nginx wipes the floor with Apache for anything over zero concurrent users. That's assuming you properly configure everything. Check out the following links for some help diving into it.
http://wiki.nginx.org/Main
http://michael.lustfield.net/content/dummies-guide-nginx
http://blog.martinfjordvald.com/2010/07/nginx-primer/
You'll see improvements in terms of requests/second but you'll also see significantly less RAM and CPU usage. One thing I like is the greater control over what's going on with a more simple configuration.
Apache made a claim that apache 2.4 will offer performance as good or better than nginx. They made a bold claim calling out nginx and when they made that release it kinda bit them in the ass. They're closer, sure, but nginx still wipes the floor in almost every single benchmark.