What exactly is a pre-fork web server model? - apache

I want to know what exactly it means when a web server describes itself as a pre-fork web server. I have a few examples such as unicorn for ruby and gunicorn for python.
More specifically, these are the questions:
What problem does this model solve?
What happens when a pre-fork web server is initially started?
How does it handle requests?
Also, a more specific question for unicorn/gunicorn:
Let's say that I have a webapp that I want to run with (g)unicorn. On initialization, the webapp will do some initialization stuff (e.g. fill in additional database entries). If I configure (g)unicorn with multiple workers, will the initialization stuff be run multiple times?

Pre-forking basically means a master creates forks which handle each request. A fork is a completely separate *nix process.
Update as per the comments below. The pre in pre-fork means that these processes are forked before a request comes in. They can however usually be increased or decreased as the load goes up and down.
Pre-forking can be used when you have libraries that are NOT thread safe. It also means issues within a request causing problems will only affect the process which they are processed by and not the entire server.
The initialisation running multiple times all depends on what you are deploying. Usually however connection pools and stuff of that nature would exist for each process.
In a threading model the master would create lighter weight threads to dispatch requests too. But if a thread causes massive issues it could have repercussions for the master process.
With tools such as Nginx, Apache 2.4's Event MPM, or gevent (which can be used with Gunicorn) these are asynchronous meaning a process can handle hundreds of requests whilst not blocking.

How does a "pre-fork worker model" work?
Master Process: There is a master process that spawns and kills workers, depending on the load and the capacity of the hardware. More incoming requests would cause the master to spawn more workers, up to a point where the "hardware limit" (e.g. all CPUs saturated) is reached, at which point queing will set in.
Workers: A worker can be understood as an instance of your application/server. So if there are 4 workers, your server is booted 4 times. It means it occupies 4 times the "Base-RAM" than only one worker would, unless you do shared memory wizardry.
Initialization: Your initialization logic needs to be stable enough to account for multiple servers. For example, if you write db entries, check if they are there already or add a setup job before your app server
Pre-fork: The "pre" in prefork means that the master always adds a bit more capacity than currently required, such that if the load goes up the system is "already ready". So it preemptively spawns some workers. For example in this apache library, you control this with the MinSpareServers property.
Requests: The requests (TCP connection handles) are being passed from the master process to the children.
What problem do pre-fork servers solve?
Multiprocessing: If you have a program that can only target one CPU core, you potentially waste some of your hardware's capacity by only spawning one server. The forked workers tackle this problem.
Stability: When one worker crashes, the master process isn't affected. It can just spawn a new worker.
Thread safety: Since it's really like your server is booted multiple times, in separate processes, you don't need to worry about threadsafety (since there are no threads). This means it's an appropriate model when you have non-threadsafe code or use non-threadsafe libs.
Speed: Since the child processes aren't forked (spawned) right when needed, but pre-emptively, the server can always respond fast.
Alternatives and Sidenotes
Container orchestration: If you're familiar with containerization and container orchestration tools such as kubernetes, you'll notice that many of the problems are solved by those as well. Kubernetes spawns multiple pods for multiprocessing, it has the same (or better) stability and things like "horizontal pod autoscalers" that also spawn and kill workers.
Threading: A server may spawn a thread for each incoming request, which allows for many requests being handled "simultaneously". This is the default for most web servers based on Java, since Java natively has good support for threads. Good support meaning the threads run truly parallel, on different cpu cores. Python's threads on the other hand cannot truly parallelize (=spread work to multiple cores) due to the GIL (Global Interpreter Lock), they only provide a means for contex switching. More on that here. That's why for python servers "pre-forkers" like gunicorn are so popular, and people coming from Java might have never heard of such a thing before.
Async / non-blocking processing: If your servers spend a lot of time "waiting", for example disk I/O, http requests to external services or database requests, then multiprocessing might not be what you want. Instead consider making your code "non-blocking", meaning that it can handle many requests concurrently. Async / await (coroutines) based systems like fastapi (asgi server) in python, Go or nodejs use this mechanism, such that even one server can handle many requests concurrently.
CPU bound tasks: If you have CPU bound tasks, the non-blocking processing mentioned above won't help much. Then you'll need some way of multiprocessing to distribute the load on your CPU cores, as the solutions mentioned above, that is: container orchestration, threading (on systems that allow true parallelization) or... pre-forked workers.
Sources
https://www.reddit.com/r/learnprogramming/comments/25vdm8/what_is_a_prefork_worker_model_for_a_server/
https://httpd.apache.org/docs/2.4/mod/prefork.html

Related

Failed Deployment in App Engine Google Cloud

I am deploying my nodejs application in google cloud app engine but it is giving error
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time.
This request may thus take longer and use more CPU than a typical request for your application. -- when making request.
I had also saw some stackoverflow answers, but they didn't worked for me.
my app.yaml have this config
runtime: nodejs10
Can anyone help me out
You could add the following to your app.yaml:
inbound_services:
- warmup
And then implement a handler that will catch all warmup requests, so that your application doesn't get the full load. The full explanation is given here. Another detailed post about this topic can be found here.
Additionally you can also add automatic scaling options. You can play a bit with those to find the optimum for your application. Especially the latency related variables are important. Good to note that they can be set in a standard GAE environment.
automatic_scaling:
min_idle_instances: automatic
max_idle_instances: automatic
min_pending_latency: automatic
max_pending_latency: automatic
More scaling options can be found here.
The "request caused a new process to be started" notification usually occurred when there is no warm up request present in your application.
Can you try to implement a health check handler that only returns a ready status when the application is warmed up. This will allow your service to not receive traffic until it is ready.
Warning: Legacy health checks using the /_ah/health path are now
deprecated, and you should migrate to use split health checks.
Here you can find Split health checks for Nodejs
Liveness checks
Liveness checks confirm that the VM and the Docker container are
running. Instances that are deemed unhealthy are restarted.
path: "/liveness_check"
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
Readiness checks
Readiness checks confirm that an instance can accept incoming
requests. Instances that don't pass the readiness check are not added
to the pool of available instances.
path: "/readiness_check"
check_interval_sec: 5
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 300
Edit
For App Engine Standard, which doesn't afford you that flexibility, hardware and software failures that cause early termination or frequent restarts can occur without prior warning. link
App Engine attempts to keep manual and basic scaling instances running
indefinitely. However, at this time there is no guaranteed uptime for
manual and basic scaling instances. Hardware and software failures
that cause early termination or frequent restarts can occur without
prior warning and can take considerable time to resolve; thus, you
should construct your application in a way that tolerates these
failures.
Here are some good strategies for avoiding downtime due to instance
restarts:
Reduce the amount of time it takes for your instances restart or for
new ones to start.
For long-running computations, periodically create
checkpoints so that you can resume from that state.
Your app should be "stateless" so that nothing is stored on the instance.
Use queues for performing asynchronous task execution.
If you configure your instances to manual scaling: Use load balancing across > multiple instances. Configure more instances than required to handle normal
traffic. Write fall-back logic that uses cached results when a manual
scaling instance is unavailable.
Instance Uptime

NServiceBus Outbox with RavenDB persistence clean up / DeduplicationDataCleanup design

I'm looking into an issue where we're seeing CPU usage maxed out on a RavenDB instance using NServiceBus outbox implementation.
The design currently has all outbox workers configuring the deduplicationdatacleanup settings in their start up configuration. i.e. these settings:
endpointConfiguration.SetTimeToKeepDeduplicationData(TimeSpan.FromDays(7));
endpointConfiguration.SetFrequencyToRunDeduplicationDataCleanup(TimeSpan.FromMinutes(1));
If you've got multiple workers that are processing messages for the system, should the cleanup be implemented on each of those, or should it be treated like a cronjob where you run the clean up process on just one of those workers, or a dedicated system in the environment that is not a worker, but more of a utility role?
I would imagine the latter, otherwise if you scale out workers, all of them are going to be trying to run the cleanup process every minute, or am I understanding the way this configuration executes the clean up incorrectly?
Thanks

Handling Http Request with Apache2 (or Nginx). Does a new process gets created for each or a set of N requests?

Will a web server (WS) (like apache2 or nginix (or container like tomcat(TC)) create a new process to handle incoming request. My concern is about servers that support high number of parallel users (say 20K+ parallel users).
I think load balancing happens on the other side of web server (if it is used to front Tomcat etc). So in theory, a single web server should be accepting all the (20K+)incoming request before it can distribute the load to other servers backing it.
So, the questions is: Does Web Server (WS) handle all these requests in a single process or it smartly spawns other process to help share the work (i know the "client - server" binding happens though - client_host:random_port plus server_host:fixed_port).
Reference: Prior to reading this article:Fronting Tomcat with Apache I was thinking it is a single process doing all the smart work. But in this article there is mentioning of MPM (Multi-Processing Module)
It combines the best from two worlds, having a set of child processes each having a set of separate threads. There are sites that are running 10K+ concurrent connections using this technology.
And as it goes, it is getting more sophisticated as threads also being spawned like mentioned above. (these are not the tomcat threads that serve each individual request by calling the service method, but these are threads on Apache WS to handle request and distribute them to nodes for processing).
If any one used MPM. Little further explanation of how all this works will be great.
Questions like -
(1) As child processes are spawned what is it exact role. Is the child process just for mediating the request to tomcat or any thing more. If so, then after the child process gets response from TC, does the child process forward the response to parent process or directly to the client (since it can know the client_host:random_port from parent process. I am not sure if this is allowed in theory, though the child process can not accept any new request as the fixed_port which can bind to only one process is already tied to parent process.
(2) What kind of load is shared to thread by the child or parent process. Again it must almost be same as in (1). But what I am not sure is that even in theory if a thread can directly send the request to client.
Apache historically use prefork model of processing. In this model each request == separate operation system (OS) process. It's calling "prefork" because Apache fork some spare processes and process request within. If number of preforked processes not enough - Apache fork new. Pros: process can execute other modules or processes and not care that they do; cons: each request = one process, too much memory used and OS fork also can be slow for your requests.
Other model of Apache - worker MPM. Almost same as prefork, but using not OS processes but OS threads. Thread - it's like lightweight process. One OS process can run many threads using one memory space. Worker MPM used much less memory and new threads created fast. Cons: modules need to support thread, crash of module can crash all threads of all OS process (but this it not important for you because you are using apache as reverse proxy only). Other cons: CPU switching context when switching between threads.
So yes, worker much better than prefork in your case, but...
But we have Nginx :) Nginx using other model (btw, Apache has event MPM too). In this case you has only one process (well, can be few processes, see below). How it works. New request rising special event, OS process waking up, receive request, prepare answer, write answer and gone sleep.
You can say "wow, but this is not multitasking" and will be right. But one big difference between this model and simple sequentially request processing. What happens if you need write big data to slow client? In synchronous way your process need to wait acknowledging about data receiving and only after - process new request. Nginx and Apache event model use asynchronous model. Nginx tell to OS to send some piece of data write this data to OS buffer and... gone sleep, or process new requests. When OS will send piece of data - special event will be sent to nginx. So, main difference - Nginx do not wait I/O (like connect, read, write), Nginx tell to OS that he want and OS send event to Nginx than this task ready (socket connected, data written or new data ready to read in local buffer). Also, modern OS can work asynchronously with HDD (read/write) and even can send files from HDD to tcp socket directly.
Sure, all math operations in this Nginx process will block this process and its stop to process new and existing requests. But when main workflow is work with network (reverse proxy, forward requests to FastCGI or other backend server) plus send static files (asynchronous too) - Nginx can serve thousands simultaneous requests in one OS process! Also, because one process of OS (and one thread) - CPU will execute it in one context.
How I told before - Nginx can start few OS processes and each of this process will be assigned by OS to separate CPU core. Almost no reasons to fork more Nginx OS processes (there is only one reason to do it: if you need to do some blocking operations, but simple reverse proxy with backend balancing - not this case)
So, pros: less CPU context switching, less memory (comparing with worker MPM too), fast connection processing. More pros: Nginx created as HTTP load balancer and have lot of options for it (and even more in commercial Nginx Plus). Cons: If you need some hard math inside OS process, this process will be blocked (but all you math in Tomcat, so Nginx only balancer).
PS: typo fix will come later, out of time. Also, my English bad, so fixes always welcome :)
PPS: Answer question about number of TC thread, asked in comments (was too long for post as comment):
Best way to know it - test it using stress loading tools. Because this number depend on application profile. Response time is not good enough to help answer. Because, for example, big difference between 200ms of 100% math (100% cpu bound) vs 50ms of math + 150ms of sleep waiting database answer.
If application is 100% CPU bound - probably one thread per one core, but in real cases all applications also spent some time in I/O (receive request, send answer to client).
If application work with I/O and need to wait for answers from other services (database, for example), this application spends some time in sleep state and CPU can process other tasks.
So best solution to create number of requests close to real load and run stress test increasing number of concurrent requests (and number of TC workers for sure). Find acceptable response time and fix this number of threads. Sure, need to check before that it is not database fault.
Sure, here I'm talking about dynamic content only, requests for static files from disk must be processed before tomcat (by Nginx, for example).

The Node.js event loop - nginx/apache

Both nginx and Node.js have event loops to handle requests. I put nginx in front of Node.js as has been recommended here
Using Node.js only vs. using Node.js with Apache/Nginx
with the setup shown here
Node.js + Nginx - What now?
How do the two event loops play together? Is there any risk of conflicts between the two? I wonder because Nginx may not be able to handle as many events per second as Node.js or vice versa. For example, if Nginx can handle 1000 events per second but node.js only 500, won't that cause issues? (I have no idea if 1000,500 are reasonable orders of magnitude, you could correct me on that.)
What about putting Apache in front of Node.js? Apache has no event loop. Just threads. So won't putting Apache in front of Node.js defeat the purpose?
In this 2010 talk, Node.js creator Ryan Dahl had vision to get rid of nginx/apache/whatever entirely and make node talk directly to the internet. When do you think this will be reality?
Both nginx and Node use an asynchronous and event-driven approach. The communication between them will go more or less like this:
nginx receives a request
nginx forwards the request to the Node process and immediately goes back to wait for more requests
Node receives the request from nginx
Node handles the request with minimal CPU usage, until at some point it needs to issue one or more I/O requests (read from a database, write the response, etc). At this point it launches all these I/O requests and goes back to wait for more requests.
The above can repeat lots of times. You could have hundreds of thousands of requests all in a non-blocking wait state where nginx is waiting for Node and Node is waiting for I/O. And while this happens both nginx and Node are ready to accept even more requests!
Eventually async I/O started by the Node process will complete and a callback function will get invoked.
If there are still I/O requests that haven't completed for this request, then Node goes back to its loop one more time. It can also happen that once an I/O operation completes this data is consumed by the Node callback and then new I/O needs to happen, so Node can start more async I/O requests before going back to the loop.
Eventually all I/O operations started by Node for a particular request will be complete, including those that write the response back to nginx. So Node ends this request, and then as always goes back to its loop.
nginx receives an event indicating that response data has arrived for a request, so it takes that data and writes it back to the client, once again in a non-blocking fashion. When the response has been written to the client and event will trigger and nginx will then end the request.
You are asking about what would happen if nginx and Node can handle a different number of maximum connections. They really don't have a maximum, the maximum in general comes from operating system configuration, for example from the maximum number of open handles the system can have at a time or the CPU throughput. So your question does not really apply. If the system is configured correctly and all processes are I/O bound, neither nginx or Node will ever block.
Putting Apache in front of Node will only work well if you can guarantee that your Apache never blocks (i.e it never reaches its maximum connection limit). This is hard/impossible to achieve for large number of connections, because Apache uses an individual process or thread for each connection. nginx and Node scale really well, Apache does not.
Running Node without another server in front works fine and it should be okay for small/medium load sites. The reason putting a web server in front of it is preferred is that web servers like nginx come with features that Node does not have and you would need to implement yourself. Things like caching, load balancing, running multiple apps from the same server, etc.
I think your questions have been largely covered by some of the others answers, but there are a few pieces missing, and some that I disagree with, so here are mine:
The event loops are isolated from each other at the process level, but do interact. The issues you're most likely to encounter are around the configuration of nginx response buffers, chunked data, etc. but this is optimisation rather than error resolution.
As you point out, if you use Apache you're nullifying the benefit of using Node.js, i.e. massive concurrency and websockets. I wouldn't recommend doing that.
People are already using Node.js at the front of their stack. Searching for benchmarks returns some reasonable-looking results in Node's favour, so performance to my mind isn't an issue. However, there are still reasons to put Nginx in front of Node.
Security - Node has been given increasing scrutiny, but it's still young. You may not have problems here, but caution is often your friend.
Training - Ops staff that you hire will know how to manage Nginx, but the configuration and management of your custom Node app will only ever be understood by those people your developers successfully communicate it to. In some companies this is nobody.
Operational Flexibility - If you reach scale you might want to split out the serving of static content, purely to reduce the load on your app servers. You might want to split content amongst different domains and have it managed separately, or have different SSL or proxying behaviour for different domains or URL patterns. These are the things that are easy for Ops guys to configure in Nginx, but you'd have to code manually in a Node app.
The event loops are independent. Event loops are implemented at the application level, so neither cares what sort of architecture the other uses.
NodeJS is good at many things, but there are some places where it still falters. Once example is serving static files. At the moment, nodejs performs fairly poorly in this test, so having a dedicated web server for your static files greatly improves response time. Also, nodejs is still in its infancy, and has not been "tested and hardened" in the matters of security like Apache on nginX.
It'll take a long time for people to consider fronting nodejs all by itself. The cluster module is a step in the right direction, but it'll take a long time even after it reaches v1 before it happens.
Both event loops are unrelated. They don't play together.
Yes, it is pretty useless. Apache is not a load balancer.
What Ryan Dahl said may be applicable already. The limit of concurrent users is definitely higher than that of Apache. Before node.js websites with fair amount of concurrent users had to use nginx to balance the load. For small to medium sized businesses it can be done with node.js alone. But ruling out nginx completely will take time. Let node.js be stable before it can follow this ambitious dream.

What specifically makes Node.js more scalable than Apache?

To be honest I've not understood it completely yet - and I even do understand how Node.js works, as a single thread using the event model. I just don't get how this is better than Apache, and how it scales horizontally if it's single-threaded.
I've found that this blog post by Tomislav Capan explains it very well:
Why The Hell Would I Use Node.js? A Case-by-Case Introduction
My interpretation of the gist of it, for Node 0.10, compared to Apache:
The good parts
Node.js avoids spinning up threads for each request, or does not need to handle pooling of requests to a set of threads like Apache does. Therefore it has less overhead to handle requests, and excels at responding quickly.
Node.js can delegate execution of the request to a separate component, and focus on new requests until the delegated component returns with the processed result. This is asynchronous code, and is made possible by the eventing model. Apache executes requests in serial within a pool, and cannot reuse the thread when one of its modules is simply waiting for a task to complete. Apache will then queue requests until a thread in the pool becomes available again.
Node.js talks JavaScript and is therefore very fast in passing through and manipulating JSON retrieved from external web API sources like MongoDB, reducing time needed per request. Apache modules, like PHP, may need more time, because they cannot efficiently parse and manipulate JSON because they need marshalling to process the data.
The bad parts
Note: most of the bad parts listed below will be improved with the upcoming version 0.12, something to keep aware of.
Node.js sucks at computational intensive tasks, because whenever it does something long running, it will queue all other incoming requests, due to its single thread. Apache will generally have more threads available, and the OS will neatly and fairly schedule CPU time between these threads, still allowing new threads to be handled, albeit a bit slower. Except when all available threads in Apache are handling requests, then Apache will also start queueing requests.
Node.js doesn't fully utilize multi-core CPUs, unless you make a Node.js cluster or spin up child processes. Ironically, if you do the latter two, you may add more orchestrating overhead, the same issue that Apache has. Logically you could also spin up more Node.js processes, but this is not managed by Node.js. You would have to test your code to see what works better; 1) multi-threading from within Node.js with clusters and child processes, or 2) multiple Node.js processes.
Mitigations
All server platforms have an upper limit. Node.js and Apache both will reach it at some point.
Node.js will reach it the fastest when you have heavy computational tasks.
Apache will reach it the fastest when you throw tons of small requests at it that require long serial execution.
Three things you could do to scale the throughput of Node.js
Utilize multi-core CPUs, by either setting up a cluster, use child processes, or use a multi-process orchestrator like Phusion Passenger.
Setup worker roles connected with a message queue. This will be the most effective solution against computational intensive long running requests; off-load them to a worker farm. This will split up your servers in two parts; 1) public facing clerical servers that accept requests from users, and 2) private worker servers handling long running tasks. Both are connected with a message queue. The clerical servers add messages (incoming long-running requests) to the queue. The worker roles listen for incoming messages, handle those, and may return the result into the message queue. If request/response is needed, then the clerical server could asynchronously wait for the response message to arrive in the message queue. Examples of message queues are RabbitMQ and ZeroMQ.
Setup a load balancer and spin up more servers. Now that you efficiently use hardware and delegate long running tasks, you can scale horizontally. If you have a load balancer, you can add more clerical servers. Using a message queue, you can add more worker servers. You could even set this up in the cloud so that you could scale on demand.
It depends on how you use it. Node.js is single threaded by default, but using the (relatively) new cluster module you can scale horizontally across multiple threads.
Furthermore, your database needs will also dictate how effective scaling is with node. For example, using MySQL with node.js won't get you nearly as much benefit as using MongoDB, because of the event driven nature of both MongoDB and node.js.
The following link has a lot of nice benchmarks of systems with different setups:
http://www.techempower.com/benchmarks/
Node.js doesn't rank the highest but compared to other setups using nginx (no apache on their tables, but close enough) it does pretty well.
Again though, it highly depends on your needs. I believe if you are simply serving static websites it is recommend you stick with a more traditional stack. However people have done some amazing things with node.js for other needs: http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/ (c10k? ha!)
Edit: It is worth mentioning that you really aren't 'replacing' just apache with node.js. You would be replacing apache AND php (in a typical lamp stack).