Is netty's udp running in single-threaded mode? - udp

Is netty's udp running in single-threaded mode?
I have configured the NioDatagramChannelFactory like below:
new NioDatagramChannelFactory(Executors.newFixedThreadPool(4), 4));
But, when I running the code as a server, and lunch more than 20 clients to send udp packages continuously to it, the server still use only one worker thread.
Why?

Normally it should use 4 worker threads here. So how are you seeing that it only use one thread ? Did you check with jstack to see how many worker threads are running ?
You should also use
new NioDatagramChannelFactory(Executors.newCachedThreadPool(), 4));
This should take care of having at max. 4 worker threads.

Related

Does a port in supervisor.slots.ports translate to one physical JVM?

From the storm docs:
supervisor.slots.ports: "For each worker machine, you configure how many workers run on that machine with this config. Each worker uses a single port for receiving messages, and this setting defines which ports are open for use. If you define five ports here, then Storm will allocate up to five workers to run on this machine."
And from the storm concepts:
Workers: Topologies execute across one or more worker processes. Each worker process is a physical JVM and executes a subset of all the tasks for the topology.
My storm.yaml defines:
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
And then I run a topology with topology.workers set to 3 (kafka-spout-parallelism set to 1 and solr-bolt-parallelism set to 2, no other bolts).
Storm-ui also shows that my topology is running fine with 1 spout and 2 bolts.
But when I login to the storm machines and run ps -aef | grep storm or jps -l, I do not see the JVM processes for workers anywhere. Only processes I see are:
Machine 1:
jps -l
30675 backtype.storm.daemon.supervisor
30583 backtype.storm.daemon.logviewer
Machine 2:
jps -l
6818 backtype.storm.ui.core
6995 backtype.storm.daemon.supervisor
6739 backtype.storm.daemon.nimbus
6904 backtype.storm.daemon.logviewer
Does storm not create one physical JVM per worker? And does that not translate to one JVM per port mentioned in supervisor.slots.ports?
One worker slot equates to one JVM slot. Slots are only taken up when a topology is deployed.
It's possible you checked the processes before the workers actually started. Check the Storm UI to make sure the topology is up, running, and processing data. If it's not, then use the log viewer to look for errors. It is possible the workers are crashing due to uncaught exceptions.

Handling Http Request with Apache2 (or Nginx). Does a new process gets created for each or a set of N requests?

Will a web server (WS) (like apache2 or nginix (or container like tomcat(TC)) create a new process to handle incoming request. My concern is about servers that support high number of parallel users (say 20K+ parallel users).
I think load balancing happens on the other side of web server (if it is used to front Tomcat etc). So in theory, a single web server should be accepting all the (20K+)incoming request before it can distribute the load to other servers backing it.
So, the questions is: Does Web Server (WS) handle all these requests in a single process or it smartly spawns other process to help share the work (i know the "client - server" binding happens though - client_host:random_port plus server_host:fixed_port).
Reference: Prior to reading this article:Fronting Tomcat with Apache I was thinking it is a single process doing all the smart work. But in this article there is mentioning of MPM (Multi-Processing Module)
It combines the best from two worlds, having a set of child processes each having a set of separate threads. There are sites that are running 10K+ concurrent connections using this technology.
And as it goes, it is getting more sophisticated as threads also being spawned like mentioned above. (these are not the tomcat threads that serve each individual request by calling the service method, but these are threads on Apache WS to handle request and distribute them to nodes for processing).
If any one used MPM. Little further explanation of how all this works will be great.
Questions like -
(1) As child processes are spawned what is it exact role. Is the child process just for mediating the request to tomcat or any thing more. If so, then after the child process gets response from TC, does the child process forward the response to parent process or directly to the client (since it can know the client_host:random_port from parent process. I am not sure if this is allowed in theory, though the child process can not accept any new request as the fixed_port which can bind to only one process is already tied to parent process.
(2) What kind of load is shared to thread by the child or parent process. Again it must almost be same as in (1). But what I am not sure is that even in theory if a thread can directly send the request to client.
Apache historically use prefork model of processing. In this model each request == separate operation system (OS) process. It's calling "prefork" because Apache fork some spare processes and process request within. If number of preforked processes not enough - Apache fork new. Pros: process can execute other modules or processes and not care that they do; cons: each request = one process, too much memory used and OS fork also can be slow for your requests.
Other model of Apache - worker MPM. Almost same as prefork, but using not OS processes but OS threads. Thread - it's like lightweight process. One OS process can run many threads using one memory space. Worker MPM used much less memory and new threads created fast. Cons: modules need to support thread, crash of module can crash all threads of all OS process (but this it not important for you because you are using apache as reverse proxy only). Other cons: CPU switching context when switching between threads.
So yes, worker much better than prefork in your case, but...
But we have Nginx :) Nginx using other model (btw, Apache has event MPM too). In this case you has only one process (well, can be few processes, see below). How it works. New request rising special event, OS process waking up, receive request, prepare answer, write answer and gone sleep.
You can say "wow, but this is not multitasking" and will be right. But one big difference between this model and simple sequentially request processing. What happens if you need write big data to slow client? In synchronous way your process need to wait acknowledging about data receiving and only after - process new request. Nginx and Apache event model use asynchronous model. Nginx tell to OS to send some piece of data write this data to OS buffer and... gone sleep, or process new requests. When OS will send piece of data - special event will be sent to nginx. So, main difference - Nginx do not wait I/O (like connect, read, write), Nginx tell to OS that he want and OS send event to Nginx than this task ready (socket connected, data written or new data ready to read in local buffer). Also, modern OS can work asynchronously with HDD (read/write) and even can send files from HDD to tcp socket directly.
Sure, all math operations in this Nginx process will block this process and its stop to process new and existing requests. But when main workflow is work with network (reverse proxy, forward requests to FastCGI or other backend server) plus send static files (asynchronous too) - Nginx can serve thousands simultaneous requests in one OS process! Also, because one process of OS (and one thread) - CPU will execute it in one context.
How I told before - Nginx can start few OS processes and each of this process will be assigned by OS to separate CPU core. Almost no reasons to fork more Nginx OS processes (there is only one reason to do it: if you need to do some blocking operations, but simple reverse proxy with backend balancing - not this case)
So, pros: less CPU context switching, less memory (comparing with worker MPM too), fast connection processing. More pros: Nginx created as HTTP load balancer and have lot of options for it (and even more in commercial Nginx Plus). Cons: If you need some hard math inside OS process, this process will be blocked (but all you math in Tomcat, so Nginx only balancer).
PS: typo fix will come later, out of time. Also, my English bad, so fixes always welcome :)
PPS: Answer question about number of TC thread, asked in comments (was too long for post as comment):
Best way to know it - test it using stress loading tools. Because this number depend on application profile. Response time is not good enough to help answer. Because, for example, big difference between 200ms of 100% math (100% cpu bound) vs 50ms of math + 150ms of sleep waiting database answer.
If application is 100% CPU bound - probably one thread per one core, but in real cases all applications also spent some time in I/O (receive request, send answer to client).
If application work with I/O and need to wait for answers from other services (database, for example), this application spends some time in sleep state and CPU can process other tasks.
So best solution to create number of requests close to real load and run stress test increasing number of concurrent requests (and number of TC workers for sure). Find acceptable response time and fix this number of threads. Sure, need to check before that it is not database fault.
Sure, here I'm talking about dynamic content only, requests for static files from disk must be processed before tomcat (by Nginx, for example).

Celery workers missing heartbeats and getting substantial drift over Ec2

I am testing my celery implementation over 3 ec2 machines right now. I am pretty confident in my implementation now, but I am getting problems with the actual worker execution. My test structure is as follows:
1 ec2 machine is designated as the broker, also runs a celery worker
1 ec2 machine is designated as the client (runs the client celery script that enqueues all the tasks using .delay(), also runs a celery worker
1 ec2 machine is purely a worker.
All the machines have 1 celery worker running. Before, I was immediately getting the message:
"Substantial drift from celery#[other ec2 ip] may mean clocks are out of sync."
A drift amount in seconds would then be printed, which would increase over time.
I would also get messages : "missed heartbeat from celery#[other ec2 ip].
The machine would be doing very little work at this point, so my AutoScaling config in ec2 would shut down the instance automatically once it got to cpu utilization levels very low (<5%)
So to try to solve this problem, i attempted to sync all my machine's clocks (although I thought celery handled this) with this command, which was performed upon start up for all machines:
apt-get -qy install ntp
service ntp start
With this, they all performed well for about 10 minutes with no hitches, after which I started getting missed heartbeats and my ec2 instances stalled and shut down. The weird thing is, the drift increased and then decreased sometimes.
Any idea on why this is happening?
I am using the newest version of celery (3.1) and rabbitmq
EDIT: It should be noted that I am utilizing us-west-1a and us-west-1c availability zones on ec2.
EDIT2: I am starting to think memory problems might be an issue. I am using a t2.micro instance, and running 3 celery workers on the same machine (only 1 instance) which is also the broker, still cause heartbeat misses and stalls.

What exactly is a pre-fork web server model?

I want to know what exactly it means when a web server describes itself as a pre-fork web server. I have a few examples such as unicorn for ruby and gunicorn for python.
More specifically, these are the questions:
What problem does this model solve?
What happens when a pre-fork web server is initially started?
How does it handle requests?
Also, a more specific question for unicorn/gunicorn:
Let's say that I have a webapp that I want to run with (g)unicorn. On initialization, the webapp will do some initialization stuff (e.g. fill in additional database entries). If I configure (g)unicorn with multiple workers, will the initialization stuff be run multiple times?
Pre-forking basically means a master creates forks which handle each request. A fork is a completely separate *nix process.
Update as per the comments below. The pre in pre-fork means that these processes are forked before a request comes in. They can however usually be increased or decreased as the load goes up and down.
Pre-forking can be used when you have libraries that are NOT thread safe. It also means issues within a request causing problems will only affect the process which they are processed by and not the entire server.
The initialisation running multiple times all depends on what you are deploying. Usually however connection pools and stuff of that nature would exist for each process.
In a threading model the master would create lighter weight threads to dispatch requests too. But if a thread causes massive issues it could have repercussions for the master process.
With tools such as Nginx, Apache 2.4's Event MPM, or gevent (which can be used with Gunicorn) these are asynchronous meaning a process can handle hundreds of requests whilst not blocking.
How does a "pre-fork worker model" work?
Master Process: There is a master process that spawns and kills workers, depending on the load and the capacity of the hardware. More incoming requests would cause the master to spawn more workers, up to a point where the "hardware limit" (e.g. all CPUs saturated) is reached, at which point queing will set in.
Workers: A worker can be understood as an instance of your application/server. So if there are 4 workers, your server is booted 4 times. It means it occupies 4 times the "Base-RAM" than only one worker would, unless you do shared memory wizardry.
Initialization: Your initialization logic needs to be stable enough to account for multiple servers. For example, if you write db entries, check if they are there already or add a setup job before your app server
Pre-fork: The "pre" in prefork means that the master always adds a bit more capacity than currently required, such that if the load goes up the system is "already ready". So it preemptively spawns some workers. For example in this apache library, you control this with the MinSpareServers property.
Requests: The requests (TCP connection handles) are being passed from the master process to the children.
What problem do pre-fork servers solve?
Multiprocessing: If you have a program that can only target one CPU core, you potentially waste some of your hardware's capacity by only spawning one server. The forked workers tackle this problem.
Stability: When one worker crashes, the master process isn't affected. It can just spawn a new worker.
Thread safety: Since it's really like your server is booted multiple times, in separate processes, you don't need to worry about threadsafety (since there are no threads). This means it's an appropriate model when you have non-threadsafe code or use non-threadsafe libs.
Speed: Since the child processes aren't forked (spawned) right when needed, but pre-emptively, the server can always respond fast.
Alternatives and Sidenotes
Container orchestration: If you're familiar with containerization and container orchestration tools such as kubernetes, you'll notice that many of the problems are solved by those as well. Kubernetes spawns multiple pods for multiprocessing, it has the same (or better) stability and things like "horizontal pod autoscalers" that also spawn and kill workers.
Threading: A server may spawn a thread for each incoming request, which allows for many requests being handled "simultaneously". This is the default for most web servers based on Java, since Java natively has good support for threads. Good support meaning the threads run truly parallel, on different cpu cores. Python's threads on the other hand cannot truly parallelize (=spread work to multiple cores) due to the GIL (Global Interpreter Lock), they only provide a means for contex switching. More on that here. That's why for python servers "pre-forkers" like gunicorn are so popular, and people coming from Java might have never heard of such a thing before.
Async / non-blocking processing: If your servers spend a lot of time "waiting", for example disk I/O, http requests to external services or database requests, then multiprocessing might not be what you want. Instead consider making your code "non-blocking", meaning that it can handle many requests concurrently. Async / await (coroutines) based systems like fastapi (asgi server) in python, Go or nodejs use this mechanism, such that even one server can handle many requests concurrently.
CPU bound tasks: If you have CPU bound tasks, the non-blocking processing mentioned above won't help much. Then you'll need some way of multiprocessing to distribute the load on your CPU cores, as the solutions mentioned above, that is: container orchestration, threading (on systems that allow true parallelization) or... pre-forked workers.
Sources
https://www.reddit.com/r/learnprogramming/comments/25vdm8/what_is_a_prefork_worker_model_for_a_server/
https://httpd.apache.org/docs/2.4/mod/prefork.html

Apache webserver - What happens to requests, when all worker threads are busy

As far as I researched, the scenario when all worker threads are busy serving requests, what happens to the requests that comes next.
Do they wait?
Is this related to some configurable parameters?
Can I get the count of such requests ?
Adding to this please can you explain or give a link where I can get a clear picture of request processing strategy of Apache webserver?
Thanks for Looking at!!
When all Apache worker threads are busy, the new request is stalled (it waits) until one of those worker threads is available. If the client gives up waiting, or you surpass the maximum wait time in your configuration file; it will drop the connection.
This answer is given in 2015. So I talk about apache httpd 2.4.
They wait because the connection is queued on the TCP socket (the connection is not ACKed) Although the default length of the backlog may be set way too high on linux boxes. This may result in connections being closed due to kernel limits being in place.
ListenBacklog (with caveats. See 1.)
This is described here. With lots of interesting stuff.
Read through Apache TCP Backlog by Ryan Frantz to get the glory details about the Apache backlog.