How to build up a parameter server? - tensorflow

In distributed tensorflow, does a parameter server need to be built as a tensorflow server with both master service and worker service ? If the answer is yes, then can a ps machine also be a worker ?

In distributed TensorFlow, every worker is also a master. Furthermore, TensorFlow runtime has only one kind of worker, unlike its predecessor DistBelief, which had specialized parameter server workers.
You implement traditional parameter server architecture by using some workers to store parameters and others to execute session.run requests.

Related

How do you manage 100 virtual servers for inference?

I need to boot 100 virtual servers and use each for tensorflow model inference for 30 days. What are some tools to do this?
I currently boot the servers using an image, and open two tmux sessions manually. One session is for the model client and the other is for the tensorflow server. I receive a slack notification if any of the server CPUs stop working as a way to know if a server fails (I also manually SSH to debug/restart the server).
Would appreciate any tips!

Servers and sessions

I'm learning about distributed TensorFlow applications, and I understand jobs, tasks, and servers. A server's target identifies its gRPC location, as in grpc://localhost:1234.
I don't understand what happens when you create a session with a server's target, as in the following code:
with tf.Session(server.target) as sess:
...
The documentation states that server.target identifies the session's execution engine. Another page says that the constructor creates a session on the server. This isn't clear to me. How exactly does a server's target affect the session's execution?

How to construct remote TenosrFlow session without starting a Server?

How can I construct a TensorFlow session that connects to an existing set of TF servers?
Following the distributed TensorFlow guide I started a bunch of TensorFlow servers that are connected to form a cluster.
I would now like to start a session that can connect to those TF servers and assign ops to them. I assume I just need to specify an appropriate target in the constructor of tf session; e.g something like
with tf.Session(
target, config=tf.ConfigProto(log_device_placement=True)) as sess:
However its not clear to me how to construct a target object pointing at any existing cluster of TF servers. The docs only show how to get the cluster spec from the server by calling server.target.
Do I need to start another server in order to construct a client that talks to the existing servers?
I want to connect to the TF cluster remotely. My TF servers are running on GCE VMs. I want to connect and assign ops to them from my local machine. Is this possible?
For the target you can just use a string of the form
grpc://<host>:<port>
Where host and port are the hostname (or ip address) and port of one of the TF servers comprising the cluster. It doesn't matter which one because where an op executes depends on the device you assigned it to when you constructed the graph.

What exactly is a pre-fork web server model?

I want to know what exactly it means when a web server describes itself as a pre-fork web server. I have a few examples such as unicorn for ruby and gunicorn for python.
More specifically, these are the questions:
What problem does this model solve?
What happens when a pre-fork web server is initially started?
How does it handle requests?
Also, a more specific question for unicorn/gunicorn:
Let's say that I have a webapp that I want to run with (g)unicorn. On initialization, the webapp will do some initialization stuff (e.g. fill in additional database entries). If I configure (g)unicorn with multiple workers, will the initialization stuff be run multiple times?
Pre-forking basically means a master creates forks which handle each request. A fork is a completely separate *nix process.
Update as per the comments below. The pre in pre-fork means that these processes are forked before a request comes in. They can however usually be increased or decreased as the load goes up and down.
Pre-forking can be used when you have libraries that are NOT thread safe. It also means issues within a request causing problems will only affect the process which they are processed by and not the entire server.
The initialisation running multiple times all depends on what you are deploying. Usually however connection pools and stuff of that nature would exist for each process.
In a threading model the master would create lighter weight threads to dispatch requests too. But if a thread causes massive issues it could have repercussions for the master process.
With tools such as Nginx, Apache 2.4's Event MPM, or gevent (which can be used with Gunicorn) these are asynchronous meaning a process can handle hundreds of requests whilst not blocking.
How does a "pre-fork worker model" work?
Master Process: There is a master process that spawns and kills workers, depending on the load and the capacity of the hardware. More incoming requests would cause the master to spawn more workers, up to a point where the "hardware limit" (e.g. all CPUs saturated) is reached, at which point queing will set in.
Workers: A worker can be understood as an instance of your application/server. So if there are 4 workers, your server is booted 4 times. It means it occupies 4 times the "Base-RAM" than only one worker would, unless you do shared memory wizardry.
Initialization: Your initialization logic needs to be stable enough to account for multiple servers. For example, if you write db entries, check if they are there already or add a setup job before your app server
Pre-fork: The "pre" in prefork means that the master always adds a bit more capacity than currently required, such that if the load goes up the system is "already ready". So it preemptively spawns some workers. For example in this apache library, you control this with the MinSpareServers property.
Requests: The requests (TCP connection handles) are being passed from the master process to the children.
What problem do pre-fork servers solve?
Multiprocessing: If you have a program that can only target one CPU core, you potentially waste some of your hardware's capacity by only spawning one server. The forked workers tackle this problem.
Stability: When one worker crashes, the master process isn't affected. It can just spawn a new worker.
Thread safety: Since it's really like your server is booted multiple times, in separate processes, you don't need to worry about threadsafety (since there are no threads). This means it's an appropriate model when you have non-threadsafe code or use non-threadsafe libs.
Speed: Since the child processes aren't forked (spawned) right when needed, but pre-emptively, the server can always respond fast.
Alternatives and Sidenotes
Container orchestration: If you're familiar with containerization and container orchestration tools such as kubernetes, you'll notice that many of the problems are solved by those as well. Kubernetes spawns multiple pods for multiprocessing, it has the same (or better) stability and things like "horizontal pod autoscalers" that also spawn and kill workers.
Threading: A server may spawn a thread for each incoming request, which allows for many requests being handled "simultaneously". This is the default for most web servers based on Java, since Java natively has good support for threads. Good support meaning the threads run truly parallel, on different cpu cores. Python's threads on the other hand cannot truly parallelize (=spread work to multiple cores) due to the GIL (Global Interpreter Lock), they only provide a means for contex switching. More on that here. That's why for python servers "pre-forkers" like gunicorn are so popular, and people coming from Java might have never heard of such a thing before.
Async / non-blocking processing: If your servers spend a lot of time "waiting", for example disk I/O, http requests to external services or database requests, then multiprocessing might not be what you want. Instead consider making your code "non-blocking", meaning that it can handle many requests concurrently. Async / await (coroutines) based systems like fastapi (asgi server) in python, Go or nodejs use this mechanism, such that even one server can handle many requests concurrently.
CPU bound tasks: If you have CPU bound tasks, the non-blocking processing mentioned above won't help much. Then you'll need some way of multiprocessing to distribute the load on your CPU cores, as the solutions mentioned above, that is: container orchestration, threading (on systems that allow true parallelization) or... pre-forked workers.
Sources
https://www.reddit.com/r/learnprogramming/comments/25vdm8/what_is_a_prefork_worker_model_for_a_server/
https://httpd.apache.org/docs/2.4/mod/prefork.html

Cluster-wide singleton in Websphere Cluster

I need to run a component using Apache Camel (or Spring Integration) under WAS ND 8.0 cluster. They both run some threads on startup, and stop them on shutdown normally. No problem to supply WAS managed threadpool. But that threads must run on single cluster's node at the same time. Moreover it must be high-available i.e. switch to other node when active node falls.
Solution I found - is WAS Partitioning Facility. It requires additional Extended Deployment licenses. Is it the only way, or there is some way to implement this using Network Deployment license only?
Thanks in advance.
I think that there is not a feature that address this interesting requirement.
I can imagine a "trick":
A Timer EJB send a message on a queue (let's say 1 per minute)
Configure a Service Integration Bus (SIB) with High Availability and No Scalability, so the HA Manager ensure that only one messaging engine (ME) is alive.
Create a non-reliable queue for high performances and low resource consumption.
The Activation Spec should be configured to listen only local ME.
A MDB implement the following logic: when the message arrives, it check if the singleton thread is alive, otherwise it start the thread.
Does it make sense?