Are there any way to do federated learning with real multiple machines using tensorflow-federated API? - tensorflow

I am studying about tensorflow-federated API to make federated learning with real multiple machines.
But I found the answer on this site that not support to make real multiple federated learning using multiple learning.
Are there no way to make federated learning with real multiple machines?
Even I make a network structure for federated learning with 2 clients PC and 1 server PC, Is it impossible to consist of that system using tensorflow federated API?
Or even if I apply the code, can't I make the system I want?
If you can modify the code to configure it, can you give me a tip?If not, when will there be an example to configure on a real computer?

In case you are still looking for something: If you're not bound to TensorFlow, you could have a look at PySyft, which is using PyTorch. Here is a practical example of a FL system built with one server and two Raspberry Pis as clients.

TFF is really about expressing the federated computations you wish to execute. In terms of physical deployments, TFF includes two distinct runtimes: one "reference executor" which simply interprets the syntactic artifact that TFF generates, serially, all in Python and without any fancy constructs or optimizations; another still under development, but demonstrated in the tutorials, which uses asyncio and hierarchies of executors to allow for flexible executor architectures. Both of these are really about simulation and FL research, and not about deploying to devices.
In principle, this may address your question (in particular, see tff.framework.RemoteExecutor). But I assume that you are asking more about deployment to "real" FL systems, e.g. data coming from sources that you don't control. This is really out of scope for TFF. From the FAQ:
Although we designed TFF with deployment to real devices in mind, at this stage we do not currently provide any tools for this purpose. The current release is intended for experimentation uses, such as expressing novel federated algorithms, or trying out federated learning with your own datasets, using the included simulation runtime.
We anticipate that over time the open source ecosystem around TFF will evolve to include runtimes targeting physical deployment platforms.

Related

How to transfer data between clients using Tensorlow Federated

I'm planning to develop a decentralized federated learning simulation environment using Tensorflow Federated API, so nodes are able to coordinate themselves to obtain data without the interference of the server.
Is there any function on tff API which manages data transfers between specific clients? I only see functions that move data from server to client (tff.federated_broadcast()) and client to server (tff.federated_collect()).
There are several ways to interpret this question I think, but in all of them the answer is no. All client-to-client communication in TFF must currently be intermediated by the server.
A little more detail: TFF provides no intrinsics which model 'information exchange' between clients, though in principle such a thing can exist. That is, TFF is designed in such a way that this could be added. However, there are no plans to pursue this path in the immediate future.

single mangement system covers several ML frameworks

Question: is there any open source project which covers all ML framework management in a single system?
Scenario Description: in some education scenario, many studies and teachers would like to use different ML frameworks such as Tensorflow, Caffe, Mxnet, etc. It's hard for environment guys to prepare all of them one by one.
Maybe you can use the AWS Deep Learning AMI. The AMI has all the frameworks you mentioned pre-installed for you.
The AMI itself is free of cost. You only pay for the EC2 instances you use.

Is it possible to use TensorFlow Serving with distributed TensorFlow cluster to improve throughput/latency?

I'm looking into ways to improve latency and/or throughput of a TensorFlow Serving instance. I've seen the "Serving Inception" manual and three GitHub Issues (2, 3, 4), but all of them seem to create a separate instance of TensorFlow Serving per server and then choosing server on client. Issue 4 is actually about adding some load balancer in front of that stuff, which is currently absent in TensorFlow Serving itself.
However, there is also "Distributed TensorFlow" tutorial which shows how to join a set of machines into a fixed cluster and then manually "pin" some computations to some machines, which can improve both latency and throughput if model is "wide" and can be parallelized well. However, I do not see any mentions of combining this with TensorFlow Serving in either documentation.
Question is: is it possible to configure TensorFlow Serving to use distributed TensorFlow cluster?
I was able to make it create and use gRPC sessions (instead of local) with some hacks:
Make tensorflow/core/distributed_runtime/rpc:grpc_session target publicly visible (it's internal to tensorflow package by default) by modifying BUILD file.
Add it as a dependency to the tensorflow_serving/model_servers:tensorflow_model_server target.
Add an extra flag to tensorflow_model_server called --session_target which sets up session_bundle_config.session_target() in main.cc.
Run the binary with --session_target=grpc://localhost:12345, where localhost:12345 is an arbitrary node which will be used to create master sessions.
See my cluster performing some computations on behalf of TensorFlow Serving.
However, this set of hacks does not look enough for "real-world usage" for three reasons:
grpc_session target is probably internal for a reason.
As noticed in my other question, distributed TensorFlow works better when computations are manually "pinned" to specific machines. So, if we use TensorFlow Serving, we need a way to save those "pins" and model's structure becomes tied with cluster's structure. I'm not sure whether this information is exported with Exporter/Saver at all.
tensorflow_model_server creates session once - during bootstrap. If master node of the cluster goes down and then restores, serving server still holds the "old" session and cannot process further requests.
All in all, it looks like this scenario is not officially supported yet, but I'm not sure.
If your model fits into single machine, then it's hard to see how distributing it over many machines will improve throughput. Essentially you are taking computations which can be done independently and adding a dependency. If one of your machines is slow or crashes, instead of making some queries slow, it will make all queries sow.
That said, it's worth benchmarking to see if it helps, in which case it would make sense to ask for this use-case to be officially supported.
Regarding questions:
Worker assignments are done through device field in graph .pbtxt. Some importers/exporters clear those assignments and have clear_devices flag. You could open graph definition (.pbtxt file or equivalently, str(tf.get_default_graph().as_graph_def(), and grep for device strings to check)
If any worker restarts, or there's some temporary network connectivity your sess.run fails with error (Unavailable) and you need to recreate the session. This is handled automatically by MonitoredTrainingSession in tf.train, but you need to handle this yourself with serving.
If your model is not using images, or is not entirely too large, you shouldn't need too much compute for each inference/serve, and I'm saying this using Inception-v# which takes ~1 sec to serve a response to an image on a Google Cloud Platform n1-standard-1 machine.
Now that being said, perhaps its the throughput that you need to scale up and that is a different problem. Your best option for scale at that point would be to use Docker Swarm & Compose, as well as Kubernetes to help scale e up and serve your inference "micro-service". You could use flask to iterate over a sequence of requests also if your use-case warrants it.

Building new TensorFlow Op, is there a build system standard?

After watching this question I decided to give writing a new op for TensorFlow a try.
Since the requirements of C++, Python and likely a *nix system are not my primary tools, I would like to avoid being at a point where I have to back out and make a system/tool changes just because I did not ask.
Is there a standard or preferred system and or tools used by those working or TensorFlow?
I know that recommendation questions are not allowed here; I am not asking for a personal recommendation, I am asking for the standard used by or what the TensorFlow group finds that works.
Really, anything where you can get Bazel and the required libraries up and running. But since you're starting from scratch: Ubuntu's a very safe bet and (I haven't measured this, but this is a solid estimate) probably gets the most testing and development by the tf team. But there are many options that all work -- you can develop inside a virtualenv on many environments. Things like GPU support get a little more platform-specific, and that's where Ubuntu starts to become the easiest choice if you don't have any other constraints.
The key requirements are outlined in installing Tensorflow from sources.

Spread vs MPI vs zeromq?

In one of the answers to Broadcast like UDP with the Reliability of TCP, a user mentions the Spread messaging API. I've also run across one called ØMQ. I also have some familiarity with MPI.
So, my main question is: why would I choose one over the other? More specifically, why would I choose to use Spread or ØMQ when there are mature implementations of MPI to be had?
MPI was deisgned tightly-coupled compute clusters with fast, reliable networks. Spread and ØMQ are designed for large distributed systems. If you're designing a parallel scientific application, go with MPI, but if you are designing a persistent distributed system that needs to be resilient to faults and network instability, use one of the others.
MPI has very limited facilities for fault tolerance; the default error handling behavior in most implementations is a system-wide fail. Also, the semantics of MPI require that all messages sent eventually be consumed. This makes a lot of sense for simulations on a cluster, but not for a distributed application.
I have not used any of these libraries, but I may be able to give some hints.
MPI is a communication protocol while Spread and ØMQ are actual implementation.
MPI comes from "parallel" programming while Spread comes from "distributed" programming.
So, it really depends on whether you are trying to build a parallel system or distributed system. They are related to each other, but the implied connotations/goals are different. Parallel programming deals with increasing computational power by using multiple computers simultaneously. Distributed programming deals with reliable (consistent, fault-tolerant and highly available) group of computers.
The concept of "reliability" is slightly different from that of TCP. TCP's reliability is "give this packet to the end program no matter what." The distributed programming's reliability is "even if some machines die, the system as a whole continues to work in consistent manner." To really guarantee that all participants got the message, one would need something like 2 phase commit or one of faster alternatives.
You're addressing very different APIs here, with different notions about the kind of services provided and infrastructure for each of them. I don't know enough about MPI and Spread to answer for them, but I can help a little more with ZeroMQ.
ZeroMQ is a simple messaging communication library. It does nothing else than send a message to different peers (including local ones) based on a restricted set of common messaging patterns (PUSH/PULL, REQUEST/REPLY, PUB/SUB, etc.). It handles client connection, retrieval, and basic congestion strictly based on those patterns and you have to do the rest yourself.
Although appearing very restricted, this simple behavior is mostly what you would need for the communication layer of your application. It lets you scale very quickly from a simple prototype, all in memory, to more complex distributed applications in various environments, using simple proxies and gateways between nodes. However, don't expect it to do node deployment, network discovery, or server monitoring; You will have to do it yourself.
Briefly, use zeromq if you have an application that you want to scale from the simple multithread process to a distributed and variable environment, or that you want to experiment and prototype quickly and that no solutions seems to fit with your model. Expect however to have to put some effort on the deployment and monitoring of your network if you want to scale to a very large cluster.