distributed tensorflow clarification - tensorflow

Is my understanding correct that model_deploy lets the user train a model using multiple devices on a single machine? The basic premise seems that the clone devices do variable sharing and variables get distributed to param servers in a round-robin fashion.
On the other hand distributed tensorflow framework enables the user to train a model through a cluster. A Cluster lets the user train a model using multiple devices across multiple servers.
I think the Slim documentation is very slim and the point has been raised couple of times already: Configuration/Flags for TF-Slim across multiple GPU/Machines
Thank you.

Related

TensorFlow Serving Cluster Architecture

Folks, I am writing an application which will produce recommendations based on ML model call. The application will have different models, some of them should be called in sequence. A data scientist should be able, to upload a model in the system. This means that the application should have logic to store models metadata as well as address of a model server. A model server will be instantiated dynamically on a model upload event.
I would like to use a cluster of TensorFlow Serving here, however I am stacked with a question of architecture.
Is there a way to have something like service registry for TensorFlow servers? What is the best way to build such a cluster of servers with different models?
I need some clarification on what you're trying to do. Is the feature vector for all the models the same? If not then it will be quite a bit harder to do this. Trained models are encapsulated in the SavedModel format. It sounds like you're trying to train an ensemble, but some of the models are frozen? You could certainly write a custom component to make an inference request as part of the input to Trainer, if that's what you need.
UPDATE 1
From your comment below it sounds like what you might be looking for is a service mesh, such as Istio for example. That would help manage the connections between services running inside containers, and the connections between users and services. In this case tf.Serving instances running your models are the services, but the basic request-response pattern is the same. Does that help?

Is it possible to share a Tensorflow model among Gunicorn workers?

I need to put in production a Tensorflow model with a simple APIs endpoint. The model should be shared among processes/workers/threads in order to not waste too many resources in terms of memory.
I already tried with multiple gunicorn workers setting the --preload option and loading the model before the definition of the resources, but after a while I get timeout error. The model is not responding. I know that there is Tensorflow serving service available for this purpose, but the problem is about the fact that the model is not still available at deployment time and the ML pipeline is composed of many different components (the Tensorflow model is just one of these components). The end user (customer) is the one that trains/saves the model. (I am using docker) Thanks in advance
The issue is that the Tensorflow runtime and global state is not fork safe.
See this thread: https://github.com/tensorflow/tensorflow/issues/5448 or this one: https://github.com/tensorflow/tensorflow/issues/51832
You could attempt to load the model only in the workers using the post_fork and post_worker_init server hooks.

Serving multiple deep learning models from cluster

I was thinking about how one should deploy multiple models for use. I am currently dealing with tensorflow. I was referring this and this article.
But I am not able to find any article which targets need to serve several models distributed manner. Q.1. Does tensorflow serving serve models off from single machine? Is there any way to set up a cluster of machines running tensorflow serving? So that multiple machines serve same model somewhat working as master and slave or say load balance between them while serving different models.
Q.2. Does similar functionality exist for other deep learning frameworks, say keras, mxnet etc (not just restricting to tensorflow and serving models from different frameworks)?
A1: Serving tensorflow models in a distributed fashion is made easy with Kubernetes, a container orchestration system, that takes much of the pain related to having distributed system away from you, including load balancing. Please check serving kubernetes.
A2: Sure, check for instance Prediction IO. It's not deep learning specific, but can be used to deploy models made with e.g. Spark MLLib.

Is there any example of distributed TensorFlow training by using multi-node multi-GPU?

By searching on Google, I can find the following two types of deployment about tensorflow training:
Training on a single node and multiple GPUs, such as CNN;
Distributed training on multiple nodes, such as between-graph replica training;
Is there any example of using multi-node multi-GPU? To be specific, there exist two levels of parallelism:
On the first level, the parameter servers and workers are distributed among different nodes;
On the second level, each worker on a single machine will use multiple GPUs for training;
Tensorflow Inception model documentation on GitHub (link) has a very good explanation on different types of training, make sure to check it out and their source code.
also, you can have a look at this code, it also does distribute training in a slightly different way.

Already implemented neural network on Google Cloud Platform

I have implemented a neural network model using Python and Tensorflow, which normally runs on my own computer.
Now I would like to train it on new datasets on the Google Cloud Platform. Do you think it is possible? Do I need to change my code?
Thank you very much for your help!
Google Cloud offers the Cloud ML Engine service, which allows to train your models and perform predictions without the need of running and maintaining an instance with the required software.
In order to run the TensorFlow NN models you already have, you will not need to change your code, you will only have to package the trainer appropriately, as described in the documentation, and run a ML Engine job that performs the training itself. Once you have your model, you can also deploy it in the same service and later get predictions with different features depending on your requirements (urgency in getting the predictions, data set sources, etc.).
Alternatively, as suggested in the comments, you can always launch a Compute Engine instance and run there your TensorFlow model as if you were doing it locally in your computer. However, I would strongly recommend the approach I proposed earlier, as you will be saving some money, because you will only be charged for your usage (training jobs and/or predictions) and do not need to configure an instance from scratch.