What exactly is "Tensorflow Distibuted", now that we have Tensorflow Serving? - tensorflow

I don't understand why "Tensorflow Distributed" still exists, now that we have Tensorflow Serving. It seems to be some way to use core Tensorflow as a serving platform, but why would we want that when Tensorflow Serving and TFX is a much more robust platform? Is it just legacy support? If so, then the Tensorflow Distributed pages should make that clear and point people towards TFX.

Distributed Tensorflow can support training one model in many machines by implementing a parameter server, with either data parallelism or model parallelism.

Related

Tensorflow Serving Performance Very Slow vs Direct Inference

I am running in the following scenario:
Single Node Kubernetes Cluster (1x i7-8700K, 1x RTX 2070, 32GB RAM)
1 Tensorflow Serving Pod
4 Inference Client Pods
What the inference clients do is they get images from 4 separate cameras (1 each) and pass it to TF-Serving for inference in order to get the understanding of what is seen on the video feeds.
I have previously been doing inference inside the Inference Client Pods individually by calling TensorFlow directly but that hasn't been good on the RAM of the graphics card. Tensorflow Serving has been introduced to the mix quite recently in order to optimize RAM as we don't load duplicated models to the graphics card.
And the performance is not looking good, for a 1080p images it looks like this:
Direct TF: 20ms for input tensor creation, 70ms for inference.
TF-Serving: 80ms for GRPC serialization, 700-800ms for inference.
The TF-Serving pod is the only one that has access to the GPU and it is bound exclusively. Everything else operates on CPU.
Are there any performance tweaks I could do?
The model I'm running is Faster R-CNN Inception V2 from the TF Model Zoo.
Many thanks in advance!
This is from TF Serving documentation:
Please note, while the average latency of performing inference with TensorFlow Serving is usually not lower than using TensorFlow directly, where TensorFlow Serving shines is keeping the tail latency down for many clients querying many different models, all while efficiently utilizing the underlying hardware to maximize throughput.
From my own experience, I've found TF Serving to be useful in providing an abstraction over model serving which is consistent, and does not require implementing custom serving functionalities. Model versioning and multi-model which come out-of-the-box save you lots of time and are great additions.
Additionally, I would also recommend batching your requests if you haven't already. I would also suggest playing around with the TENSORFLOW_INTER_OP_PARALLELISM, TENSORFLOW_INTRA_OP_PARALLELISM, OMP_NUM_THREADS arguments to TF Serving. Here is an explanation of what they are
Maybe you could try OpenVINO? It's a heavily optimized toolkit for inference. You could utilize your i7-8700K and run some frames in parallel. Here are some performance benchmarks for very similar i7-8700T.
There is even OpenVINO Model Server which is very similar to Tensorflow Serving.
Disclaimer: I work on OpenVINO.

Has tf.Estimator become obsolete in Tensorflow 2.0?

Today I've set up a custom model using its tf.Estimator high-level API in Tensorflow 2.0.
It was a pain in the *** to get it running, and there are very few complete examples online that implement custom Estimators in Tensorflow 2, which made me questioning the reasons for using this API.
According to the docs, the main advantages of using the tf.Estimator API are:
You can run Estimator-based models on a local host or on a distributed multi-server environment without changing your model. Furthermore, you can run Estimator-based models on CPUs, GPUs, or TPUs without recoding your model.
You no longer have to worry about creating the computational graph or sessions since Estimators handle all the "plumbing" for you
Advantage 2. clearly doesn't apply to Tensorflow 2.0 anymore, as it runs in eager mode by default, so you don't have to worry about sessions anyways.
Advantage 1. also seems quite irrelevant in Tensorflow 2.0 - using tf.distribute.Strategy, you can now easily run even high level tf.Keras models in a distributed fashion and on CPUs/GPUs/TPUs.
tf.Keras models are so much easier and faster to set up, so why did they even bother to keep the tf.Estimator API in Tensorflow 2.0? Are there other advantages of using this API?

Distributed Tensorflow Using fit_generator()

Is it possible to use Tensorflow in a distributed manner and use the fit_generator()? In my research so far I have not seen anything on how to do this or if it is possible. If it is not possible then what are some possible solutions to use distributed Tensorflow when all the data will not fit in memory.
Using fit_generator() is not possible under a tensorflow distribution scope.
have a lookt at tf.data. i rewrote all my Keras ImageDataGenerators to a tensorflow data pipeline. doesn't need much time, is more transparent and quite remarkably faster.

"Using TensorFlow backend". Is this an error?

I am new to deep learning on jupyter-notebook. I compiled this first cell and got this reply. It says "TensorFlow backend". Is this an error?
No it's not an error. Keras is a model-level library, providing high-level building blocks for developing deep learning models. It does not handle itself low-level operations such as tensor products, convolutions and so on. Instead, it relies on a specialized, well-optimized tensor manipulation library to do so, serving as the "backend engine" of Keras. Rather than picking one single tensor library and making the implementation of Keras tied to that library, Keras handles the problem in a modular way, and several different backend engines can be plugged seamlessly into Keras.
At this time, Keras has three backend implementations available: the TensorFlow backend, the Theano backend, and the CNTK backend.
In your case it is TensorFlow backend.

how to serve pytorch or sklearn models using tensorflow serving

I have found tutorials and posts which only says to serve tensorflow models using tensor serving.
In model.conf file, there is a parameter model_platform in which tensorflow or any other platform can be mentioned. But how, do we export other platform models in tensorflow way so that it can be loaded by tensorflow serving.
I'm not sure if you can. The tensorflow platform is designed to be flexible, but if you really want to use it, you'd probably need to implement a C++ library to load your saved model (in protobuf) and give a serveable to tensorflow serving platform. Here's a similar question.
I haven't seen such an implementation, and the efforts I've seen usually go towards two other directions:
Pure python code serving a model over HTTP or GRPC for instance. Such as what's being developed in Pipeline.AI
Dump the model in PMML format, and serve it with a java code.
Not answering the question, but since no better answers exist yet: As an addition to the alternative directions by adrin, these might be helpful:
Clipper (Apache License 2.0) is able to serve PyTorch and scikit-learn models, among others
Further reading:
https://www.andrey-melentyev.com/model-interoperability.html
https://medium.com/#vikati/the-rise-of-the-model-servers-9395522b6c58
Now you can serve your scikit-learn model with Tensorflow Extended (TFX):
https://www.tensorflow.org/tfx/guide/non_tf