Dask with Tensorflow - Joblib? - tensorflow

I was amazed at the results of DASK and JOBLIB when testing our different algorithms in a multi-node dash cluster with SciKit Learn and XGBoost.
I would like to know how to combine tensorflow 2.0 with Dask and Joblib to accomplish parallel computing with neural networks.

I don't think there are currently any "out of the box" solutions for this, though I believe there is chatter about building this functionality into dask or dask-ml. In early development too, but check out ray, they have some info in their docs about distributed TensorFlow: https://docs.ray.io/en/master/raysgd/raysgd_tensorflow.html

Related

Tensorflow profiling for non-model computations

I have a computation which has for loops and calls to Tensorflow matrix algorithms such as tf.lstsq and Tensorflow iteration with tf.map_fn. I would like to profile this to see how much parallelism I am getting in the tf.map_fn and matrix algorithms that get called.
This doesn't seem to be the use case at all for the Tensorflow Profiler which is organized around the neural network model training loop.
Is there a way to use Tensorflow Profiler for arbitrary Tensorflow computations, or is the go-to move in this case to use NVidia tools like nvprof?
I figured out that the nvprof and nvvp and nsight tools I was looking for are available as a Conda install of cudatoolkit-dev. Uses are described in this gist.

What is the difference between JAX, Trax, and TensorRT, in simple terms?

I have been using TensorRT and TensorFlow-TRT to accelerate the inference of my DL algorithms.
Then I have heard of:
JAX https://github.com/google/jax
Trax https://github.com/google/trax
Both seem to accelerate DL. But I am having a hard time to understand them. Can anyone explain them in simple terms?
Trax is a deep learning framework created by Google and extensively used by the Google Brain team. It comes as an alternative to TensorFlow and PyTorch when it comes to implementing off-the-shelf state of the art deep learning models, for example Transformers, Bert etc. , in principle with respect to the Natural Language Processing field.
Trax is built upon TensorFlow and JAX. JAX is an enhanced and optimised version of Numpy. The important distinction about JAX and NumPy is that the former using a library called XLA (advanced linear algebra) which allows to run your NumPy code on GPU and TPU rather than on CPU like it happens in the plain NumPy, thus speeding up computation.

Solutions for big data preprecessing for feeding deep neural network models built with TensorFlow 2.0?

Currently I am using Python, Numpy, pandas, scikit-learn to do data preprocessing (LabelEncoder, MinMaxScaler, fillna, etc.), and then feeding the processed data to DNN models built with Tensorflow 2.0. This input pipeline meets my needs when data is small enough to fit a PC's RAM.
Now I have some large datasets, more than 10GB, some are larger. I also plan to deploy the models in a production environment, which means there will be new data coming everyday. For DNN model training there is distributed strategy of tensorflow 2.0. But for data preprocessing obviously I cannot use pandas, scikitlearn on the large datasets with one PC. It seems to me I need to use a for-loop where I repeatedly fetch a small part of the data and use it for training?
I am wondering what do people typically use in either experiment or production environment for big data preprocessing?
Should I use Spark(Scala) / PySpark and Tensorflow input pipeline?
Yeah, with the current way you are doing preprocessing, it'll not scale well.
PySpark is one right way to run your preprocessing layer. Setup a simple standalone spark cluster with few workers and then run your preprocessing (labelEncoder/OneHotEncoder/fillNA/...) This solution should scale well and it abstracts the distributed computation layer.
PS : PySpark might not be the only way forward, but it is one of the good way forward for this use case.

Distributed Tensorflow Using fit_generator()

Is it possible to use Tensorflow in a distributed manner and use the fit_generator()? In my research so far I have not seen anything on how to do this or if it is possible. If it is not possible then what are some possible solutions to use distributed Tensorflow when all the data will not fit in memory.
Using fit_generator() is not possible under a tensorflow distribution scope.
have a lookt at tf.data. i rewrote all my Keras ImageDataGenerators to a tensorflow data pipeline. doesn't need much time, is more transparent and quite remarkably faster.

What exactly is "Tensorflow Distibuted", now that we have Tensorflow Serving?

I don't understand why "Tensorflow Distributed" still exists, now that we have Tensorflow Serving. It seems to be some way to use core Tensorflow as a serving platform, but why would we want that when Tensorflow Serving and TFX is a much more robust platform? Is it just legacy support? If so, then the Tensorflow Distributed pages should make that clear and point people towards TFX.
Distributed Tensorflow can support training one model in many machines by implementing a parameter server, with either data parallelism or model parallelism.