Keras run predict_generator on multiple GPUs - tensorflow

I am generating bottlenecks for a VGG16 model. I have a lot of data and am using 6 GPUs. It seems to be simple enough to train with multiple GPUs as of Keras 2.0.9. I tried just running multi_gpu_model(model, gpus=6) and then running predict_generator. And although my program is shown running on each GPU( Using nvidia-smi ) it seems to be generating on CPU. Is there any way for me to easily run predict_generator on multiple GPUs? (Although I have a bad feeling Ill have to 'rewrite' my own version of multi_gpu_model), If I was to implement this myself would the best way to do this be to make a thread for each GPU that will run the predict_generator on a separate batch?
Sorry If I have missed something obvious.

Related

How to get the exact GPU memory usage for Keras

I recently started learning Keras and TensorFlow. I am testing out a few models currently on the MNIST dataset (pretty basic stuff). I wanted to know, exactly how much my model is consuming memory-wise, during training and inference. I tried googling but did not find much info.
I came across Nvidia-smi. I tried using config.gpu_options.allow_growth = True option but still am not able to use the exact memory python.exe is consuming due to some issues with Nvidia-smi. I know that I could run a separate pass of train and inference, but this is too cumbersome. It is very easy if I could just find the right API to do the job.
Tensorflow being such a well known and well-used library, I am hoping to find a better and faster way to get to these numbers.
Finally, once again my question is:
How to get the exact memory usage for a Keras model during training and inference.
Relevant specs:
OS: Windows 10
GPU: GTX 1050
TensorFlow version: 1.14
Please let me know if any other details are required.
Thanks!

How to do parallel GPU inferencing in Tensorflow 2.0 + Keras?

Let's begin with the premise that I'm newly approaching to TensorFlow and deep learning in general.
I have TF 2.0 Keras-style model trained using tf.Model.train(), two available GPUs and I'm looking to scale down inference times.
I trained the model distributing across GPUs using the extremely handy tf.distribute.MirroredStrategy().scope() context manager
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model.compile(...)
model.train(...)
both GPUs get effectively used (even if I'm not quite happy with the results accuracy).
I can't seem to find a similar strategy for distributing inference between GPUs with the tf.Model.predict() method: when i run model.predict() I get (obviously) usage from only one of the two GPUs.
Is it possible to istantiate the same model on both GPUs and feed them different chunks of data in parallel?
There are posts that suggest how to do it in TF 1.x but I can't seem to replicate the results in TF2.0
https://medium.com/#sbp3624/tensorflow-multi-gpu-for-inferencing-test-time-58e952a2ed95
Tensorflow: simultaneous prediction on GPU and CPU
my mental struggles with the question are mainly
TF 1.x is tf.Session()based while sessions are implicit in TF2.0, if I get it correctly, the solutions I read use separate sessions for each GPU and I don't really know how to replicate it in TF2.0
I don't know how to use the model.predict() method with a specific session.
I know that the question is probably not well-formulated but I summarize it as:
Does anybody have a clue on how to run Keras-style model.predict() on multiple GPUs (inferencing on a different batch of data on each GPU in a parallel way) in TF2.0?
Thanks in advance for any help.
Try to load model in tf.distribute.MirroredStrategy and use greater batch_size
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = tf.keras.models.load_model(saved_model_path)
result = model.predict(batch_size=greater_batch_size)
There still does not seem to be an official example for distributed inference. There is a potential solution here using tf.distribute.MirroredStrategy: https://github.com/tensorflow/tensorflow/issues/37686. However, it does not seem to fully utilize multi gpus

How to determine the optimal number of GPUs for my machine learning script?

The cluster I am using has 4 NVIDIA's GPUs (P100) per node. I have a tensorflow code that I need to run. It takes many hours to complete and I tried to use all 4 GPUs available on the node. but it looks like it runs slower if I use all 4 GPUs than if I use only 1GPU and I am not sure why... What is the best strategy to determine how many GPUs should I use for my problem?
It is possible that you didn't optimally structure your code for multi-gpu training if you distributed it layer-wise. Generally training speed should scale roughly linearly with number of GPUs.
Please refer to this answer on what options you have to adapt your network to multi-gpu training.

data generator with tensorflow on the gpu

I am making a neural network using tensorflow and I ran into a problem trying to use a generator to split my data up, basically it's too slow.
My training data consists of 52x52 numpy arrays. I need to split each array into a 52x52x3 array before I input it into my NN. As mentioned I have a generator working that does this, but I noticed that even though my NN is running on the GPU my GPU usage is very low (under 10% usually). I think this might be caused by me doing the generator on the CPU.
Is there any way of running my generator on the GPU?
What I tried:
- I thought of trying to use pyCUDA in order to program the generator on the GPU but found that tensorflow and pyCUDA don't support each other
-I tried using the from_generator function from the Dataset API as mentioned here:
https://www.tensorflow.org/api_docs/python/tf/contrib/data/Dataset
But while having issues with it I ran into this github thread mentioning that this function isn't supported to run on the GPU anyway:
https://github.com/tensorflow/tensorflow/issues/13610
Any help would be greatly appreciated.

TensorFlow - GPU Acceleration only for training?

Will utilizing GPU Acceleration with TensorFlow increase the speed of only the training of models or will it also help improve speed while using the model on data.
Most guides only talk about utilizing GPU acceleration for training purposes.
Also will it work with any of the TensorFlow Models ? Even those run via shell scripts ?
In addition would it run on the shell scripts by default or does it require explicit coding to make it work.
It will work for both and yes it should make using the models faster even when not training (unless the model is really simple and the overhead of placing it on the GPU outweighs the performance cost.) I do think using a GPU is less necessary for just evaluating the model. When training often the data is batched together so that each train step contains multiple runs of the model. Also the gradients need to be calculated which takes up a lot of compute time and memory. The weights also need to be updated during training. Therefore just making a simple forward pass is a lot faster. I really think you would see a benefit if you needed to make a whole bunch of forward passes at once.
As for running tensorflow models through shell scripts, I would assume if they train on the GPU they will also run on the GPU.