I have a tensorflow (tf2.0)/keras model that uses multiple GPUs for its computations. There are 2 branches in the model and each branch is on a separate GPU.
I have a 4 GPU system that I want to use for training and I would like to mirror this model so that GPUs 1 and 2 contain one model and GPUs 3 and 4 contain the mirrored model.
Will tf.distribute.MirroredStrategy handle this mirroring automatically? Or does it assume that my model will be a single GPU model?
If tf.distribute.MirroredStrategy will not handle this, does anyone have any suggestions for how to customise MirroredStrategy to achieve this training structure?
This sounds a lot like your are gonna need to do a custom training loop.
The mirrored strategy replicates the model on each GPU, but since your model is allready on two GPUs, i don't think that its gonna work properly.
But you can try it out and check with nvidia-smi what tensorflow is doing.
Related
Suppose we have a simple TensorFlow model with a few convolutional layers. We like to train this model on a cluster of computers that is not equipped with GPUs. Each computational node of this cluster might have 1 or multiple cores. Is it possible out-of-the-box?
If not, which packages are able to do that? Are those packages able to perform data and model parallelism?
According to Tensorflow documentation
tf.distribute.Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines or TPUs.
As mentioned above it supports CPU for distributed training by considering all devices should be in the same network.
Yes, you can use multiple devices for training model and need to have cluster and worker configuration to be done on couple of devices as shown below.
tf_config = {
'cluster': {
'worker': ['localhost:1234', 'localhost:6789']
},
'task': {'type': 'worker', 'index': 0}
}
To know about the configuration and training model, please refer Multi-worker training with Keras.
According to this SO answer
tf.distribute.Strategy is integrated to tf.keras, so when model.fit is
used with tf.distribute.Strategy instance and then using
strategy.scope() for your model allows to create distributed
variables. This allows it to equally divide your input data on your
devices.
Note: One can be benefited using distributed training when dealing with huge data and complex models (i.e. w.r.t performance).
Let's begin with the premise that I'm newly approaching to TensorFlow and deep learning in general.
I have TF 2.0 Keras-style model trained using tf.Model.train(), two available GPUs and I'm looking to scale down inference times.
I trained the model distributing across GPUs using the extremely handy tf.distribute.MirroredStrategy().scope() context manager
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model.compile(...)
model.train(...)
both GPUs get effectively used (even if I'm not quite happy with the results accuracy).
I can't seem to find a similar strategy for distributing inference between GPUs with the tf.Model.predict() method: when i run model.predict() I get (obviously) usage from only one of the two GPUs.
Is it possible to istantiate the same model on both GPUs and feed them different chunks of data in parallel?
There are posts that suggest how to do it in TF 1.x but I can't seem to replicate the results in TF2.0
https://medium.com/#sbp3624/tensorflow-multi-gpu-for-inferencing-test-time-58e952a2ed95
Tensorflow: simultaneous prediction on GPU and CPU
my mental struggles with the question are mainly
TF 1.x is tf.Session()based while sessions are implicit in TF2.0, if I get it correctly, the solutions I read use separate sessions for each GPU and I don't really know how to replicate it in TF2.0
I don't know how to use the model.predict() method with a specific session.
I know that the question is probably not well-formulated but I summarize it as:
Does anybody have a clue on how to run Keras-style model.predict() on multiple GPUs (inferencing on a different batch of data on each GPU in a parallel way) in TF2.0?
Thanks in advance for any help.
Try to load model in tf.distribute.MirroredStrategy and use greater batch_size
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = tf.keras.models.load_model(saved_model_path)
result = model.predict(batch_size=greater_batch_size)
There still does not seem to be an official example for distributed inference. There is a potential solution here using tf.distribute.MirroredStrategy: https://github.com/tensorflow/tensorflow/issues/37686. However, it does not seem to fully utilize multi gpus
I'm using Google Colab for training my models.
But speed is still low.
So is there a way I can train from two different accounts and combine the training later?
No, you cannot train using 2 accounts the same model on colab. Google colab is for research purposes only. Not to train large scale production models. Colab also disconnects the kernel every 12 hour.
You can instead train the model using multiple GPU's on a single computer. Keras supports multi GPU training when using tensorflow as backend. But training on two different computers/VM is not possible. How will gradients flow during back propagation?
There is a solution though, but not an end-to-end approach. You can split your model into two different models, where the output of first model will become the input for second and second will produce the final output. For this you need a different training set for each model.
Take this example.
Suppose you are building a face recogniser where the model takes in a raw camera picture and recognises the face as yes/no.
Instead of training this big Networks you could split it into two different nets, where task for first net will be to crop the face and remove other useless things from image and second to recognise from cropped image.
This is non end-to-end model, and you can train the two models diffently on different machines with different dataset and then eventually merge it together. This is usually more powerful and easy to train.
Look up this question Tensorflow Combining Two Models End to End
Another possibility is to ensemble the two trained models. You'd have to make sure however that the data for both of the models are coming from the same distribution.
So I have this very large and deep model I implemented with TensorFlow r1.2, running on an NVIDIA Tesla k40 with 12 GB of memory. The model consists of several RNNs, a bunch of weight and embedding matrices as well as bias vectors. When I launched the training program, it first took about 2-3 hours to build to model, and then crashed due to OOM issues. I tried to reduce batch size to even 1 data sample per batch, but still ran into the same issue.
If I google tensorflow muitlple gpu, the examples I found mainly focused on utilizing multiple GPUs by parallel model design, which means to have each GPU run the same graph and have the CPU calculate the total gradient thus propagate back to each parameters.
I know one possible solution might be running the model on an GPU with larger memory. But I wonder if there's a way to split my graph (model) into different parts sequentially and assign them to different GPUs?
The official guide on using GPUs shows you that example in "Using multiple GPUs". You just need to create the operations within different tf.device contexts; the nodes will still be added to the same graph, but they will be annotated with device directives indicating where they should be run. For example:
with tf.device("/gpu:0"):
net0 = make_subnet0()
with tf.device("/gpu:1"):
net1 = make_subnet1()
result = combine_subnets(net0, net1)
I have been training my tensorflow retraining algorithm using a single GTX Titan and it works just fine, but when I try to use multiple gpus in the flower of retraining example it does not work and seems to only utilize one GPU when I run it in Nvidia SMI.
Why is this happening as it does work with multiple gpus when retraining at Inception model from scratch but not during retraining?
TensorFlow's flower retraining example does not work with multiple GPUs at all, even if you set --num_gpus > 1. It should support a single GPU as you noted.
The model needs to be modified to utilize multiple GPUs in parallel. Unfortunately, a single TensorFlow operation like the flower retraining example can't automatically be split over multiple GPUs at this time.