As part of the Tensorflow Research Cloud initiative, I have access to 100 TPU v2 machines with 8 TPUs on them (TPU v2-8s).
I need to achieve model data parallelism. Is there a way for me to run data parallelism on the 100 machines at once ? I would rather use tf.distribute.TPUStrategy if possible. Or do I absolutely need to write my own script that communicates between the machines to average the gradients between them.
As far as I'm aware, currently we don't have a good way of all-reducing gradients across TPU devices over regular network.
Related
I have recently become interested in incorporating distributed training into my Tensorflow projects. I am using Google Colab and Python 3 to implement a Neural Network with customized, distributed, training loops, as described in this guide:
https://www.tensorflow.org/tutorials/distribute/training_loops
In that guide under section 'Create a strategy to distribute the variables and the graph', there is a picture of some code that basically sets up a 'MirroredStrategy' and then prints the number of generated replicas of the model, see below.
Console output
From what I can understand, the output indicates that the MirroredStrategy has only created one replica of the model, and thereofore, only one GPU will be used to train the model. My question: is Google Colab limited to training on a single GPU?
I have tried to call MirroredStrategy() both with, and without, GPU acceleration, but I only get one model replica every time. This is a bit surprising because when I use the multiprocessing package in Python, I get four threads. I therefore expected that it would be possible to train four models in parallel in Google Colab. Are there issues with Tensorflows implementation of distributed training?
On google colab, you can only use one GPU, that is the limit from Google. However, you can run different programs on different gpu instances so by creating different colab files and connect them with gpus but you can not place the same model on many gpu instances in parallel.
There are no problems with mirrored startegy, talking from personal experience it works fine if you have more than one GPU.
I'm a new in machine learning and Tensorflow. I have a question about distributed training in TensorFlow. I've read about multi GPUs environments and it looks that it is quite possible (https://www.tensorflow.org/guide/using_gpu).
But what about multiple machines with multiple GPUs? Is it possible to divide machine training tasks between few machines? Is there a specific algorithms/tasks, which require such distribution or multiple GPUs are enough for machine learning? Will there be demand on this?
Thanks
It is possible.
You can run same model on multiple machines using data parallelism with distributed strategies or horovod to speed up your training. In that case you are running the same model across multiple machines to emulate a larger batch.
You can also go for a little less conventional way with GPipe or TF-Mesh to split a single model across multiple machines to increase number of model layers or even split individual layers across multiple workers.
I'm using ML Engine to run a hyperparameter tuning job for a Keras / Tensorflow model. I originally had set the machine type to be complex_model_l which is $1.65/hour. However, I'm using a TFRecords saved on Google Cloud Storage for my training and validation sets.
Given that they only take up ~6GB of space combined, is there any need for such a large machine? Could I use a standard machine (costs $0.27/hour) and run the tuning job as quickly?
Any advice would be awesome! I'm just not sure to what degree Tensorflow can make use of multiple cores by default.
Thanks!
The cluster I am using has 4 NVIDIA's GPUs (P100) per node. I have a tensorflow code that I need to run. It takes many hours to complete and I tried to use all 4 GPUs available on the node. but it looks like it runs slower if I use all 4 GPUs than if I use only 1GPU and I am not sure why... What is the best strategy to determine how many GPUs should I use for my problem?
It is possible that you didn't optimally structure your code for multi-gpu training if you distributed it layer-wise. Generally training speed should scale roughly linearly with number of GPUs.
Please refer to this answer on what options you have to adapt your network to multi-gpu training.
I'd like to run a Tensorflow application using multiple GPU's on Cloud ML.
My Tensorflow application is written in the non-distributed paradigm, that is outlined here
From what I understand if I want to use Cloud ML to run this same application, with multiple GPU's then the application must use scale tier CUSTOM and I need to set up parameter servers, worker servers which seem to be a distributed-tensorflow paradigm. Link here
Is this the only way to run multiple GPU training jobs on Cloud ML?
Is there a guide that helps me scope the changes required for my multiGPU (tower based) training application to a distributed tensorflow application?
You can use CUSTOM tier with only a single master node, and no workers/parameter servers. Those are optional parameters.
Then complex_model_m_gpu has 4 GPUs, and complex_model_l_gpu has 8.