Train two models on two GPUs - gpu

I have two models (theano scripts) I want to train and evaluate.
I have two GPUs I can use to train them.
How can I run a model on each GPU at the same time?

When running your script you can choose where your progam will run with THEANO_FLAGS:
THEANO_FLAGS='device=gpu0' python script_1.py
THEANO_FLAGS='device=gpu1' python script_2.py
change gpuX for each gpu (eg. gpu0,gpu1,gpu2...)

Related

How to use multiple GPUs for separate training with Tensorflow?

I have looked through many articles and posts about using multiple GPUs with TensorFlow. It helps me more here on "how to use parallel GPUs to train NN (neural network)". But I have a different question. Can a separate GPU be used to train different NNs at the same time?
More details:
I have neural networks A, B, and GPU1, GPU2. I want to train A NN on GPU1 and B NN on GPU2 at the same time. Is it possible?
I suggest using two separate python scripts to train both networks, such as trainA.py and trainB.py.
In the first two lines of trainA.py you select your preferred GPU.
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
For trainB.py you select the other GPU:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
Now you should be able to run both train scripts at the same time.

Using ssd_inception_v2 to train on different resolution

The dataset contains images of different sizes.
The pretrained weights are trained on 300x300 resolution.
I am training on widerface dataset where objects are as small as 15x15.
Q1. I want to train with 800x800 resolution do i need to resize all the images manually or this will be done by Tensorflow automatically ?
I am using the following command to train:
python3 /opt/github/models/research/object_detection/legacy/train.py --logtostderr --train_dir=/opt/github/object_detection_retraining/wider_face_checkpoint/ --pipeline_config_path=/opt/github/object_detection_retraining/models/ssd_inception_v2_coco_2018_01_28/pipeline.config
Q2. I also tried training it using the model_main.py but after 1000 iterations it is evaluating the dataset with each iteration.
I am using the following command to train:
python3 /opt/github/models/research/object_detection/model_main.py --num_train_steps=200000 --logtostderr --model_dir=/opt/github/object_detection_retraining/wider_face_checkpoint/ --pipeline_config_path=/opt/github/object_detection_retraining/models/ssd_inception_v2_coco_2018_01_28/pipeline.config
Q3. Also if you can suggest any model i should use for real time face detection apart from mobilenet and inception, please suggest.
Thanks.
Q1. No you do not need to resize manually. See this detailed answer.
Q2. By 1000 iterations you meant steps right? (An iteration counts as a complete cycle of the dataset.) Usually the model performed evaluation after a certain amount of time, e.g. 10 minutes. So in every 10 minutes, the checkpoints are saved and an evaluation of the model on evaluation set is performed.
Q3. SSD models with mobilenet is one of the fast detectors, apart from that you can try YOLO models for real time detection

Assign Torch and Tensorflow models two separate GPUs

I am comparing two pre-trained models, one is in Tensorflow and one is in Pytorch, on a machine that has multiple GPUs. Each model fits on one GPU. They are both loaded in the same Python script. How can I assign one GPU to the Tensorflow model and another GPU to the Pytorch model?
Setting CUDA_VISIBLE_DEVICES=0,1 only tells both models that these GPUs are available - how can I (within Python I guess), make sure that Tensorflow takes GPU 0 and Pytorch takes GPU 1?
You can refer to torch.device. https://pytorch.org/docs/stable/tensor_attributes.html?highlight=device#torch.torch.device
In particular do
device=torch.device("gpu:0")
tensor = tensor.to(device)
or to load a pretrained model
device=torch.device("gpu:0")
model = model.to(device)
to put tensor/model on gpu 0.
Similarly tensorflow has tf.device. https://www.tensorflow.org/api_docs/python/tf/device. Its usage is described here https://www.tensorflow.org/guide/using_gpu
for tensorflow to load model on gpu:0 do,
with tf.device("gpu:0"):
load_model_function(model_path)

Merge weights of same model trained on 2 different computers using tensorflow

I was doing some research on training deep neural networks using tensorflow. I know how to train a model. My problem is i have to train the same model on 2 different computers with different datasets. Then save the model weights. Later i have to merge the 2 model weight files somehow. I have no idea how to merge them. Is there a function that does this or should the weights be averaged?
Any help on this problem would be useful
Thanks in advance
There is literally no way to merge weights, you cannot average or combine them in any way, as the result will not mean anything. What you could do instead is combine predictions, but for that the training classes have to be the same.
This is not a programming limitation but a theoretical one.
It is better to merge weight updates (gradients) during the training and keep a common set of weights rather than trying to merge the weights after individual trainings have completed. Both individually trained networks may find a different optimum and e.g. averaging the weights may give a network which performs worse on both datasets.
There are two things you can do:
Look at 'data parallel training': distributing forward and backward passes of the training process over multiple compute nodes each of which has a subset of the entire data.
In this case typically:
each node propagates a minibatch forward through the network
each node propagates the loss gradient backwards through the network
a 'master node' collects gradients from minibatches on all nodes and updates the weights correspondingly
and distributes the weight updates back to the compute nodes to make sure each of them has the same set of weights
(there are variants of the above to avoid that compute nodes idle too long waiting for results from others). The above assumes that Tensorflow processes running on the compute nodes can communicate with each other during the training.
Look at https://www.tensorflow.org/deploy/distributed) for more details and an example of how to train networks over multiple nodes.
If you really have train the networks separately, look at ensembling, see e.g. this page: https://mlwave.com/kaggle-ensembling-guide/ . In a nutshell, you would train individual networks on their own machines and then e.g. use an average or maximum over the outputs of both networks as a combined classifier / predictor.

TensorFlow Slim - Clone on cpu

What does 'Use CPUs to deploy clones' mean in the following snippet (slim/train_image_classifier.py):
tf.app.flags.DEFINE_boolean(
'clone_on_cpu', False,
'Use CPUs to deploy clones.'
)
Use CPUs to deploy clones' mean
In general setup model losses and gradients are calculated on GPUs, a single clone use a single GPU. For multi GPUs training multiples clones are created. If you have 4 GPUs 4 clones are created and loss for separate batches are computed simultaneously (data parallelism). That said, Now if you don't have GPUs you can use multiple CPUs to for data parallelism ( will be slower than GPU off course). USE CPUs to deploy clones option let you use CPUs for data parallelism; to compute model losses and gradients on cpus.